Instructions to use dealignai/Qwen3.6-35B-A3B-MXFP8-CRACK-MTP with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- MLX
How to use dealignai/Qwen3.6-35B-A3B-MXFP8-CRACK-MTP with MLX:
# Make sure mlx-vlm is installed # pip install --upgrade mlx-vlm from mlx_vlm import load, generate from mlx_vlm.prompt_utils import apply_chat_template from mlx_vlm.utils import load_config # Load the model model, processor = load("dealignai/Qwen3.6-35B-A3B-MXFP8-CRACK-MTP") config = load_config("dealignai/Qwen3.6-35B-A3B-MXFP8-CRACK-MTP") # Prepare input image = ["http://images.cocodataset.org/val2017/000000039769.jpg"] prompt = "Describe this image." # Apply chat template formatted_prompt = apply_chat_template( processor, config, prompt, num_images=1 ) # Generate output output = generate(model, processor, formatted_prompt, image) print(output) - Notebooks
- Google Colab
- Kaggle
- Local Apps
- LM Studio
- Pi new
How to use dealignai/Qwen3.6-35B-A3B-MXFP8-CRACK-MTP with Pi:
Start the MLX server
# Install MLX LM: uv tool install mlx-lm # Start a local OpenAI-compatible server: mlx_lm.server --model "dealignai/Qwen3.6-35B-A3B-MXFP8-CRACK-MTP"
Configure the model in Pi
# Install Pi: npm install -g @mariozechner/pi-coding-agent # Add to ~/.pi/agent/models.json: { "providers": { "mlx-lm": { "baseUrl": "http://localhost:8080/v1", "api": "openai-completions", "apiKey": "none", "models": [ { "id": "dealignai/Qwen3.6-35B-A3B-MXFP8-CRACK-MTP" } ] } } }Run Pi
# Start Pi in your project directory: pi
- Hermes Agent new
How to use dealignai/Qwen3.6-35B-A3B-MXFP8-CRACK-MTP with Hermes Agent:
Start the MLX server
# Install MLX LM: uv tool install mlx-lm # Start a local OpenAI-compatible server: mlx_lm.server --model "dealignai/Qwen3.6-35B-A3B-MXFP8-CRACK-MTP"
Configure Hermes
# Install Hermes: curl -fsSL https://hermes-agent.nousresearch.com/install.sh | bash hermes setup # Point Hermes at the local server: hermes config set model.provider custom hermes config set model.base_url http://127.0.0.1:8080/v1 hermes config set model.default dealignai/Qwen3.6-35B-A3B-MXFP8-CRACK-MTP
Run Hermes
hermes
Qwen 3.6 35B-A3B — MXFP8 CRACK + d3 MTP
CRACK abliterated · MXFP8 (8-bit microscaling) · d3 MTP self-speculative (1.50× faster) · Vision + Video · Reasoning toggle · 35 GB
What Is This?
This is Qwen 3.6 35B-A3B — a vision-language model (Mixture-of-Experts (256 routed, 10 active) hybrid SSM + full attention, 40 layers, native image + video understanding) that has been:
- CRACK abliterated — refusal behavior removed at the weight level, so it complies across task categories instead of refusing, while keeping its knowledge, reasoning, and vision intact.
- MXFP8 (8-bit microscaling) quantized for MLX on Apple Silicon — 35 GB.
- MTP-preserved — the native multi-token-prediction head is kept and abliterated too, so d3 self-speculative decoding works (~1.50× faster) on an MTP-aware runtime (vMLX).
Vision and video processing are fully preserved.
Results
Evaluated through the vMLX inference engine. HarmBench scored with a strict classifier (rejects loops, empty/template dumps, and thinking-trace leakage). MMLU is the standard 57-subject multiple-choice benchmark.
| Metric | Result |
|---|---|
| HarmBench-320 (compliance / ASR) | 99.4% (318/320) |
| MMLU (57-subject) | 84.6% |
| d3 MTP speedup | 1.50× vs autoregressive |
Abliteration preserves the model's knowledge and reasoning — it stays coherent in both direct and reasoning modes.
Features
- Vision + video —
image-text-to-text, native frame/video understanding preserved. - d3 MTP speculative decoding — native MTP head preserved and abliterated → ~1.50× faster generation on an MTP-aware runtime.
- Reasoning toggle —
enable_thinking=True(default, full chain-of-thought) orenable_thinking=False(direct answers).
Usage
Run with vMLX (recommended — supports VL + video + native MTP) or an MLX runtime with Qwen 3.6 support.
Recommended sampling (from the model's generation_config): temperature 1.0, top_p 0.95,
top_k 20.
# vMLX OpenAI-compatible endpoint
# POST /v1/chat/completions
{
"model": "dealignai/Qwen3.6-35B-A3B-MXFP8-CRACK-MTP",
"messages": [{"role": "user", "content": "..."}],
"temperature": 1.0, "top_p": 0.95, "top_k": 20,
"enable_thinking": true
}
About CRACK
CRACK (Controlled Refusal Ablation via Calibrated Knockouts) removes safety-refusal behavior at the weight level by projecting refusal directions out of the residual-stream writer matrices, with strengths calibrated to preserve reasoning quality and coherence.
Support dealignai
All models are built from original research and released free.
Support us on Ko-fi — membership gets early access and extras.
Ko-fi · X @dealignai · dealign.ai
See our research: Safety Generalization in Frontier Models

Disclaimer
This model has had its safety-refusal behavior removed for research purposes. It will follow instructions across all categories without refusing. You are solely responsible for how you use it and for complying with all applicable laws. Published for AI-safety research and authorized security testing.
- Downloads last month
- 933
Quantized
Model tree for dealignai/Qwen3.6-35B-A3B-MXFP8-CRACK-MTP
Base model
Qwen/Qwen3.6-35B-A3B