Qwen 3.6 35B-A3B — MXFP8 CRACK + d3 MTP

CRACK abliterated · MXFP8 (8-bit microscaling) · d3 MTP self-speculative (1.50× faster) · Vision + Video · Reasoning toggle · 35 GB

Ko-fi


What Is This?

This is Qwen 3.6 35B-A3B — a vision-language model (Mixture-of-Experts (256 routed, 10 active) hybrid SSM + full attention, 40 layers, native image + video understanding) that has been:

  1. CRACK abliterated — refusal behavior removed at the weight level, so it complies across task categories instead of refusing, while keeping its knowledge, reasoning, and vision intact.
  2. MXFP8 (8-bit microscaling) quantized for MLX on Apple Silicon — 35 GB.
  3. MTP-preserved — the native multi-token-prediction head is kept and abliterated too, so d3 self-speculative decoding works (~1.50× faster) on an MTP-aware runtime (vMLX).

Vision and video processing are fully preserved.

Results

Evaluated through the vMLX inference engine. HarmBench scored with a strict classifier (rejects loops, empty/template dumps, and thinking-trace leakage). MMLU is the standard 57-subject multiple-choice benchmark.

Metric Result
HarmBench-320 (compliance / ASR) 99.4% (318/320)
MMLU (57-subject) 84.6%
d3 MTP speedup 1.50× vs autoregressive

Abliteration preserves the model's knowledge and reasoning — it stays coherent in both direct and reasoning modes.

Features

  • Vision + videoimage-text-to-text, native frame/video understanding preserved.
  • d3 MTP speculative decoding — native MTP head preserved and abliterated → ~1.50× faster generation on an MTP-aware runtime.
  • Reasoning toggleenable_thinking=True (default, full chain-of-thought) or enable_thinking=False (direct answers).

Usage

Run with vMLX (recommended — supports VL + video + native MTP) or an MLX runtime with Qwen 3.6 support.

Recommended sampling (from the model's generation_config): temperature 1.0, top_p 0.95, top_k 20.

# vMLX OpenAI-compatible endpoint
# POST /v1/chat/completions
{
  "model": "dealignai/Qwen3.6-35B-A3B-MXFP8-CRACK-MTP",
  "messages": [{"role": "user", "content": "..."}],
  "temperature": 1.0, "top_p": 0.95, "top_k": 20,
  "enable_thinking": true
}

About CRACK

CRACK (Controlled Refusal Ablation via Calibrated Knockouts) removes safety-refusal behavior at the weight level by projecting refusal directions out of the residual-stream writer matrices, with strengths calibrated to preserve reasoning quality and coherence.

Support dealignai

All models are built from original research and released free.

Support us on Ko-fi — membership gets early access and extras.

Ko-fi · X @dealignai · dealign.ai

See our research: Safety Generalization in Frontier Models

dealign.ai

Disclaimer

This model has had its safety-refusal behavior removed for research purposes. It will follow instructions across all categories without refusing. You are solely responsible for how you use it and for complying with all applicable laws. Published for AI-safety research and authorized security testing.

Downloads last month
933
Safetensors
Model size
10B params
Tensor type
U32
·
U8
·
F16
·
MLX
Hardware compatibility
Log In to add your hardware

Quantized

Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for dealignai/Qwen3.6-35B-A3B-MXFP8-CRACK-MTP

Finetuned
(135)
this model

Collection including dealignai/Qwen3.6-35B-A3B-MXFP8-CRACK-MTP