Built for vMLX — the MLX inferencer with VL + video, KV-cache quantization, prefix-cache reuse, agentic tool calling, and native MTP speculative decoding.
_{Free for macOS · vmlx.net}

Qwen 3.6 35B-A3B — MXFP8 CRACK + d3 MTP

CRACK abliterated · MXFP8 (8-bit microscaling) · d3 MTP self-speculative (1.50× faster) · Vision + Video · Reasoning toggle · 35 GB

What Is This?

This is Qwen 3.6 35B-A3B — a vision-language model (Mixture-of-Experts (256 routed, 10 active) hybrid SSM + full attention, 40 layers, native image + video understanding) that has been:

CRACK abliterated — refusal behavior removed at the weight level, so it complies across task categories instead of refusing, while keeping its knowledge, reasoning, and vision intact.
MXFP8 (8-bit microscaling) quantized for MLX on Apple Silicon — 35 GB.
MTP-preserved — the native multi-token-prediction head is kept and abliterated too, so d3 self-speculative decoding works (~1.50× faster) on an MTP-aware runtime (vMLX).

Vision and video processing are fully preserved.

Results

Evaluated through the vMLX inference engine. HarmBench scored with a strict classifier (rejects loops, empty/template dumps, and thinking-trace leakage). MMLU is the standard 57-subject multiple-choice benchmark.

Metric	Result
HarmBench-320 (compliance / ASR)	99.4% (318/320)
MMLU (57-subject)	84.6%
d3 MTP speedup	1.50× vs autoregressive

Abliteration preserves the model's knowledge and reasoning — it stays coherent in both direct and reasoning modes.

Features

Vision + video — image-text-to-text, native frame/video understanding preserved.
d3 MTP speculative decoding — native MTP head preserved and abliterated → ~1.50× faster generation on an MTP-aware runtime.
Reasoning toggle — enable_thinking=True (default, full chain-of-thought) or enable_thinking=False (direct answers).

Usage

Run with vMLX (recommended — supports VL + video + native MTP) or an MLX runtime with Qwen 3.6 support.

Recommended sampling (from the model's generation_config): temperature 1.0, top_p 0.95, top_k 20.

# vMLX OpenAI-compatible endpoint
# POST /v1/chat/completions
{
  "model": "dealignai/Qwen3.6-35B-A3B-MXFP8-CRACK-MTP",
  "messages": [{"role": "user", "content": "..."}],
  "temperature": 1.0, "top_p": 0.95, "top_k": 20,
  "enable_thinking": true
}

About CRACK

CRACK (Controlled Refusal Ablation via Calibrated Knockouts) removes safety-refusal behavior at the weight level by projecting refusal directions out of the residual-stream writer matrices, with strengths calibrated to preserve reasoning quality and coherence.

Support dealignai

All models are built from original research and released free.

Support us on Ko-fi — membership gets early access and extras.

Ko-fi · X @dealignai · dealign.ai

See our research: Safety Generalization in Frontier Models

Disclaimer

This model has had its safety-refusal behavior removed for research purposes. It will follow instructions across all categories without refusing. You are solely responsible for how you use it and for complying with all applicable laws. Published for AI-safety research and authorized security testing.

Downloads last month: 933

Safetensors

Model size

10B params

Tensor type

U32

F16

MLX

Hardware compatibility

Quantized

Model tree for dealignai/Qwen3.6-35B-A3B-MXFP8-CRACK-MTP

Base model

Qwen/Qwen3.6-35B-A3B

Finetuned

(135)

this model

Collection including dealignai/Qwen3.6-35B-A3B-MXFP8-CRACK-MTP

Qwen 3.6 CRACK — MXFP + d3 MTP

Collection

Abliterated Qwen 3.6 27B + 35B-A3B (MXFP4/8) with native d3 MTP, vision + video. • 4 items • Updated 7 days ago