socrates-state-classifier-qwen3.5-lora

A Chinese-text Socratic-teaching state classifier — a fast, deterministic drop-in for the LLM-consultant call in the KELE pipeline.

Given a student/teacher dialogue turn, it predicts the active pedagogical state (one of the SocRule strategies, e.g. c16, d33) that the teacher should occupy next. In the KELE reproduction study it replaces a per-turn LLM consultant call with a single forward pass, so a full kele.py evaluation runs without an external consultant model. This is a LoRA fine-tune of Qwen/Qwen3.5-0.8B-Base with the adapter merged into the base weights — the repo is a plain AutoModelForSequenceClassification checkpoint, no PEFT required to load it.

Built as part of the KELE reproduction and extension study — CSEN 346 (Natural Language Processing), Santa Clara University, 2026.


Model Summary

Property Value
Base model Qwen/Qwen3.5-0.8B-Base (Apache-2.0)
Task Single-turn dialogue state classification
Language Chinese (Simplified)
Classes 35 state labels (a0–e34); 34 active in the test set (a0 is unused → 34 SocRule strategies across 5 stages)
Method LoRA (r=8, α=16, auto-selected target modules), merged
Trainable params ~2.05M (0.24% of 855M)
Precision bf16 autocast over fp32 master weights
Framework SocRule (KELE, Peng et al. 2025)
Training data SocratDataset train split

State Taxonomy

The label space follows the SocRule schema — 34 teaching strategies grouped into five monotonic pedagogical stages (a → b → c → d → e):

Stage Code range Description
a — Initiation a1 Student poses the question
b — Concept Probing b2–b7 Teacher probes prior knowledge
c — Inductive Reasoning c8–c29 Core teaching stage; may repeat
d — Answer Derivation d30–d33 Guide the student to the answer
e — Summary e34 Teacher summarises; dialogue ends

(a0 is a reserved slot with no examples in the corpus.)


Evaluation

Held-out test split, 8526 turns (the SocratDataset test split — separate from the 5% in-domain hold-out used during training):

  • Overall state accuracy: 69.77%
  • Turn-weighted stage accuracy: 90.24%
Stage Stage accuracy (via state prediction)
a 100.00%
b 93.51%
c 87.04%
d 78.85%
e 95.81%

Per-state caveat. Accuracy is strong on well-represented states and drops sharply on the long tail of rare states (those with < 30 test turns), which the classifier rarely emits — a class-imbalance artifact of SocratDataset's natural state distribution, not of the architecture. Full per-state numbers are in eval_results.json. For a tail-balanced variant, see scripts/train_state_classifier_34way_balanced.py in the project repo.


How to Use

from transformers import AutoModelForSequenceClassification, AutoTokenizer

tok = AutoTokenizer.from_pretrained("ulises-c/socrates-state-classifier-qwen3.5-lora")
model = AutoModelForSequenceClassification.from_pretrained("ulises-c/socrates-state-classifier-qwen3.5-lora")

logits = model(**tok("学生: 什么是化学键?", return_tensors="pt")).logits
pred = model.config.id2label[int(logits.argmax(-1))]   # e.g. "b4"

As the consultant inside a KELE evaluation:

uv run python kele.py --bert-consultant ulises-c/socrates-state-classifier-qwen3.5-lora   # ... + your usual eval flags

Training Procedure

uv run python scripts/train_state_classifier_34way.py --model-id Qwen/Qwen3.5-0.8B-Base --lora --lora-r 8 --lora-alpha 16 --batch_size 8 --bf16-autocast
  • Data: SocratDataset train split (77,258 labeled turns), with 5% held out for in-domain eval (≈73.4K train / 3.9K eval); seed=42; 5 epochs.
  • Hardware / speed: trained on a single NVIDIA RTX 4000 Ada (20 GB) in 3.3 h. Qwen3.5-0.8B is 75% gated-DeltaNet linear attention, so flash-linear-attention is installed to engage its Triton GDN kernels (2.8× over the pure-PyTorch fallback); combined with bf16 autocast and no gradient-checkpointing (the run fits in ~12.6 GB) this is ~4× faster than the original fp32 config.
  • The full training loop is scripts/train_state_classifier_34way.py in the project repo.

Limitations

  • Chinese-only, single-domain (elementary-school science / KELE SocratDataset); no out-of-distribution or cross-lingual validation.
  • Long-tail blind spots: rare states are under-predicted (see the per-state caveat above).
  • Trained and evaluated on SocratDataset, which has documented benchmark contamination for generative SocratTeachLLM; this is a discriminative state classifier, but the same single-corpus caveat applies.

Citation

If you use this model, cite the original KELE paper and this checkpoint:

@inproceedings{peng-etal-2025-kele,
  title     = {{KELE}: A Multi-Agent Framework for Structured {S}ocratic Teaching with Large Language Models},
  author    = {Peng, Yuan and others},
  booktitle = {Findings of the Association for Computational Linguistics: EMNLP 2025},
  year      = {2025},
  url       = {https://aclanthology.org/2025.findings-emnlp.888/}
}
@misc{chavarria-khan-2026-socrates-state-classifier,
  title  = {Socrates State Classifier ({Q}wen3.5-0.8{B} {L}o{RA}): A Socratic Dialogue State Classifier for the {KELE} Pipeline},
  author = {Chavarria, Ulises and Khan, Maximilian},
  year   = {2026},
  url    = {https://huggingface.co/ulises-c/socrates-state-classifier-qwen3.5-lora}
}

Related Resources

Resource Link
KELE paper (EMNLP 2025 Findings) https://aclanthology.org/2025.findings-emnlp.888/
KELE GitHub repository https://github.com/yuanpan1020/KELE
Base model — Qwen3.5-0.8B-Base https://huggingface.co/Qwen/Qwen3.5-0.8B-Base
SocratTeachLLM (original) https://huggingface.co/yuanpan/SocratTeachLLM
Training dataset — SocratDataset https://huggingface.co/datasets/ulises-c/SocratDataset
Clean-probe synthetic eval set https://huggingface.co/datasets/ulises-c/SocratDataset-SYNTHETIC
Training + evaluation code https://github.com/ulises-c/csen-346

License

Apache-2.0, inherited from the Qwen/Qwen3.5-0.8B-Base base. Use must also cite the KELE paper, whose SocRule schema defines the label space.

Downloads last month
13
Safetensors
Model size
0.9B params
Tensor type
F32
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for ulises-c/socrates-state-classifier-qwen3.5-lora

Adapter
(6)
this model

Dataset used to train ulises-c/socrates-state-classifier-qwen3.5-lora

Evaluation results

  • state_accuracy on SocratDataset (test split)
    self-reported
    69.770
  • stage_accuracy on SocratDataset (test split)
    self-reported
    90.240