Model Card for t0-s1-3B

t0-s1-3B is a fine-tuned language model developed at the Alan Turing Institute as part of the t0 research initiative. It is a replication of the S1 workQwen2.5-3B-Instruct fine-tuned on the s1K dataset using supervised fine-tuning (SFT) via TRL. The s1K dataset is a curated set of 1,000 high-quality reasoning traces designed to elicit test-time scaling behaviour in smaller language models.

Model Details

Model Description

  • Developed by: t0 team at the Alan Turing Institute
  • Authors: Ryan Sze-Yin Chan, Federico Nanni, Tomas Lazauskas, Rosie Wood, Penelope Yong, Lionel Tarassenko, Mark Girolami, James Geddes, Andrew Duncan
  • Model type: Text Generation (causal language model)
  • Language(s) (NLP): English
  • License: Apache 2.0
  • Finetuned from model: Qwen2.5-3B-Instruct

Model Sources

Uses

Direct Use

This model can be used for text generation and reasoning tasks. It is intended to explore test-time scaling behaviour in small language models, following the methodology of the S1 paper.

Downstream Use

The model can be used as a reasoning-capable base for downstream tasks or plugged into larger pipelines. It is part of the broader t0 research initiative into lean yet capable LLMs.

Out-of-Scope Use

This model is not intended for:

  • Medical diagnosis or clinical decision-making without appropriate oversight
  • Use cases outside of English-language text
  • Production deployments without further evaluation and safety review

Bias, Risks, and Limitations

The model inherits biases from its base model (Qwen2.5-3B-Instruct) and from the s1K training dataset. The s1K dataset is small and curated for reasoning; performance on other tasks may vary.

Recommendations

Users (both direct and downstream) should be made aware of the risks, biases and limitations of the model. Outputs should be reviewed appropriately for the intended use case.

How to Get Started with the Model

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("text-generation", model="alan-turing-institute/t0-s1-3B")
messages = [
    {"role": "user", "content": "Who are you?"},
]
pipe(messages)
# Load model directly
from transformers import AutoTokenizer, AutoModelForCausalLM

tokenizer = AutoTokenizer.from_pretrained("alan-turing-institute/t0-s1-3B")
model = AutoModelForCausalLM.from_pretrained("alan-turing-institute/t0-s1-3B")
messages = [
    {"role": "user", "content": "Who are you?"},
]
inputs = tokenizer.apply_chat_template(
    messages,
    add_generation_prompt=True,
    tokenize=True,
    return_dict=True,
    return_tensors="pt",
).to(model.device)

outputs = model.generate(**inputs, max_new_tokens=40)
print(tokenizer.decode(outputs[0][inputs["input_ids"].shape[-1]:]))

For serving with vLLM:

pip install vllm
vllm serve "alan-turing-institute/t0-s1-3B"

See the t0-1 repository for more details.

Training Details

Training Data

This model was fine-tuned on the s1K dataset — a curated set of 1,000 reasoning traces used in the original S1 work. See the S1 paper and the t0-1 repository for more information on the training data and procedure.

Training Procedure

Supervised fine-tuning (SFT) via TRL, replicating the methodology described in the S1 paper. See the t0-1 repository for full details on hyperparameters and evaluation.

Citation

If you use this model, please cite both the t0 paper and the original S1 work:

BibTeX:

@article{chan2025retrieval,
  title={Retrieval-augmented reasoning with lean language models},
  author={Chan, Ryan Sze-Yin and Nanni, Federico and Lazauskas, Tomas and Wood, Rosie and Yong, Penelope and Tarassenko, Lionel and Girolami, Mark and Geddes, James and Duncan, Andrew},
  journal={arXiv preprint arXiv:2508.11386},
  year={2025}
}

@article{muennighoff2025s1,
  title={s1: Simple Test-Time Scaling},
  author={Muennighoff, Niklas and Yang, Zitong and Shi, Weijia and Li, Xiang Lisa and Fei-Fei, Li and Hajishirzi, Hannaneh and Zettlemoyer, Luke and Liang, Percy and Candès, Emmanuel and Hashimoto, Tatsunori},
  journal={arXiv preprint arXiv:2501.19393},
  year={2025}
}
Downloads last month
18
Safetensors
Model size
3B params
Tensor type
F32
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for alan-turing-institute/t0-s1-3B

Quantizations
1 model

Collection including alan-turing-institute/t0-s1-3B

Papers for alan-turing-institute/t0-s1-3B