Instructions to use alan-turing-institute/t0-s1-3B with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use alan-turing-institute/t0-s1-3B with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("text-generation", model="alan-turing-institute/t0-s1-3B") messages = [ {"role": "user", "content": "Who are you?"}, ] pipe(messages)# Load model directly from transformers import AutoTokenizer, AutoModelForCausalLM tokenizer = AutoTokenizer.from_pretrained("alan-turing-institute/t0-s1-3B") model = AutoModelForCausalLM.from_pretrained("alan-turing-institute/t0-s1-3B") messages = [ {"role": "user", "content": "Who are you?"}, ] inputs = tokenizer.apply_chat_template( messages, add_generation_prompt=True, tokenize=True, return_dict=True, return_tensors="pt", ).to(model.device) outputs = model.generate(**inputs, max_new_tokens=40) print(tokenizer.decode(outputs[0][inputs["input_ids"].shape[-1]:])) - Notebooks
- Google Colab
- Kaggle
- Local Apps
- vLLM
How to use alan-turing-institute/t0-s1-3B with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "alan-turing-institute/t0-s1-3B" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "alan-turing-institute/t0-s1-3B", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'Use Docker
docker model run hf.co/alan-turing-institute/t0-s1-3B
- SGLang
How to use alan-turing-institute/t0-s1-3B with SGLang:
Install from pip and serve model
# Install SGLang from pip: pip install sglang # Start the SGLang server: python3 -m sglang.launch_server \ --model-path "alan-turing-institute/t0-s1-3B" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "alan-turing-institute/t0-s1-3B", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'Use Docker images
docker run --gpus all \ --shm-size 32g \ -p 30000:30000 \ -v ~/.cache/huggingface:/root/.cache/huggingface \ --env "HF_TOKEN=<secret>" \ --ipc=host \ lmsysorg/sglang:latest \ python3 -m sglang.launch_server \ --model-path "alan-turing-institute/t0-s1-3B" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "alan-turing-institute/t0-s1-3B", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }' - Docker Model Runner
How to use alan-turing-institute/t0-s1-3B with Docker Model Runner:
docker model run hf.co/alan-turing-institute/t0-s1-3B
Model Card for t0-s1-3B
t0-s1-3B is a fine-tuned language model developed at the Alan Turing Institute as part of the t0 research initiative. It is a replication of the S1 work — Qwen2.5-3B-Instruct fine-tuned on the s1K dataset using supervised fine-tuning (SFT) via TRL. The s1K dataset is a curated set of 1,000 high-quality reasoning traces designed to elicit test-time scaling behaviour in smaller language models.
Model Details
Model Description
- Developed by: t0 team at the Alan Turing Institute
- Authors: Ryan Sze-Yin Chan, Federico Nanni, Tomas Lazauskas, Rosie Wood, Penelope Yong, Lionel Tarassenko, Mark Girolami, James Geddes, Andrew Duncan
- Model type: Text Generation (causal language model)
- Language(s) (NLP): English
- License: Apache 2.0
- Finetuned from model: Qwen2.5-3B-Instruct
Model Sources
- Repository: https://github.com/alan-turing-institute/t0-1
- Paper (t0): https://arxiv.org/abs/2508.11386
- Paper (S1 — original work being replicated): https://arxiv.org/abs/2501.19393
Uses
Direct Use
This model can be used for text generation and reasoning tasks. It is intended to explore test-time scaling behaviour in small language models, following the methodology of the S1 paper.
Downstream Use
The model can be used as a reasoning-capable base for downstream tasks or plugged into larger pipelines. It is part of the broader t0 research initiative into lean yet capable LLMs.
Out-of-Scope Use
This model is not intended for:
- Medical diagnosis or clinical decision-making without appropriate oversight
- Use cases outside of English-language text
- Production deployments without further evaluation and safety review
Bias, Risks, and Limitations
The model inherits biases from its base model (Qwen2.5-3B-Instruct) and from the s1K training dataset. The s1K dataset is small and curated for reasoning; performance on other tasks may vary.
Recommendations
Users (both direct and downstream) should be made aware of the risks, biases and limitations of the model. Outputs should be reviewed appropriately for the intended use case.
How to Get Started with the Model
# Use a pipeline as a high-level helper
from transformers import pipeline
pipe = pipeline("text-generation", model="alan-turing-institute/t0-s1-3B")
messages = [
{"role": "user", "content": "Who are you?"},
]
pipe(messages)
# Load model directly
from transformers import AutoTokenizer, AutoModelForCausalLM
tokenizer = AutoTokenizer.from_pretrained("alan-turing-institute/t0-s1-3B")
model = AutoModelForCausalLM.from_pretrained("alan-turing-institute/t0-s1-3B")
messages = [
{"role": "user", "content": "Who are you?"},
]
inputs = tokenizer.apply_chat_template(
messages,
add_generation_prompt=True,
tokenize=True,
return_dict=True,
return_tensors="pt",
).to(model.device)
outputs = model.generate(**inputs, max_new_tokens=40)
print(tokenizer.decode(outputs[0][inputs["input_ids"].shape[-1]:]))
For serving with vLLM:
pip install vllm
vllm serve "alan-turing-institute/t0-s1-3B"
See the t0-1 repository for more details.
Training Details
Training Data
This model was fine-tuned on the s1K dataset — a curated set of 1,000 reasoning traces used in the original S1 work. See the S1 paper and the t0-1 repository for more information on the training data and procedure.
Training Procedure
Supervised fine-tuning (SFT) via TRL, replicating the methodology described in the S1 paper. See the t0-1 repository for full details on hyperparameters and evaluation.
Citation
If you use this model, please cite both the t0 paper and the original S1 work:
BibTeX:
@article{chan2025retrieval,
title={Retrieval-augmented reasoning with lean language models},
author={Chan, Ryan Sze-Yin and Nanni, Federico and Lazauskas, Tomas and Wood, Rosie and Yong, Penelope and Tarassenko, Lionel and Girolami, Mark and Geddes, James and Duncan, Andrew},
journal={arXiv preprint arXiv:2508.11386},
year={2025}
}
@article{muennighoff2025s1,
title={s1: Simple Test-Time Scaling},
author={Muennighoff, Niklas and Yang, Zitong and Shi, Weijia and Li, Xiang Lisa and Fei-Fei, Li and Hajishirzi, Hannaneh and Zettlemoyer, Luke and Liang, Percy and Candès, Emmanuel and Hashimoto, Tatsunori},
journal={arXiv preprint arXiv:2501.19393},
year={2025}
}
- Downloads last month
- 18