Instructions to use alan-turing-institute/t0-s1-3B with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use alan-turing-institute/t0-s1-3B with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("text-generation", model="alan-turing-institute/t0-s1-3B")
messages = [
    {"role": "user", "content": "Who are you?"},
]
pipe(messages)

# Load model directly
from transformers import AutoTokenizer, AutoModelForCausalLM

tokenizer = AutoTokenizer.from_pretrained("alan-turing-institute/t0-s1-3B")
model = AutoModelForCausalLM.from_pretrained("alan-turing-institute/t0-s1-3B")
messages = [
    {"role": "user", "content": "Who are you?"},
]
inputs = tokenizer.apply_chat_template(
	messages,
	add_generation_prompt=True,
	tokenize=True,
	return_dict=True,
	return_tensors="pt",
).to(model.device)

outputs = model.generate(**inputs, max_new_tokens=40)
print(tokenizer.decode(outputs[0][inputs["input_ids"].shape[-1]:]))

Notebooks
Google Colab
Kaggle
Local Apps

vLLM

How to use alan-turing-institute/t0-s1-3B with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "alan-turing-institute/t0-s1-3B"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "alan-turing-institute/t0-s1-3B",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker

docker model run hf.co/alan-turing-institute/t0-s1-3B

SGLang

How to use alan-turing-institute/t0-s1-3B with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "alan-turing-institute/t0-s1-3B" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "alan-turing-institute/t0-s1-3B",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "alan-turing-institute/t0-s1-3B" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "alan-turing-institute/t0-s1-3B",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Docker Model Runner
How to use alan-turing-institute/t0-s1-3B with Docker Model Runner:
```
docker model run hf.co/alan-turing-institute/t0-s1-3B
```
Browse Quantizations to use this model in llama.cpp, Ollama, LM Studio, or any compatible app.

Model Card for t0-s1-3B

t0-s1-3B is a fine-tuned language model developed at the Alan Turing Institute as part of the t0 research initiative. It is a replication of the S1 work — Qwen2.5-3B-Instruct fine-tuned on the s1K dataset using supervised fine-tuning (SFT) via TRL. The s1K dataset is a curated set of 1,000 high-quality reasoning traces designed to elicit test-time scaling behaviour in smaller language models.

Model Details

Model Description

Developed by: t0 team at the Alan Turing Institute
Authors: Ryan Sze-Yin Chan, Federico Nanni, Tomas Lazauskas, Rosie Wood, Penelope Yong, Lionel Tarassenko, Mark Girolami, James Geddes, Andrew Duncan
Model type: Text Generation (causal language model)
Language(s) (NLP): English
License: Apache 2.0
Finetuned from model: Qwen2.5-3B-Instruct

Model Sources

Repository: https://github.com/alan-turing-institute/t0-1
Paper (t0): https://arxiv.org/abs/2508.11386
Paper (S1 — original work being replicated): https://arxiv.org/abs/2501.19393

Uses

Direct Use

This model can be used for text generation and reasoning tasks. It is intended to explore test-time scaling behaviour in small language models, following the methodology of the S1 paper.

Downstream Use

The model can be used as a reasoning-capable base for downstream tasks or plugged into larger pipelines. It is part of the broader t0 research initiative into lean yet capable LLMs.

Out-of-Scope Use

This model is not intended for:

Medical diagnosis or clinical decision-making without appropriate oversight
Use cases outside of English-language text
Production deployments without further evaluation and safety review

Bias, Risks, and Limitations

The model inherits biases from its base model (Qwen2.5-3B-Instruct) and from the s1K training dataset. The s1K dataset is small and curated for reasoning; performance on other tasks may vary.

Recommendations

Users (both direct and downstream) should be made aware of the risks, biases and limitations of the model. Outputs should be reviewed appropriately for the intended use case.

How to Get Started with the Model

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("text-generation", model="alan-turing-institute/t0-s1-3B")
messages = [
    {"role": "user", "content": "Who are you?"},
]
pipe(messages)

# Load model directly
from transformers import AutoTokenizer, AutoModelForCausalLM

tokenizer = AutoTokenizer.from_pretrained("alan-turing-institute/t0-s1-3B")
model = AutoModelForCausalLM.from_pretrained("alan-turing-institute/t0-s1-3B")
messages = [
    {"role": "user", "content": "Who are you?"},
]
inputs = tokenizer.apply_chat_template(
    messages,
    add_generation_prompt=True,
    tokenize=True,
    return_dict=True,
    return_tensors="pt",
).to(model.device)

outputs = model.generate(**inputs, max_new_tokens=40)
print(tokenizer.decode(outputs[0][inputs["input_ids"].shape[-1]:]))

For serving with vLLM:

pip install vllm
vllm serve "alan-turing-institute/t0-s1-3B"

See the t0-1 repository for more details.

Training Details

Training Data

This model was fine-tuned on the s1K dataset — a curated set of 1,000 reasoning traces used in the original S1 work. See the S1 paper and the t0-1 repository for more information on the training data and procedure.

Training Procedure

Supervised fine-tuning (SFT) via TRL, replicating the methodology described in the S1 paper. See the t0-1 repository for full details on hyperparameters and evaluation.

Citation

If you use this model, please cite both the t0 paper and the original S1 work:

BibTeX:

@article{chan2025retrieval,
  title={Retrieval-augmented reasoning with lean language models},
  author={Chan, Ryan Sze-Yin and Nanni, Federico and Lazauskas, Tomas and Wood, Rosie and Yong, Penelope and Tarassenko, Lionel and Girolami, Mark and Geddes, James and Duncan, Andrew},
  journal={arXiv preprint arXiv:2508.11386},
  year={2025}
}

@article{muennighoff2025s1,
  title={s1: Simple Test-Time Scaling},
  author={Muennighoff, Niklas and Yang, Zitong and Shi, Weijia and Li, Xiang Lisa and Fei-Fei, Li and Hajishirzi, Hannaneh and Zettlemoyer, Luke and Liang, Percy and Candès, Emmanuel and Hashimoto, Tatsunori},
  journal={arXiv preprint arXiv:2501.19393},
  year={2025}
}