princeton-nlp/llama3-ultrafeedback
Viewer • Updated • 61.8k • 754 • 18
This model is a fine-tuned version of meta-llama/Meta-Llama-3-8B-Instruct on the princeton-nlp/llama3-ultrafeedback dataset. It achieves the following results on the evaluation set:
More information needed
More information needed
More information needed
The following hyperparameters were used during training:
| Training Loss | Epoch | Step | Validation Loss | Rewards/chosen | Rewards/rejected | Rewards/accuracies | Rewards/margins | Logps/rejected | Logps/chosen | Logits/rejected | Logits/chosen |
|---|---|---|---|---|---|---|---|---|---|---|---|
| 1.2439 | 0.8550 | 400 | 1.2415 | -0.3366 | -0.4015 | 0.5874 | 0.0649 | -0.4015 | -0.3366 | 0.0060 | 0.0153 |
Base model
meta-llama/Meta-Llama-3-8B-Instruct