Post
21
nevmenandr/char-based-lstm-russian-poetry-pasternak
š§ LSTM Language Model Visualization: A Deep Dive into Char-RNN
š Model Architecture at a Glance
- Model Type: 5-layer LSTM
- Hidden Size: 512
- Vocabulary: 137 characters
- Sequence Length: 50
- Total Parameters: ~9.8 million
- Training: 50 epochs, 10,750 iterations
- Final Validation Loss: 1.1266
- The model learned to generate Pasternak-style poetry - pretty impressive for a char-rnn!
šØ The Beautiful Mess
Check out this heatmap visualization - it's like a Persian carpet! š āØ
- Each gate has its own patterns:
- Input Gate: Controls what new info enters the cell
- Forget Gate: Decides what to discard
- Cell Gate: Creates new candidate values
- Output Gate: Determines what to output
- The weights show beautiful structured patterns - different gates learned distinct strategies for processing
text.https://huggingface.co/papers/2306.02771
š§ LSTM Language Model Visualization: A Deep Dive into Char-RNN
š Model Architecture at a Glance
- Model Type: 5-layer LSTM
- Hidden Size: 512
- Vocabulary: 137 characters
- Sequence Length: 50
- Total Parameters: ~9.8 million
- Training: 50 epochs, 10,750 iterations
- Final Validation Loss: 1.1266
- The model learned to generate Pasternak-style poetry - pretty impressive for a char-rnn!
šØ The Beautiful Mess
Check out this heatmap visualization - it's like a Persian carpet! š āØ
- Each gate has its own patterns:
- Input Gate: Controls what new info enters the cell
- Forget Gate: Decides what to discard
- Cell Gate: Creates new candidate values
- Output Gate: Determines what to output
- The weights show beautiful structured patterns - different gates learned distinct strategies for processing
text.https://huggingface.co/papers/2306.02771