TinyStories 25M LLM Training on 8GB VRAM

by /u/tevlon ·

Test the train-a-model-from-scratch repo on an 8GB GPU to build a 25M TinyStories model; compare performance with mHC, BitNet, TurboQuant, and MTP.

What to do now

Clone the repo, set up a CUDA 8GB environment, and run the training script to generate a 25M TinyStories model; benchmark against other small models.

Summary

The author shares a GitHub repository that trains a 25‑million‑parameter TinyStories model from scratch using only 8 GB of VRAM. The repo demonstrates that the mHC model is too small to train effectively, while BitNet suffers from slow training with no memory gain. TurboQuant is deemed unnecessary for this setup, and MTP works but adds training overhead. The resulting TinyStories 25M model is published on HuggingFace at https://huggingface.co/epoyraz/tinystories-25m. The post also notes that the training pipeline can be run on a single GPU, making it accessible for developers with modest hardware. This provides a practical example of how to build a small LLM without large GPU resources.

The repository includes scripts for data preprocessing, model training, and evaluation, all designed to run on an 8 GB GPU. The author highlights that the training process is feasible on consumer‑grade hardware, opening up LLM experimentation to a broader audience.

Overall, the article offers a concrete, step‑by‑step guide for developers interested in low‑resource LLM training, showcasing the trade‑offs between different small‑model architectures and the practical feasibility of training on limited GPU memory.

Key changes

Provides a repo to train TinyStories 25M from scratch on 8GB VRAM
Shows mHC model is too small for training
Demonstrates BitNet is slow with no memory gain
TurboQuant is unnecessary for this setup
MTP works but slows training
Model available on HuggingFace
Training pipeline runs on a single GPU

Affects

internal

Story evolution

Customer impact

Analyzing matches…

Ask about this story

Impact on an agency? Which customers? Compare historically Risks of waiting