Exploring the basic depths of LLMs and VLMs, A review of LLM and VLM content. I have been working on the basics, but now is a good time to explore the depths and refresh.
Here is a list of resources I have gone through:
The top 3 gets us up to date in data manipulation, basic architecture of attention, and tokenisation.
- makemore by Andrej Karpathy
- minbpe by karpathy
- attention? attention! Lilian Weng
Then we get into actually training a model. Inccluding llm.c, which is worth the deepdive
- gpt-2 again by karpathy
- llama3 from scratch by naklecha
- llm training in simple, raw by c/cuda karpathy
And some extra bits we can get into once the basics is complete, esepcially surrounding modern techniques, such as Q/Lora, quantisation, evals, MoE, ViTs, and Flash Attention.
- decoding strategies in large language models mlabonne
- how to make llms go fast by vgel
- a visual guide to quantization maarten
- the novice’s llm training guide by alpin
- a survey on evaluation of large language models paper
- mixture of experts explained huggingface
Some notes on ViTs, CLIP, and Paligemma
- vision transformer by aman-arora
- clip, siglip and paligemma by umar-jamil
Some further extra reading, (less truly evergreen, but still useful)
- extending the RoPE by eleutherai
- Flash attention by Alesia Gorkic