Depth is All You Need

A short post for the essential reading exploring the basics of LLMs and VLMs

January 3, 2025 · 1 min read

Exploring the basic depths of LLMs and VLMs, A review of LLM and VLM content. I have been working on the basics, but now is a good time to explore the depths and refresh.

Here is a list of resources I have gone through:

The top 3 gets us up to date in data manipulation, basic architecture of attention, and tokenisation.

makemore by Andrej Karpathy
minbpe by karpathy
attention? attention! Lilian Weng

Then we get into actually training a model. Inccluding llm.c, which is worth the deepdive

gpt-2 again by karpathy
llama3 from scratch by naklecha
llm training in simple, raw by c/cuda karpathy

And some extra bits we can get into once the basics is complete, esepcially surrounding modern techniques, such as Q/Lora, quantisation, evals, MoE, ViTs, and Flash Attention.

decoding strategies in large language models mlabonne
how to make llms go fast by vgel
a visual guide to quantization maarten
the novice’s llm training guide by alpin
a survey on evaluation of large language models paper
mixture of experts explained huggingface

Some notes on ViTs, CLIP, and Paligemma

vision transformer by aman-arora
clip, siglip and paligemma by umar-jamil

Some further extra reading, (less truly evergreen, but still useful)

extending the RoPE by eleutherai
Flash attention by Alesia Gorkic

←

Modelling in 2024

Depth is All You Need part 2

→