SQLite Internals

There's a comfort with being able to read through a spec or a code repository and know that you've covered the full breadth of a tool. That's why I love SQLite's 130KLOC core code base. It's something that you can read through in a long weekend if, ya know, that's what you do on weekends. post

Varints use a simple trick. The high bit is used as a flag to indicate if there are more bytes to be read and the other 7 bits are our actual data. To represent 1,000, we start with its binary representation split into 7 bit chunks.

SQLite groups rows together into 4KB chunks called "pages". Why 4KB? That's what file systems typically use as their page size so keeping everything aligned reduces page fetches from disk. Disks are usually the slowest part of a database so limiting page fetches can have a huge performance win.

SQLite is structured as a b-tree, which is a data structure where each node can point to two or more child nodes and the values in these child nodes are all sorted. There are a ton of different variants of b-trees but the one that SQLite uses is called a b+tree.

At a tree depth of 4, we can hold about 500³ leaf pages, or about 476GB. That means we only need to read 4 pages to find a given record—even in a huge database.

I'll be writing more on SQLite internals in future posts—from rollback journals to write-ahead logs to the SQLite virtual machine.