Changelog: New Images, Search

Two pieces of infrastructure went onto the site this weekend. Both make a similar choice—static and deterministic even/over dynamic and server-bound—and both turned out smaller than I expected. Quick tour of each?

(Not worth mentioning but nice: new sidenotes; KaTeX for math in posts.)

The new feature images

Stock photography is bad. AI illustration looks like everyone else's AI illustration. Much as I would like to commission art for a personal blog, that would be unhinged. So with Claude's help I built a small Cloudflare Worker that takes a post slug^[1], hashes it, and renders a deterministic pixel-art SVG from the bits.

The system has roughly the shape of a tiny pattern language. Each image is two to four non-overlapping panels, laid out by binary space partitioning^[2] with a 24px outer margin and 16px gaps between them. Each panel runs one of ten strategies — grid, quilt, checker, strata, columns, field, plus the more characterful gravity, chaotic, scatter, clusters — and draws from a vocabulary of seven atomic marks:

About forty percent of panels run a secondary pass with a different mark on a palette subset—producing a sense of two systems negotiating inside one rectangle.

The math underneath is just bit reading. SHA-256^[3] of the slug gives 256 bits of pseudo-random output, and the renderer doesn't really generate from that seed so much as address into it. Every design decision—strategy, mark, palette pick, density, cell size, which cells get filled—pulls a few bits off the stream, treats them as a number, and uses that number to look up an answer from a fixed list of options. Read three bits, get a number from 0 to 7, pick one of eight strategies. Read five bits, get a number from 0 to 31, pick a palette entry. This is what makes it deterministic.

For example, when a post has a particular tag, the math weights the tag color eight-to-one, the two color neighbors five each, with other brand colors and a muted accent once each — total weight twenty. So picking a color is "read five bits, take mod twenty^[4], see which bucket the integer lands in": $n \in [0, 32)$, index by $n \bmod 20$ into a length-20 array. The probabilities fall out as $P_\text{section} = 8/20$, $P_\text{neighbor} = 5/20$ for each of the two neighbors, and $1/20$ each for the distant arcs and the muted accent. Over a few hundred draws across an image, the section color will usually be dominant. Per-cell decisions like "is this grid square filled" or "does this column get a stripe" are a coin flip against a density threshold that itself came from earlier reads. Read a byte^[5] $b_c \in [0, 256)$, fill cell $c$ if $b_c < \tau$ where $\tau$ is the threshold, then move on.

$$\text{slug} \xrightarrow{\text{SHA-256}} \text{seed} \xrightarrow{\text{read}} \text{choices} \xrightarrow{\text{render}} \text{SVG}$$

Same slug → same bits → same sequence of reads → byte-identical^[6] SVG. I don't want my post art to depend on which image model was state-of-the-art the month I published, and I don't want it to drift if I regenerate it next year. The inspiration is Anna Lucia, whose generative^[7] work I saw last week and felt immediately drawn to. She even does fiber art with the generative output (my Mom is a fiber artist), so I had to borrow the general feel of her work for this, while applying the general vibe of the work here.

Code is at github.com/cpj-fyi/art, MIT licensed. One Cloudflare Worker, ~1500 lines of TypeScript, no model calls, no manual design step.

The new search

Most Ghost blogs have their standard search tool built in, but it only searches titles. I wanted the full-text to be included so that folks can even find citations that are relevant to their work. So again, with Claude's help, I built full-text site search running on Pagefind. Pagefind crawls the rendered HTML of a site, builds a search index, and ships that index as a folder of static files. Searches run entirely in the browser via WebAssembly^[8]. The index is just files that live in GitHub; the query runs on the reader's machine.

The architecture: a GitHub Action runs every night, builds the Pagefind index against cpj.fyi, and pushes the result to a gh-pages^[9] branch where it gets served as static assets. The theme's JavaScript loads the Pagefind library from that index folder when someone opens the search box. When you type, the engine reads only the index chunks relevant to your query and returns ranked results.

Pagefind splits the index into small chunks keyed by the words that index them, so the browser fetches only the chunks matching what you typed. For a site my size the savings are academic, but it's why this approach scales to documentation sites with thousands of pages, and I think that's why the search feels especially snappy. Try it!

Building nightly means search is up to twenty-four hours stale, but this is fine, because I almost never post daily like I say I will, and new posts almost never get searched in their first day—readers are reading them, not looking for them. If it ever started to bother me I could trigger the workflow on publish via webhook^[10]; the cron^[11] is enough.

Pagefind is by CloudCannon, MIT licensed. The site theme is at cpj-fyi/cpj-theme — one workflow file, a small JavaScript hook, and the search box markup itself.

A slug is the URL ending that identifies a post — the-end-of-role-clarity is the slug for the post at cpj.fyi/the-end-of-role-clarity. Every post on the site has one and it's unique. ↩︎
Binary space partitioning is a recursive splitting algorithm: take a rectangle, slice it in half along an axis chosen by the hash, then do the same to each half until you have the target number of panels. Tiles cleanly, no gaps or overlaps. ↩︎
SHA-256 is a cryptographic hash function. Feed it any input—a slug, a book, a movie—and it returns a 256-bit fingerprint that's effectively unique to that input. Same input always produces the same output; trivially small input changes produce wildly different outputs. ↩︎
Mod is modular arithmetic. n mod 20 is "the remainder when n is divided by 20." If n = 35, then n mod 20 = 15. It's how you fold a big integer into a small range. ↩︎
A byte is an integer between 0 and 255, stored as eight bits. Reading "a byte" from the hash stream means consuming the next eight bits and treating them as a number in that range. ↩︎
Byte-identical means the two SVGs aren't just visually identical—every character of the file is the same. Diff them as text and you get nothing back. ↩︎
Generative art is produced by code following rules rather than drawn by hand. The artist designs the system; the system makes the pieces. ↩︎
WebAssembly (often shortened to WASM) is a way to run compiled code — written in languages like Rust or C++ — inside a browser at close to native speed. Pagefind's search engine is a WASM module that loads alongside the JavaScript on the page. ↩︎
gh-pages is a special branch on a GitHub repository — anything you push there gets auto-published as a free website at a github.io URL. A common pattern for hosting static assets that need to live on the public internet without standing up a real server. ↩︎
A webhook is the opposite trigger model from a scheduled job: instead of running on a clock, it runs the moment something happens (a post is published, a commit is pushed) by having one system call a URL on another. ↩︎
Cron is the scheduled-job model — "run this every night at 4am." ↩︎

Essays Artificial Intelligence

The new feature images

The new search

Related

The End of Role Clarity

When It Starts Feeling Like a Video Game

Super Performance