Building Zion — Rust + Ratatui + Ollama — Bench

What It Is

Zion is a terminal app — a second brain that lives in the shell. You talk to it, it searches your notes semantically, generates new ones, and lets you write without leaving the terminal. No Electron, no web server, no cloud.

Stack: Rust for the runtime, ratatui for the TUI, Ollama as the local LLM backend running llama3.1:8b, and nomic-embed-text for semantic search.

Why Rust

A TUI that streams LLM tokens needs to handle async I/O, render at ~60fps, and stay responsive while waiting on HTTP. Rust’s async story (tokio) handles this cleanly. The alternative — Python + a TUI library — works but GIL contention and startup time both matter when the thing runs on every keypress.

The borrow checker is genuinely painful for UI state (shared mutable state is its natural enemy), but the payoff is zero-cost async and a binary that starts in milliseconds.

Ratatui

Ratatui is an immediate-mode TUI framework. Every frame, you describe what the screen should look like — layout constraints, widgets, styles — and it diffs against the terminal buffer and flushes only the changes.

let chunks = Layout::default()
    .direction(Direction::Vertical)
    .constraints([
        Constraint::Min(3),    // chat area
        Constraint::Length(3), // input bar
        Constraint::Length(1), // status line
    ])
    .split(frame.area());

The layout engine is constraint-based: Min, Max, Length, Percentage, Ratio. Nest layouts to build columns inside rows. It composes cleanly.

The hard part isn’t layout — it’s text. Ratatui works in Line and Span units (styled text runs), not raw strings. Markdown rendering, word-wrap, and scroll position all have to be computed manually and cached, or the app stutters on large chat histories.

Streaming Tokens

Ollama’s generate API streams newline-delimited JSON:

{"response": "The", "done": false}
{"response": " cat", "done": false}
...
{"response": "", "done": true}

Each chunk arrives over HTTP as the model generates. The Rust side reads this with reqwest streaming + futures::StreamExt, and sends each token over a tokio::sync::mpsc channel to the UI loop:

while let Some(chunk) = stream.next().await {
    let text = chunk?.response;
    tx.send(AppEvent::Token(text)).await?;
}
tx.send(AppEvent::Done).await?;

The UI thread receives tokens, appends them to the current assistant message, marks the chat dirty, and re-renders. The result: text appears word by word in the terminal, same as any web chat UI — but entirely local.

Semantic Search

Before sending a query to the LLM, Zion retrieves the most relevant notes. Each note is embedded at index time using nomic-embed-text (also running via Ollama), and embeddings are cached to disk as JSON.

At query time: embed the user’s message → cosine similarity against all cached embeddings → take top-K → inject into the LLM prompt as context. Retrieval-augmented generation, local, no vector database needed.

fn cosine(a: &[f32], b: &[f32]) -> f32 {
    let dot: f32 = a.iter().zip(b).map(|(x, y)| x * y).sum();
    let na: f32 = a.iter().map(|x| x * x).sum::<f32>().sqrt();
    let nb: f32 = b.iter().map(|x| x * x).sum::<f32>().sqrt();
    dot / (na * nb)
}

At ~110 notes, a linear scan over embeddings is fast enough — sub-millisecond. No HNSW index needed yet.

App State

The App struct owns all mutable state. The event loop is a tight loop that:

Polls crossterm for keyboard events
Drains the mpsc channel for LLM tokens and async results
Mutates App state
Calls terminal.draw(|frame| ui(frame, &app))

No reactive framework, no signals — just a state machine and a render function. Simple to reason about, easy to debug. The tradeoff is that complex interactions (pending confirms, modal dialogs, writing mode) accumulate as boolean flags and Option fields on App, which gets unwieldy.

What’s Hard

Shared state across async tasks. The LLM call runs on a spawned tokio task. It can’t hold a reference to App — the borrow checker won’t allow it. The solution: pass only what the task needs by value, and communicate results back via channel. Clean, but requires discipline to not reach for Arc<Mutex<App>> as a shortcut.

Text layout. Ratatui doesn’t know about word-wrap at the semantic level — it wraps bytes. Computing visual line positions, cursor coordinates, and scroll offsets for the writing mode editor required essentially reimplementing a small text layout engine inside the app.

Terminal repainting. On fast machines, the render loop is fast enough that partial token streams look smooth. On slower connections, there’s visible stuttering. The fix was to mark chat as dirty only when new content arrives, and skip repaints otherwise.