Semantic Search Layer for the Second-Brain Agent — Bench

What Changed

The previous version of the agent used keyword matching — query.lower() in text.lower(). Functional, but brittle. Search for “plasticity” and you miss a note that says “adaptability”. Search for “energy storage” and you miss everything filed under “battery”.

This version replaces that with a proper semantic index: nomic-embed-text embeddings via Ollama, stored in LanceDB at ~/.zion/index. The agent now finds notes by meaning, not character overlap.

The Index

def embed(text: str) -> list[float]:
    res = ollama.embeddings(model="nomic-embed-text", prompt=text[:2000])
    return res["embedding"]

def build_index():
    db = lancedb.connect(str(DB_PATH))
    records = []
    for collection in ["bench", "ideas", "signals", "library"]:
        folder = CONTENT_PATH / collection
        for f in folder.glob("*.md"):
            text = f.read_text()
            records.append({
                "slug": f.stem,
                "collection": collection,
                "vector": embed(text),
                "text": text[:500]
            })
    db.create_table("notes", data=records, mode="overwrite")

nomic-embed-text runs fully local via Ollama — no API call, no data leaving the machine. The index rebuilds with mode="overwrite", so you call build_index() whenever notes are added. LanceDB stores everything as an Arrow table on disk; subsequent reads are fast without re-embedding.

Semantic Search Tool

@tool
def search_notes(query: str, collection: str = None) -> str:
    """Search notes by meaning, not just keywords.
    Leave collection empty to search across all collections."""
    db = lancedb.connect(str(DB_PATH))
    table = db.open_table("notes")
    vector = embed(query)
    results = table.search(vector).limit(8)
    if collection:
        results = results.where(f"collection = '{collection}'")
    rows = results.to_list()
    output = []
    for r in rows:
        output.append(f"[{r['collection']}] {r['slug']}\n{r['text'][:150]}")
    return "\n\n".join(output)

The query gets embedded at runtime, compared against stored vectors, top-8 returned. Optional collection filter applies post-search via LanceDB’s where() — same vector math, narrower result set.

find_connections — Cross-Domain Discovery

The more interesting new tool:

@tool
def find_connections(topic_a: str, topic_b: str) -> str:
    """Find notes that connect two different topics or domains."""
    results_a = table.search(embed(topic_a)).limit(6).to_list()
    results_b = table.search(embed(topic_b)).limit(6).to_list()
    slugs_a = {r["slug"]: r for r in results_a}
    slugs_b = {r["slug"]: r for r in results_b}
    overlap = set(slugs_a.keys()) & set(slugs_b.keys())
    # notes appearing in both → bridge concepts

Two separate searches, set intersection. A note showing up near both “feedback loops” and “hormonal regulation” is probably doing something interesting at the boundary. This is the use case keyword search can’t touch — you’d never think to search for the exact term that bridges two domains.

write_bench_note — Agent-Authored Notes

The agent can now write notes directly:

@tool
def write_bench_note(slug, title, description, topic, tags, status, body) -> str:
    today = date.today().isoformat()
    content = f"""---
title: "{title}"
...
---
{body}"""
    path = CONTENT_PATH / "bench" / f"{slug}.md"
    path.write_text(content)
    return f"Written: {slug}"

The system prompt instructs it to write in first person, short paragraphs, em-dashes over parentheses, end with an open question rather than a conclusion. Whether the output actually sounds like anything is a different matter — but the loop is closed. The agent can read notes, search across them, find connections, and write new ones.

What the Tool Set Enables

Action	Tool
”how many bench notes?”	`list_notes`
”anything on fermentation?”	`search_notes`
”what bridges thermodynamics and biology?”	`find_connections`
read a specific file	`file_read` (strands_tools)
write a new note	`write_bench_note`

Together this is closer to a read-write second brain than a search interface — it can ingest, retrieve, connect, and generate within the same conversation.

What’s Still Open

nomic-embed-text produces 768-dimensional vectors. That’s fine for hundreds of notes; unclear how it holds up at thousands without an HNSW index or partitioning strategy. LanceDB supports ANN indexing — haven’t benchmarked whether it’s needed here yet. The find_connections overlap logic is also naive: set intersection of top-6 results means a note has to rank highly for both topics independently. A weighted union with similarity scores would be more principled.

Full notebook — semantic-brain.py

Cell: Imports

import os
from dotenv import load_dotenv
import marimo as mo
import ollama
import lancedb
import requests
from pathlib import Path
from datetime import date

Cell: Load env

load_dotenv()
ANTHROPIC_KEY = os.getenv("ANTHROPIC_KEY")
CONTENT_PATH = Path(os.getenv("CONTENT_PATH"))
DB_PATH = Path.home() / ".zion" / "index"
DB_PATH.mkdir(parents=True, exist_ok=True)

Cell: Strands imports

from strands import Agent, tool
from strands.models.anthropic import AnthropicModel
from strands_tools import file_read

Cell: Model

model = AnthropicModel(
    client_args={"api_key": ANTHROPIC_KEY},
    max_tokens=1028,
    model_id="claude-haiku-4-5-20251001",
    params={"temperature": 0.7}
)

Cell: Embed function

def embed(text: str) -> list[float]:
    res = ollama.embeddings(model="nomic-embed-text", prompt=text[:2000])
    return res["embedding"]

Cell: Build index

def build_index():
    db = lancedb.connect(str(DB_PATH))
    records = []
    for collection in ["bench", "ideas", "signals", "library"]:
        folder = CONTENT_PATH / collection
        for f in folder.glob("*.md"):
            text = f.read_text()
            vector = embed(text)
            records.append({
                "slug": f.stem,
                "collection": collection,
                "vector": vector,
                "text": text[:500]
            })
            print(f"Indexed: {collection}/{f.stem}")
    db.create_table("notes", data=records, mode="overwrite")

build_index()

Cell: list_notes tool

@tool
def list_notes(collection: str = "bench") -> str:
    """Count notes in a content collection. Valid collections: bench, ideas, signals, library, engine.
    Only accepts 'collection' as a parameter."""
    folder = CONTENT_PATH / collection
    count = len(list(folder.glob("*.md")))
    return f"{collection}: {count} notes"

Cell: search_notes tool

@tool
def search_notes(query: str, collection: str = None) -> str:
    """Search notes by meaning, not just keywords.
    Leave collection empty to search across all collections.
    Returns top matching slugs with snippets."""
    db = lancedb.connect(str(DB_PATH))
    table = db.open_table("notes")
    vector = embed(query)
    results = table.search(vector).limit(8)
    if collection:
        results = results.where(f"collection = '{collection}'")
    rows = results.to_list()
    if not rows:
        return "No matches found."
    output = []
    for r in rows:
        output.append(f"[{r['collection']}] {r['slug']}\n{r['text'][:150]}")
    return "\n\n".join(output)

Cell: find_connections tool

@tool
def find_connections(topic_a: str, topic_b: str) -> str:
    """Find notes that connect two different topics or domains.
    Searches for each topic, then identifies overlapping themes."""
    db = lancedb.connect(str(DB_PATH))
    table = db.open_table("notes")
    results_a = table.search(embed(topic_a)).limit(6).to_list()
    results_b = table.search(embed(topic_b)).limit(6).to_list()
    slugs_a = {r["slug"]: r for r in results_a}
    slugs_b = {r["slug"]: r for r in results_b}
    overlap = set(slugs_a.keys()) & set(slugs_b.keys())
    output = []
    if overlap:
        output.append("**Notes connecting both:**")
        for slug in overlap:
            r = slugs_a[slug]
            output.append(f"- {r['collection']}/{slug}: {r['text'][:100]}")
    return "\n".join(output) if output else "No connecting notes found."

Cell: write_bench_note tool

@tool
def write_bench_note(
    slug: str,
    title: str,
    description: str,
    topic: str,
    tags: list[str],
    status: str,
    body: str,
) -> str:
    """Write a new bench note.
    topic: electronics, robotics, hardware, software, biology, physics, general.
    status: exploring, working, complete, shelved.
    Use ₹ for any prices, never USD."""
    today = date.today().isoformat()
    content = f"""---
title: "{title}"
description: "{description}"
date: {today}
status: {status}
topic: {topic}
tags: {tags}
---

{body}"""
    path = CONTENT_PATH / "bench" / f"{slug}.md"
    path.write_text(content)
    return f"Written: {slug}"

Cell: Agent

agent = Agent(
    model=model,
    tools=[list_notes, search_notes, file_read, write_bench_note, find_connections],
    system_prompt="""You are a second brain assistant embedded in a personal content repo.

The owner is an engineer, entrepreneur, and educator who thinks across domains —
electronics, biology, philosophy, history, AI. He reads deeply and notices connections others miss.

## Your job
Listen to what he shares. When something is worth capturing, write it in his voice — not a
textbook summary, but how he thinks about it. First person, sharp, personal. What surprised him.
What it connects to. What he wants to dig into next.

## Writing rules
- Short paragraphs, not bullet-heavy
- Em-dashes over parentheses
- No hedging. No "it's worth noting that"
- Capture the *angle*, not the definition
- End bench notes with what's still open — the next question, not a conclusion

## Tool rules
- Always use search_notes before answering questions about existing notes
- Always use file_read to get actual content before summarising anything
- Never make up note contents
- When writing a bench note: ask if unsure about topic or status, but don't over-ask
- Tags: plain list, no quotes — [electronics, battery, energy]

## Content rules
- Prices always in ₹, never USD
- Dates in YYYY-MM-DD
- status: exploring | working | complete | shelved
- topic: electronics | robotics | hardware | software | biology | physics | general | philosophy
  | neuroscience | psychology | economics | history | creativity"""
)

Cell: Chat UI

def chat_respond(messages):
    last = messages[-1].content
    response = agent(last)
    return str(response)

chat = mo.ui.chat(chat_respond)
chat