Skip to content

Memory Service

The constellation — every memory a node, every connection a reason to remember

The Memory Service is a Rust binary (port 42069) that gives every Sanctum agent persistent memory without requiring any of them to manage it. It is the primary engine, supported by the memory-vault-mcp shim with something faster, smaller, and wired directly into the proxy so that remembering happens as a side effect of thinking — which, if you squint, is how it works for the rest of us too.

Dual storage: SQLite with FTS5 for search, markdown files for humans and Obsidian. The database is fast. The files are legible. The agents don’t care which one you read. You will care, at 2 AM, when you need to understand why Yoda thinks the internet goes down every Thursday.

The proxy (port 4040) already sees every conversation in the house. Making it the memory capture point was less a design decision than an observation: the data was already flowing through the wire. We just started writing it down.

The integration is three hooks, all non-blocking:

  1. Pre-request — The proxy queries sanctum-memory for cached context relevant to the incoming conversation and injects it into the system message. The agent receives memories it didn’t ask for and doesn’t know it received. This is, technically, inception.
  2. Post-response — After streaming the response, the proxy fires an async ingest call with the conversation data. No waiting. No acknowledgment. Fire and forget.
  3. Failure isolation — Memory failures never block or slow requests. If the memory service is down, the proxy sends the request without context and logs a warning. Agents can think without remembering. They just think less well.

Every memory has a type. The type determines where it lives, how long it survives, and how it’s retrieved.

TypePurposeExample
semanticFacts, preferences, knowledge”User prefers terse responses”
episodicEvents with timestamps”Internet outage March 23 at 3:39 AM”
proceduralHow-to knowledge, runbooks”To restart LM Studio, kill the process then…”
observationAgent-noted patterns”Disk usage trending up 2% per week”
session_summaryCompressed conversation logsEnd-of-session distillation

The distinction between semantic and episodic matters for retrieval. When an agent asks “what does the user prefer,” you search semantic. When it asks “what happened last Thursday,” you search episodic. Conflating them is how you get a memory system that answers “what happened last Thursday” with “the user prefers dark mode.”

Dual storage, matching the existing vault layout:

BackendRoleFormat
SQLite (.vault.db)Search, metadata, indexesFTS5 full-text, JSON1 metadata
Markdown filesHuman-readable, git-trackedYAML frontmatter + body

The markdown directories — inbox/, knowledge/, events/, procedures/ — are unchanged from the vault. Obsidian still works. Git history still works. The database is the index; the files are the truth.

Every memory gets a score between 0.0 and 1.0. The score determines how long it lives.

Formula: base × source_weight × recency × access_boost × link_boost

FactorCalculationRationale
Source weightuser=0.9, system=0.85, claude-code=0.7, openclaw=0.7, HA=0.5User-stated facts outrank machine observations
Recencyhours^(-0.3) (power-law decay)Recent memories matter more, but the decay is gentle
Access boost1 + ln(access_count + 1)Frequently accessed memories earn protection
Link boostProportional to inbound wikilinksWell-connected knowledge survives longer

Importance determines lifespan. The system forgets on purpose — and considers this a feature.

ImportanceTTLNotes
> 0.8PermanentCore knowledge, user-stated preferences
0.5 – 0.890 daysAgent-observed patterns, recurring events
0.3 – 0.530 daysSingle observations, transient context
< 0.37 daysEphemeral session data

Protection rules: Memories with importance above 0.8 or an access count of 5 or more are exempt from expiry. If the system keeps reaching for a memory, the memory stays. Even if the math says otherwise.

Runs every 6 hours. The process is hybrid: regex extraction happens immediately, LLM enrichment is deferred to council-27b on a best-effort basis. If the local model is busy or down, consolidation finishes without enrichment and tries again next cycle.

  1. Scan inbox — Find raw notes older than 24 hours
  2. Recompute scores — Update importance for all active memories
  3. LLM enrichment — Extract entities, tags, and relationships via council-27b (best-effort)
  4. Promote — Move consolidated notes to knowledge/, events/, or procedures/
  5. Expire — Apply TTL rules, archive expired notes (retained 90 days)
  6. Enforce caps — Inbox: 300, Knowledge: 1000, Events: 500, Procedures: 200

All endpoints accept and return JSON. The service binds to 127.0.0.1:42069 by default.

MethodEndpointDescription
POST/recallContext-aware retrieval — returns memories ranked by relevance to a query
POST/searchFull-text search with filters (type, tags, date range, source)
POST/ingestAsync ingestion of conversation data (called by proxy)
POST/writeCreate or update a memory with schema enforcement
GET/read/{id}Read a memory by ID (auto-tracks access count)
DELETE/delete/{id}Remove a memory
GET/healthService health and vault metrics
POST/consolidateTrigger manual consolidation (dry-run by default)

All settings live in instance.yaml under services.memory_vault:

services:
memory_vault:
enabled: true
port: 42069
db_path: "~/.sanctum/memory/.vault.db"
markdown_dir: "~/.sanctum/memory"
consolidation_interval: 21600 # 6 hours in seconds
model: "council-27b" # LLM for enrichment
max_context_tokens: 2048 # injected context budget
ttl_check_interval: 3600 # hourly TTL sweep
PropertyValue
Host127.0.0.1
Port42069
Binary~3.8MB (Rust, statically linked)
StorageSQLite 3 + FTS5, markdown files
Model tiercouncil-27b (enrichment only, best-effort)
DependenciesNone at runtime (SQLite compiled in)
LaunchAgentcom.sanctum.memory-vault