Plan: LLM Query Processing for KB Entries
Summary
Add opt-in LLM processing to KB entries with caching. Supports summarize, synthesize, and answer operations.
CLI Interface
mx get path/entry.md --llm # Summarize entry
mx get path/entry.md --llm --no-cache # Force fresh generation
mx search "query" --llm # Synthesize results to answer query
mx ask "How do I deploy?" # Shorthand for search + synthesize
mx ask "..." --show-sources # Include source paths
mx llm-cache stats # Cache statistics
mx llm-cache clear # Clear cache
Files to Create
src/memex/llm_query.py
Core LLM operations:
content_hash(content: str) -> str- SHA256 for cache keyssummarize_entry(entry: KBEntry) -> LLMResponsesynthesize_entries(entries: list[KBEntry]) -> LLMResponseanswer_query(query: str, entries: list[KBEntry]) -> LLMResponse_truncate_content()- Handle token limits with tiktoken
src/memex/llm_cache.py
Cache management (stored in .indices/llm_cache.json):
get_cached_response(operation, content_hashes, query) -> CachedLLMResponse | Nonecache_response(operation, paths, hashes, query, model, response, tokens)clear_cache() -> intcache_stats() -> dict
Cache key: sha256(operation + sorted(content_hashes) + query)[:16]
Files to Modify
src/memex/config.py
Add LLMQueryConfig dataclass:
@dataclass
class LLMQueryConfig:
enabled: bool = False
model: str = "anthropic/claude-3-5-haiku"
max_input_tokens: int = 8000
cache_enabled: bool = True
Add get_llm_query_config() (follow existing config loader patterns)
src/memex/models.py
Add:
class LLMResponse(BaseModel):
operation: Literal["summarize", "synthesize", "answer"]
input_paths: list[str]
response: str
model: str
cached: bool = False
token_count: int = 0
src/memex/cli.py
-
Add to
getcommand (line ~1189):--llmflag--no-cacheflag
-
Add to
searchcommand (line ~973):--llmflag--no-cacheflag
-
Add new
askcommand:- Search → load entries → call
answer_query() - Options:
--limit,--show-sources,--no-cache
- Search → load entries → call
-
Add
llm-cachecommand group withstatsandclearsubcommands
Config (.kbconfig)
llm_query:
enabled: true
model: anthropic/claude-3-5-haiku
max_input_tokens: 8000
cache_enabled: true
Cache Format (.indices/llm_cache.json)
{
"version": 1,
"entries": {
"cache_key_hash": {
"operation": "summarize",
"input_hashes": ["sha256..."],
"input_paths": ["path.md"],
"query": null,
"model": "anthropic/claude-3-5-haiku",
"response": "...",
"created_at": "2024-01-14T12:00:00Z",
"token_count": 150
}
}
}
Implementation Order
- Add
LLMQueryConfigtoconfig.py - Create
llm_cache.py - Create
llm_query.py - Add
LLMResponsetomodels.py - Add
--llmtomx get - Add
--llmtomx search - Add
mx askcommand - Add
mx llm-cachecommands
Verification
# Enable in config
echo 'llm_query:\n enabled: true' >> .kbconfig
# Test summarize
mx get kb/guides/quick-start.md --llm
# Test cached response
mx get kb/guides/quick-start.md --llm # Should say "(cached)"
# Test answer query
mx ask "How do I track issues?"
# Test cache management
mx llm-cache stats
mx llm-cache clear
# Run tests
uv run pytest tests/test_llm_query.py -v