Plan: LLM Query Processing for KB Entries

Summary

Add opt-in LLM processing to KB entries with caching. Supports summarize, synthesize, and answer operations.

CLI Interface

mx get path/entry.md --llm           # Summarize entry
mx get path/entry.md --llm --no-cache # Force fresh generation
mx search "query" --llm              # Synthesize results to answer query
mx ask "How do I deploy?"            # Shorthand for search + synthesize
mx ask "..." --show-sources          # Include source paths
mx llm-cache stats                   # Cache statistics
mx llm-cache clear                   # Clear cache

Files to Create

`src/memex/llm_query.py`

Core LLM operations:

content_hash(content: str) -> str - SHA256 for cache keys
summarize_entry(entry: KBEntry) -> LLMResponse
synthesize_entries(entries: list[KBEntry]) -> LLMResponse
answer_query(query: str, entries: list[KBEntry]) -> LLMResponse
_truncate_content() - Handle token limits with tiktoken

`src/memex/llm_cache.py`

Cache management (stored in .indices/llm_cache.json):

get_cached_response(operation, content_hashes, query) -> CachedLLMResponse | None
cache_response(operation, paths, hashes, query, model, response, tokens)
clear_cache() -> int
cache_stats() -> dict

Cache key: sha256(operation + sorted(content_hashes) + query)[:16]

Files to Modify

`src/memex/config.py`

Add LLMQueryConfig dataclass:

@dataclass
class LLMQueryConfig:
    enabled: bool = False
    model: str = "anthropic/claude-3-5-haiku"
    max_input_tokens: int = 8000
    cache_enabled: bool = True

Add get_llm_query_config() (follow existing config loader patterns)

`src/memex/models.py`

Add:

class LLMResponse(BaseModel):
    operation: Literal["summarize", "synthesize", "answer"]
    input_paths: list[str]
    response: str
    model: str
    cached: bool = False
    token_count: int = 0

`src/memex/cli.py`

Add to get command (line ~1189):
- --llm flag
- --no-cache flag
Add to search command (line ~973):
- --llm flag
- --no-cache flag
Add new ask command:
- Search → load entries → call answer_query()
- Options: --limit, --show-sources, --no-cache
Add llm-cache command group with stats and clear subcommands

Config (.kbconfig)

llm_query:
  enabled: true
  model: anthropic/claude-3-5-haiku
  max_input_tokens: 8000
  cache_enabled: true

Cache Format (.indices/llm_cache.json)

{
  "version": 1,
  "entries": {
    "cache_key_hash": {
      "operation": "summarize",
      "input_hashes": ["sha256..."],
      "input_paths": ["path.md"],
      "query": null,
      "model": "anthropic/claude-3-5-haiku",
      "response": "...",
      "created_at": "2024-01-14T12:00:00Z",
      "token_count": 150
    }
  }
}

Implementation Order

Add LLMQueryConfig to config.py
Create llm_cache.py
Create llm_query.py
Add LLMResponse to models.py
Add --llm to mx get
Add --llm to mx search
Add mx ask command
Add mx llm-cache commands

Verification

# Enable in config
echo 'llm_query:\n  enabled: true' >> .kbconfig

# Test summarize
mx get kb/guides/quick-start.md --llm

# Test cached response
mx get kb/guides/quick-start.md --llm  # Should say "(cached)"

# Test answer query
mx ask "How do I track issues?"

# Test cache management
mx llm-cache stats
mx llm-cache clear

# Run tests
uv run pytest tests/test_llm_query.py -v