reference/focusgroup-evaluation-2026-01

Focusgroup Agent UX Evaluation (2026-01)

Comprehensive evaluation of mx CLI agent usability using focusgroup - 5 sessions with 12 total agents (8 Claude, 4 GPT).

Overall Rating: 8-8.5/10

Sessions

Session ID Focus Agents Key Finding
20260111-cb7d8f20 General usability 4 Claude Validation crashes, JSON error gaps
20260111-d86b9983 General usability 4 Claude 8.5/10 rating
20260111-2b993462 Error messages 2 Claude 80% self-service, no command suggestions
20260111-7c39a291 Intuition test 4 Claude 100% command convergence
20260111-6e6f778b Intuition test 4 GPT More variance, same flag mistakes

Key Strengths

Feature Why It Works
Token-conscious output --compact, --terse, --json serve different context budgets
Schema introspection mx schema - "agent gold" for self-discovery
Batch operations mx batch reduces round-trips
Dry-run support mx add --dry-run allows validation
Error handling --json-errors with codes like ENTRY_NOT_FOUND (1002)

Issues Found

P0: Crashes

  • mx search "test" --limit 0 - Python traceback
  • mx list --category=nonexistent - Python traceback
  • --json-errors doesn't catch Click validation failures

P1: Flag Discoverability

  • All 8 agents guessed --find/--replace for patch (actual: --old/--new)
  • Cross-model consensus = strong signal flags are unintuitive

P2: Command Suggestions

  • Flag typos get suggestions: --jason → "Possible options: --json"
  • Command typos don't: mx serach → no "Did you mean 'search'?"

Agent-Recommended Patterns

# Session start
mx prime

# Precise search (avoid semantic surprises)
mx search "query" --strict --terse --limit=1

# Check existence
mx get path.md --metadata  # Exit code 0/1

# Batch for efficiency
echo -e "search 'docker'\nget path/file.md" | mx batch

# Safe writes
mx add --title="..." --dry-run --json

Cross-Model Comparison

Metric Claude GPT
Response variance 0% (identical) High (3+ variants)
Correct commands 8/8 8/8
Correct flags ~6/8 ~5/8

Both models guessed wrong on patch flags - validates this is a real UX gap.

Related Beads Issues

  • voidlabs-kb-ktfu - Epic: mx CLI Agent UX Improvements
  • voidlabs-kb-i24u - Patch flag rename (--old/--new → --find/--replace)
  • voidlabs-kb-77q4 - Flag discoverability
  • voidlabs-kb-skeb - Error message improvements
  • voidlabs-kb-pgml - General agent usability
  • voidlabs-kb-jll6 - Append workflow confusion

Session Logs

Retrieve full session data:

focusgroup logs show 20260111-cb7d8f20
focusgroup logs show 20260111-2b993462
focusgroup logs show 20260111-7c39a291
focusgroup logs show 20260111-6e6f778b