Focusgroup Agent UX Evaluation (2026-01)
Comprehensive evaluation of mx CLI agent usability using focusgroup - 5 sessions with 12 total agents (8 Claude, 4 GPT).
Overall Rating: 8-8.5/10
Sessions
| Session ID | Focus | Agents | Key Finding |
|---|---|---|---|
| 20260111-cb7d8f20 | General usability | 4 Claude | Validation crashes, JSON error gaps |
| 20260111-d86b9983 | General usability | 4 Claude | 8.5/10 rating |
| 20260111-2b993462 | Error messages | 2 Claude | 80% self-service, no command suggestions |
| 20260111-7c39a291 | Intuition test | 4 Claude | 100% command convergence |
| 20260111-6e6f778b | Intuition test | 4 GPT | More variance, same flag mistakes |
Key Strengths
| Feature | Why It Works |
|---|---|
| Token-conscious output | --compact, --terse, --json serve different context budgets |
| Schema introspection | mx schema - "agent gold" for self-discovery |
| Batch operations | mx batch reduces round-trips |
| Dry-run support | mx add --dry-run allows validation |
| Error handling | --json-errors with codes like ENTRY_NOT_FOUND (1002) |
Issues Found
P0: Crashes
mx search "test" --limit 0- Python tracebackmx list --category=nonexistent- Python traceback--json-errorsdoesn't catch Click validation failures
P1: Flag Discoverability
- All 8 agents guessed
--find/--replacefor patch (actual:--old/--new) - Cross-model consensus = strong signal flags are unintuitive
P2: Command Suggestions
- Flag typos get suggestions:
--jason→ "Possible options: --json" - Command typos don't:
mx serach→ no "Did you mean 'search'?"
Agent-Recommended Patterns
# Session start
mx prime
# Precise search (avoid semantic surprises)
mx search "query" --strict --terse --limit=1
# Check existence
mx get path.md --metadata # Exit code 0/1
# Batch for efficiency
echo -e "search 'docker'\nget path/file.md" | mx batch
# Safe writes
mx add --title="..." --dry-run --json
Cross-Model Comparison
| Metric | Claude | GPT |
|---|---|---|
| Response variance | 0% (identical) | High (3+ variants) |
| Correct commands | 8/8 | 8/8 |
| Correct flags | ~6/8 | ~5/8 |
Both models guessed wrong on patch flags - validates this is a real UX gap.
Related Beads Issues
voidlabs-kb-ktfu- Epic: mx CLI Agent UX Improvementsvoidlabs-kb-i24u- Patch flag rename (--old/--new → --find/--replace)voidlabs-kb-77q4- Flag discoverabilityvoidlabs-kb-skeb- Error message improvementsvoidlabs-kb-pgml- General agent usabilityvoidlabs-kb-jll6- Append workflow confusion
Session Logs
Retrieve full session data:
focusgroup logs show 20260111-cb7d8f20
focusgroup logs show 20260111-2b993462
focusgroup logs show 20260111-7c39a291
focusgroup logs show 20260111-6e6f778b