Exploration Mode
Exploration mode lets agents actually run the tool being evaluated, rather than just reading its help output. This provides more authentic feedback based on real interaction.
Overview
By default, focusgroup shows agents the tool's --help output and asks them to provide feedback. With exploration mode enabled, agents can:
- Run the tool with various arguments
- Explore subcommands
- Test edge cases
- Discover issues through actual use
Enabling Exploration
Via CLI
focusgroup ask mx "Try searching for deployment docs" --explore
Via Config
[session]
exploration = true # Enable for all agents
# Or per-agent:
[[agents]]
provider = "claude"
exploration = true # Just this agent can explore
How It Works
- Context Enhancement: Agents receive instructions on how to run the tool
- Tool Access: CLI agents can execute the tool command
- Interactive Feedback: Agents explore, then report findings
What Agents See
With exploration enabled, agents receive additional context:
## Interactive Exploration
**IMPORTANT**: You can and should run `mytool` commands to explore
this tool before giving feedback!
### How to Explore
1. Try the basic help: mytool --help
2. Explore subcommands: mytool <subcommand> --help
3. Try common operations
4. Explore subcommands that interest you
Requirements
Exploration works with CLI-based agents (see providers for setup):
| Provider | Exploration Support |
|---|---|
| Claude | Full support |
| Codex | Full support |
Example: Exploring a Search Tool
[session]
name = "memex-exploration"
mode = "single"
exploration = true
moderator = true
[tool]
command = "mx"
[[agents]]
provider = "claude"
name = "Explorer-1"
[[agents]]
provider = "codex"
name = "Explorer-2"
[questions]
rounds = [
"Explore this knowledge base tool. Try searching for various topics, then report what worked well and what was confusing.",
]
Sample Session Output
## Explorer-1 (Claude)
I explored the `mx` tool by running several commands:
1. `mx --help` - Good overview, clear subcommand list
2. `mx search "deployment"` - Found relevant docs quickly
3. `mx search "nonexistent-topic"` - Helpful "no results" message
4. `mx get docs/deployment.md` - Retrieved full content
**What worked well:**
- Search is fast and results are relevant
- Error messages are clear
**What was confusing:**
- Unclear difference between `search` and `list`
- No obvious way to see all available tags
## Explorer-2 (Codex)
I tested the tool with various inputs:
1. Basic search worked well
2. Tried `mx add` but wasn't sure about required fields
3. `mx tree` gave a good overview of structure
**Suggestions:**
- Add examples to each subcommand's help
- Show available tags in search results
Security Considerations
WARNING: Exploration mode grants agents significant system access. Read this section carefully before enabling exploration.
What Permissions Are Granted
CLI agents run with relaxed permission controls to enable tool exploration:
| Provider | Permission Flags | What This Means |
|---|---|---|
| Claude | --dangerously-skip-permissions |
Bypasses all permission prompts; agent can run any command without approval |
| Codex (explore) | --sandbox danger-full-access |
Removes sandbox restrictions; full filesystem and network access |
| Codex (no explore) | --full-auto |
Standard Codex safety checks apply |
What Agents Can Access
With exploration mode enabled, CLI agents can:
- Filesystem: Read, write, and delete any files accessible to your user account
- Network: Make HTTP requests, download files, connect to services
- Shell Commands: Run arbitrary shell commands (git, curl, rm, etc.)
- Environment: Access environment variables, including secrets in your shell
- Other Tools: Invoke any CLI tools installed on your system
Why These Permissions?
Exploration mode exists to let agents actually use tools and give authentic feedback. An agent exploring a CLI tool needs to run that tool—which requires shell access. Sandbox restrictions would prevent agents from testing the very tool you want feedback on.
This is a deliberate trade-off: useful exploration feedback requires real tool access.
Recommendations
1. Run in Isolated Environments
The safest approach is to run focusgroup exploration in an isolated environment:
# Use a container
docker run -it --rm -v $(pwd):/workspace myimage focusgroup ask mytool "..." --explore
# Use a VM
# Run focusgroup inside a disposable VM or devcontainer
# Use a separate user account
# Create a low-privilege account specifically for exploration
2. Review Tools Before Exploration
Before enabling exploration for a tool:
- Understand what the tool can do (especially write operations)
- Check if the tool accesses sensitive data or services
- Consider using read-only tools first to test the setup
3. Limit What's in the Environment
# Run with a clean environment to limit exposed secrets
env -i PATH="$PATH" HOME="$HOME" focusgroup ask mytool "..." --explore
# Or explicitly set only needed variables
export MYTOOL_CONFIG=/path/to/config
focusgroup ask mytool "..." --explore
Risk Summary
| Scenario | Risk Level | Recommendation |
|---|---|---|
| Exploring your own read-only tool in a dev container | Low | Safe to proceed |
| Exploring any tool in a VM/container | Low | Safe to proceed |
| Exploring read-only tools on your main system | Medium | Acceptable with caution |
| Exploring write-capable tools on your main system | High | Use isolated environment |
| Running exploration with sensitive secrets in env | High | Clean environment or container |
Future: Sandbox Level Control
A --sandbox-level flag for granular control is planned but not yet implemented. For now, exploration mode is all-or-nothing with respect to permissions.
Best Practices
1. Use Specific Questions
# Good: Specific exploration task
rounds = ["Search for 'deployment' topics, then report what you found"]
# Less effective: Vague request
rounds = ["Explore this tool"]
2. Combine with Discussion Mode
[session]
mode = "discussion"
exploration = true
[questions]
rounds = [
"Each of you explore different aspects of this tool.",
"Share what you found. What patterns do you see?",
"Based on everyone's exploration, what should be prioritized?",
]
3. Use Moderator for Synthesis
[session]
exploration = true
moderator = true # Synthesize exploration findings
4. Mix Exploration Modes
# One agent explores, another analyzes
[[agents]]
provider = "claude"
name = "Explorer"
exploration = true
[[agents]]
provider = "codex"
name = "Analyst"
exploration = false
system_prompt = "Analyze the explorer's findings and suggest improvements."
Troubleshooting
Agent Can't Run Commands
- Verify the CLI tool is installed (
claude --version,codex --version) - Check that the tool being evaluated is in PATH
Exploration Too Slow
- Reduce number of agents
- Use
parallel_agents = falseto run sequentially - Limit the scope of exploration in your question
Agent Runs Wrong Commands
- Provide more specific instructions in your question
- Use
working_dirto set the right directory - Ensure tool name is clear and unambiguous