5 releases
| 0.1.39 | Jan 9, 2026 |
|---|---|
| 0.1.25 | Jan 1, 2026 |
| 0.1.23 | Dec 30, 2025 |
| 0.1.22 | Dec 30, 2025 |
| 0.1.1 | Nov 25, 2025 |
#757 in Machine learning
Used in mecha10-cli
41KB
560 lines
LLM Command Node
Natural language command parsing via LLM APIs (OpenAI, Claude, Ollama) with dashboard control.
Quick Start
-
Copy
.env.exampleto.envand add your API key:cp .env.example .env # Edit .env and set: OPENAI_API_KEY=sk-... -
Start development mode:
mecha10 dev -
Open the dashboard at
http://localhost:3000/dashboard/robot-control -
Send commands via the AI Command Control panel!
Overview
The LLM Command node allows users to control robots using natural language commands. It leverages large language models (LLMs) to parse commands and convert them into structured actions.
Features
- Multi-Provider Support: OpenAI, Claude (Anthropic), and local Ollama
- Command Parsing: Converts natural language into structured robot actions
- Action Routing: Publishes to appropriate topics (motor commands, navigation goals, behaviors)
- Vision Queries: Uses object detection data to answer "what do you see?" questions
- Behavior Interruption: Automatically pauses autonomous behaviors when user commands are issued
- Auto-Resume: Configurable automatic resumption of behaviors after timeout
- Dashboard Integration: Real-time command input and response display
- Error Handling: Clear error messages and timeout handling
Configuration
The node is configured via configs/*/llm-command.toml (or through mecha10.json):
# LLM Provider Configuration
provider = "openai" # Options: "openai", "claude", "local"
model = "gpt-4o-mini"
temperature = 0.7
max_tokens = 500
vision_enabled = false
# Topic Configuration
[topics]
command_in = "/ai/command"
response_out = "/ai/response"
camera_in = "/robot/sensors/camera/rgb"
nav_goal_out = "/nav/goal"
motor_cmd_out = "/motor/cmd_vel"
behavior_out = "/behavior/execute"
# Behavior Interrupt Configuration
[behavior_interrupt]
enabled = true
mode = "interrupt_with_auto_resume" # Options: "disabled", "interrupt_only", "interrupt_with_auto_resume"
timeout_secs = 30 # Auto-resume timeout (for interrupt_with_auto_resume mode)
await_completion = false
control_topic = "/behavior/control"
Behavior Interrupt Configuration
When the LLM issues motor or navigation commands, it can automatically interrupt autonomous behaviors:
enabled: Enable/disable behavior interruption (default:true)mode: Interrupt behavior (options below):"disabled": Never interrupt behavior tree"interrupt_only": Interrupt but don't auto-resume (manual resume required)"interrupt_with_auto_resume": Interrupt and automatically resume after timeout
timeout_secs: Seconds before auto-resume (default:30)await_completion: Wait for command completion before resuming (not yet implemented)control_topic: Topic for behavior control commands (default:"/behavior/control")
Environment Variables
Recommended: Use .env file in your project root
Copy .env.example to .env and add your API keys:
# Copy the example file
cp .env.example .env
# Edit .env and add your API key
# .env
OPENAI_API_KEY=sk-...
ANTHROPIC_API_KEY=sk-ant-...
The .env file is automatically loaded by mecha10 dev and passed to all nodes.
Alternative: Set environment variables directly
# For OpenAI
export OPENAI_API_KEY="sk-..."
# For Claude
export ANTHROPIC_API_KEY="sk-ant-..."
# For local Ollama (no key needed)
# Ensure Ollama is running on localhost:11434
Topics
Input Topics
/ai/command(CommandMessage): Natural language command from user{ "text": "move forward", "timestamp": 1234567890, "user_id": "optional_user_id" }
Output Topics
-
/ai/response(ResponseMessage): LLM response with action feedback{ "text": "Moving the robot forward", "timestamp": 1234567890, "action_taken": true, "error": null } -
/motor/cmd_vel(MotorCommand): Motor velocity commands{ "linear": 0.5, "angular": 0.0, "timestamp": 1234567890 } -
/nav/goal(NavigationGoal): Navigation waypoint goals{ "x": 5.0, "y": 3.0, "theta": 0.0, "timestamp": 1234567890 } -
/behavior/execute(BehaviorCommand): Behavior execution commands{ "name": "follow_person", "params": null, "timestamp": 1234567890 }
Command Examples
Motor Commands
"move forward"→{"action": "motor", "linear": 0.5, "angular": 0.0}"turn left"→{"action": "motor", "linear": 0.0, "angular": 0.5}"stop"→{"action": "motor", "linear": 0.0, "angular": 0.0}
Navigation Commands
"go to x:5 y:3"→{"action": "navigate", "goal": {"x": 5.0, "y": 3.0, "theta": 0.0}}"move to the door"→ Extracts coordinates and navigates
Behavior Commands
"follow that person"→{"action": "behavior", "name": "follow_person"}"patrol the area"→{"action": "behavior", "name": "patrol"}
Vision Queries
The node subscribes to /vision/detections from the object-detector node and uses this data to answer vision questions:
"what do you see?"→ "I see a person (95% confidence) and a car (87% confidence)""is there a person in front of me?"→ "Yes, I detect 1 person with 95% confidence""how many cars?"→ "I see 2 cars: car (87% confidence) and car (82% confidence)""describe what's visible"→ Natural language description based on detections
How it works:
- Object detector node continuously publishes detections to
/vision/detections - LLM command node stores the latest detections
- When a vision query is detected, detections are formatted as text context
- LLM analyzes the detections and provides a natural language response
Benefits over vision APIs:
- ✅ Much cheaper - No image tokens, just structured detection data
- ✅ Faster - No need to encode/send images
- ✅ More accurate - Uses specialized YOLO model for detection
Behavior Interruption
The LLM command node intelligently manages the interaction between user commands and autonomous behaviors:
How It Works
- Automatic Interruption: When the LLM parses a motor or navigation command, it interrupts the behavior tree
- User Priority: Direct user commands always take priority over autonomous behaviors
- Auto-Resume: After a timeout (configurable), the behavior tree automatically resumes
- Manual Resume: Users can manually re-enable behaviors via the dashboard
Interrupt Modes
Disabled (mode = "disabled")
- Behavior tree is never interrupted by LLM commands
- User commands may be overridden by autonomous behaviors
- Use when you want autonomous behaviors to have priority
Interrupt Only (mode = "interrupt_only")
- Behavior tree is paused when motor/navigation commands are issued
- No automatic resumption - requires manual re-enable from dashboard
- Use when you want explicit control over behavior resumption
Interrupt with Auto-Resume (mode = "interrupt_with_auto_resume")
- Behavior tree is paused when motor/navigation commands are issued
- Automatically resumes after
timeout_secs(default: 30s) - Use for seamless switching between manual and autonomous control
Example Scenario
1. Robot is running "patrol" behavior (autonomous)
2. User says: "stop" via LLM command
→ Behavior tree is interrupted
→ Motor command published: {linear: 0.0, angular: 0.0}
3. Robot stops and remains idle
4. After 30 seconds (timeout):
→ Behavior tree automatically resumes
→ Robot continues patrolling
Control Messages
The system uses enhanced BehaviorControl messages:
{
"action": "interrupt",
"source": "llm-command",
"duration_secs": 30,
"timestamp": 1234567890
}
Actions:
interrupt: Pause behavior tree (from LLM command)resume: Resume behavior tree (manual or auto)enable: Enable behavior tree (from dashboard)disable: Disable behavior tree (from dashboard)
System Prompt
The default system prompt guides the LLM to parse commands into structured JSON actions:
You are a helpful robot assistant. Parse user commands and respond with structured actions.
For navigation commands (e.g., "go to the door", "move to coordinates"), extract the goal and respond with JSON:
{"action": "navigate", "goal": {"x": 5.0, "y": 3.0, "theta": 0.0}}
For motor commands (e.g., "move forward", "turn left", "stop"), respond with JSON:
{"action": "motor", "linear": 0.5, "angular": 0.0}
For behavior commands (e.g., "follow that person", "patrol the area"), respond with JSON:
{"action": "behavior", "name": "follow_person"}
For vision queries (e.g., "what do you see?"), describe what's visible in the camera feed.
For general questions, respond conversationally.
You can customize this prompt in the configuration.
Dashboard Integration
The dashboard provides a user-friendly interface for:
- Command Input: Text field for natural language commands
- Command History: Shows past commands with status indicators
- Response Display: Shows LLM responses and action feedback
- Status Badges: Connection status and processing indicators
Access the dashboard at http://localhost:3000/dashboard/robot-control
Architecture
┌─────────────────┐
│ Dashboard UI │
│ (Command Input) │
└────────┬────────┘
│ publishes
▼
/ai/command
│
▼
┌──────────────────┐
│ OpenAI Reasoning │
│ Node │
│ │
│ ┌────────────┐ │
│ │ LlmNode │ │
│ │ (mecha10- │ │
│ │ ai-llm) │ │
│ └────────────┘ │
│ │ │
│ Parse JSON │
│ │ │
└─────────┼────────┘
│
┌─────┴─────┬──────────────┬───────────────┐
▼ ▼ ▼ ▼
/motor/cmd_vel /nav/goal /behavior/execute /ai/response
│ │ │ │
▼ ▼ ▼ ▼
┌────────┐ ┌──────────┐ ┌─────────────┐ ┌─────────┐
│ Motor │ │Navigation│ │ Behavior │ │Dashboard│
│ Driver │ │ Stack │ │ Executor │ │ UI │
└────────┘ └──────────┘ └─────────────┘ └─────────┘
Dependencies
- mecha10-core: Framework core (Context, Topic, Message)
- mecha10-ai-llm: LLM integration library (providers, LlmNode)
- tokio: Async runtime
- serde/serde_json: Serialization
- anyhow: Error handling
- reqwest: HTTP client (for API calls)
Running
The node is launched automatically by mecha10 dev when included in mecha10.json.
To run manually:
cargo run -p mecha10-nodes-llm-command
Testing
Test the node with simulation:
-
Start control plane and simulation:
docker compose up -d mecha10 dev -
Send a test command via dashboard or Redis CLI:
redis-cli PUBLISH "/ai/command" '{"text":"move forward","timestamp":1234567890}' -
Subscribe to response topic:
redis-cli SUBSCRIBE "/ai/response" redis-cli SUBSCRIBE "/motor/cmd_vel"
Limitations
- Vision queries not yet supported: Camera frame integration pending
- No conversation context: Each command is processed independently
- API rate limits: Subject to provider rate limits (OpenAI, Claude)
- Network latency: Response time depends on LLM API latency
Future Enhancements
- Vision query support (integrate camera feed)
- Conversation context (multi-turn dialogue)
- Voice input integration
- Command validation and safety checks
- Multi-language support
- Offline fallback mode
License
MIT
Dependencies
~29–47MB
~650K SLoC