Plan Learning Flywheel
Concept
Section titled “Concept”The learning flywheel converts expensive L2 (LLM-assisted) plans into cheap L0 (deterministic) rules. Over time, this drives the escalation rate toward zero as the system learns to handle more situations autonomously.
Request → L2 (LLM + A*) → Successful Plan → Extract Rule → Register Rule ↓Request → Check Learned Rules → Match! → Use Learned Plan → Record Success ↓ Promote to L0 (after N successes)Components
Section titled “Components”RuleExtractor
Section titled “RuleExtractor”Extracts generalizable conditions from successful L2 plans:
- Uses the plan’s first action preconditions as rule conditions
- Captures relevant state keys from the initial state
- Configurable minimum confidence threshold (default: 0.8)
RuleRegistry
Section titled “RuleRegistry”Stores learned rules with:
- Success/failure tracking
- Version history
- Active/inactive status
- Promotion tracking (to L0)
- Automatic pruning of low-performing rules
PatternMatcher
Section titled “PatternMatcher”Checks incoming requests against learned rules:
- Exact goal pattern matching
- State condition evaluation
- Selects best rule by: success rate → confidence → cost
PlanConverter
Section titled “PlanConverter”Orchestrates the conversion pipeline:
- Validates plan eligibility
- Creates or updates rules
- Triggers promotion check
RuleMigration
Section titled “RuleMigration”Safe promotion from L2 registry to L0:
- Version snapshots before promotion
- Audit trail of all events
- Auto-rollback on performance degradation
Configuration
Section titled “Configuration”learning: min_learning_confidence: 0.8 promotion_threshold: 3 max_learned_rules: 500 auto_rollback_enabled: trueMonitoring
Section titled “Monitoring”Track flywheel health via:
rulesLearned: Total rules in registryrulesPromoted: Rules promoted to L0l0HitRate: Ratio of requests handled by learned rulesescalationRateDelta: Change in escalation rate (negative = improving)