Skip to content

L2 Planner Operational Runbook

See monitoring/dashboards/l2-planner-dashboard.json for the dashboard definition.

Key panels:

  • Escalation Rate: Should trend downward over time
  • Success Rate: Target > 80%
  • Confidence Distribution: Healthy system has most plans > 0.7
  • Learned Rules Count: Should grow steadily
ConditionSeverityAction
Escalation rate > 50% (1h window)WarningCheck L2 planner health
Escalation rate > 80% (1h window)CriticalCheck A2A connectivity and action graph
P95 latency > 30sWarningCheck A2A response times
Learned rule failure rate > 30%WarningTrigger rule audit

Edit src/config/routing-config.yaml:

routing:
l2_confidence_threshold: 0.5 # Lower = more autonomous, higher = more escalations
const registry = dispatcher.getRuleRegistry();
const stats = registry.getStats();
const rules = registry.getAll();
const migration = new RuleMigration(registry, versioning, auditor);
migration.rollback("rule-id");
const tracker = dispatcher.getEscalationTracker();
const trend = tracker.getTrend(3600_000); // 1-hour buckets
const reasons = tracker.getTopReasons(10);
  1. Check A2A client connectivity (is Ava responding?)
  2. Verify action graph has sufficient actions for the goal types
  3. Check if confidence threshold is too high
  4. Review top escalation reasons: tracker.getTopReasons(10)
  1. Identify the rule: registry.findByGoal(goalPattern)
  2. Check failure count: rule.failureCount
  3. Rollback if needed: migration.rollback(rule.id)
  4. Audit trail: auditor.getForRule(rule.id)
  1. Check minimum learning confidence: plans below 0.8 are not extracted
  2. Check promotion threshold: rules need 3 successes before promotion
  3. Verify registry is not at max capacity (500 default)
  4. Check if plans are too long (max 10 actions for extraction)