afrexai-agent-observability

// Score, monitor, and troubleshoot AI agent fleets in production. Built for ops teams running 1-100+ agents.

$ git log --oneline --stat

stars:1,933

forks:367

updated:March 4, 2026

SKILL.mdreadonly

Agent Observability & Monitoring

Score, monitor, and troubleshoot AI agent fleets in production. Built for ops teams running 1-100+ agents.

What This Does

Evaluates your agent deployment across 6 dimensions and returns a 0-100 health score with specific fixes.

6-Dimension Assessment

1. Execution Visibility (0-20 pts)

Can you see what every agent is doing right now?
Task queue depth, active/idle ratio, error rates
Benchmark: Top quartile tracks 95%+ of agent actions in real-time

2. Cost Attribution (0-20 pts)

Do you know exactly what each agent costs per task?
Token spend, API calls, compute time, tool invocations
Benchmark: Unmonitored agents waste 30-55% on retries and hallucination loops

3. Output Quality (0-15 pts)

Are agent outputs validated before reaching users or systems?
Accuracy sampling, hallucination detection, regression tracking
Benchmark: 1 in 12 agent outputs contains a material error without monitoring

4. Failure Recovery (0-15 pts)

What happens when an agent fails mid-task?
Retry logic, graceful degradation, human escalation paths
Benchmark: Mean time to detect agent failure without monitoring: 4.2 hours

5. Security & Boundaries (0-15 pts)

Are agents staying within authorized scope?
Tool access auditing, data exfiltration checks, permission drift
Benchmark: 23% of production agents access tools outside their intended scope

6. Fleet Coordination (0-15 pts)

Do multi-agent workflows hand off cleanly?
Message passing reliability, deadlock detection, duplicate work
Benchmark: Uncoordinated fleets duplicate 18-25% of work

Scoring

Score	Rating	Action
80-100	Production-grade	Optimize and scale
60-79	Operational	Fix gaps before scaling
40-59	Risky	Immediate remediation needed
0-39	Blind	Stop scaling, instrument first

Quick Assessment Prompt

Ask the agent to evaluate your setup:

Run the agent observability assessment against our current deployment:
- How many agents are running?
- What monitoring exists today?
- What broke in the last 30 days?
- What's our monthly agent spend?
- Who gets alerted when an agent fails?

Cost Framework

Company Size	Unmonitored Waste	Monitoring Investment	Net Savings
1-5 agents	$2K-$8K/mo	$500-$1K/mo	$1.5K-$7K/mo
5-20 agents	$8K-$45K/mo	$2K-$5K/mo	$6K-$40K/mo
20-100 agents	$45K-$200K/mo	$8K-$20K/mo	$37K-$180K/mo

90-Day Monitoring Roadmap

Week 1-2: Inventory all agents, document intended scope, tag cost centers Week 3-4: Deploy execution logging (every tool call, every output) Month 2: Build dashboards — cost per task, error rate, latency P95 Month 3: Automated alerting — failure detection <5 min, cost anomaly flags, scope violations

7 Monitoring Mistakes

Logging only errors (miss the slow degradation)
No cost attribution (agents burn budget invisibly)
Monitoring agents like servers (they need task-level observability)
Manual review of agent outputs (doesn't scale past 3 agents)
No baseline metrics (can't detect regression without a baseline)
Alerting on everything (alert fatigue kills response time)
Skipping agent-to-agent handoff monitoring (where most fleet failures happen)

Industry Adjustments

Industry	Critical Dimension	Why
Financial Services	Security & Boundaries	Regulatory audit trails mandatory
Healthcare	Output Quality	Clinical accuracy non-negotiable
Legal	Execution Visibility	Billing requires task-level tracking
Ecommerce	Cost Attribution	Margin-sensitive, waste kills profit
SaaS	Fleet Coordination	Multi-tenant agent isolation
Manufacturing	Failure Recovery	Downtime = production line stops
Construction	Security & Boundaries	Safety-critical document handling
Real Estate	Output Quality	Valuation errors = liability
Recruitment	Fleet Coordination	Candidate pipeline handoffs
Professional Services	Cost Attribution	Client billing accuracy

Go Deeper

AI Agent Context Packs — industry-specific decision frameworks: https://afrexai-cto.github.io/context-packs/
AI Revenue Leak Calculator — find where your business loses money to manual processes: https://afrexai-cto.github.io/ai-revenue-calculator/
Agent Setup Wizard — configure your agent stack in 5 minutes: https://afrexai-cto.github.io/agent-setup/

Built by AfrexAI — we help businesses run AI agents that actually make money.