Should we chaos test our agents?

(github.com)

1 points | by himmi-01 2 hours ago

1 comments

himmi-01 2 hours ago
We're building and deploying agents faster than normal SaaS because there are so many frameworks. I talked to teams seeing weird problems with their agents after they are in production. Tool failure, backend latency, hallucinations, known prompt injection etc. Sharing EvalMonkey, so you can test the agent and save the frustration later when tracking a single trace. Comes with CLI for Claude Code/Cursor as well.