What is an eval?
An eval is a way of measuring how well your agent is performing against a set of criteria. There are 2 types of evals:- Offline evals uses historical data to assess agent performance. It is helpful for preventing regressions when an agent is edited.
- Online evals uses real time data to assess agent performance. It is helpful for monitoring real world performance and outputs of your agent.
Key Terminology
In Lindy, an eval is a reference task combined with a set of scorers. Typically this is a task that has performed well so you can prevent regressions. Think of them as tests that you want to pass every time you deploy a new iteration of an agent. A scorer defines how the eval should be scored.Creating an eval
To create an offline eval, click on the testube icon under any step


Monitoring and running your evals
To monitor and run your evals, go to the evals tab.
Eval runs consume credits from your account.
Note: running an evaluation is a safe simulation—it does not execute real actions. it will simply simulate how the run would behave with the current version of the agent.