What is an Evaluation future?

The Evaluation feature enables Magnet AI admins to test RAG Tools and Prompt Templates with custom test sets to track, compare, and improve the quality of Gen AI-generated output. This is important to keep the generated output reliable and cost-effective at the same time.

Admins can create sets of test inputs manually or import them from an Excel file and run evaluation jobs for one or multiple RAG Tools or Prompt Templates. When the job is over, download a report with test inputs and Gen AI-generated outputs and analyze it. This helps evaluate the relevance, accuracy and other parameters of AI tools, and adjust their configurations accordingly.

Use the same Test Set to evaluate an AI tool with different configuration parameters to reach the optimal combination of parameters that ensures the most consistent and accurate output.