InspectAI is an evaluation framework created by the UK AI Security Institute. It can be used to run a wide range of evaluations that measure coding, reasoning, agentic tasks, knowledge, behavior, and multimodal understanding. With InspectAI, evaluations and benchmarking become simple, reproducible, and consistent across multiple models and providers.

Prerequisites

Before you begin, ensure you have:
  1. A SambaCloud account and an active API key at SambaCloud API Keys.
  2. To use InspectAI with SambaNova, set your SambaNova API key as an environment variable:
export SAMBANOVA_API_KEY=your-sambacloud-api-key`
  1. Python environment with required packages installed.
python3 -m venv .venv
source .venv/bin/activate
pip install inspect-ai
pip install openai

Running evaluations

Before you can run your first evaluation, you’ll need to define a task in a Python script. Each task has three main components:
  1. Dataset – the list of inputs and expected results
  2. Solver – how the model produces its outputs
  3. Scorer – how outputs are evaluated against the expected results

Example: Hello world

Save the following code into a hello_world.py file.
from inspect_ai import Task, task
from inspect_ai.dataset import Sample
from inspect_ai.scorer import exact
from inspect_ai.solver import generate

@task
def hello_world():
    return Task(
        dataset=[
            Sample(
                input="Just reply with Hello World",
                target="Hello World",
            )
        ],
        solver=[generate()],
        scorer=exact(),
    )
Then run the evaluation with SambaCloud. Here’s an example using the Llama-4-Maverick-17B-128E-Instruct model:
inspect eval hello_world.py --model sambanova/llama-4-maverick-17b-128e-instruct

Viewing results

  • Results are stored in the ./logs directory.
  • Use the Inspect web UI for interactive viewing
inspect view

More information