InspectAI

InspectAI is an evaluation framework created by the UK AI Security Institute. It can be used to run a wide range of evaluations that measure coding, reasoning, agentic tasks, knowledge, behavior, and multimodal understanding. With InspectAI, evaluations and benchmarking become simple, reproducible, and consistent across multiple models and providers.

Prerequisites

Before you begin, ensure you have:

A SambaCloud account and an active API key at SambaCloud API Keys.
To use InspectAI with SambaNova, set your SambaNova API key as an environment variable:

export SAMBANOVA_API_KEY=your-sambacloud-api-key`

Python environment with required packages installed.

python3 -m venv .venv
source .venv/bin/activate
pip install inspect-ai
pip install openai

Running evaluations

Before you can run your first evaluation, you’ll need to define a task in a Python script. Each task has three main components:

Dataset – the list of inputs and expected results
Solver – how the model produces its outputs
Scorer – how outputs are evaluated against the expected results

Example: Hello world

Save the following code into a hello_world.py file.

from inspect_ai import Task, task
from inspect_ai.dataset import Sample
from inspect_ai.scorer import exact
from inspect_ai.solver import generate

@task
def hello_world():
    return Task(
        dataset=[
            Sample(
                input="Just reply with Hello World",
                target="Hello World",
            )
        ],
        solver=[generate()],
        scorer=exact(),
    )

Then run the evaluation with SambaCloud. Here’s an example using the Llama-4-Maverick-17B-128E-Instruct model:

inspect eval hello_world.py --model sambanova/llama-4-maverick-17b-128e-instruct

Viewing results

Results are stored in the ./logs directory.
Use the Inspect web UI for interactive viewing

inspect view

You can also use the Inspect Visual Studio Code extension for easier log exploration.

More information

See the InspecAI repo for more evaluation examples.
For more details, see the official InspectAI documentation.

Overview

Agent building and orchestration

Coding assistants

Evaluation and monitoring

LLM frameworks

Low-code platforms

Hyperscalers

Real-time voice

Tool and Browser Use

Vector DB and search

Prerequisites

Running evaluations

Example: Hello world

Viewing results

More information

Overview

Agent building and orchestration

Coding assistants

Evaluation and monitoring

LLM frameworks

Low-code platforms

Hyperscalers

Real-time voice

Tool and Browser Use

Vector DB and search

​Prerequisites

​Running evaluations

​Example: Hello world

​Viewing results

​More information

Prerequisites

Running evaluations

Example: Hello world

Viewing results

More information