Skip to main content
Vapi is a developer platform for building voice AI agents. It handles the underlying infrastructure so developers can focus on creating high-quality voice experiences. Voice agents built with Vapi can engage in natural conversations with users, make and receive phone calls, integrate seamlessly with existing systems and APIs, and support complex workflows such as appointment scheduling, customer support, and other advanced use cases. SambaNova’s high-speed inference enables low-latency voice interactions, which is critical for natural-sounding conversations.

Prerequisites

Before starting, ensure you have:
  • A SambaNova Cloud account and API key.
  • A Vapi account with dashboard access.
  • Python 3.11 or later.
  • ngrok installed for exposing your local server.

Installation and setup

Clone the repository

git clone https://github.com/sambanova/integrations.git
cd integrations/vapi

Create a virtual environment

python -m venv .venv
source .venv/bin/activate

Install dependencies

pip install flask==3.1.2 sambanova==1.2.0

Install ngrok

On macOS:
brew install ngrok
Then add your ngrok auth token. For more information, see the ngrok documentation:
ngrok config add-authtoken $YOUR_NGROK_AUTHTOKEN

Set environment variables

Create a .env file or export your API key directly:
export SAMBANOVA_API_KEY="your-sambanova-api-key"
You can obtain your API key from the SambaNova Cloud portal.

Run the local LLM server

Start the Flask server:
python app.py
The server will run on http://localhost:5000/.

Expose the server using ngrok

In a separate terminal, run:
ngrok http 5000
ngrok will generate a public URL similar to:
https://abcd-1234.ngrok-free.dev
This is the endpoint Vapi will call.

Test the endpoint

Verify your setup with a cURL request:
curl -X POST https://abcd-1234.ngrok-free.dev/chat/completions \
  -H "Content-Type: application/json" \
  -d '{
    "call": "chat.completions",
    "metadata": {
      "request_id": "example-123"
    },
    "model": "gpt-oss-120b",
    "messages": [
      {
        "role": "system",
        "content": "You are a helpful assistant."
      },
      {
        "role": "user",
        "content": "Hello! Explain what an LLM is in one sentence."
      }
    ],
    "temperature": 0.7,
    "max_tokens": 150,
    "stream": true
  }'
Replace https://abcd-1234.ngrok-free.dev with your actual ngrok URL.

Configure Vapi with your custom LLM

  1. Log in to the Vapi Dashboard.
  2. Create a new assistant using a Blank Template.
  3. Navigate to ModelProviderCustom LLM.
  4. Enter the model name you’ll use (for example, gpt-oss-120b).
  5. Paste your ngrok URL into the endpoint URL field:
    https://abcd-1234.ngrok-free.dev/chat/completions
    
  6. Save the configuration.

Additional resources

For detailed instructions and the complete source code, see the Vapi integration example on GitHub.

Vapi documentation

For more information about Vapi, see the official Vapi documentation.