Vapi is a developer platform for building voice AI agents. It handles the underlying infrastructure so developers can focus on creating high-quality voice experiences. Voice agents built with Vapi can engage in natural conversations with users, make and receive phone calls, integrate seamlessly with existing systems and APIs, and support complex workflows such as appointment scheduling, customer support, and other advanced use cases.
SambaNova’s high-speed inference enables low-latency voice interactions, which is critical for natural-sounding conversations.
Prerequisites
Before starting, ensure you have:
- A SambaNova Cloud account and API key.
- A Vapi account with dashboard access.
- Python 3.11 or later.
- ngrok installed for exposing your local server.
Installation and setup
Clone the repository
git clone https://github.com/sambanova/integrations.git
cd integrations/vapi
Create a virtual environment
python -m venv .venv
source .venv/bin/activate
Install dependencies
pip install flask==3.1.2 sambanova==1.2.0
Install ngrok
On macOS:
Then add your ngrok auth token. For more information, see the ngrok documentation:
ngrok config add-authtoken $YOUR_NGROK_AUTHTOKEN
Set environment variables
Create a .env file or export your API key directly:
export SAMBANOVA_API_KEY="your-sambanova-api-key"
You can obtain your API key from the SambaNova Cloud portal.
Run the local LLM server
Start the Flask server:
The server will run on http://localhost:5000/.
Expose the server using ngrok
In a separate terminal, run:
ngrok will generate a public URL similar to:
https://abcd-1234.ngrok-free.dev
This is the endpoint Vapi will call.
Test the endpoint
Verify your setup with a cURL request:
curl -X POST https://abcd-1234.ngrok-free.dev/chat/completions \
-H "Content-Type: application/json" \
-d '{
"call": "chat.completions",
"metadata": {
"request_id": "example-123"
},
"model": "gpt-oss-120b",
"messages": [
{
"role": "system",
"content": "You are a helpful assistant."
},
{
"role": "user",
"content": "Hello! Explain what an LLM is in one sentence."
}
],
"temperature": 0.7,
"max_tokens": 150,
"stream": true
}'
Replace https://abcd-1234.ngrok-free.dev with your actual ngrok URL.
-
Log in to the Vapi Dashboard.
-
Create a new assistant using a Blank Template.
-
Navigate to Model → Provider → Custom LLM.
-
Enter the model name you’ll use (for example,
gpt-oss-120b).
-
Paste your ngrok URL into the endpoint URL field:
https://abcd-1234.ngrok-free.dev/chat/completions
-
Save the configuration.
Additional resources
For detailed instructions and the complete source code, see the Vapi integration example on GitHub.
Vapi documentation
For more information about Vapi, see the official Vapi documentation.