> ## Documentation Index
> Fetch the complete documentation index at: https://sambanova-systems.mintlify.site/docs/llms.txt
> Use this file to discover all available pages before exploring further.

# Getting started with custom chat templates and output parsing

When you call the SambaNova Chat Completions API, the platform applies the model's default Jinja-based chat template server-side, formatting your messages into the raw prompt the model receives. For most use cases this is the right behavior. However, some scenarios require you to take control of prompt formatting and output parsing on the client side.

This page explains when and why to use the Completions API with custom templates, and how to implement custom output parsers. For a complete interactive walkthrough, see the [Custom Chat Templates AI Starter Kit](https://github.com/sambanova/ai-starter-kit/tree/main/chat_templates).

## When to use custom chat templates

Use the Completions API with a custom chat template instead of the Chat Completions API when:

* **You need full control over prompt structure.** Some workflows require injecting custom variables, special tokens, or instructions that are not exposed through the Chat Completions API parameters. For standard models available on SambaCloud with no customization, the Chat Completions API with built-in function calling is the recommended approach. See [Function calling and JSON mode](/en/features/function-calling).
* **You are using a BYOC (Bring Your Own Checkpoint) model.** Fine-tuned checkpoints deployed on SambaStack may use a different chat template than the base model. Letting the server apply the base model's default template produces incorrect prompts for these checkpoints.
* **Your model uses a non-standard tool-call output format.** Fine-tuned models may emit tool calls in a format the default parsers do not handle, for example XML markers instead of JSON.

## How it works

Instead of calling `/v1/chat/completions`, you render the prompt string yourself and send it directly to `/v1/completions`. The server receives a raw string and continues generation from it, applying no template of its own.

The workflow has four steps:

1. **Load a chat template.** Either pull the Jinja template from a Hugging Face tokenizer or write a custom one.
2. **Render the prompt.** Apply the template to your messages and tool definitions to produce a raw prompt string.
3. **Call the Completions API.** Send the rendered string to `/v1/completions`.
4. **Parse the output.** Convert the raw text response into a structured assistant message with tool calls.

## Load a chat template

### From a Hugging Face model

Use the `transformers` library to load the tokenizer for your base model and extract its built-in chat template.

```python theme={null}
from transformers import AutoTokenizer
 
tokenizer = AutoTokenizer.from_pretrained(
    "meta-llama/Llama-3.1-8B-Instruct",
    use_auth_token="<your-hf-token>"
)
chat_template = tokenizer.chat_template
```

A Hugging Face token is required for gated models such as the Llama families. See [Hugging Face token settings](https://huggingface.co/settings/tokens).

### Define a custom Jinja template

If your checkpoint uses a different template than the base model, write a Jinja2 template directly. Your template must handle the `messages`, `tools`, and `add_generation_prompt` variables at minimum.

```python theme={null}
from jinja2 import Environment, TemplateSyntaxError
 
custom_template = """
{{ bos_token }}
{%- for message in messages %}
    {%- if message['role'] == 'user' %}
        {{ '<|User|>' + message['content'] }}
    {%- elif message['role'] == 'assistant' %}
        {{ '<|Assistant|>' + message['content'] + eos_token }}
    {%- endif %}
{%- endfor %}
{% if add_generation_prompt %}{{ '<|Assistant|>' }}{% endif %}
"""
 
# Validate syntax before use
try:
    Environment().parse(custom_template)
except TemplateSyntaxError as e:
    raise ValueError(f"Invalid Jinja template at line {e.lineno}: {e.message}")
```

## Render the prompt

Apply the template to your messages and tool definitions using Jinja2. Pass tokenizer attributes such as `bos_token` and `eos_token` as context variables when using a template loaded from a tokenizer. For custom templates, supply these values explicitly.

```python theme={null}
from jinja2 import Template
from datetime import datetime
 
messages = [
    {"role": "system", "content": "You are a helpful assistant."},
    {"role": "user", "content": "What is the population of Bogota?"}
]
 
tools = [
    {
        "type": "function",
        "function": {
            "name": "get_population",
            "description": "Returns the population of a city.",
            "parameters": {
                "type": "object",
                "properties": {
                    "city": {"type": "string", "description": "Name of the city"}
                },
                "required": ["city"]
            }
        }
    }
]
 
context = {
    "messages": messages,
    "tools": tools,
    "add_generation_prompt": True,
    "bos_token": "<|begin_of_text|>",
    "eos_token": "<|eot_id|>",
    "date_string": datetime.now().strftime("%d %b %Y"),
}
 
rendered_prompt = Template(chat_template).render(**context).strip()
```

## Call the Completions API

Send the rendered prompt string to the `/v1/completions` endpoint. This endpoint accepts a raw string and returns a raw string — no template is applied server-side.

```python theme={null}
from sambanova import SambaNova
 
client = SambaNova(
    api_key="<your-sambanova-api-key>",
    base_url="https://api.sambanova.ai/v1"
)
 
response = client.completions.create(
    model="Meta-Llama-3.1-8B-Instruct",
    prompt=rendered_prompt,
    max_tokens=2048,
    temperature=0.0,
)
 
raw_output = response.choices[0].text
```

## Parse model output

The raw text response must be parsed into a structured assistant message. The correct parser depends on the tool-call format your model emits.

### JSON format (Llama-style)

Llama instruction-tuned models emit tool calls as JSON objects in the response text.

```python theme={null}
import json
 
def parse_llama_output(response: str) -> list[dict]:
    """Extract JSON tool calls from a Llama-style response."""
    tool_calls = []
    brace_count, start = 0, None
 
    for i, ch in enumerate(response):
        if ch == "{":
            if brace_count == 0:
                start = i
            brace_count += 1
        elif ch == "}" and start is not None:
            brace_count -= 1
            if brace_count == 0:
                block = response[start:i + 1]
                obj = json.loads(block)
                tool_calls.append({
                    "type": "function",
                    "function": {
                        "name": obj["name"],
                        "arguments": json.dumps(obj["parameters"])
                    }
                })
                start = None
    return tool_calls
```

### XML format (DeepSeek-style)

DeepSeek models use XML markers to delimit tool calls.

```python theme={null}
import re
import json
 
def parse_deepseek_output(response: str) -> list[dict]:
    """Extract XML-delimited tool calls from a DeepSeek-style response."""
    tool_calls = []
    pattern = r"<｜tool▁call▁begin｜>(.*?)<｜tool▁sep｜>(.*?)<｜tool▁call▁end｜>"
 
    for name, args in re.findall(pattern, response, re.DOTALL):
        tool_calls.append({
            "type": "function",
            "function": {
                "name": name.strip(),
                "arguments": args.strip()
            }
        })
    return tool_calls
```

### Build the assistant message

Once tool calls are extracted, assemble the final assistant message in OpenAI-compatible format.

```python theme={null}
def build_assistant_message(response: str, tool_calls: list) -> dict:
    if tool_calls:
        return {"role": "assistant", "content": None, "tool_calls": tool_calls}
    return {"role": "assistant", "content": response.strip(), "tool_calls": []}
```

## Custom parsers

If your model emits tool calls in a format other than the default one, implement a custom parser. Your parser must accept the raw response string and return a list of tool-call dicts in OpenAI-compatible format.

```python theme={null}
def parse(response: str) -> list[dict]:
    """
    Custom parser template.
    Returns a list of tool-call dicts:
    [{"type": "function", "function": {"name": str, "arguments": str}}, ...]
    """
    # Implement your extraction logic here
    return []
```

> **Note:** Custom parsers execute user-supplied code. Only run code you trust. This is not a sandboxed execution environment.

## Next steps

* Explore the full end-to-end workflow, interactive Streamlit app, and Jupyter notebook in the [Custom Chat Templates AI Starter Kit](https://github.com/sambanova/ai-starter-kit/tree/main/chat_templates).
* For standard function calling without custom templates, see [Function calling and JSON mode](/en/features/function-calling).
