Text generation

This document describes different aspects of text generation, including types of generation, model selection, creating prompts, and managing multi-turn conversations.

Types of generations

You can use various methods to generate text, including non-streaming, streaming, and asynchronous completions.

Simple generation (non-streaming)

Use the following code to perform text generation with the OpenAI Python client in a non-streaming manner.

Simple text generation python code
from openai import OpenAI
client = OpenAI(
    base_url="https://api.sambanova.ai/v1", 
    api_key="<your-api-key>"
)
completion = client.chat.completions.create(
    model="Meta-Llama-3.1-8B-Instruct",
    messages = [  
        {"role": "system", "content": "Answer the question in a couple sentences."},
        {"role": "user", "content": "Share a happy story with me"}
    ]
)
print(completion.choices[0].message.content)

Asynchronous generation (non-streaming)

For asynchronous completions, use the AsyncOpenAI Python client, as shown below.

Asynchronous text generation python code
from openai import AsyncOpenAI
import asyncio
async def main():
    client = AsyncOpenAI(
        base_url="https://api.sambanova.ai/v1", 
        api_key="<your-api-key>"
    )
    completion = await client.chat.completions.create(
        model="Meta-Llama-3.1-8B-Instruct",
        messages = [
            {"role": "system", "content": "Answer the question in a couple sentences."},
            {"role": "user", "content": "Share a happy story with me"}
        ]
    )
    print(completion.choices[0].message.content)
asyncio.run(main())

Streaming response

For real-time streaming completions, use the following approach with the OpenAI Python client.

Streaming response python code
from openai import OpenAI
client = OpenAI(
    base_url="https://api.sambanova.ai/v1", 
    api_key="<your-api-key>"
)
completion = client.chat.completions.create(
    model="Meta-Llama-3.1-8B-Instruct",
    messages = [
        {"role": "system", "content": "Answer the question in a couple sentences."},
        {"role": "user", "content": "Share a happy story with me"}
    ],
    stream = True
)
for chunk in completion:
  print(chunk.choices[0].delta.content, end="")

Asynchronous streaming

You can leverage the AsyncOpenAI Python client to enable asynchronous streaming.

Asynchronous streaming python code
from openai import AsyncOpenAI
import asyncio
async def main():
    client = AsyncOpenAI(
        base_url="https://api.sambanova.ai/v1", 
        api_key="<your-api-key>"
    )
    completion = await client.chat.completions.create(
        model="Meta-Llama-3.1-8B-Instruct",
        messages = [
            {"role": "system", "content": "Answer the question in a couple sentences."},
            {"role": "user", "content": "Share a happy story with me"}
        ],
        stream = True
    )
    async for chunk in completion:
        print(chunk.choices[0].delta.content, end="")
asyncio.run(main())

Model selection

Models differ in their architecture, impacting their speed and response quality. Selecting a model depends on the factors shown below.

Factor	Consideration
Task complexity	Larger models are better suited for complex tasks.
Accuracy requirements	Larger models generally offer higher accuracy.
Cost and resources	Larger models come with increased costs and resource demands.

Experiment with various models to find the one that best fits your specific use case.

Creating effective prompts

Prompt engineering is the practice of designing and refining prompts to optimize responses from large language models (LLMs). This process is iterative and requires experimentation to achieve the best possible outcomes.

Building a prompt

A basic prompt can be as simple as a few words to elicit a response from the LLM. However, for more complex use cases, additional elements may be needed, as shown below.

Element	Description
Defining a persona	Assigning a specific role to the model (e.g., “You are a financial advisor”).
Providing context	Supplying background information to guide the model’s response.
Specifying output format	Instructing the model to respond in a particular style (e.g., JSON, bullet points, structured text).
Describing a use case	Clarifying the goal of the interaction.

Advanced prompting techniques

To improve response quality and reasoning, more advanced techniques can be used.

Technique	Description
In-context learning	Providing examples of desired outputs to guide the model.
Chain-of-Thought (CoT) prompting	Encouraging the model to articulate its reasoning before delivering a response.

For more details about prompt engineering, see A Systematic Survey of Prompt Engineering in Large Language Models: Techniques and Applications.

Messages and roles

In chat-based interactions, messages are represented as dictionaries with specific roles and content.

Element	Description
`role`	Specifies who is sending the message.
`content`	Contains the message text.

Common roles

Roles are typically categorized as system, user, or assistant.

Role	Description
`system`	Provides general instructions to the model.
`user`	Represents user input.
`assistant`	Contains the model’s response.

Multi-turn conversation

To maintain context across multiple exchanges, messages in a conversational AI system are typically stored as a list of dictionaries. Each dictionary contains keys that specify the sender’s role and the message content. This structure helps the system track context across multiple turns in a conversation.

Below is an example of how a multi-turn conversation is structured using the Meta-Llama-3.1-8B-Instruct model:

Structuring Multi-Turn Conversations in Meta-Llama-3.1-8B-Instruct using python
completion = client.chat.completions.create(
    model="Meta-Llama-3.1-8B-Instruct",
    messages = [
        {"role": "user", "content": "Hi! My name is Peter and I am 31 years old. What is 1+1?"},
        {"role": "assistant", "content": "Nice to meet you, Peter. 1 + 1 is equal to 2"},
        {"role": "user", "content": "What is my age?"}
    ],
    stream = True
)
for chunk in completion:
  print(chunk.choices[0].delta.content, end="")

After running the program, you should see an output similar to the following.

Example output
You told me earlier, Peter. You're 31 years old.

By structuring conversations this way, the model can maintain context, recall prior user inputs, and provide more coherent responses.

Considerations for long conversations

When engaging in long conversations with LLMs, certain factors such as token limits and memory constraints must be considered to ensure accuracy and coherence.

Token limits - LLMs have a fixed context window, limiting the number of tokens they can process in a single request. If the input exceeds this limit, the system might truncate it, leading to incomplete or incoherent responses.
Memory constraints - The model does not retain context beyond its input window. To preserve context, past messages should be re-included in prompts.

By structuring prompts effectively and managing conversation history, you can optimize interactions with LLMs for better accuracy and coherence.

Get started

Capabilities

Build with SambaNova

Integrations

Examples

Resources

Types of generations

Simple generation (non-streaming)

Asynchronous generation (non-streaming)

Streaming response

Asynchronous streaming

Model selection

Creating effective prompts

Building a prompt

Advanced prompting techniques

Messages and roles

Common roles

Multi-turn conversation

Considerations for long conversations

Get started

Capabilities

Build with SambaNova

Integrations

Examples

Resources

​Types of generations

​Simple generation (non-streaming)

​Asynchronous generation (non-streaming)

​Streaming response

​Asynchronous streaming

​Model selection

​Creating effective prompts

​Building a prompt

​Advanced prompting techniques

​Messages and roles

​Common roles

​Multi-turn conversation

​Considerations for long conversations

Types of generations

Simple generation (non-streaming)

Asynchronous generation (non-streaming)

Streaming response

Asynchronous streaming

Model selection

Creating effective prompts

Building a prompt

Advanced prompting techniques

Messages and roles

Common roles

Multi-turn conversation

Considerations for long conversations