The Playground provides an in-platform experience for generating predictions using deployed generative tuning endpoints. A user preset option is available to populate the input field to quickly experience generative tuning predictions. Alternatively, you can input text directly into the input field, without selecting a user preset.

A live generative tuning endpoint is required to use the Playground.

For more information on endpoints, see the Create and use endpoints document.

Option 1: Access the Playground from the left menu

To access the Playground experience directly, click Playground from the left menu. The Playground window will open.

Playground menu
Figure 1. Playground menu icon

Option 2: Access the Playground from an endpoint window

From an Endpoint window, click Try now . The Playground window will open.

Try Now
Figure 2. Generative tuning endpoint Try Now

Using the Playground

  1. Select your generative tuning endpoint from the Select endpoint drop-down.

    Endpoint select
    Figure 3. Playground endpoint select
  2. Select one of the presets from the User preset drop-down to populate the input/output field to quickly experience generative tuning. Alternatively, you can input text directly into the input/output field, without selecting a User preset.

  3. Click Submit to initiate a generative tuning prediction by the platform.

    1. The generated response will be displayed in the input/output field with highlighted blue text.

      Playground window
      Figure 4. Playground user preset


Tokens are basic units based on text that are used when processing a prompt to generate a language output, or prediction. They can be thought of as pieces of words. Tokens are not defined exactly on where a word begins or ends and can include trailing spaces (spaces after a word) and subwords. Before the processing of a prompt begins, the input is broken into tokens.

The SambaStudio Playground displays a token count for each submission. The display shows the total number of tokens for the current submission relative to the maximum number of tokens supported by the model.

Token count
Figure 5. Token count

Tuning parameters

Tuning parameters provide additional flexibility and options for generative tuning. Adjusting these parameters allows you to search for the optimal values to maximize the performance and output of the response.

Tuning parameters
Figure 6. Tuning parameters panel

The following parameters are available in the Playground.

Do sampling

Toggles whether to use sampling. If not enabled, greedy decoding is used. When enabled, the platform randomly picks the next word according to its conditional probability distribution. Language generation using sampling does not remain deterministic. If you need to have deterministic results, set this as false, as the model is less likely to generate unexpected or unusual words. Setting it to true allows the model a better chance of generating a high quality response, even with inherent deficiencies. However, this is not desirable in an industrial pipeline as it can lead to more hallucinations and non-determinism.

Default value: Off

Max tokens to generate

The maximum numbers of tokens to generate, ignoring the number of tokens in the prompt. When using max tokens to generate, make sure your total tokens for the prompt plus the requested max tokens to generate are not more than the supported sequence length of the model. You can use this parameter to limit the response to a certain number of tokens. The generation will stop under the following conditions:

  1. When the model stops generating due to <|endoftext|>.

  2. The generation encounters a stop sequence set up in the parameters.

  3. The generation reaches the limit for max tokens to generate.

    Default value: 100
    Min value: 1
    Max value: 2048

Repetition penalty

The repetition penalty, also known as frequency penalty, controls the model’s tendency to repeat predictions. The repetition penalty reduces the probability of words that have previously been generated. The penalty depends on how many times a word has previously occurred in the prediction. This parameter can be used to penalize words that were previously generated or belong to the context. It decreases the model’s likelihood to repeat the same line verbatim.

A value setting of 1 means no penalty.

Default value: 1
Min value: 1
Max value: 10

Return logits

Shows the top <number> (the numerical value entered) of tokens by its probability to be generated. This indicates how likely a token was to be generated next. This helps debug a given generation and see alternative options to the generated token. The highlighted token is the one that the model predicted with the list sorted by probabilities from high to low, until the top <number> is reached. On the basis of tuning other parameters, you can use the logits feature to analyze how the predicted tokens by the model might change.

Default value: 0
Min value: 0
Max value: 20

Click the highlighted token to display the list.

Return logits
Figure 7. Return logits set to 4
Stop sequences

Stop sequences are used to make the model stop generating text at a desired point, such as the end of a sentence or a list. It is an optional setting that tells the API when to stop generating tokens. The completion will not contain the stop sequence. If nothing is passed, it defaults to the token <|endoftext|>. This token represents a probable stopping point in the text.

Max words: 4


The value used to modulate the next token probabilities. As the value decreases, the model becomes more deterministic and repetitive. With a temperature between 0 and 1, the randomness and creativity of the model’s predictions can be controlled. A temperature parameter close to 1 would mean that the logits are passed through the softmax function without modification. If the temperature is close to 0, the highest probable tokens will become very likely compared to the other tokens: the model becomes more deterministic and will always output the same set of tokens after a given sequence of words.

Default value: 1
Min value" 0.01
Max value: 1

Top k

The number of highest probability vocabulary tokens to keep for top k filtering. Top k means allowing the model to choose randomly among the top k tokens by their respective probabilities. For example, choosing the top three tokens means setting the top k parameter to a value of 3. Changing the top k parameter sets the size of the shortlist the model samples from as it outputs each token. Setting top k to 1 gives us greedy decoding.

Default value: 50
Min value: 1
Max value: 50257

Top p

Top p sampling, sometimes called nucleus sampling, is a technique used to sample possible outcomes of the model. It controls diversity via nucleus sampling as well as the randomness and originality of the model. The top p parameter specifies a sampling threshold during inference time. Top p shortlists the top tokens whose sum of likelihoods does not exceed a certain value. If set to less than 1, only the smallest set of most probable tokens with probabilities that add up to top p or higher are kept for generation.

Default value: 1
Min value: 0.01
Max value: 1

View tuning parameter information in the GUI

A tuning parameter’s definition and values are viewable within the SambaStudio GUI. Follow the steps below to view information for a tuning parameter.

  1. In the Tuning parameters panel, hover over the parameter name you wish to view. An overview parameter card will display.

  2. Click the > (right arrow) to display the complete parameter card that includes its definition and values.

  3. Click the X to close the complete parameter card.

Tuning parameter card
Figure 8. Tuning parameter information

View code

Follow the steps below to view and copy code generated from the current input.

  1. Click View code to open the View code window.

  2. Click the CURL, CLI, or Python SDK tab to view the corresponding code block and make a request programmatically.

  3. Click Copy code to copy the selected code block to your clipboard.

  4. Click Close, or the upper right X, to close the window and return to the Playground.

View code window
Figure 9. View code

Download results

The platform allows you to download the results of the last response provided. After receiving a generative tuning prediction from the platform, click Download results. The file will be downloaded to the location configured by your browser.

Download results
Figure 10. Download results