Human Aligned (HA) models

This document provides information for SambaStudio’s Human Aligned (HA) models. These checkpoints have been trained on a small amount of data where prompts are given to humans, who manually write the completions to the prompts. This data is optimized for use in human-facing applications, such as a chatbot.

Data preparation for training

The Generative data preparation repo describes how to prepare data to be used to train SambaNova’s Human Aligned (HA) models. To access the data preparation package, including its associated documentation, please visit the SambaNova public GitHub using the following link: https://github.com/sambanova/generative_data_prep

Figure 1. Generative data preparation repo on SambaNova public GitHub

Prompt guidelines

End prompts with either a colon (:), a question mark (?), or another way of letting the model know that it is time for it to start generating. For example, using Please summarize the previous article: (with a colon) is a better prompt than Please summarize the previous article (without a colon). Adding these annotations tends to lead to better generations as it indicates to the model that you’ve finished your question and are expecting an answer.

Example prompts

The examples below demonstrate prompts for SambaNova’s Human Aligned (HA) models. Each example is identified by a task type. Click the triangle to expand and view each example prompt.

Open domain Q&A example 1

Prompt:
How does the future outlook of Apple Inc. look like?

Open domain Q&A example 2

Prompt:
What are the biggest challenges traditional banks face with users these days?

Extractive summarization

Prompt:
Please summarize the following paragraph into a markdown table: I have three options to consider. Option one is to walk to work that will take me about 3 hours but will cost me $0. Option two is to take my car, which will take me 20 minutes, but will cost me $5. Option 3 is to take an Uber that will take me 10 minutes but will cost me $20.

Sentence rephrase example 1

Prompt:
Reword the sentence 'I like go to the carnival, because its so much good street food.' to enhance its fluency.

Sentence rephrase example 2

Prompt:
Combine the following two sentences into one coherent sentence.\n1. Soon after the 1998 election Fred proclaimed he will step down as party leader. \n2. Soon after the 1998 election Fred began working as a chef.

Usage

Use the Human Aligned (HA) models for any Playground use cases or where there is direct human interaction with the checkpoint itself.

Playground tuning parameter settings

The Playground tuning parameters provide additional flexibility and options for generative tuning. We recommend the following settings for Human Aligned (HA) models used in the Playground.

Setting Do sampling to On is recommended when using Human Aligned (HA) models.
A Temperature of 0.7 or higher is recommended when using Human Aligned (HA) models.

Hyperparameters and settings

The hyperparameters and settings for the Human Aligned (HA) models when creating a training job are described below.

Parameter Definition Allowed values

Parameter	Definition	Allowed values
`do_eval`	Specifies if final evaluation is performed.	true, false
`eval_steps`	Period of evaluating the model in number of training steps: `evaluation_strategy` must be set to `steps` for `eval_steps` to have an affect.	Integer > 0
`evaluation_strategy`	Strategy to validate the model during training.	no, steps, epoch
`learning_rate`	The learning rate to use in optimizer.	0.0 < float < 1.0
`logging_steps`	Period of logging training loss in number of training steps.	Integer > 0
`lr_schedule`	Type of learning rate scheduler to use.	polynomial_decay_schedule_with_warmup, cosine_schedule_with_warmup, fixed_lr
`max_seq_length`	Sequence length to pad or truncate the dataset. Should be set to align your dataset size with your chosen model.	Defined by selected model.
`num_iterations`	The number of iterations to run.	Integer > 0
`prompt_loss_weight`	Loss scale for the prompt tokens.	0.0 < float < 1.0
`save_optimizer_state`	Determines whether to save the optimizer state when saving a checkpoint.	true, false
`save_steps`	Period of saving the model checkpoints in number of training steps.	Integer > 0
`skip_checkpoint`	Determines whether or not to skip the checkpoint.	true, false
`subsample_eval`	Subsample for the evaluation dataset.	0.0 < float < 1.0
`subsample_eval_seed`	Random seed to use for the subsample evaluation.	Integer > 0
`use_token_type_ids`	Determines whether to use token_type_ids to compute loss.	true, false Setting to true is recommended if Generative data preparation was used.
`vocab_size`	Maximum size of the vocabulary.	Defined by selected model.
`warmup_steps`	Number of warmup steps to use in learning rate scheduler in optimizer.	Integer > 0
`weight_decay`	Weight decay rate to use in optimizer.	0.0 < float < 1.0

do_eval

Specifies if final evaluation is performed.

true, false

eval_steps

Period of evaluating the model in number of training steps: evaluation_strategy must be set to steps for eval_steps to have an affect.

Integer > 0

evaluation_strategy

Strategy to validate the model during training.

no, steps, epoch

learning_rate

The learning rate to use in optimizer.

0.0 < float < 1.0

logging_steps

Period of logging training loss in number of training steps.

Integer > 0

lr_schedule

Type of learning rate scheduler to use.

polynomial_decay_schedule_with_warmup, cosine_schedule_with_warmup, fixed_lr

max_seq_length

Sequence length to pad or truncate the dataset. Should be set to align your dataset size with your chosen model.

Defined by selected model.

num_iterations

The number of iterations to run.

Integer > 0

prompt_loss_weight

Loss scale for the prompt tokens.

0.0 < float < 1.0

save_optimizer_state

Determines whether to save the optimizer state when saving a checkpoint.

true, false

save_steps

Period of saving the model checkpoints in number of training steps.

Integer > 0

skip_checkpoint

Determines whether or not to skip the checkpoint.

true, false

subsample_eval

Subsample for the evaluation dataset.

0.0 < float < 1.0

subsample_eval_seed

Random seed to use for the subsample evaluation.

Integer > 0

use_token_type_ids

Determines whether to use token_type_ids to compute loss.

true, false

Setting to true is recommended if Generative data preparation was used.

vocab_size

Maximum size of the vocabulary.

Defined by selected model.

warmup_steps

Number of warmup steps to use in learning rate scheduler in optimizer.

Integer > 0

weight_decay

Weight decay rate to use in optimizer.

0.0 < float < 1.0

Inference settings

The inference settings for Human Aligned (HA) models when creating a batch inference job are described below.

Parameter Definition Allowed values

Parameter	Definition	Allowed values
`do_sample`	Toggles whether to use sampling. If not enabled, greedy decoding is used. When enabled, the platform randomly picks the next word according to its conditional probability distribution. Language generation using sampling does not remain deterministic. If you need to have deterministic results, set this to off, as the model is less likely to generate unexpected or unusual words. Setting it to on allows the model a better chance of generating a high quality response, even with inherent deficiencies. However, this is not desirable in an industrial pipeline as it can lead to more hallucinations and non-determinism.	true, false Setting to true is recommended. If set to false, temperature, top_k, and top_p are ignored and will have no affect.
`max_seq_length`	Sequence length to pad or truncate the dataset. Should be set to align your dataset size with your chosen model.	Defined by selected model.
`max_tokens_to_generate`	The maximum numbers of tokens to generate, ignoring the number of tokens in the prompt. When using max tokens to generate, make sure your total tokens for the prompt plus the requested max tokens to generate are not more than the supported sequence length of the model. You can use this parameter to limit the response to a certain number of tokens. The generation will stop under the following conditions: When the model stops generating due to `<\|endoftext\|>`. The generation encounters a stop sequence set up in the parameters. The generation reaches the limit for max tokens to generate. This should not exceed `max_seq_length`.	1 → `max_sequence_length` of model
`repetition_penalty`	The repetition penalty, also known as frequency penalty, controls the model’s tendency to repeat predictions. The repetition penalty reduces the probability of words that have previously been generated. The penalty depends on how many times a word has previously occurred in the prediction. This parameter can be used to penalize words that were previously generated or belong to the context. It decreases the model’s likelihood to repeat the same line verbatim.	Between 1 and 2. ~1.2-1.5 A value setting of 1 means no penalty.
`stop_sequences`	Stop sequences are used to make the model stop generating text at a desired point, such as the end of a sentence or a list. It is an optional setting that tells the API when to stop generating tokens. The completion will not contain the stop sequence. If nothing is passed, it defaults to the token `<\|endoftext\|>`. This token represents a probable stopping point in the text.	Any comma separated strings, each stop word must be enclosed in double quotes. Example: "Stop phrase 1", "stop phrase 2 with sp3ciAl token$"
`temperature`	The value used to modulate the next token probabilities. As the value decreases, the model becomes more deterministic and repetitive. With a temperature between `0` and `1`, the randomness and creativity of the model’s predictions can be controlled. A temperature parameter close to `1` would mean that the logits are passed through the softmax function without modification. If the temperature is close to `0`, the highest probable tokens will become very likely compared to the other tokens: the model becomes more deterministic and will always output the same set of tokens after a given sequence of words. Setting to a value of `0.7` or higher is recommended.	0 < x ⇐ 1 Has no affect when do_sample is set to false.
`top_k`	The number of highest probability vocabulary tokens to keep for top k filtering. Top k means allowing the model to choose randomly among the top k tokens by their respective probabilities. For example, choosing the top three tokens means setting the top k parameter to a value of `3`. Changing the top k parameter sets the size of the shortlist the model samples from as it outputs each token. Setting top k to `1` gives us greedy decoding.	1 ⇐ x ⇐ `vocab_size` Has no affect when do_sample is set to false.
`top_logprobs`	Shows the top `<number>` (the numerical value entered) of tokens by its probability to be generated. This indicates how likely a token was to be generated next. This helps debug a given generation and see alternative options to the generated token. The highlighted token is the one that the model predicted with the list sorted by probabilities from high to low, until the top `<number>` is reached. On the basis of tuning other parameters, you can use the feature to analyze how the predicted tokens by the model might change.	0 ⇐ x ⇐ 20
`top_p`	Top p sampling, sometimes called nucleus sampling, is a technique used to sample possible outcomes of the model. It controls diversity via nucleus sampling as well as the randomness and originality of the model. The top p parameter specifies a sampling threshold during inference time. Top p shortlists the top tokens whose sum of likelihoods does not exceed a certain value. If set to less than `1`, only the smallest set of most probable tokens with probabilities that add up to top p or higher are kept for generation.	0 < x ⇐ 1 Has no affect when do_sample is set to false.
`vocab_size`	Maximum size of the vocabulary.	Defined by selected model.

do_sample

Toggles whether to use sampling. If not enabled, greedy decoding is used. When enabled, the platform randomly picks the next word according to its conditional probability distribution. Language generation using sampling does not remain deterministic. If you need to have deterministic results, set this to off, as the model is less likely to generate unexpected or unusual words. Setting it to on allows the model a better chance of generating a high quality response, even with inherent deficiencies. However, this is not desirable in an industrial pipeline as it can lead to more hallucinations and non-determinism.

true, false

Setting to true is recommended. If set to false, temperature, top_k, and top_p are ignored and will have no affect.

max_seq_length

Sequence length to pad or truncate the dataset. Should be set to align your dataset size with your chosen model.

Defined by selected model.

max_tokens_to_generate

The maximum numbers of tokens to generate, ignoring the number of tokens in the prompt. When using max tokens to generate, make sure your total tokens for the prompt plus the requested max tokens to generate are not more than the supported sequence length of the model. You can use this parameter to limit the response to a certain number of tokens. The generation will stop under the following conditions:

When the model stops generating due to <|endoftext|>.
The generation encounters a stop sequence set up in the parameters.
The generation reaches the limit for max tokens to generate.

This should not exceed max_seq_length.

1 → max_sequence_length of model

repetition_penalty

The repetition penalty, also known as frequency penalty, controls the model’s tendency to repeat predictions. The repetition penalty reduces the probability of words that have previously been generated. The penalty depends on how many times a word has previously occurred in the prediction. This parameter can be used to penalize words that were previously generated or belong to the context. It decreases the model’s likelihood to repeat the same line verbatim.

Between 1 and 2. ~1.2-1.5

A value setting of 1 means no penalty.

stop_sequences

Stop sequences are used to make the model stop generating text at a desired point, such as the end of a sentence or a list. It is an optional setting that tells the API when to stop generating tokens. The completion will not contain the stop sequence. If nothing is passed, it defaults to the token <|endoftext|>. This token represents a probable stopping point in the text.

Any comma separated strings, each stop word must be enclosed in double quotes.

Example: "Stop phrase 1", "stop phrase 2 with sp3ciAl token$"

temperature

The value used to modulate the next token probabilities. As the value decreases, the model becomes more deterministic and repetitive. With a temperature between 0 and 1, the randomness and creativity of the model’s predictions can be controlled. A temperature parameter close to 1 would mean that the logits are passed through the softmax function without modification. If the temperature is close to 0, the highest probable tokens will become very likely compared to the other tokens: the model becomes more deterministic and will always output the same set of tokens after a given sequence of words.

Setting to a value of 0.7 or higher is recommended.

0 < x ⇐ 1

Has no affect when do_sample is set to false.

top_k

The number of highest probability vocabulary tokens to keep for top k filtering. Top k means allowing the model to choose randomly among the top k tokens by their respective probabilities. For example, choosing the top three tokens means setting the top k parameter to a value of 3. Changing the top k parameter sets the size of the shortlist the model samples from as it outputs each token. Setting top k to 1 gives us greedy decoding.

1 ⇐ x ⇐ vocab_size

Has no affect when do_sample is set to false.

top_logprobs

Shows the top <number> (the numerical value entered) of tokens by its probability to be generated. This indicates how likely a token was to be generated next. This helps debug a given generation and see alternative options to the generated token. The highlighted token is the one that the model predicted with the list sorted by probabilities from high to low, until the top <number> is reached. On the basis of tuning other parameters, you can use the feature to analyze how the predicted tokens by the model might change.

0 ⇐ x ⇐ 20

top_p

Top p sampling, sometimes called nucleus sampling, is a technique used to sample possible outcomes of the model. It controls diversity via nucleus sampling as well as the randomness and originality of the model. The top p parameter specifies a sampling threshold during inference time. Top p shortlists the top tokens whose sum of likelihoods does not exceed a certain value. If set to less than 1, only the smallest set of most probable tokens with probabilities that add up to top p or higher are kept for generation.

0 < x ⇐ 1

Has no affect when do_sample is set to false.

vocab_size

Maximum size of the vocabulary.

Defined by selected model.