BLOOMChat
BLOOMChat is a 176 billion parameter multilingual chat model. It is instruction tuned from BLOOM (176B) on assistant-style conversation datasets and supports conversation, question answering, and generative answers in multiple languages.
Data preparation
The BLOOMChat training repo describes how to prepare data to be used to train SambaNova’s BLOOMChat model. The repo includes data preparation, tokenization, and the training and inference code, for BLOOMChat.
To access the data preparation package, including its associated documentation, please visit the SambaNova public GitHub using the following link: https://github.com/sambanova/bloomchat
Training and validation requirements
The expected requirements for training and validation datasets are described below.
-
The expected format of the data type is an HDF5 format, which is processed from the generative data preparation.
Example data directory pathDATA_DIR=../dataset_prep/oasst1_dolly_out/hdf5
-
Use tags to indicate which parts of conversation are from the user and which are from the model, as demonstrated in the repo .
-
Use
<human>
to indicate user parts of the conversation. -
Use
<bot>
to indicate model parts of the conversation.Example tagstext = f"<human>: {item['instruction']}\n<bot>: {item['response']}"
-
-
The code to load and get datasets is demonstrated in the repo .
Batch inference requirements
The expected format for batch inference is a JSON file, or directory of JSON files, as demonstrated in the repo .
--data_dir /import/mlds-sc-speech/daas/datasets/toy_generative
Prompt guidelines
The desired stop generations for stop tokens is <human>
. This helps avoid model hallucinations of new conversation dialogs, due to the way BLOOMChat is trained to continue with conversations.
For example, given the prompt <human>: how are you
, the model will generate a whole continuation that includes both parties.
<bot>: I am good! How are you?
<human>: doing great! You like ice cream?
<bot>: You bet!
This is because the model will only stop generating responses until it either hits a stop token or the maximum number of tokens it is allowed to generate. To ensure a meaningful response, we want to return the response to the first location of the <bot>
tag. The example below demonstrates a simple string truncation to accomplish this.
generated_output = "<bot>: I am good! How are you?\n<human>: doing great! You like ice cream?\n<bot>: You bet!"
#want to get the first element
true_response = generated_output.split("<human>")[0]
print(true_response)
#Output of true_response: <bot>: I am good! How are you?\n
Prompts for chat
-
You will need to prepend the
<human>:
tag for anything the user has said or wants to say. -
Additionally, append the
<bot>:
tag with no text following the tag to describe what the model should generate.
To have a conversation history, prepend the necessary tags to the conversation as shown in the example below.
<human>: First question that was asked
<bot>: Model answer to the first question
<human>: Question that you want answered
<bot>:
Example prompts
The examples below demonstrate prompts for BLOOMchat. Each example is identified by a task type. Click the triangle to expand and view each example prompt.
Closed-book QA
<human>: What does it mean for the Fed to raise or lower interest rates?
<bot>: The Federal Reserve (also known as the Fed) is the central banking system of the United States. It was created in 1913, and is an independent agency of the federal government. The Fed has a dual mandate of maximizing employment and stabilizing prices. The Fed sets interest rates, and can influence the money supply and the credit markets through its actions.
Raising interest rates means that the Fed is trying to reduce the money supply in the economy, and is trying to discourage borrowing and encourage saving. Lowering interest rates means that the Fed is trying to stimulate economic growth by encouraging borrowing and reducing the cost of credit.
The Fed raises or lowers interest rates in order to influence the economy and achieve its goals for employment and price stability. Interest rates are one of the primary tools that the Fed uses to achieve its goals.
Brainstorming
<human>: Write a script in which Bob accidentally breaks his dad’s guitar
<bot>: Bob was in his room, practicing for his upcoming guitar solo, when he accidentally stepped on the guitar, causing it to fall and break. His dad, who was in the other room, heard the commotion and came running into Bob’s room, visibly angry. Bob tried to explain what had happened, but his dad wasn’t having any of it. He yelled at Bob, calling him careless and irresponsible, and told him he was grounded for a week. Bob was devastated, as this would mean missing his solo and disappointing his friends and family. He knew he was in trouble, but he couldn’t help but feel a little bit guilty too.
Tourism
<human>: List the top ten places to visit in Delhi
<bot>: 1. Qutub Minar
2. Red Fort
3. India Gate
4. Jama Masjid
5. Lotus Temple
6. Akshardham Temple
7. Humayun’s Tomb
8. Purana Quila
9. National Museum
10. Raj Ghat
Sentiment analysis
<human>: Classify the sentiment of the following sentence into Positive, Neutral, or Negative. Do it on a scale of 1/10: How about the following sentence: It is raining outside and I feel so blue
<bot>: The sentiment of the sentence "It is raining outside and I feel so blue" is negative, with a score of 9/10.
Information extraction
<human>: Answer the following question using the context below:
Q: Which regulatory body is involved?
Context: U.S. authorities launched emergency measures on Sunday to shore up confidence in the banking system after the failure of Silicon Valley Bank (SIVB.O) threatened to trigger a broader financial crisis. After a dramatic weekend, regulators said the failed bank’s customers will have access to all their deposits starting Monday and set up a new facility to give banks access to emergency funds. The Federal Reserve also made it easier for banks to borrow from it in emergencies. While the measures provided some relief for Silicon Valley firms and global markets on Monday, worries about broader banking risks remain and have cast doubts over whether the Fed will stick with its plan for aggressive interest rate hikes.
<bot>: The relevant regulatory body in this context is the Federal Reserve.
QA (concept explanation)
<human>: Tell me something about organic photovoltaic
<bot>: Organic photovoltaics (OPVs) are a type of photovoltaic (PV) technology that use organic materials as the active layer in the device. Organic materials have the advantage of being lightweight, flexible, and low-cost compared to traditional inorganic semiconductors used in photovoltaics. The active layer in an OPV device absorbs photons from sunlight, creating electron-hole pairs that are separated by the device’s internal electric field, generating a flow of electrons that can be collected and used to power a device. The active layer in an OPV device is typically a blend of organic semiconducting materials and can be printed or sprayed onto a substrate using low-cost manufacturing techniques. This makes OPVs an attractive technology for applications where traditional photovoltaics are not suitable, such as on building or vehicle surfaces, or on flexible substrates.
Usage
BLOOMChat was trained with the human bot templating in mind. Follow the prompt guidelines to get the most out of the model.
-
Use this checkpoint for chatbot related tasks.
-
Use this checkpoint with few-shot prompting to dictate how you want the model to respond.
-
Use this checkpoint with context retrieval to get reduced hallucination results.
Playground tuning parameters
The Playground tuning parameters provide additional flexibility and options for generative tuning.
We recommend setting Do sampling to On when using BLOOMChat in the Playground. This is to ensure you get varied results when chatting with the model.
Inference settings
The inference settings for human aligned (HA) models when creating a batch inference job are described below.
Parameter | Definition | Allowed values |
---|---|---|
|
Toggles whether to use sampling. If not enabled, greedy decoding is used. When enabled, the platform randomly picks the next word according to its conditional probability distribution. Language generation using sampling does not remain deterministic. If you need to have deterministic results, set this to off, as the model is less likely to generate unexpected or unusual words. Setting it to on allows the model a better chance of generating a high quality response, even with inherent deficiencies. However, this is not desirable in an industrial pipeline as it can lead to more hallucinations and non-determinism. |
true, false Setting to true is recommended. Set to false to evaluate the model for benchmark tests. |
|
The maximum numbers of tokens to generate, ignoring the number of tokens in the prompt. When using max tokens to generate, make sure your total tokens for the prompt plus the requested max tokens to generate are not more than the supported sequence length of the model. You can use this parameter to limit the response to a certain number of tokens. The generation will stop under the following conditions:
|
1 → |
|
The repetition penalty, also known as frequency penalty, controls the model’s tendency to repeat predictions. The repetition penalty reduces the probability of words that have previously been generated. The penalty depends on how many times a word has previously occurred in the prediction. This parameter can be used to penalize words that were previously generated or belong to the context. It decreases the model’s likelihood to repeat the same line verbatim. |
Between 1 and 2. A value setting of 1 means no penalty. |
|
Stop sequences are used to make the model stop generating text at a desired point, such as the end of a sentence or a list. It is an optional setting that tells the API when to stop generating tokens. The completion will not contain the stop sequence. If nothing is passed, it defaults to the token |
Any comma separated strings, each stop word must be enclosed in double quotes. Using |
|
The value used to modulate the next token probabilities. As the value decreases, the model becomes more deterministic and repetitive. With a temperature between |
0 < x ⇐ 1 |
|
The number of highest probability vocabulary tokens to keep for top k filtering. Top k means allowing the model to choose randomly among the top k tokens by their respective probabilities. For example, choosing the top three tokens means setting the top k parameter to a value of |
1 ⇐ x ⇐ |
|
Shows the top |
0 ⇐ x ⇐ 20 |
|
Top p sampling, sometimes called nucleus sampling, is a technique used to sample possible outcomes of the model. It controls diversity via nucleus sampling as well as the randomness and originality of the model. The top p parameter specifies a sampling threshold during inference time. Top p shortlists the top tokens whose sum of likelihoods does not exceed a certain value. If set to less than |
0 < x ⇐ 1 |