API reference and Swagger
This document contains API reference information and describes how to access and interact with the SambaStudio Swagger framework.
Online generative inference
Once you have deployed an endpoint for a generative model, you can run online inference against it to get completions for prompts.
Text models
Creates a model response for the given chat conversation or a text prompt.
API Type | HTTP Method | Endpoint |
---|---|---|
Predict |
|
The URL of the endpoint displayed in the Endpoint window. Example: |
Stream |
|
The Stream URL of the endpoint displayed in the Endpoint window. Example: |
Request body
Attributes | Type | Description |
---|---|---|
inputs |
Array (strings) |
A list of prompts to provide to the model. |
param |
JSON object |
Allows setting the tuning parameters to be used, specified as key value pairs.
|
curl --location '<your-endpoint-url>' \
--header 'Content-Type: application/json' \
--header 'key: <your-endpoint-key>' \
--data '{
"inputs": [
"Whats the capital of Austria?"
],
"params": {
"do_sample": {
"type": "bool",
"value": "false"
},
"max_tokens_to_generate": {
"type": "int",
"value": "100"
},
"repetition_penalty": {
"type": "float",
"value": "1"
},
"temperature": {
"type": "float",
"value": "1"
},
"top_k": {
"type": "int",
"value": "50"
},
"top_logprobs": {
"type": "int",
"value": "0"
},
"top_p": {
"type": "float",
"value": "1"
}
}
}'
Predict response
Attributes | Type | Description |
---|---|---|
|
Array |
Array of response for each prompt in the input array. |
|
String |
Indicates when the model stopped generating subsequent tokens. Possible values can be |
|
String |
The model’s prediction for the input prompt. |
|
Float |
Count of the total tokens generated by the model. |
|
Array |
Array of the tokens generated for the given prompt. |
|
JSON |
The top N tokens by its probability to be generated. This indicates how likely was a token to be generated next. The value will be null if |
|
String |
The prompt provided in the input. |
{
"data": [
{
"prompt": "Whats the capital of Austria?",
"tokens": [
"\n",
"\n",
"Answer",
":",
"The",
"capital",
"of",
"Austria",
"is",
"Vienna",
"(",
"G",
"erman",
":",
"Wien",
")."
],
"total_tokens_count": 24.0,
"completion": "\n\nAnswer: The capital of Austria is Vienna (German: Wien).",
"logprobs": {
"top_logprobs": null,
"text_offset": null
},
"stop_reason": "end_of_text"
}
]
}
Stream response
If the request is streamed, the response will be a sequence of chat completion objects with a final response indicating its completion.
Attributes | Type | Description |
---|---|---|
|
String |
Indicates when the model stopped generating subsequent tokens. Possible values can be |
|
String |
The model’s prediction for the input prompt. |
|
Float |
Count of the total tokens generated by the model. |
|
Array |
Array of the tokens generated for the given prompt. |
|
JSON |
The top N tokens by its probability to be generated. This indicates how likely was a token to be generated next. The value will be null if |
|
String |
The prompt provided in the input. |
|
Boolean |
To determine if it is the last response from the model. |
|
String |
The stream token for the response. |
{
"prompt": "",
"tokens": null,
"stop_reason": "",
"logprobs": {
"top_logprobs": null,
"text_offset": null
},
"is_last_response": false,
"completion": "",
"total_tokens_count": 0.0,
"stream_token": "?\n\nAnswer: "
}
{
"prompt": "Whats the capital of Austria",
"tokens": [
"?",
"\n",
"\n",
"Answer",
":",
"The",
"capital",
"of",
"Austria",
"is",
"Vienna",
"(",
"G",
"erman",
":",
"Wien",
")."
],
"stop_reason": "end_of_text",
"logprobs": {
"top_logprobs": null,
"text_offset": null
},
"is_last_response": true,
"completion": "?\n\nAnswer: The capital of Austria is Vienna (German: Wien).",
"total_tokens_count": 24.0,
"stream_token": ""
}
{"prompt": "", "tokens": null, "stop_reason": "", "logprobs": {"top_logprobs": null, "text_offset": null}, "is_last_response": false, "completion": "", "total_tokens_count": 0.0, "stream_token": "?\n\nAnswer: "}
{"prompt": "", "tokens": null, "stop_reason": "", "logprobs": {"top_logprobs": null, "text_offset": null}, "is_last_response": false, "completion": "", "total_tokens_count": 0.0, "stream_token": "The "}
....
....
...
{"stop_reason": "", "tokens": null, "prompt": "", "logprobs": {"top_logprobs": null, "text_offset": null}, "is_last_response": false, "completion": "", "total_tokens_count": 0.0, "stream_token": "Wien)."}
{"prompt": "Whats the capital of Austria", "tokens": ["?", "\n", "\n", "Answer", ":", "The", "capital", "of", "Austria", "is", "Vienna", "(", "G", "erman", ":", "Wien", ")."], "stop_reason": "end_of_text", "logprobs": {"top_logprobs": null, "text_offset": null}, "is_last_response": true, "completion": "?\n\nAnswer: The capital of Austria is Vienna (German: Wien).", "total_tokens_count": 24.0, "stream_token": ""}
Multimodal model
You can use the LLaVA multimodal API to generate both text and image inference.
HTTP Method | Endpoint |
---|---|
|
The URL of the endpoint displayed in the Endpoint window. |
Request body
Attributes | Type | Description |
---|---|---|
|
Array (JSON) |
An array of prompts along with an image to provide to the model (currently only one prompt and image is supported).
|
|
JSON object |
Allows setting the tuning parameters to be used, specified as key value pairs.
|
curl --location 'https://<host>/api/predict/generic/<project-id>/<endpoint-id>' \
--header 'Content-Type: application/json' \
--header 'key: <your-endpoint-key>' \
--data '{
"instances": [
{
"prompt": "A chat between a curious human and an artificial intelligence assistant. The assistant gives helpful, detailed, and polite answers to the humans questions. USER: What are the things I should be cautious about when I visit here? ASSISTANT:",
"image_content": "base64-encoded-string-of-image"
}
],
"params": {
"do_sample": {
"type": "bool",
"value": "false"
},
"max_tokens_to_generate": {
"type": "int",
"value": "100"
},
"repetition_penalty": {
"type": "float",
"value": "1"
},
"stop_sequences": {
"type": "str",
"value": ""
},
"temperature": {
"type": "float",
"value": "1"
},
"top_k": {
"type": "int",
"value": "50"
},
"top_logprobs": {
"type": "int",
"value": "0"
},
"top_p": {
"type": "float",
"value": "1"
}
}
}'
Response
Attributes | Type | Description |
---|---|---|
|
Array |
Array of response for each prompt in the input array. |
|
String |
Indicates when the model stopped generating subsequent tokens. Possible values can be |
|
String |
The model’s prediction for the input prompt. |
|
Float |
Count of the total tokens generated by the model. |
|
Array |
Array of the tokens generated for the given prompt. |
|
JSON |
The top N tokens by its probability to be generated. This indicates how likely was a token to be generated next. The value will be null if |
|
String |
The prompt provided in the input. |
|
JSON |
Details of the request. |
{
"status": {
"complete": true,
"exitCode": 0,
"elapsedTime": 2.8143582344055176,
},
"predictions": [
{
"completion": "The image shows a person standing in front of a large body of water, which could be an ocean or a lake. The person is wearing a wetsuit and appears to be preparing to go into the water.",
"logprobs": {
"top_logprobs": null,
"text_offset": null
},
"prompt": "A chat between a curious human and an artificial intelligence assistant. The assistant gives helpful, detailed, and polite answers to the humans questions. USER: What is shown in the image? ASSISTANT:",
"stop_reason": "end_of_text",
"tokens": [
"The",
"image",
"shows",
"a",
"person",
"standing",
"in",
"front",
"of",
"a",
"large",
"body",
"of",
"water",
",",
"which",
"could",
"be",
"an",
"ocean",
"or",
"a",
"lake",
".",
"The",
"person",
"is",
"we",
"aring",
"a",
"w",
"ets",
"uit",
"and",
"appears",
"to",
"be",
"prepar",
"ing",
"to",
"go",
"into",
"the",
"water",
"."
],
"total_tokens_count": 89
}
]
Online ASR inference
SambaStudio allows you to deploy an endpoint for automatic speech recognition (ASR) and run online inference against it, enabling live-transcription scenarios.
To run online inference for ASR, a The sample rate of the audio file must be 16kHz. The file must contain no more than 15s of audio. |
API reference
Request
HTTP Method | Endpoint |
---|---|
|
URL from the Endpoint window. |
Headers
Param | Description |
---|---|
|
The API Key from the Endpoint window. |
Examples
The examples below demonstrate a request and a response.
curl -k -X POST "<your-endpoint-url>" \
-H "key:<your-endpoint-key>" \
--form 'predict_file=@"/Users/username/Downloads/1462-170138-0001.flac"'
{
"status_code":200,
"data":["He has written a delightful part for her and she's quite inexpressible."]
}
Online inference for other NLP tasks
For non-generative tasks, the Try It feature provides an in-platform prediction generation experience. To use the Try It feature and generate predictions, your endpoint must have reached the Live status. Follow the steps below to use the Try It feature to generate predictions.
See the Create and use endpoints document for information on how to use endpoints in the platform. |
-
From an Endpoint window, click the Try Now button.
Figure 1. Try Now button-
The Try It window will open.
-
-
Input text into the Try It window to use the following options:
-
Click the Run button to view a response relative to the endpoint’s task.
Figure 2. Try It inputted text -
Click the Curl command, CLI Command, and Python SDK buttons to view how to make a request programmatically for each option.
Figure 3. Try It Curl command
-
SambaStudio Swagger framework
SambaStudio implements the OpenAPI Specification (OAS) Swagger framework to describe and use its REST APIs.
Access the SambaStudio Swagger framework
To access SambaStudio’s OpenAPI Swagger framework, add /api/docs
to your host server URL.
http://<sambastudio-host-domain>/api/docs
Interact with the SambaStudio APIs
For the Predict and Predict File APIs, use the information described in the [Online inference for generative inference] and [Online inference for ASR] sections of this document. |
You will need the following information when interacting with the SambaStudio Swagger framework.
- Project ID
-
When you viewing a Project window, the Project ID is displayed in the browser URL path after
…/projects/details/
. In the example below,cd6c07ca-2fd4-452c-bf3e-f54c3c2ead83
is the Project ID.Example Project ID pathhttp://<sambastudio-host-domain>/ui/projects/details/cd6c07ca-2fd4-452c-bf3e-f54c3c2ead83
See Projects for more information.
- Job ID
-
When you viewing a Job window, the Job ID is displayed in the browser URL path after
…/projects/details/<project-id>/jobs/
. In the example below,cb1ca778-e25e-42b0-bf43-056ab34374b0
is the Job ID.Example Job ID pathhttp://<sambastudio-host-domain>/ui/projects/details/cd6c07ca-2fd4-452c-bf3e-f54c3c2ead83/jobs/cb1ca778-e25e-42b0-bf43-056ab34374b0
See the Train jobs document for information on training jobs. See the Batch inference document for information on batch inference jobs.
- Endpoint ID
-
The Endpoint ID is displayed in the URL path of the Endpoint information window. The Endpoint ID is the last sequence of numbers.
Figure 4. Endpoint IDSee Create and use endpoints for more information.
- Key
-
The SambaStudio Swagger framework requires the SambaStudio API authorization key. This Key is generated in the Resources section of the platform. See SambaStudio resources for information on how to generate your API authorization key.
Figure 5. Resources section