SambaStudio Python SDK#

Copyright © 2024 by SambaNova Systems, Inc. Disclosure, reproduction, reverse engineering, or any other use made without the advance written permission of SambaNova Systems, Inc. is unauthorized and strictly prohibited. All rights of ownership and enforcement are reserved.

class snsdk.SnSdk(host_url: str, access_key: str, command: str | None = None, user_agent: str = 'SambaStudioSDK/23.11.2')#

SambaNova Systems Python SDK for SambaStudio. You can create an instance of SnSdk class by passing the SambaStudio hostname and the access key.

Parameters:
  • host_url (str) – Base URL of the SambaStudio API service. Example: https://example-domain.net.

  • access_key (str) – SambaStudio API authorization key.

add_dataset(dataset_name: str, apps: Dict[Literal['ids', 'names'], List[str]], dataset_metadata: any, description: str, file_type: str, url: str, language: str, job_type: List[str], source: any) Dict#

[V2 Version] Add a new dataset.

Important

The specified dataset_path used in SambaStudio are relative paths to the storage root directory <NFS_root>.

Paths outside storage root cannot be used as SambaStudio does not have access to those directories.

Parameters:
  • dataset_name (str) – The name of the new dataset.

  • apps (list) – The list of ML Apps that the new dataset will be associated.

  • dataset_metadata (str) – The metadata for the dataset.

  • description (str) – Free-form text description of the dataset.

  • file_type (str) – Free-form text file types in the dataset.

  • url (str) – Free-form text including URL source of the dataset.

  • language (str) – Language of the NLP dataset.

  • job_type (List[str]) – The job type ([“evaluation”, “train”]/[“batch_predict”]/[“train”]/[“evaluation”]) for the dataset.

  • source (str) – The dataset source.

Returns:

The new dataset’s ID.

Return type:

dict

add_endpoint_api_key(project_id: str, endpoint_id: str, api_key: str, description: str) Dict#

Adds an API key to an endpoint.

Parameters:
  • id (str project) – The ID of the project.

  • endpoint_id (str) – The ID of the endpoint.

  • api_key (str) – The API key to be added.

  • description (str) – The API key description.

Returns:

The API keys for the endpoint.

Return type:

dict

add_model(project: str, model_checkpoint: str, model_checkpoint_name: str, job: str, description: str, checkpoint_type: str) Dict#

Add a model from an existing checkpoint.

Parameters:
  • project (str) – The project name or project_id that contains the checkpoint.

  • model_checkpoint (str) – The name of the checkpoint to use to create the new model.

  • model_checkpoint_name (str) – The name for the new model.

  • job (str) – The job name or job_id that contains the checkpoint.

  • description (str) – Model description.

  • checkpoint_type (str) – The type of checkpoint(pretrained/finetuned).

Returns:

The status code.

Return type:

dict

add_users(users: List[Dict[str, str]]) Dict#

Add the user.

Returns:

Response Dict

Return type:

dict

admin_job_list() Dict#

Returns the list of running jobs for all users.

Note

Use of this method requires the admin role.

Returns:

The list of running jobs for all users.

Return type:

dict

admin_resource_usage() Dict#

Returns the number of total and available RDUs.

Note

Use of this method requires the admin role.

Returns:

The number of total and available RDUs.

Return type:

dict

app_info(app: str) Dict#

Returns the app details for the ML App.

Parameters:

app (str) – The ML App name.

Returns:

The ML App details.

Return type:

dict

checkpoint_info(checkpoint_name: str)#

Returns the checkpoint information.

Parameters:

checkpoint_name – The name of the checkpoint.

Returns:

Checkpoint information if it exists.

Return type:

dict

complete_folder_upload(dataset_id: str, folder_id: str, file_list: List[Dict[str, any]]) Dict#

Gets Folder Upload status :param str dataset_id: The datasetId for the newly uploading dataset. :param str folder_id: The id refers to the folder being uploaded. :param List[Dict[str, any]] file_list: The list of files the folder contains. :rtype: dict

complete_multipart_file_upload(folder_id: str, file_path: str, chunk_size: int, upload_id: str, part_list: List[Dict[str, any]]) Dict#

Returns File upload status. :param str folder_id: The folderId for the newly uploaded dataset folder. :param str file_path: The path of the new dataset. :param str chunk_size: The size of parts in which the dataset is being divided into. :param str upload_id: The uploadId for this dataset upload. :param List[Dict[str, any]] part_list: The list of the chunks the file is divided into. :rtype: dict

create_endpoint(project: str, endpoint_name: str, description: str, model_checkpoint: str, instances: int, hyperparams: str, rdu_arch: str) Dict#

Creates a new endpoint.

Parameters:
  • project (str) – The project name or ID.

  • endpoint_name (str) – The name of the endpoint.

  • description (str) – The description of the endpoint.

  • model_checkpoint (str) – The model-checkpoint to used for the endpoint.

  • instances (str) – Specifies how many instances will be deployed.

Returns:

The endpoint ID.

Return type:

dict

create_folder_upload(dataset_id: str, file_list: List[str]) Dict#

Creates the folder upload. :param str dataset_id: The datasetId for the newly uploading dataset. :param List[Dict[str, any]] file_list: The list of files the folder contains. :returns The Datasetpath and folderUploadId. :rtype: dict

create_job(job_type: str, project: str, model_checkpoint: str, job_name: str, description: str, dataset: str, hyperparams: str, load_state: bool, sub_path: str, parallel_instances: int, rdu_arch: str) Dict#

Create a new job.

Parameters:
  • job_type (str) – The type of job to create. Options are train or batch_predict.

  • project (str) – The project name or ID in which to create the job.

  • model_checkpoint (str) – The model checkpoint to use.

  • job_name (str) – The name of the job.

  • description (str) – job description.

  • dataset (str) – The dataset to use for the job.

  • hyperparams (str) – The hyper-parameters for the job in JSON format.

  • load_state (bool) – Only load weights from the model checkpoint, if True.

  • sub_path (str) – folder/file path.

  • parallel_instances (int) – Number of data parallel instances to be run in a training job.

Returns:

The created job’s details.

Raises:

ValueError – raised if the job_type is not train or batch_predict.

Return type:

dict

create_multipart_file_upload(folder_id: str, file_path: str, chunk_size: int, hash_algo: str | None = None, hash_digest: str | None = None) Dict#

Create Multiple part of new dataset and start upload. :param str folder_id: The folderId for the newly uploaded dataset folder. :param str file_path: The path of the new dataset. :param str chunk_size: The size of parts in which the dataset is being divided into. :param str hash_algo: hashing algorithm used to compute the digest of original file. e.g. ‘crc32’ :param str hash_digest: The hash digest of the original file using hash_algo. :returns: The new CreateMultipartUploadResult’s Object with Path and UploadId. :rtype: dict

create_project(project_name: str, description: str) Dict#

Creates a new project.

Parameters:
  • project_name (str) – The name to be used for the new project.

  • description (str) – The project description.

Returns:

The project ID.

Return type:

dict

create_tenant(tenant_name: str, display_name: str | None = None) Dict#

Create a new tenant.

Parameters:
  • tenant_name (str) – The name of the new tenant.

  • display_name (str) – The display name of the new tenant [Deprecated].

Returns:

The tenant ID.

Return type:

dict

dataset_info(dataset: str) Dict#

Returns the details for the dataset.

Parameters:

dataset (str) – The dataset name or ID.

Returns:

The dataset details.

Return type:

dict

delete_checkpoint(checkpoint: str) Dict#

Deletes a checkpoint.

Parameters:

checkpoint (str) – The checkpoint name.

Returns:

The status code.

Return type:

dict

delete_dataset(dataset: str) Dict#

Deletes the specified dataset.

Parameters:

dataset (str) – The dataset name or ID.

Returns:

The status code.

Return type:

dict

delete_endpoint(project: str, endpoint: str) Dict#

Removes the endpoint.

Parameters:
  • project (str) – The project name or ID associated with the endpoint.

  • endpoint (str) – The endpoint name or ID.

Returns:

The status code.

Return type:

dict

delete_exported_model(model_id: str, model_activity_id: str) Dict#

Delete an exported model.

Parameters:
  • model_id (str) – id exported model id to be deleted.

  • model_activity_id (str) – model_activity_id is used to monitor the export status of the model. Currently copying model checkpoint from the NFS export is not supported, model_activity_id attribute is not used.

Returns:

The status code.

Return type:

dict

delete_job(project: str, job: str) Dict#

Delete the specified job.

Parameters:
  • project (str) – The project name or ID in which the job exists.

  • job (str) – The job name or ID to delete.

Returns:

The status code.

Return type:

dict

delete_model(model: str) Dict#

Delete a model.

Parameters:

model (str) – The model name or ID that you wish to delete.

Returns:

The status code.

Return type:

dict

delete_project(project: str) Dict#

Delete the given project.

Parameters:

project (str) – The project name or ID to delete.

Returns:

The status code.

Return type:

dict

delete_tenant(tenant_name: str) Dict#

Deletes a tenant.

Parameters:

tenant_name (str) – The name of the tenant to be deleted.

Returns:

The tenant object.

Return type:

dict

delete_user(user_id: str, tenant: str | None = None) Dict#

Delete the user.

Returns:

Response Dict

Return type:

dict

download_job_artifacts(project: str, job: str, dest_dir: str = './', artifact_type: str = 'results') Dict#

Downloads the requested job artifacts.

Parameters:
  • project (str) – The project name or ID in which the job exists.

  • job (str) – The job name or ID.

  • dest_dir (str) – destination directory for the download.

  • artifact_type (str) – it can be results or logs.

Returns:

The status code.

Return type:

dict

download_logs(project: str, job: str, dest_dir: str = './') Dict#

Downloads the job logs.

Parameters:
  • project (str) – The project ID.

  • job (str) – The job ID.

  • dest_dir (str) – destination directory for the download.

Returns:

The status code.

Return type:

dict

download_results(project: str, job: str, dest_dir: str = './') Dict#

Downloads the batch_predict job results.

Parameters:
  • project (str) – The project name or ID in which the job exists.

  • job (str) – The job name or ID to delete.

  • dest_dir (str) – destination directory for the download.

Returns:

The status code.

Return type:

dict

edit_endpoint_api_key(project_id: str, api_key: str, status: str, description: str) Dict#

Updates an API key status or description of the project.

Parameters:
  • project_id (str) – The ID of the project.

  • api_key (str) – The API key to be added.

  • status (str) – The status of the API key.

  • description (str) – The API key description.

Return type:

dict

endpoint_info(project: str, endpoint: str) Dict#

Gets the endpoint details.

Parameters:
  • project (str) – The project name or ID associated with the endpoint.

  • endpoint (str) – The endpoint name or ID.

Returns:

The endpoint details if exists, otherwise returns an error.

Return type:

dict

endpoint_info_by_id(endpoint: str) Dict#

Gets the endpoint details. :param str endpoint: The endpoint name or ID. :returns: The endpoint details if exists otherwise error. :rtype: dict

export_model(model_id, storage_type) Dict#

Export a model. SambaNova owned models cannot be exported.

Parameters:
  • model_id (str) – The model ID.

  • storage_type (str) – Storage type (Local) for the import source.

Returns:

The status code.

Return type:

dict

exported_model_list() Dict#

Returns the list of the exported models.

Returns:

The list of the exported models.

Return type:

dict

generate_dataset_id() Dict#

Generate datasetId for new dataset uploaded Returns: datasetId for new dataset uploaded.

get_feature_list()#

Returns list of feature flags.

get_file_upload_status(dataset_id: str, upload_id: str) Dict#

Gets the file upload status :param str dataset_id: The datasetId for the newly uploading dataset. :param str upload_id: The uploadId for this dataset upload.

get_metrics(project: str, job: str, random_sample_limit: int = -1) Dict#

Return the metrics for the specified job.

Parameters:
  • project (str) – The project name or ID in which the job exists.

  • job (str) – The job name or ID for which the metrics should be returned.

  • random_sample_limit (int) – Select metrics at random when the metric dataset exceeds the random_sample_limit.

Returns:

The training metrics in JSON format.

Return type:

dict

import_model(model_id: str, import_path: str, import_model_name: str, storage_type: str, steps: int) Dict#

import a model.

Parameters:
  • model_id (str) – Model ID.

  • import_path (str) – Source path to the model.

  • import_model_name (str) – Name for the imported model.

  • storage_type (str) – Storage type (Local) for the import source.

  • steps (int) – Specifies the step for the model checkpoint to start.

Returns:

The status code.

Return type:

dict

job_info(project: str, job: str, verbose: bool = False) Dict#

Return the details for the given job.

Parameters:
  • project (str) – The project name or ID in which the job exists.

  • job (str) – The job name or ID of the job to return.

  • verbose (bool) – True to include full config in output.

Returns:

The job details.

Return type:

dict

job_log_list(project: str, job: str) Dict#

Returns the list of log file names of the given jobs.

Parameters:
  • project (str) – The project ID.

  • job (str) – The job ID.

Returns:

The list of log file names of the given jobs.

Return type:

dict

job_log_preview(project: str, job: str, log_file: str) Dict#

Returns the the preview of given log file.

Parameters:
  • project (str) – The project ID.

  • job (str) – The job ID.

  • log_file (str) – log file name.

Return type:

dict

list_apps() Dict#

Returns the list of supported ML Apps.

Returns:

The list of ML Apps.

Return type:

dict

list_checkpoints(project: str, job: str) Dict#

Returns the list of checkpoints for associated project and job.

Parameters:
  • project (str) – The project name or project_id.

  • job (str) – The job name or job_id.

Returns:

A list of checkpoints generated by the job.

Return type:

dict

list_datasets() Dict#

Returns the list of supported datasets.

Returns:

The list of supported datasets

Return type:

dict

list_endpoints(project: str | None = None) Dict#

Returns all endpoints belonging to the user if the project is not specified. Returns the list of endpoints associated with the project when the project is specified.

Parameters:

project (str) – The project name or project_id.

Returns:

The list of the endpoints.

Return type:

dict

list_exports(project: str, job: str) Dict#

Return the list export results of batch predict results .

Parameters:
  • project (str) – The project name or ID.

  • job (str) – The job name or ID.

Returns:

List of exports.

Return type:

dict

list_jobs(project_id: str | None = None) Dict#

Return the list of jobs in the given project.

Return the list of all jobs for the user if the project_id is not set.

Parameters:

project_id (str) – The project ID.

Returns:

The job listing.

Return type:

dict.

list_models(verbose: bool = False) Dict#

Returns the list of available models.

Parameters:

verbose (bool) – If True provides detailed information about the models.

Returns:

Dict of models.

Return type:

dict

list_notifications(page: int | None = None, limit: int | None = None, levels: str | None = None, archived: bool | None = None, type: str | None = None, created_start: datetime | None = None, created_end: datetime | None = None, ordering: str | None = None, all: bool = False)#

Return a list of notifications for the current user.

Returns one page at a time with up to limit elements.

Raises ValueError if:

  • limit is <= 0, or

  • levels is not 10, 20, 30, 40, or 50, or

  • type is not ‘user’ or ‘admin’

list_projects() Dict#

Returns the list of projects.

Returns:

The list of projects.

Return type:

dict

list_roles() Dict#

Returns the list of roles.

Returns:

List of users.

Return type:

dict

list_tenants() Dict#

Return the list of tenants.

Returns:

List of tenants.

Return type:

dict

list_users() Dict#

Returns the list of users in the tenants and organizations.

Returns:

List of users.

Return type:

dict

login(server: str, username: str, password: str) Dict#

Login using username and password.

Parameters:
  • server (str) – Base URL of the SambaStudio API service. Example: https://example-domain.net.

  • username (str) – Username for login.

  • password (str) – Password for login.

Returns:

dict contains key.

Return type:

dict

model_info(model: str, job_type: str) Dict#

Returns the details of the model.

Parameters:
  • model (str) – The model name or ID to retrieve.

  • job_type (str) – job_type (train/batch_predict/deploy).

Returns:

The model details.

Return type:

dict

nlp_predict(project: str, endpoint: str, key: str, input: List[str] | str, params: str | None = None, trace_id: str | None = None) Dict#

NLP predict using inline input string.

Parameters:
  • project (str) – Project ID in which the endpoint exists.

  • endpoint (str) – Endpoint ID.

  • key (str) – API Key.

  • input (str) – Input string.

  • params (str) – Input params string.

  • trace_id (str) – tracce_id for telemetry.

Returns:

Prediction results.

Return type:

dict

nlp_predict_file(project: str, endpoint: str, key: str, file_path: str) Dict#

NLP predict using file input.

Parameters:
  • project (str) – Project ID in which the endpoint exists.

  • endpoint (str) – Endpoint ID.

  • key (str) – API Key.

  • file_path (str) – Input file location.

Returns:

Prediction results.

Return type:

dict

project_info(project: str) Dict#

Returns the project details.

Parameters:

project (str) – The project name or ID.

Returns:

The project details.

Return type:

dict

resume_job(job_type: str, project: str, job_name: str, hyperparams: str, load_state: bool) Dict#

Resume a existing job.

Parameters:
  • job_type (str) – The type of job to create. Options are train.

  • project (str) – The project name or ID in which to create the job.

  • job_name (str) – The name of the job.

  • hyperparams (str) – The hyper-parameters for the job in JSON format.

  • load_state (bool) – only load weights from the model checkpoint, if True.

Returns:

The created job’s details.

Raises:

ValueError – raised if the job_type is not train or batch_predict.

Return type:

dict

search_dataset(dataset_name: str) Dict#

Search by dataset name and returns its dataset_id, if it exits.

Parameters:

dataset_name (str) – The dataset name.

Returns:

The dataset ID(dataset_id).

Return type:

dict

search_job(project: str, job_name: str) Dict#

Search by project(name/project_id) and job name and returns its job_id, if it exits.

Parameters:
  • project (str) – The project name or project_id.

  • job_name (str) – The job name.

Returns:

The job ID(job_id).

Return type:

dict

search_model(model_name: str) Dict#

Search by model name and return its model_id if it exits.

Parameters:

model_name (str) – The model name for which to search.

Returns:

The model’s ID.

Return type:

dict

search_project(project_name: str) Dict#

Search by project name and returns its project_id, if it exits.

Parameters:

project_name (str) – The project name.

Returns:

The project ID(project_id).

Return type:

dict

stop_endpoint(project: str, endpoint: str) Dict#

Stops the endpoint from running without deleting it.

Parameters:
  • project (str) – The project name or ID associated with the endpoint.

  • endpoint (str) – The endpoint name or ID.

Returns:

The status code.

Return type:

dict

stop_job(project: str, job: str) Dict#

Stop the specified job.

Parameters:
  • project (str) – The project name or ID in which the job exists.

  • job (str) – The job name or ID of the job to stop.

Returns:

The status code.

Return type:

dict

tenant_default_tenant() Dict#

Return the default tenant of a user.

tenant_info(tenant: str) Dict#

Return the details of a tenant.

The returned details include the tenant name, time the tenant was created, and time the tenant was last updated.

Parameters:

tenant (str) – The tenant name or ID.

Returns:

The tenant details.

Return type:

dict

update_endpoint(project: str, endpoint: str, description: str | None = None, instances: int | None = None, rdu_arch: str | None = None) Dict#

Update the endpoint.

Parameters:
  • project (str) – The project name or ID.

  • endpoint (str) – The endpoint name or ID.

  • description (str) – The endpoint Description.

  • instances (int) – Number of instances.

Returns:

The status code

Return type:

dict

update_job(project: str, job: str, name: str, description: str) Dict#

Update the job name of description.

Parameters:
  • project (str) – The name or id of the project.

  • job (str) – The name or id of the job.

  • name (str) – New name.

  • description (str) – New description.

Returns:

Success/Error response

Return type:

dict

update_project(project: str, name: str, description: str) Dict#

Updates the project name and/or description.

Parameters:
  • project (str) – The name or id of the project.

  • name (str) – The new name to be used for the project.

  • description (str) – The new description to be used for the project.

Returns:

Success/Error response.

Return type:

dict

update_tenant(tenant_name: str, display_name: str, rdu_node_count: int, arch: str) Dict#

Update the existing tenant display_name, rdu_count.

Parameters:
  • tenant_name (str) – The name of the new tenant.

  • display_name (str) – The display name of the new tenant [Deprecated].

  • rdu_count (int) – Number of rdus allocated to the tenant.

Returns:

Updated tenant details.

Return type:

dict

upload_part_file(folder_id: str, file_path: str, part_number: int, upload_id: str, chunk_size: int, file: str) Dict#

Upload part of new dataset. :param str folder_id: The folderId for the newly uploaded dataset folder. :param str file_path: The file Path. :param str part_number: The part number which is being uploaded. :param str upload_id: The uploadId for this dataset upload. :param str chunk_size: The size of parts in which the dataset is being divided into. :param str file: The partial file content. :rtype: dict

upload_results_to_aws(project: str, job: str, bucket: str, folder: str, access_key_id: str, secret_access_key: str, session_token: str, region_name: str) Dict#

Upload the the batch predict results to AWS.

Parameters:
  • project (str) – The project name or ID.

  • job (str) – The job name or ID.

  • bucket (str) – AWS S3 bucket name.

  • folder (str) – AWS S3 object name.

  • access_key_id (str) – AWS access key.

  • secret_access_key (str) – AWS secret access key.

  • session_token (str) – AWS temporary token.

  • region_name (str) – AWS region name.

Returns:

The export results job ID.

Return type:

dict