SambaStack User Management - SambaNova Documentation

This section guides SambaStack administrators in managing user permissions, configuring admin access, and setting up service tiers and Quality of Service (QoS) levels. It explains how administrators control which users can access specific models, how much they can use, and the priority assigned to their requests.

Add an administrator

Users can be granted elevated access by adding their email to the SambaStack configuration. To add a user as an administrator:

SambaStack on-prem
SambaStack hosted

Add their email address under the db-admin section in the sambastack.yaml file. For example:
```
db-admin:
  admins:
    - abc@example.com
```
Only add email addresses of authorized admins to maintain security. Must be at the same YAML level as bundles (root-level key in sambastack.yaml).

After updating the .yaml file, apply the following configuration:

helm upgrade \
  --install \
  --namespace sambastack \
  --version 0.3.528 \
  sambastack \
  -f sambastack.yaml \
  oci://<REGISTRY_URL>/sambastack/sambastack

Add their email address under the db-admin section in the sambastack.yaml file. For example:
```
data:
  sambastack.yaml: |
    db-admin:
      admins:
        - abc@example.com
```
Only add email addresses of authorized admins to maintain security.
After updating the .yaml file, apply the following configuration:
```
kubectl apply -f sambastack.yaml
```

Open a browser (Chrome is recommended) and navigate to the URL below or simply select the “Administration” tab on the left panel:

<UI Domain>/admin

The admin page becomes accessible once the administrator logs in.

Service tiers

Service tiers define what models users can access, their usage limits, and permissions.

Key concepts

Service tiers offer powerful controls to tailor user access, usage limits, and permissions:

Control access: Decide which models each user or group can use.
Set usage limits: Define how many requests or tokens a user can make in a set period.

Service tiers are structured lists of model-group objects that define access controls and operational limits. Each model-group block sets parameters such as queue depth and per-user rate limits. The inherits attribute allows a tier to extend another base tier’s configuration. When inheriting, only specified fields in the overrides section are modified, enabling precise and maintainable customization.

Configuration fields

The following table outlines the key fields used to define service tiers, along with descriptions and example values for each.

Field	Description	Example
`qos`	Quality of Service level assigned to requests from this tier. Usually matches the service tier name.	`enterprise-group-1`, `customer-demo`
`models`	List of models accessible to users within the tier. A model must be included in at least one tier for users to access it.	`[Llama-3.3-Swallow-70B-Instruct-v0.4]`
`queueDepth`	Maximum number of queries to queue before returning a busy response.	`100`
`rates`	Defines rate limits (allowed requests and period in seconds).	`{ allowedRequests: 10, periodSeconds: 60 }`
`inherits`	Allows a tier to inherit settings from a base tier and override specific fields.	`inherits: previously defined tier name`, `overrides: mentions which properties to override`

System-managed tiers

Some tiers are pre-configured and system-managed. Do not remove or disable these tiers — misconfiguring them can interrupt critical workflows.

Tier	Purpose	HTTP Response
`free` / `web`	Default baseline access tiers	Standard
`deprecated`	Models permanently removed	410 (Gone)
`maintenance`	Models temporarily unavailable	503
`restricted`	Models with limited access	403

If a tier is removed from sambastack.yaml, it reverts to SambaNova defaults.

Sample configuration

SambaStack on-prem
SambaStack hosted

Add a serviceTiers section at the same YAML level as bundles (root-level key). See the setup example.

serviceTiers:
  <Tier1>:                              # Custom service tier name
    - models:
        - Llama-4-Maverick-17B-128E-Instruct
      queueDepth: 25                    # Queries to queue before returning busy
      qos: "free"                       # Usually matches service tier name
      rates:
        - allowedRequests: 0
          periodSeconds: 30
  <Tier2>:                              # Tier that inherits from another
    inherits: <Tier1>
    overrides:
      - models:
          - Llama-4-Maverick-17B-128E-Instruct
        queueDepth: 25
        qos: "free"
        rates:
          - allowedRequests: 2
            periodSeconds: 30

Apply changes:

helm upgrade \
  --install \
  --namespace sambastack \
  --version 0.3.528 \
  sambastack \
  -f sambastack.yaml \
  oci://<REGISTRY_URL>/sambastack/sambastack

Add a serviceTiers section under data in sambastack.yaml. See the setup example.

data:
  sambastack.yaml: |
    serviceTiers:
      <Tier1>:                          # Custom service tier name
        - models:
            - Llama-4-Maverick-17B-128E-Instruct
          queueDepth: 25                # Queries to queue before returning busy
          qos: "free"                   # Usually matches service tier name
          rates:
            - allowedRequests: 0
              periodSeconds: 30
      <Tier2>:                          # Tier that inherits from another
        inherits: <Tier1>
        overrides:
          - models:
              - Llama-4-Maverick-17B-128E-Instruct
            queueDepth: 25
            qos: "free"
            rates:
              - allowedRequests: 2
                periodSeconds: 30

Apply changes:

kubectl apply -f sambastack.yaml

The tier named <Tier1>/<Tier2>, for example in the screenshot below “Premium2” will appear as a Usage Plan on the Admin page and can be assigned to users.

Using inheritance

You can define base tiers and create derived tiers using inheritance for reuse and consistency. Base tier example:

serviceTiers:
  free:
    - models:
        - Llama-3.1-Swallow-8B-Instruct-v0.3
        - Meta-Llama-3.1-8B-Instruct
      queueDepth: 25
      qos: "free"
      rates:
        - allowedRequests: 20
          periodSeconds: 60

Derived tier example:

premium2:
  inherits: free
  overrides:
    - models:
        - Meta-Llama-3.1-70B-Instruct
      batchSize: 2
      queueDepth: 50
      qos: "premium2"
      rates:
        - allowedRequests: 100
          periodSeconds: 60

SambaStack on-prem
SambaStack hosted

After updating the configuration, apply the changes:

helm upgrade \
  --install \
  --namespace sambastack \
  --version 0.3.528 \
  sambastack \
  -f sambastack.yaml \
  oci://<REGISTRY_URL>/sambastack/sambastack

After updating the configuration, apply the changes:

kubectl apply -f sambastack.yaml

Best practices

When creating and managing service tiers, consider the following best practices to ensure stability, security, and flexibility: 1. Preserve system-managed and default tiers Some tiers are pre-configured and system-managed and should not be removed or disabled. These tiers provide baseline access and enforce model lifecycle and access controls. Removing or misconfiguring them can interrupt critical workflows. This includes:

free / web – Default baseline access tiers.
deprecated – Models permanently removed (HTTP 410).
maintenance – Models temporarily unavailable (HTTP 503).
restricted – Models with limited access (HTTP 403).

These tiers are managed by the system and should be accounted for in service tier planning, not customized. 2. Modify with caution You may adjust rate limits or model lists in default tiers, but changes should align with user needs. Overwriting a tier replaces its full configuration. 3. Use custom tiers for flexibility Create custom tiers by inheriting from base tiers (such as free or web) to tailor access while preserving the underlying structure.

Complete example

SambaStack on-prem
SambaStack hosted

Example of System-managed, Required, and Custom Tiers in sambastack.yaml. Must be at the same YAML level as bundles (root-level key in sambastack.yaml).

serviceTiers:
  # SYSTEM-MANAGED TIERS
  deprecated:
    - models:
        - Old-Model-v1
      queueDepth: 0
      rates:
        - allowedRequests: 0
          periodSeconds: 3600

  restricted:
    - models:
        - Restricted-Model
      queueDepth: 1
      rates:
        - allowedRequests: 0
          periodSeconds: 60

  # REQUIRED TIERS
  free:
    - models:
        - Meta-Llama-3.1-8B-Instruct
      queueDepth: 100
      qos: "free"
      rates:
        - allowedRequests: 20
          periodSeconds: 60

  web:
    - models:
        - Meta-Llama-3.1-8B-Instruct
      queueDepth: 100
      qos: "web"
      rates:
        - allowedRequests: 20
          periodSeconds: 60

  # CUSTOM TIERS
  developer:
    inherits: free
    overrides:
      - models:
          - Meta-Llama-3.1-8B-Instruct
        queueDepth: 100
        qos: "developer"
        rates:
          - allowedRequests: 60
            periodSeconds: 60

  enterprise:
    inherits: developer
    overrides:
      - models:
          - Meta-Llama-3.3-70B-Instruct
        queueDepth: 100
        qos: "enterprise"
        rates:
          - allowedRequests: 200
            periodSeconds: 60

Example of System-managed, Required, and Custom Tiers in sambastack.yaml:

apiVersion: v1
kind: ConfigMap
metadata:
  name: sambastack
  labels:
    sambastack-installer: "true"
data:
  sambastack.yaml: |
    version: 0.3.528
    serviceTiers:
      # SYSTEM-MANAGED TIERS
      deprecated:
        - models:
            - Old-Model-v1
          queueDepth: 0
          rates:
            - allowedRequests: 0
              periodSeconds: 3600

      restricted:
        - models:
            - Restricted-Model
          queueDepth: 1
          rates:
            - allowedRequests: 0
              periodSeconds: 60

      # REQUIRED TIERS
      free:
        - models:
            - Meta-Llama-3.1-8B-Instruct
          queueDepth: 100
          qos: "free"
          rates:
            - allowedRequests: 20
              periodSeconds: 60

      web:
        - models:
            - Meta-Llama-3.1-8B-Instruct
          queueDepth: 100
          qos: "web"
          rates:
            - allowedRequests: 20
              periodSeconds: 60

      # CUSTOM TIERS
      developer:
        inherits: free
        overrides:
          - models:
              - Meta-Llama-3.1-8B-Instruct
            queueDepth: 100
            qos: "developer"
            rates:
              - allowedRequests: 60
                periodSeconds: 60

      enterprise:
        inherits: developer
        overrides:
          - models:
              - Meta-Llama-3.3-70B-Instruct
            queueDepth: 100
            qos: "enterprise"
            rates:
              - allowedRequests: 200
                periodSeconds: 60

Quality of Service (QoS)

Quality of Service (QoS) defines priority levels that determine how requests are processed across deployments when competing for resources. It ensures that higher-priority traffic receives precedence over lower-priority traffic, optimizing resource allocation during periods of contention.

How QoS works

Each service tier is assigned a qos label.
Deployments define the priority order using qosList in their specifications.
Requests are processed in priority order: the first QoS level in the list is served first.

Example configuration:

serviceTiers:
  <serviceTier Name>:
    - models:
        - <Model Name>
      qos: <Qos Name>
      queueDepth: <number>
      rates:
        - allowedRequests: <number>
          periodSeconds: <number>

bundleDeploymentSpecs:
  - name: <Model Name>
    groups:
      - name: <Group Name>
        minReplicas: <number>
        qosList:
          - "web"
          - "free"

In this example, the deployment serves web tier requests first, then free tier requests when no web traffic is queued.

Purpose of QoS

QoS prioritizes requests so that higher-tier traffic is served before lower-tier traffic, ensuring predictable and fair resource sharing.

Example: A deployment listing qosList: ["free", "web"] serves free tier requests first, falling back to web tier requests only when no free traffic is queued.

QoS vs. service tiers

Concept	Purpose	Defined In
Service Tier	Defines who can access what and how much (models, rate limits)	`sambastack.yaml` or Admin UI
QoS	Defines when requests are processed (priority order)	`qosList` in `bundleDeploymentSpecs`

This means that:

Service tiers define who can access what and how much.
QoS defines when requests are processed based on priority.

Important notes

The free tier is automatically assigned to all new users.
Deployments can support multiple QoS levels to handle different traffic types concurrently.

Request handling workflow

The following outlines the step-by-step processing of a user request, illustrating how service tiers and QoS priorities interact to manage and route traffic efficiently.

A user sends an API request using their credentials.
SambaStack identifies the user’s assigned service tier (usage plan).
The request is checked against that tier’s allowed models, batch size, rate limits, and associated QoS.
The deployment selects requests to process according to its qosList priority.
If the request exceeds the user’s rate limit, it is rejected with a 429 Too Many Requests response.
If the QoS queue for the request’s priority level is full, the system returns a busy response.
Otherwise, the request is placed in the QoS queue awaiting processing.

Administrators can adjust service tiers or quotas via the admin UI. Any changes to tiers, rate limits, or QoS settings apply cluster-wide and should be made by editing and deploying the YAML configuration, preferably under version control.

Overview

Installation

Service Administration

Hardware Administration

Reference Architectures

Resources

Manage User Access and Permissions

Add an administrator

Service tiers

Key concepts

Configuration fields

System-managed tiers

Sample configuration

Using inheritance

Best practices

Complete example

Quality of Service (QoS)

How QoS works

Purpose of QoS

QoS vs. service tiers

Important notes

Request handling workflow

Overview

Installation

Service Administration

Hardware Administration

Reference Architectures

Resources

​Add an administrator

​Service tiers

​Key concepts

​Configuration fields

​System-managed tiers

​Sample configuration

​Using inheritance

​Best practices

​Complete example

​Quality of Service (QoS)

​How QoS works

​Purpose of QoS

​QoS vs. service tiers

​Important notes

​Request handling workflow

Add an administrator

Service tiers

Key concepts

Configuration fields

System-managed tiers

Sample configuration

Using inheritance

Best practices

Complete example

Quality of Service (QoS)

How QoS works

Purpose of QoS

QoS vs. service tiers

Important notes

Request handling workflow