SambaStack User Management

This section guides SambaStack administrators in managing user permissions, configuring admin access, and setting up service tiers and Quality of Service (QoS) levels. It explains how administrators control which users can access specific models, how much they can use, and the priority assigned to their requests.

Add an administrator

Users can be granted elevated access by adding their email to the SambaStack configuration. To add a user as an administrator:

Add their email address under the db-admin section in the sambastack.yaml file. For example:
```
data:
 sambastack.yaml: |
   db-admin:
     admins:
       - abc@example.com
```
Only add email addresses of authorized admins to maintain security.
After updating the .yaml file, apply the following configuration:
```
kubectl apply -f sambastack.yaml
```

Verify admin access

To confirm your setup is working correctly, start by verifying that you can access the SambaStack Admin UI.

Find the UI domain by inspecting installer logs:

kubectl -n sambastack-installer logs -l sambanova.ai/app=sambastack-installer -f

Open a browser (Chrome is recommended) and navigate to:
```
<UI Domain>/admin
```

The admin page becomes accessible once the administrator logs in.

Service tier configuration

Service tiers (known as Usage Plans in the UI) define what models users can access, their usage limits, and permissions.

Sample service tier configuration

For service tiers, you can add a section under data (in sambastack.yaml) with proper indentation. See the example.

data: 
  sambastack.yaml: |                     
    serviceTiers:
      example:
      - models:
        - Llama-4-Maverick-17B-128E-Instruct
        queueDepth: 25
        qos: "free"
        rates:
        - allowedRequests: 100
          periodSeconds: 30

Apply changes by running:

kubectl apply -f sambastack.yaml

The tier named example will appear as a Usage Plan on the Admin page and can be assigned to users.

Key fields in service tiers

The following table outlines the key fields used to define service tiers, along with descriptions and example values for each.

Field	Description	Example value
`qos`	Quality of Service level assigned to requests from this tier. Usually matches the service tier name.	`enterprise-group-1`, `customer-demo`
`models`	List of models accessible to users within the tier. A model must be included in at least one tier for users to access it.	`[Llama-3.3-Swallow-70B-Instruct-v0.4]`
`queueDepth`	Maximum number of queries to queue before returning a busy response.	`100`
`rates`	Defines per-user rate limits (allowed requests and period in seconds)	`{ allowedRequests: 10, periodSeconds: 60 }`
`inherits`	Allows a tier to inherit settings from a base tier and override specific fields.	`inherits: previously defined tier name`, `overrides: mentions which properties to override`

Service tier features

Service tiers offer powerful controls to tailor user access, usage limits, and permissions, ensuring flexible and secure management of AI model resources.

Control access: Decide which models each user or group can use.
Set usage limits: Define how many requests or tokens a user can make in a set period.

Service tier functionality

Service tiers are structured lists of model-group objects that define access controls and operational limits. Each model-group block sets parameters such as queue depth and per-user rate limits to control resource usage and request handling. The inherits attribute allows a tier to extend another base tier’s configuration, promoting reuse. When inheriting, only specified fields in the overrides section are modified, enabling precise and maintainable customization.

Service tier management

Creating and editing service tiers

You can define base tiers and create derived tiers by using inheritance to promote reuse and consistency. Example - Base tier

serviceTiers:
  free:
  - models:
    - Llama-3.1-Swallow-8B-Instruct-v0.3
    - Meta-Llama-3.1-8B-Instruct
    queueDepth: 25
    qos: "free"
    rates:
    - allowedRequests: 20
      periodSeconds: 60

Example - Derived tier with inheritance

premium:
  inherits: free
  overrides:
  - models:
    - Meta-Llama-3.1-70B-Instruct
    batchSize: 2
    queueDepth: 50
    qos: "premium"
    rates:
    - allowedRequests: 100
      periodSeconds: 60

After updating the configuration, apply the changes by running:

kubectl apply -f sambastack.yaml

Service tier recommendations

When creating and managing service tiers, consider the following best practices to ensure stability, security, and flexibility:

Maintain default tiers - Keep the pre-configured free and web service tiers unchanged whenever possible. These tiers provide essential baseline access and functionality for users. Removing or disabling these tiers can interrupt critical access methods.
Modify with caution - You may adjust rate limits and model lists in the default tiers, but be careful to align changes with user needs to avoid unintentionally restricting access. Note that overwriting a tier replaces the entire tier configuration. If a tier is removed from the sambastack.yaml file, it will revert to the SambaNova default settings.
Use custom tiers for flexibility - Create custom tiers by inheriting from base tiers like free or web. This inheritance model allows you to tailor access, rate limits, and models while preserving the foundational structure and avoiding duplication.
Understand system-managed tiers - Certain tiers are reserved for system use to manage model lifecycle and access restrictions. These system-managed tiers include:
- deprecated: For models permanently removed from use (returns HTTP 410). Use this tier to prevent requests from reaching specific models.
- maintenance: For models temporarily unavailable (returns HTTP 503).
- restricted: For models with restricted access (returns HTTP 403).

These tiers are managed by the system and should be factored into service tier planning. Example of System-managed, Required, and Custom Tiers in sambastack.yaml

text
apiVersion: v1
kind: ConfigMap
metadata:
  name: sambastack
  labels:
    sambastack-installer: "true"
data:
  sambastack.yaml: |
    version: 0.3.297
    serviceTiers:
      ###################################
      #     SYSTEM-MANAGED TIERS        #
      ###################################
      deprecated:
      - models:
        - Old-Model-v1
        queueDepth: 0
        rates:
        - allowedRequests: 0
          periodSeconds: 3600

      restricted:
      - models:
        - Restricted-Model
        queueDepth: 1
        rates:
        - allowedRequests: 0
          periodSeconds: 60

      ###################################
      #        REQUIRED TIERS           #
      ###################################
      free:
      - models:
        - Meta-Llama-3.1-8B-Instruct
        queueDepth: 100
        qos: "free"
        rates:
        - allowedRequests: 20
          periodSeconds: 60

      web:
      - models:
        - Meta-Llama-3.1-8B-Instruct
        queueDepth: 100
        qos: "web"
        rates:
        - allowedRequests: 20
          periodSeconds: 60

      ###################################
      #        CUSTOM TIERS             #
      ###################################
      developer:
        inherits: free
        overrides:
        - models:
          - Meta-Llama-3.1-8B-Instruct
          queueDepth: 100
          qos: "developer"
          rates:
          - allowedRequests: 60
            periodSeconds: 60

      enterprise:
        inherits: developer
        overrides:
        - models:
          - Meta-Llama-3.3-70B-Instruct
          queueDepth: 100
          qos: "enterprise"
          rates:
          - allowedRequests: 200
            periodSeconds: 60

Quality of Service (QoS)

Quality of Service (QoS) defines priority levels that determine how requests are processed across deployments when competing for resources. It ensures that higher-priority traffic receives precedence over lower-priority traffic, optimizing resource allocation during periods of contention.

QoS configuration

QoS levels are specified within service tier configurations by assigning a qos label to each tier.
Deployments define the priority order of QoS levels they support using the qosList in their deployment specifications.

Example configuration snippet from sambastack.yaml:

serviceTiers:
  - name: Standard
    qos: "free"

bundleDeploymentSpecs:
  - name: <Model Name>
    groups:
      - name: <Group Name>
        minReplicas: <number>
        qosList:
          - "web"
          - "free"

This example means the deployment prioritizes requests from the web tier first, then the free tier if no web requests are pending.

Purpose of QoS

QoS prioritizes requests so that higher-tier traffic is served before lower-tier traffic, ensuring predictable and fair resource sharing.

Example: A deployment listing qosList: [“free”, “web”] serves free tier requests first, falling back to web tier requests only when no free traffic is queued.

Important considerations

The Free tier is a default, hardcoded tier assigned to all new users automatically.
Deployments can be configured to support multiple QoS levels to handle different traffic types concurrently.

Difference between QoS and service tiers

Concept	Controls	Defined In
Service Tier	Assigns users to models, rate limits, queue depth, and QoS labels. Determines access and usage restrictions.	`sambastack.yaml` (`serviceTiers` section) or Admin UI (Usage Plans)
QoS	Prioritizes real-time request dispatch among traffic from various tiers based on deployment capabilities.	Deployment specifications (`qosList` in `bundleDeploymentSpecs`)

This means that:

Service tiers define who can access what and how much.
QoS defines when requests are processed based on priority.

User request handling workflow

The following outlines the step-by-step processing of a user request, illustrating how service tiers and QoS priorities interact to manage and route traffic efficiently.

A user sends an API request using their credentials.
SambaStack identifies the user’s assigned service tier (usage plan).
The request is checked against that tier’s allowed models, batch size, rate limits, and associated QoS.
The deployment selects requests to process according to its qosList priority.
If the request exceeds the user’s rate limit, it is rejected with a 429 Too Many Requests response.
If the QoS queue for the request’s priority level is full, the system returns a busy response.
Otherwise, the request is placed in the QoS queue awaiting processing.

Administrators can adjust service tiers or quotas via the admin UI. Any changes to tiers, rate limits, or QoS settings apply cluster-wide and should be made by editing and deploying the YAML configuration, preferably under version control.

Get started

Features

Model deployment

Manage User Access and Permissions

Add an administrator

Verify admin access

Service tier configuration

Sample service tier configuration

Key fields in service tiers

Service tier features

Service tier functionality

Service tier management

Creating and editing service tiers

Service tier recommendations

Quality of Service (QoS)

QoS configuration

Purpose of QoS

Important considerations

Difference between QoS and service tiers

User request handling workflow

Get started

Features

Model deployment

​Add an administrator

​Verify admin access

​Service tier configuration

​Sample service tier configuration

​Key fields in service tiers

​Service tier features

​Service tier functionality

​Service tier management

​Creating and editing service tiers

​Service tier recommendations

​Quality of Service (QoS)

​QoS configuration

​Purpose of QoS

​Important considerations

​Difference between QoS and service tiers

​User request handling workflow

Add an administrator

Verify admin access

Service tier configuration

Sample service tier configuration

Key fields in service tiers

Service tier features

Service tier functionality

Service tier management

Creating and editing service tiers

Service tier recommendations

Quality of Service (QoS)

QoS configuration

Purpose of QoS

Important considerations

Difference between QoS and service tiers

User request handling workflow