- Sambastack on-prem
- Sambastack hosted
Add an administrator
Users can be granted elevated access by adding their email to the Sambastack configuration. To add a user as an administrator:-
Add their email address under the
db-adminsection in thesambastack.yamlfile. For example:Only add email addresses of authorized admins to maintain security. Must be at the same YAML level asbundles(root-level key in sambastack.yaml). -
After updating the .yaml file, apply the following configuration:

Service tier configuration
Service tiers (known as Usage Plans in the UI) define what models users can access, their usage limits, and permissions.Sample service tier configuration
For service tiers, you can add a section must be at the same YAML level as bundles (root-level key in sambastack.yaml) with proper indentation. See the example.
Key fields in service tiers
The following table outlines the key fields used to define service tiers, along with descriptions and example values for each.| Field | Description | Example value |
|---|---|---|
qos | Quality of Service level assigned to requests from this tier. Usually matches the service tier name. | enterprise-group-1, customer-demo |
models | List of models accessible to users within the tier. A model must be included in at least one tier for users to access it. | [Llama-3.3-Swallow-70B-Instruct-v0.4] |
queueDepth | Maximum number of queries to queue before returning a busy response. | 100 |
rates | Defines rate limits (allowed requests and period in seconds) | { allowedRequests: 10, periodSeconds: 60 } |
inherits | Allows a tier to inherit settings from a base tier and override specific fields. | inherits: previously defined tier name, overrides: mentions which properties to override |
Service tier features
Service tiers offer powerful controls to tailor user access, usage limits, and permissions, ensuring flexible and secure management of AI model resources.- Control access: Decide which models each user or group can use.
- Set usage limits: Define how many requests or tokens a user can make in a set period.
Service tier functionality
Service tiers are structured lists of model-group objects that define access controls and operational limits.Each model-group block sets parameters such as queue depth and per-user rate limits to control resource usage and request handling.Theinherits attribute allows a tier to extend another base tier’s configuration, promoting reuse.When inheriting, only specified fields in the overrides section are modified, enabling precise and maintainable customization.Service tier management
Creating and editing service tiers
You can define base tiers and create derived tiers by using inheritance to promote reuse and consistency.Example - Base tierService tier recommendations
When creating and managing service tiers, consider the following best practices to ensure stability, security, and flexibility:1. Preserve system-managed and default tiersSome tiers are pre-configured and system-managed and should not be removed or disabled. These tiers provide baseline access and enforce model lifecycle and access controls. Removing or misconfiguring them can interrupt critical workflows.This includes:
- free / web – Default baseline access tiers.
- deprecated – Models permanently removed (HTTP 410).
- maintenance – Models temporarily unavailable (HTTP 503).
- restricted – Models with limited access (HTTP 403).
You may adjust rate limits or model lists in default tiers where supported, but changes should align with user needs to avoid unintended access restrictions. Overwriting a tier replaces its full configuration. If a tier is removed from
sambastack.yaml, it reverts to SambaNova defaults.3. Use custom tiers for flexibilityCreate custom tiers by inheriting from base tiers (such as
free or web) to tailor access, rate limits, and models while preserving the underlying structure.These tiers are managed by the system and should be factored into service tier planning.Example of System-managed, Required, and Custom Tiers in sambastack.yaml. Must be at the same YAML level as bundles (root-level key in sambastack.yaml).Quality of Service (QoS)
Quality of Service (QoS) defines priority levels that determine how requests are processed across deployments when competing for resources. It ensures that higher-priority traffic receives precedence over lower-priority traffic, optimizing resource allocation during periods of contention.QoS configuration
- QoS levels are specified within service tier configurations by assigning a qos label to each tier.
- Deployments define the priority order of QoS levels they support using the qosList in their deployment specifications.
Purpose of QoS
QoS prioritizes requests so that higher-tier traffic is served before lower-tier traffic, ensuring predictable and fair resource sharing.- Example: A deployment listing qosList: [“free”, “web”] serves free tier requests first, falling back to web tier requests only when no free traffic is queued.
Important considerations
- The Free tier is a default, hardcoded tier assigned to all new users automatically.
- Deployments can be configured to support multiple QoS levels to handle different traffic types concurrently.
Difference between QoS and service tiers
| Concept | Controls | Defined In |
|---|---|---|
| Service Tier | Assigns users to models, rate limits, queue depth, and QoS labels. Determines access and usage restrictions. | sambastack.yaml (serviceTiers section) or Admin UI (Usage Plans) |
| QoS | Prioritizes real-time request dispatch among traffic from various tiers based on deployment capabilities. | Deployment specifications (qosList in bundleDeploymentSpecs) |
- Service tiers define who can access what and how much.
- QoS defines when requests are processed based on priority.
User request handling workflow
The following outlines the step-by-step processing of a user request, illustrating how service tiers and QoS priorities interact to manage and route traffic efficiently.- A user sends an API request using their credentials.
- SambaStack identifies the user’s assigned service tier (usage plan).
- The request is checked against that tier’s allowed models, batch size, rate limits, and associated QoS.
- The deployment selects requests to process according to its qosList priority.
- If the request exceeds the user’s rate limit, it is rejected with a 429 Too Many Requests response.
- If the QoS queue for the request’s priority level is full, the system returns a busy response.
- Otherwise, the request is placed in the QoS queue awaiting processing.
