Skip to main content
gateway:
  replicas: 3
  auth:
    enabled: true
    secretName: <oidc-auth secret name>
  ingress:
    hosts:
      - host: <api url>
        tlsSecretName: <k8s api secret name>
    annotations:
      nginx.ingress.kubernetes.io/backend-protocol: HTTP
      nginx.ingress.kubernetes.io/force-ssl-redirect: 'true'
      nginx.ingress.kubernetes.io/proxy-read-timeout: '600'
      nginx.ingress.kubernetes.io/proxy-body-size: 25m
      nginx.ingress.kubernetes.io/enable-cors: 'true'
      nginx.ingress.kubernetes.io/configuration-snippet: |
        proxy_set_header Authorization $http_authorization;
        limit_req zone=240_req_min_header burst=360 nodelay;
        limit_req_status 429;

cloud-ui:
  ingress:
    hosts:
      - host: <ui url>
        tlsSecretName: <k8s api secret name>
    annotations:
      nginx.ingress.kubernetes.io/backend-protocol: HTTP
      nginx.ingress.kubernetes.io/force-ssl-redirect: 'true'
      nginx.ingress.kubernetes.io/proxy-body-size: 21m
      nginx.ingress.kubernetes.io/configuration-snippet: |
        proxy_set_header Authorization $http_authorization;
        limit_req zone=120_req_min_ip burst=240 nodelay;
        limit_req_status 429;

db-admin:
  admins:
  - temp-admin@cluster.local
  - example@example.com                # Email of lasting admin account

auth-and-billing:
  pgSecretName: pg-credentials         # Only needed with a custom external postgres DB

cloudnative-pg:
  enabled: false                       # Only needed with a custom external postgres DB

bundles:
  bundleSpecs:
    - name: gpt-oss-120b-8-32-64-128k
  bundleDeploymentSpecs:
    - name: gpt-oss-120b-8-32-64-128k
      groups:
        - name: default
          minReplicas: 1
          qosList:
            - web
            - free

serviceTiers:
  <Tier1>:                              # Custom service tier name
    - models:
        - gpt-oss-120b-8-32-64-128k
      queueDepth: 25                    # Queries to queue before returning busy
      qos: "free"                       # Usually matches service tier name
      rates:
        - allowedRequests: 50
          periodSeconds: 60
  <Tier2>:                              # Tier that inherits from another
    inherits: <Tier1>
    overrides:
      - models:
          - gpt-oss-120b-8-32-64-128k
        queueDepth: 25
        qos: "free"
        rates:
          - allowedRequests: 100
            periodSeconds: 60
This reference uses gpt-oss-120b-8-32-64-128k as the example model bundle. You can use any model bundle(s) you like in practice.

Configuration Parameters

gateway

ParameterTypeDescription
gateway.replicasintegerAPI gateway replica count for high availability
gateway.auth.enabledbooleanEnable built-in OIDC integration
gateway.auth.secretNamestringName of Kubernetes Secret containing OIDC credentials. Leave empty for default auth mode
gateway.ingress.hosts[].hoststringYour API FQDN (e.g., api.example.com)
gateway.ingress.hosts[].tlsSecretNamestringKubernetes TLS secret name for the API host

cloud-ui (Web UI)

ParameterTypeDescription
cloud-ui.replicasintegerUI replica count for high availability
cloud-ui.ingress.hosts[].hoststringYour UI FQDN (e.g., ui.example.com)
cloud-ui.ingress.hosts[].tlsSecretNamestringKubernetes TLS secret name for the UI host

db-admin

ParameterTypeDescription
db-admin.adminslistEmail addresses of users who can access the Admin UI

auth-and-billing

ParameterTypeDescription
auth-and-billing.replicasintegerCore control-plane service scaling
auth-and-billing.pgSecretNamestringName of Kubernetes Secret containing external PostgreSQL connection details (DB_HOST, DB_DATABASE, DB_USER, DB_PASSWD) as base64-encoded data fields. Required when using external PostgreSQL

cloudnative-pg

ParameterTypeDescription
cloudnative-pg.enabledbooleantrue = deploy in-cluster PostgreSQL; false = use external PostgreSQL via auth-and-billing.pgSecretName

bundles

ParameterTypeDescription
bundles.bundleSpecs[]listDeclares bundles (model assets) by name
bundles.bundleDeploymentSpecs[]listDeploys the declared bundles
bundleDeploymentSpecs[].namestringMust match a declared bundleSpecs.name
bundleDeploymentSpecs[].groups[].namestringRouting/capacity group name
bundleDeploymentSpecs[].groups[].minReplicasintegerMinimum engines for the group
bundleDeploymentSpecs[].groups[].qosList[]listQoS tags (e.g., web, free, pro)