Deployment

Deploy Astra + K3s (Single Node and Cluster)

Deploy Astra + K3s (Single Node and Cluster)

This guide shows two deployment patterns:

  • Single-node production profile: Astra and K3s on the same host with public ingress and dedicated local disk.
  • Cluster deployment: multi-node Astra backend with multi-node K3s control plane/workers.

Prerequisites

  • Linux host(s) with Docker and k3s, kubectl, etcdctl available.
  • Network reachability from all K3s server nodes to all Astra client endpoints.
  • Astra image published in Docker Hub. Stable semver tags (vX.Y.Z) are the default production channel, while prerelease tags such as v0.1.1-rc1 should be selected explicitly when validating release candidates:
    • docker.io/halceon/astra:\{tag\}
    • docker.io/nudevco/astra:\{tag\}

Local SSH orchestrator (single command from workstation)

refs/scripts/deploy/deploy-k3s-single-node-remote.sh \
  --host root@<host-public-ip> \
  --cluster-name <cluster-name> \
  --kubeconfig-server-ip <host-public-ip> \
  -- \
  --disk-device auto \
  --validation smoke

Run full readiness gate:

refs/scripts/deploy/deploy-k3s-single-node-remote.sh \
  --host root@<host-public-ip> \
  --cluster-name <cluster-name> \
  --kubeconfig-server-ip <host-public-ip> \
  --kubeconfig-server-ip <host-secondary-ip> \
  -- \
  --disk-device auto \
  --validation full

On-host deployment (run directly on VM)

cd /root/astra-lab/repo
refs/scripts/deploy/deploy-k3s-single-node.sh \
  --disk-device auto \
  --node-name <cluster-name> \
  --tls-san <host-public-ip> \
  --tls-san <host-secondary-ip> \
  --kubelet-arg fail-swap-on=false \
  --validation smoke

Default behavior of the deploy scripts:

  • Remote script bootstraps the host, syncs https://github.com/SparkAIUR/astra.git on main, runs the on-host deploy, and pulls kubeconfig locally.
  • Remote script can generate multiple local kubeconfigs with --kubeconfig-server-ip (for dual access via tailscale + LAN) and can rename kubeconfig cluster/context/user names with --cluster-name.
  • On-host deploy resolves the latest stable semver tag from docker.io/halceon/astra unless --astra-image or --astra-tag is provided. Use an explicit prerelease tag when validating an RC build.
  • Validation defaults to smoke (fast), with --validation full for the 720s readiness gate.
  • Disk setup is only executed when --disk-device is set (auto or explicit path).

Machine-readable outputs:

  • DEPLOY_SUMMARY_PATH=...
  • HOST_PUBLIC_IP=...
  • REMOTE_KUBECONFIG_PATH=...
  • VALIDATION_SUMMARY_PATH=...
  • LOCAL_KUBECONFIG_PATHS=... (comma-separated when multiple kubeconfigs are generated)

Topology A: Single-Node Production Profile (Manual Reference)

1. Bootstrap host tools

refs/scripts/validation/bootstrap-ubuntu24-remote.sh root@<host-ip>
ssh root@<host-ip> "astra-run 'cd \"\$ASTRA_REPO_DIR\" && refs/scripts/validation/bootstrap-phase6-host.sh'"

2. Prepare dedicated local storage disk for K3s PVCs

If your storage disk is blank (example: /dev/xvde1):

mkfs.ext4 -F /dev/xvde1
mkdir -p /var/lib/rancher/k3s/storage
uuid=$(blkid -s UUID -o value /dev/xvde1)
grep -q "${uuid}" /etc/fstab || echo "UUID=${uuid} /var/lib/rancher/k3s/storage ext4 defaults,nofail 0 2" >> /etc/fstab
mount -a
findmnt /var/lib/rancher/k3s/storage

If your storage partition already has an ext4 filesystem (for example /dev/nvme0n1p4), you can skip mkfs and just mount + persist it in /etc/fstab.

3. Start Astra locally (single-node override)

From repo root on the target host:

export ASTRA_IMAGE=docker.io/halceon/astra:v0.1.1-rc1
docker compose \
  -f refs/scripts/validation/docker-compose.image.yml \
  -f refs/scripts/validation/docker-compose.k3s-single-node.yml \
  up -d minio minio-init astra-node1 astra-node2 astra-node3

docker-compose.k3s-single-node.yml uses ports: !override so Astra/MinIO bind only to localhost; verify with docker compose ... config if you customize the profile.

Check datastore health:

etcdctl --endpoints=http://127.0.0.1:52379 endpoint status -w table

4. Install/start K3s with Astra datastore endpoint, Traefik, and public IP

Use Astra node client endpoints exposed by compose:

curl -sfL https://get.k3s.io | \
  INSTALL_K3S_EXEC="server \
    --write-kubeconfig-mode 644 \
    --node-external-ip <host-public-ip> \
    --tls-san <host-public-ip> \
    --tls-san <host-secondary-ip> \
    --node-name <cluster-name> \
    --kubelet-arg fail-swap-on=false \
    --default-local-storage-path /var/lib/rancher/k3s/storage \
    --datastore-endpoint 'http://127.0.0.1:52379,http://127.0.0.1:52391,http://127.0.0.1:52392'" \
  sh -

5. Validate cluster + datastore path

kubectl get nodes
kubectl create ns astra-k3s-smoke
kubectl -n astra-k3s-smoke create configmap smoke --from-literal=ok=true
kubectl -n astra-k3s-smoke get configmap smoke -o yaml

Optional direct datastore probe:

etcdctl --endpoints=http://127.0.0.1:52379 get /registry/configmaps/astra-k3s-smoke/smoke --prefix --keys-only

6. Validate Traefik and local-path storage

Create ingress + PVC smoke workload:

kubectl -n astra-k3s-smoke apply -f - <<'EOF'
apiVersion: apps/v1
kind: Deployment
metadata:
  name: whoami
spec:
  replicas: 1
  selector:
    matchLabels: { app: whoami }
  template:
    metadata:
      labels: { app: whoami }
    spec:
      containers:
      - name: whoami
        image: traefik/whoami:v1.10.3
        ports: [{containerPort: 80}]
---
apiVersion: v1
kind: Service
metadata:
  name: whoami
spec:
  selector: { app: whoami }
  ports:
  - port: 80
    targetPort: 80
---
apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
  name: whoami
spec:
  ingressClassName: traefik
  rules:
  - http:
      paths:
      - path: /
        pathType: Prefix
        backend:
          service:
            name: whoami
            port:
              number: 80
---
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  name: smoke-pvc
spec:
  accessModes: [ReadWriteOnce]
  storageClassName: local-path
  resources:
    requests:
      storage: 1Gi
EOF

Probe ingress from outside the host:

curl -fsS http://<host-public-ip>/

Topology B: Cluster Deployment

Deploy 3 Astra nodes with stable raft/client addresses and persistent disks. Every K3s server must be able to reach all Astra client endpoints.

Example endpoint list used by K3s:

http://astra1.example.net:2379,http://astra2.example.net:2379,http://astra3.example.net:2379

2. Bootstrap first K3s server node

export K3S_TOKEN="<shared-cluster-token>"
export ASTRA_DATASTORE="http://astra1.example.net:2379,http://astra2.example.net:2379,http://astra3.example.net:2379"

curl -sfL https://get.k3s.io | \
  INSTALL_K3S_EXEC="server \
    --token ${K3S_TOKEN} \
    --datastore-endpoint '${ASTRA_DATASTORE}' \
    --write-kubeconfig-mode 644" \
  sh -

3. Join additional K3s server nodes

Run on each additional control-plane node:

export K3S_TOKEN="<shared-cluster-token>"
export K3S_URL="https://<first-server-ip>:6443"
export ASTRA_DATASTORE="http://astra1.example.net:2379,http://astra2.example.net:2379,http://astra3.example.net:2379"

curl -sfL https://get.k3s.io | \
  INSTALL_K3S_EXEC="server \
    --server ${K3S_URL} \
    --token ${K3S_TOKEN} \
    --datastore-endpoint '${ASTRA_DATASTORE}'" \
  sh -

4. Join worker nodes

export K3S_TOKEN="<shared-cluster-token>"
export K3S_URL="https://<first-server-ip>:6443"

curl -sfL https://get.k3s.io | \
  INSTALL_K3S_EXEC="agent --server ${K3S_URL} --token ${K3S_TOKEN}" \
  sh -

5. Validate HA behavior

  • Confirm all server/agent nodes are Ready.
  • Run workload smoke tests (configmaps, deployments, service accounts).
  • Restart one Astra node and verify K3s API remains available.

Production Notes

  • Prefer dedicated Astra nodes; avoid collocating heavy workloads with Astra in production.
  • Use TLS and auth controls for Astra client access paths.
  • Keep Astra profile/tuning explicit (ASTRAD_PROFILE, queue/quorum budgets) and track with Prometheus/Grafana.
  • Revalidate with the phase harness when changing datastore tuning:
    • refs/scripts/validation/phase6-k3s-benchmark.sh

Readiness Stress and Cleanup (Single Node)

Use the bundled stress harness:

export INGRESS_URL="http://<host-public-ip>/astra-ready"
refs/scripts/validation/k3s-single-node-readiness.sh

Outputs:

  • $\{ASTRA_RESULTS_DIR\}/single-node-readiness-<timestamp>/single-node-readiness-summary.json
  • $\{ASTRA_RESULTS_DIR\}/single-node-readiness-<timestamp>/

Default behavior deletes the stress namespace at end. Set KEEP_RESOURCES=true to retain.

Kubeconfig Export (Local Workstation)

Copy remote kubeconfig and rewrite server address:

mkdir -p ~/.kube
scp root@<host-public-ip>:/etc/rancher/k3s/k3s.yaml ~/.kube/astra-k3s-<host-public-ip>.yaml
sed -i.bak "s#https://127.0.0.1:6443#https://<host-public-ip>:6443#g" ~/.kube/astra-k3s-<host-public-ip>.yaml
KUBECONFIG=~/.kube/astra-k3s-<host-public-ip>.yaml kubectl get nodes

Rollback

  • Keep last-known-good datastore endpoint config and K3s token material.
  • Before large rollout waves, snapshot Astra and validate restore path.