Skip to content

Cluster Capacity Estimates

Hardware: Dell OptiPlex 3080 Micro (3 nodes currently; 20 acquired) OS: Talos Linux v1.12.5 · Kubernetes v1.35.2 · Cilium v1.19.1 Status: Estimates — CPU model TBD (confirm via talosctl read /proc/cpuinfo)


Confirmed Specs

Spec Value
RAM 8GB DDR4 3200MHz per node
Storage 256GB NVMe SSD per node (WD PC SN530, PCIe Gen3 x4)
Network 1GbE per node
Power ~65W max per node

CPU Scenarios

The OptiPlex 3080 Micro shipped with 10th Gen Intel Core (Comet Lake). Two likely configs:

Config Cores Threads Base / Boost TDP
i5-10500T (likely) 6c / 12t 12 2.3 / 3.8 GHz 35W
i7-10700T (possible) 8c / 16t 16 2.0 / 4.5 GHz 35W

Action: Run talosctl read /proc/cpuinfo on any node to confirm. The estimates below cover both cases.


Current Cluster (3 nodes)

Node Role IP
talos-zzo-1sj control-plane 192.168.10.32
talos-shv-9v1 worker 192.168.10.11
talos-d33-vyt worker 192.168.10.43
Metric i5-10500T (3 nodes) i7-10700T (3 nodes)
Total physical cores 18 24
Total threads 36 48
Total RAM 24 GB 24 GB
Total NVMe storage 768 GB raw 768 GB raw
Max power draw ~195 W ~195 W

Projected Cluster (20 nodes)

These are projections for when all 20 acquired machines are deployed.

Metric i5-10500T (20 nodes) i7-10700T (20 nodes)
Total physical cores 120 160
Total threads 240 320
Total RAM 160 GB 160 GB
Total NVMe storage 5.12 TB raw 5.12 TB raw
Max power draw ~1.3 kW ~1.3 kW

Kubernetes Allocatable (estimated)

Talos Linux is a minimal, immutable OS purpose-built for Kubernetes. It has lower system overhead than general-purpose distributions: the OS runs entirely in RAM (~200MB) with no SSH daemon, package manager, or shell. Estimated system reservation is ~5% CPU + ~400MB RAM per node (kubelet, containerd, Cilium agent, Talos machined).

Current cluster (3 nodes)

Resource Per Node Cluster Total (3 nodes)
Allocatable CPU (i5) ~11.4 cores ~34.2 cores
Allocatable CPU (i7) ~15.2 cores ~45.6 cores
Allocatable RAM ~7.6 GB ~22.8 GB
Allocatable storage ~230 GB (local) ~690 GB (local, no replication)

Projected cluster (20 nodes)

Resource Per Node Cluster Total (20 nodes)
Allocatable CPU (i5) ~11.4 cores ~228 cores
Allocatable CPU (i7) ~15.2 cores ~304 cores
Allocatable RAM ~7.6 GB ~152 GB
Allocatable storage ~230 GB (local) ~4.6 TB (local, no replication)

Storage replication depends on the distributed storage solution chosen (TBD). Usable capacity with 2-replica replication would be roughly half the raw total.


What You Can Run

General Workloads

Workload Resources Per Node Current Cluster (3) Projected (20) Notes
NGINX (reverse proxy / static) 0.1 CPU, 64MB ~50 instances ~150 ~1,000 Trivial overhead; run as DaemonSet for ingress
FastAPI / Node.js services 0.5 CPU, 256MB ~12–15 replicas ~36–45 ~250–300 Standard microservice sizing
PostgreSQL 2 CPU, 2GB RAM ~3 instances ~9 ~60 Use PVC for data persistence (storage TBD)
Redis 0.25 CPU, 512MB ~10 instances ~30 ~200
Jupyter notebooks 1 CPU, 2GB ~3 per node ~9 ~60 Good for member data science workloads
OpenClaw Gateway 0.5 CPU, 512MB ~6 instances ~18 ~120 One per team/member; stateless, scales horizontally
OpenClaw Agent sessions 1 CPU, 1GB ~5 sessions ~15 ~100 Each active agent session ~1 CPU; session count drives sizing

LLM Inference (CPU-only, no GPU)

CPU-only inference via llama.cpp or ollama. Performance scales with core count and memory bandwidth.

Model Quantization VRAM needed Tok/sec per node (i5) Tok/sec per node (i7) Notes
Llama 3.2 3B Q4_K_M ~2 GB 15–25 tok/sec 20–35 tok/sec Fast, lightweight
Llama 3.1 8B Q4_K_M ~5 GB 6–12 tok/sec 8–16 tok/sec Good quality/speed tradeoff
Llama 3.1 8B Q8 ~9 GB 3–6 tok/sec 4–8 tok/sec Exceeds 8GB RAM — requires swap or split
Mistral 7B Q4_K_M ~4.5 GB 8–14 tok/sec 10–18 tok/sec Strong reasoning
Phi-3 Mini 3.8B Q4_K_M ~2.5 GB 12–20 tok/sec 15–28 tok/sec Efficient, good for code
DeepSeek-R1 7B Q4_K_M ~5 GB 5–10 tok/sec 7–14 tok/sec Strong reasoning, slower

8GB RAM per node is the hard constraint. Models requiring >6GB weights in RAM leave little headroom for the OS and Kubernetes components. Stick to Q4 quantized models <=7B for reliable single-node inference. For larger models, consider model parallelism across 2–3 nodes (experimental with llama.cpp --split-mode).

Parallel Inference (2 worker nodes — current cluster)

With the 2 current worker nodes running independent replicas:

Model Config Combined throughput
Llama 3.2 3B Q4 2 nodes x 1 instance 30–50 tok/sec
Llama 3.1 8B Q4 2 nodes x 1 instance 12–24 tok/sec
Mistral 7B Q4 2 nodes x 1 instance 16–28 tok/sec

These are independent replicas (not distributed inference) — each node handles separate requests. Useful for concurrent users, not for running a single large model faster.

Projected (3+ dedicated inference nodes): With dedicated inference workers at scale, combined throughput scales linearly (e.g., 3 nodes running Llama 3.1 8B Q4 = 18–36 tok/sec combined).


Storage

Storage strategy is TBD. No distributed storage solution is deployed yet. All capacity below refers to local NVMe on each node.

Use case Capacity
Raw NVMe per node 256 GB
Total raw (3 nodes) 768 GB
Total raw (20 nodes, projected) 5.12 TB
Model storage (local path) Store model weights on local node NVMe — no replication needed, and local NVMe is fastest for sequential reads

When a distributed storage solution is selected, usable capacity will depend on the replication factor (e.g., 2-replica halves raw capacity, 3-replica yields ~1/3).


Example: AI Floor Member Stack

A typical member running an AI project on the cluster:

Service CPU RAM Notes
NGINX (ingress) 0.1 64MB Shared cluster-wide via DaemonSet — no per-member cost
OpenClaw Gateway 0.5 512MB One per member team
OpenClaw Agent (1 active session) 1.0 1GB Scales with concurrent agent activity
FastAPI backend 0.5 256MB Member's app
PostgreSQL 2.0 2GB With PVC (storage solution TBD)
Redis 0.25 512MB Cache / task queue
Total per member ~4.35 CPU ~4.3 GB

Current cluster (3 nodes): With ~22.8 GB allocatable RAM and ~34 allocatable cores, the cluster can support ~5 full member stacks on the 2 worker nodes before hitting RAM limits. The control-plane node should be reserved for system workloads.

Projected (20 nodes): With ~152 GB allocatable RAM and ~228 allocatable cores, the cluster can support ~35 full member stacks simultaneously before hitting RAM limits.


Realistic Day-One Cluster Layout (current: 3 nodes)

Control plane x 1  — Talos system, etcd, API server, Cilium
Worker x 2         — general workloads + inference

With 3 nodes, there is no dedicated inference tier. Workers handle both application workloads and inference requests. This is sufficient for PoC/development purposes.

Projected layout (20 nodes)

Control plane x 3     — Talos system, etcd, API server (HA)
Inference workers x 3 — 1x Llama 3.1 8B Q4 per node (18–36 tok/sec combined)
General workers x 14  — ~160+ allocatable cores, ~106 GB RAM for user workloads

This would give the cluster: - ~200+ concurrent microservices on the general pool - 3 independent LLM instances for member inference (Llama 8B or similar) - Grafana, Prometheus, Loki with negligible overhead on the general pool - Distributed storage capacity dependent on chosen solution and replication factor


Bottlenecks to Watch

Bottleneck Impact Mitigation
8GB RAM per node Limits model size; no headroom for large quantizations Stick to Q4 <=7B; avoid swap in production
1GbE network ~125 MB/s per node — fine for most workloads, slow for large model transfers Pre-stage model weights locally; don't stream from NFS
No GPU CPU inference only; 5–25 tok/sec vs 500+ tok/sec on A100 Acceptable for PoC; Phase 2 requires different hardware
No IPMI/iDRAC Can't remote-power or console into nodes PiKVM per shelf or Wake-on-LAN + SSH
3 nodes (current) Single control-plane node is not HA; limited worker capacity Scale to 3 CP + N workers as nodes are brought online
No distributed storage Data is local-only; node failure loses local data Select and deploy a storage solution (Rook-Ceph, Longhorn, etc.)

Estimates based on published Intel 10th Gen benchmarks and llama.cpp community benchmarks on similar hardware. Confirm CPU model before finalizing inference node allocation.