Building a Production Kubernetes Cluster on Bare Metal

I run a 3-node bare-metal Kubernetes cluster at home for production-grade workloads and learning. This post is the exact playbook I used to build it — from a blank Ubuntu 24.04 install to a fully working cluster with load balancing, ingress, and persistent storage.

Stack: Ubuntu 24.04 · Kubernetes v1.35.3 · containerd v2.2.2 · Calico VXLAN · MetalLB v0.14.9 · NGINX Ingress v1.12.1


Cluster Architecture

    graph TD
    A:::hidden
    classDef hidden display:none
    %%{init: {'theme': 'dark', 'themeVariables': {'fontSize': '20px', 'fontFamily': 'monospace', 'lineColor': '#94a3b8', 'clusterBkg': '#1e293b', 'titleColor': '#f1f5f9'}}}%%
graph TD
    subgraph CLUSTER["ON-PREMISE CLUSTER"]
        subgraph NODES["Control Plane and Workers"]
            M["master-01\n10.0.0.10\ncontrol-plane · v1.35.3"]
            W1["worker-01\n10.0.0.11\nworker · v1.35.3"]
            W2["worker-02\n10.0.0.12\nworker · v1.35.3"]
        end
        CNI["Calico VXLAN CNI\neth0 · Pod CIDR 10.244.0.0/16"]
        subgraph LB["Load Balancing and Ingress"]
            METALLB["MetalLB L2\nIP Pool 10.0.0.200-220\nautoAssign false"]
            INGRESS["NGINX Ingress\n10.0.0.200\nlocal-path StorageClass"]
        end
    end
    M --> CNI
    W1 --> CNI
    W2 --> CNI
    CNI --> METALLB
    METALLB --> INGRESS
    style M fill:#1d4ed8,stroke:#3b82f6,stroke-width:2px,color:#ffffff
    style W1 fill:#15803d,stroke:#22c55e,stroke-width:2px,color:#ffffff
    style W2 fill:#15803d,stroke:#22c55e,stroke-width:2px,color:#ffffff
    style CNI fill:#7c3aed,stroke:#a78bfa,stroke-width:2px,color:#ffffff
    style METALLB fill:#b45309,stroke:#f59e0b,stroke-width:2px,color:#ffffff
    style INGRESS fill:#0e7490,stroke:#22d3ee,stroke-width:2px,color:#ffffff

Substitute 10.0.0.x and 10.0.0.200–220 with your actual LAN IPs.

Component Versions

ComponentVersionPurpose
Ubuntu24.04.3 LTSHost OS
Kubernetesv1.35.3Cluster orchestration
containerdv2.2.2Container runtime
Calicov3.29.3CNI — pod networking + NetworkPolicy
MetalLBv0.14.9LoadBalancer IP allocation (L2)
NGINX Ingressv1.12.1HTTP/HTTPS routing
local-path-provisionerv0.0.30Default dynamic StorageClass

Network CIDRs

NetworkCIDRPurpose
Node network10.0.0.0/24Physical LAN
Pod network10.244.0.0/16Calico pod IPs
Service network10.96.0.0/12Kubernetes ClusterIP services
MetalLB pool10.0.0.200–220External LoadBalancer IPs

Prerequisites

  • Three machines on the same L2 subnet
  • Ubuntu 24.04 LTS installed on all nodes
  • Root / sudo access on all nodes
  • Internet connectivity on all nodes
  • A free IP range (e.g. 10.0.0.200–220) not in use on LAN
  • Minimum per node: 2 vCPU, 2 GB RAM, 20 GB disk

Phase 1 — Node Preparation (All Nodes)

Run every command in this phase on all 3 nodes unless stated otherwise.

1.1 — Set Hostnames

# On the master node:
sudo hostnamectl set-hostname master-01

# On worker 1:
sudo hostnamectl set-hostname worker-01

# On worker 2:
sudo hostnamectl set-hostname worker-02

1.2 — Update /etc/hosts

sudo tee -a /etc/hosts <<EOF

# Kubernetes Cluster
10.0.0.10  master-01
10.0.0.11  worker-01
10.0.0.12  worker-02
EOF

cat /etc/hosts | grep -E "master|worker"

1.3 — Disable Swap

Kubernetes requires swap to be completely off.

sudo swapoff -a
sudo sed -i '/ swap / s/^\(.*\)$/#\1/g' /etc/fstab

# Verify — Swap row must show: 0B 0B 0B
free -h

1.4 — Load Kernel Modules

sudo tee /etc/modules-load.d/k8s.conf <<EOF
overlay
br_netfilter
EOF

sudo modprobe overlay
sudo modprobe br_netfilter

lsmod | grep -E "overlay|br_netfilter"

1.5 — Apply Sysctl Settings

sudo tee /etc/sysctl.d/99-k8s.conf <<EOF
net.bridge.bridge-nf-call-iptables  = 1
net.bridge.bridge-nf-call-ip6tables = 1
net.ipv4.ip_forward                 = 1
vm.swappiness                       = 0
net.netfilter.nf_conntrack_max      = 1048576
net.core.somaxconn                  = 32768
EOF

sudo sysctl --system

# Verify
sysctl net.ipv4.ip_forward
# Expected: net.ipv4.ip_forward = 1

1.6 — Install containerd

sudo apt-get update
sudo apt-get install -y ca-certificates curl gnupg lsb-release apt-transport-https

sudo install -m 0755 -d /etc/apt/keyrings

curl -fsSL https://download.docker.com/linux/ubuntu/gpg | \
  sudo gpg --dearmor -o /etc/apt/keyrings/docker.gpg

sudo chmod a+r /etc/apt/keyrings/docker.gpg

echo "deb [arch=$(dpkg --print-architecture) signed-by=/etc/apt/keyrings/docker.gpg] \
  https://download.docker.com/linux/ubuntu $(lsb_release -cs) stable" | \
  sudo tee /etc/apt/sources.list.d/docker.list

sudo apt-get update
sudo apt-get install -y containerd.io

sudo mkdir -p /etc/containerd
containerd config default | sudo tee /etc/containerd/config.toml

# CRITICAL: Enable SystemdCgroup — cluster will not form without this
sudo sed -i 's/SystemdCgroup = false/SystemdCgroup = true/' /etc/containerd/config.toml

sudo sed -i 's|sandbox_image = ".*"|sandbox_image = "registry.k8s.io/pause:3.10"|' \
  /etc/containerd/config.toml

sudo systemctl restart containerd
sudo systemctl enable containerd

sudo systemctl status containerd --no-pager

1.7 — Install kubeadm, kubelet, kubectl (v1.35)

Critical: The echo writing the apt source must be a single line. A backslash line continuation writes a literal \n into the .list file causing E: Malformed entry on apt-get update.

curl -fsSL https://pkgs.k8s.io/core:/stable:/v1.35/deb/Release.key | \
  sudo gpg --dearmor -o /etc/apt/keyrings/kubernetes-apt-keyring.gpg

# Single line — no backslash continuation
echo "deb [signed-by=/etc/apt/keyrings/kubernetes-apt-keyring.gpg] https://pkgs.k8s.io/core:/stable:/v1.35/deb/ /" | \
  sudo tee /etc/apt/sources.list.d/kubernetes.list

# Verify — must be exactly one line
cat /etc/apt/sources.list.d/kubernetes.list

sudo apt-get update
sudo apt-get install -y kubelet kubeadm kubectl

# Pin versions — prevent accidental upgrades
sudo apt-mark hold kubelet kubeadm kubectl

sudo systemctl enable kubelet

kubeadm version && kubectl version --client && kubelet --version

Fix for malformed entry error:

sudo rm /etc/apt/sources.list.d/kubernetes.list
# Re-run the single-line echo command above

Phase 2 — Master Initialization

Run on master node only.

2.1 — Pre-pull Control Plane Images

sudo kubeadm config images pull

2.2 — Initialize the Cluster

sudo kubeadm init \
  --apiserver-advertise-address=10.0.0.10 \
  --pod-network-cidr=10.244.0.0/16 \
  --service-cidr=10.96.0.0/12 \
  --control-plane-endpoint=10.0.0.10:6443 \
  | tee ~/kubeadm-init.log

2.3 — Configure kubectl

mkdir -p $HOME/.kube
sudo cp /etc/kubernetes/admin.conf $HOME/.kube/config
sudo chown $(id -u):$(id -g) $HOME/.kube/config

# NotReady is expected here — CNI not installed yet
kubectl get nodes

2.4 — Enable kubectl Autocomplete

echo 'source <(kubectl completion bash)' >> ~/.bashrc
echo 'alias k=kubectl' >> ~/.bashrc
echo 'complete -o default -F __start_kubectl k' >> ~/.bashrc
source ~/.bashrc

2.5 — Generate Worker Join Command

kubeadm token create --print-join-command | tee ~/worker-join.sh
chmod +x ~/worker-join.sh
cat ~/worker-join.sh

Tokens expire in 24 hours. Regenerate anytime:

kubeadm token create --print-join-command

Phase 3 — Calico CNI

Run on master node only.

3.1 — Identify Your NIC Name

# Find the interface bound to your master node IP
ip a | grep -A2 "10.0.0.10"
# Common values: eth0, enp1s0, ens3

3.2 — Install Tigera Operator

kubectl create -f \
  https://raw.githubusercontent.com/projectcalico/calico/v3.29.3/manifests/tigera-operator.yaml

sleep 20
kubectl get pods -n tigera-operator

3.3 — Apply Calico Installation CR

Replace eth0 with your actual NIC name from 3.1.

kubectl apply -f - <<EOF
apiVersion: operator.tigera.io/v1
kind: Installation
metadata:
  name: default
spec:
  cni:
    type: Calico
  calicoNetwork:
    ipPools:
    - name: default-ipv4-pool
      cidr: 10.244.0.0/16
      encapsulation: VXLAN
      natOutgoing: Enabled
      nodeSelector: all()
    nodeAddressAutodetectionV4:
      interface: "eth0"
---
apiVersion: operator.tigera.io/v1
kind: APIServer
metadata:
  name: default
spec: {}
EOF

Why VXLAN? It works on any flat L2 network without a BGP-capable router. Calico tunnels pod traffic inside UDP packets over your existing network — no special switch config needed.

3.4 — Wait for Calico Ready

watch kubectl get pods -n calico-system
# Wait until ALL pods show Running (~90 seconds) then Ctrl+C

kubectl get nodes
# master-01   Ready   control-plane

Phase 4 — Worker Nodes Join

# On each worker node — use fresh token from ~/worker-join.sh on master
sudo kubeadm join 10.0.0.10:6443 \
  --token <token> \
  --discovery-token-ca-cert-hash sha256:<hash>

After both workers join, label them on the master:

kubectl get nodes -o wide

kubectl label node worker-01 node-role.kubernetes.io/worker=worker
kubectl label node worker-02 node-role.kubernetes.io/worker=worker

kubectl get nodes

Expected output:

NAME        STATUS   ROLES           VERSION   INTERNAL-IP
master-01   Ready    control-plane   v1.35.3   10.0.0.10
worker-01   Ready    worker          v1.35.3   10.0.0.11
worker-02   Ready    worker          v1.35.3   10.0.0.12

Phase 5 — MetalLB Load Balancer

Run on master node only.

5.1 — Install MetalLB

kubectl apply -f \
  https://raw.githubusercontent.com/metallb/metallb/v0.14.9/config/manifests/metallb-native.yaml

kubectl wait --namespace metallb-system \
  --for=condition=ready pod \
  --selector=app=metallb \
  --timeout=120s

kubectl get pods -n metallb-system

You should see one speaker pod per node — they handle L2 ARP announcements.

5.2 — Configure IP Pool

kubectl apply -f - <<EOF
apiVersion: metallb.io/v1beta1
kind: IPAddressPool
metadata:
  name: primary-pool
  namespace: metallb-system
spec:
  addresses:
  - 10.0.0.200-10.0.0.220
  autoAssign: false
---
apiVersion: metallb.io/v1beta1
kind: L2Advertisement
metadata:
  name: l2-advert
  namespace: metallb-system
spec:
  ipAddressPools:
  - primary-pool
EOF

kubectl get ipaddresspool -n metallb-system
# AUTO ASSIGN must show: false

autoAssign: false gives you full control over which service gets which IP. Every LoadBalancer service must be annotated with its specific IP.

5.3 — Assign an IP to a Service

# Via annotation on an existing service
kubectl annotate svc <service-name> -n <namespace> \
  metallb.universe.tf/loadBalancerIPs=10.0.0.201

# Or inline in a Service manifest
metadata:
  annotations:
    metallb.universe.tf/loadBalancerIPs: "10.0.0.201"

5.4 — Smoke Test

kubectl create deployment test-lb --image=nginx
kubectl expose deployment test-lb --port=80 --type=LoadBalancer
kubectl annotate svc test-lb metallb.universe.tf/loadBalancerIPs=10.0.0.210

watch kubectl get svc test-lb
# EXTERNAL-IP must show: 10.0.0.210

curl http://10.0.0.210
# Returns: Welcome to nginx!

kubectl delete deployment test-lb && kubectl delete svc test-lb

Phase 6 — NGINX Ingress + StorageClass

6.1 — Install NGINX Ingress Controller

kubectl apply -f \
  https://raw.githubusercontent.com/kubernetes/ingress-nginx/controller-v1.12.1/deploy/static/provider/baremetal/deploy.yaml

# Patch to LoadBalancer and pin to 10.0.0.200
kubectl patch svc ingress-nginx-controller \
  -n ingress-nginx \
  -p '{"spec": {"type": "LoadBalancer"}, "metadata": {"annotations": {"metallb.universe.tf/loadBalancerIPs": "10.0.0.200"}}}'

kubectl wait --namespace ingress-nginx \
  --for=condition=ready pod \
  --selector=app.kubernetes.io/component=controller \
  --timeout=120s

kubectl get svc ingress-nginx-controller -n ingress-nginx
# EXTERNAL-IP: 10.0.0.200

MetalLB assigns 10.0.0.200 to the NGINX Ingress service. All app traffic enters at this single IP. NGINX routes to the correct backend based on the Host header. Every app shares one external IP, differentiated by hostname.

6.2 — Install local-path StorageClass

kubectl apply -f \
  https://raw.githubusercontent.com/rancher/local-path-provisioner/v0.0.30/deploy/local-path-storage.yaml

kubectl patch storageclass local-path \
  -p '{"metadata":{"annotations":{"storageclass.kubernetes.io/is-default-class":"true"}}}'

kubectl get storageclass
# local-path (default) must appear

Without a default StorageClass, any app requesting a PVC will hang in Pending forever. local-path automatically creates a directory on the node's disk (/opt/local-path-provisioner/) — zero extra config.


Phase 7 — Final Verification

echo "=== NODES ===" && kubectl get nodes -o wide
echo "=== ALL PODS ===" && kubectl get pods -A
echo "=== METALLB ===" && kubectl get ipaddresspool,l2advertisement -n metallb-system
echo "=== INGRESS ===" && kubectl get svc ingress-nginx-controller -n ingress-nginx
echo "=== STORAGE ===" && kubectl get storageclass
echo "=== CLUSTER INFO ===" && kubectl cluster-info

Expected final state:

NODES:
master-01   Ready   control-plane   v1.35.3
worker-01   Ready   worker          v1.35.3
worker-02   Ready   worker          v1.35.3

METALLB:
primary-pool   false   ["10.0.0.200-10.0.0.220"]

INGRESS:
ingress-nginx-controller   LoadBalancer   10.0.0.200   80/TCP,443/TCP

STORAGE:
local-path (default)   rancher.io/local-path   true

Deploy Your First App

Stateless (web / API)

kubectl create namespace myapp
kubectl create deployment myapp --image=nginx -n myapp
kubectl expose deployment myapp --port=80 -n myapp

kubectl apply -f - <<EOF
apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
  name: myapp-ingress
  namespace: myapp
  annotations:
    nginx.ingress.kubernetes.io/rewrite-target: /
spec:
  ingressClassName: nginx
  rules:
  - host: myapp.example.com
    http:
      paths:
      - path: /
        pathType: Prefix
        backend:
          service:
            name: myapp
            port:
              number: 80
EOF

# Test — add to /etc/hosts: 10.0.0.200 myapp.example.com
curl -H "Host: myapp.example.com" http://10.0.0.200

App with database (PVC)

kubectl apply -f - <<EOF
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  name: db-pvc
  namespace: myapp
spec:
  accessModes: [ReadWriteOnce]
  resources:
    requests:
      storage: 5Gi
EOF

kubectl get pvc -n myapp
# STATUS: Bound — provisioned automatically by local-path

Adding a New Worker Node

  1. Run all of Phase 1 on the new node (hostname, hosts, swap, modules, sysctl, containerd, kubeadm)
  2. Generate a fresh token on master (tokens expire in 24h):
    kubeadm token create --print-join-command
  3. Run the join command on the new node with sudo
  4. Label on master:
    kubectl label node <new-node> node-role.kubernetes.io/worker=worker

Troubleshooting

SymptomCauseFix
Node stuck NotReadyCNI not installedRun Phase 3 Calico install
Pod stuck PendingNo resources / no nodeskubectl describe pod <n> → check Events
LoadBalancer stuck <pending>Missing IP annotationAdd metallb.universe.tf/loadBalancerIPs annotation
PVC stuck PendingNo StorageClasskubectl get sc → verify local-path is default
Pod CrashLoopBackOffApp errorkubectl logs <pod> --previous
apt-get update malformed entryBackslash in .list fileDelete file, rewrite as single-line echo
Worker join hangs at preflightMissing sudoAdd sudo before kubeadm join
Worker join failsExpired tokenRun kubeadm token create --print-join-command on master
Calico pods stuck InitWrong NIC nameEdit Installation CR — set correct interface: value
DNS not resolving inside podsCoreDNS unhealthykubectl get pods -n kube-system -l k8s-app=kube-dns

Quick Reference

# Cluster health
kubectl get nodes -o wide
kubectl get pods -A
kubectl top nodes && kubectl top pods -A

# Debug a stuck resource
kubectl describe pod <pod> -n <namespace>
kubectl describe node <node>
kubectl describe pvc <pvc> -n <namespace>

# Logs
kubectl logs <pod> -n <namespace>
kubectl logs <pod> -n <namespace> --previous   # crashed container
kubectl logs -n ingress-nginx -l app.kubernetes.io/component=controller -f

# MetalLB
kubectl get ipaddresspool,l2advertisement -n metallb-system

# Storage
kubectl get pv,pvc -A

# Generate fresh join token
kubeadm token create --print-join-command

# Debug pod
kubectl run debug --image=busybox --rm -it --restart=Never -- sh

# Force delete stuck pod
kubectl delete pod <pod> -n <namespace> --grace-period=0 --force

Version Lifecycle

VersionStatusEOL
v1.35 ← current✅ Latest stableFeb 2027
v1.34✅ SupportedOct 2026
v1.33✅ SupportedJun 2026
v1.32❌ End of LifeFeb 2026

Kubernetes releases a new minor version roughly every 4 months, supported for 14 months. Stay within the top 3 supported versions.


Built and verified on a live 3-node bare-metal cluster · March 2026