public-edge VPS — Hetzner AX41-NVMe HEL1, LUKS + clevis-tang LAN-unlock, full replication peer
  • Shell 88.3%
  • Lua 6.5%
  • Dockerfile 5.2%
Find a file
2026-06-30 02:27:44 +01:00
.forgejo/workflows chore: trigger deploy to verify forgejo-runner end-to-end 2026-05-26 21:48:11 +01:00
configs harden(runner): pin forge.azrak.io to LAN + auto-heal a wedged forgejo_runner 2026-06-30 02:26:48 +01:00
initramfs-tools chore(reip): remap .11->.10 (r2-d2), .12->.11 (c-3po), v4+v6 2026-06-14 11:16:08 +01:00
network feat(boot): durable cold-start recovery — tunnel routes/SNAT + boot-reconcile 2026-06-15 10:21:31 +01:00
runbook harden(runner): pin forge.azrak.io to LAN + auto-heal a wedged forgejo_runner 2026-06-30 02:26:48 +01:00
scripts init: README + runbook (steps 1-3) + tang LAN setup script + azrak backup handoff prompt 2026-05-26 11:27:15 +01:00
stack Merge branch 'fix/technitium-public-doh' into trunk 2026-06-16 14:03:08 +01:00
.gitignore feat(traefik): public-edge Traefik v3 on optimus, ACME via Cloudflare DNS-01 2026-05-26 13:19:38 +01:00
README.md init: README + runbook (steps 1-3) + tang LAN setup script + azrak backup handoff prompt 2026-05-26 11:27:15 +01:00

optimus — public-edge VPS

Single Hetzner AX41-NVMe dedicated server in HEL1. Public ingress for all azrak.io / opmail.io / ivoryghst.io traffic, full LUKS encryption, replication peer to r2-d2 + c-3po + pi.

At-a-glance

Host optimus (Hetzner AX41-NVMe @ HEL1)
Public IPv4 65.108.206.224/26
Public IPv6 2a01:4f9:1a:a2db::2/64
CPU AMD Ryzen 5 3600 (6c/12t)
RAM 64 GB DDR4 ECC
Storage 2× 512 GB NVMe (RAID1 LUKS)
OS Debian 12 bookworm
Encryption LUKS2 full-disk + clevis-tang LAN-unlock + dropbear-initramfs fallback
VPN UniFi WireGuard site-to-site to LAN

Role

  1. Public TLS edge — Traefik terminates LE certs, all public hostnames point here.
  2. Full replication peer — Stalwart cluster member, Forgejo warm-standby, AdGuard 4th replica, Postgres/Patroni 4th member, Redis 4th sentinel.
  3. Static SMTP IP — outbound mail leaves from here for clean reputation.
  4. DDoS absorber — provider edge handles volumetric attacks before they reach LAN.

Encryption story

LUKS2 full-disk. The OS partition + everything is encrypted; only /boot is plaintext (kernel + initramfs).

Boot flow:

  1. Power on / reboot
  2. BIOS → grub → kernel + initramfs (from /boot)
  3. initramfs brings up network (eth0 + static IPv4/IPv6 from rescue config)
  4. clevis-tang queries 3 LAN tang servers (r2-d2, c-3po, pi) over public internet (tang's protocol is cryptographically secure over plain HTTP — see runbook/clevis-tang.md)
  5. Shamir Secret Sharing policy t=1 — any 1 of 3 tangs available = unlock material reconstructed
  6. LUKS volume unlocks, root mounts, continues normal boot
  7. Services start

Fallback: if all 3 LAN tangs are unreachable (e.g., during prolonged LAN outage AND a forced VPS reboot), dropbear-initramfs lets you SSH in to the initramfs and type the LUKS passphrase manually.

The provider cannot decrypt at rest — no key material on the VPS disk. The closest commodity hosting gets to "they can't spy" without paying $200+/mo for confidential VMs.

Replication topology (TBD as we build)

Service Primary Replicas
Stalwart Mail r2-d2 + c-3po (Swarm cluster, 2/2) + optimus (3rd cluster member)
Forgejo r2-d2 (live) c-3po (lsyncd standby) + optimus (lsyncd standby)
AdGuard DNS pi (HA addon) r2-d2 + c-3po + optimus (4-way replica via adguardhome-sync)
Postgres (Patroni) current leader varies + optimus as 4th member (etcd-4 + patroni-4)
Redis c-3po (Sentinel master) + optimus as 4th sentinel + replica
TimescaleDB (monitoring) r2-d2 c-3po (streaming standby) + optimus (TBD)

Repo layout

  • runbook/ — step-by-step bringup + ops procedures
  • configs/ — Traefik static config, clevis-tang policy, WireGuard, etc.
  • scripts/ — provisioning helpers + restore-from-backup scripts

Operations notes

  • Reboot during LAN outage: VPS will boot but LUKS won't auto-unlock. SSH in to initramfs at port 22 on the public IP and type the passphrase manually (or wait for LAN to recover).
  • Tang server outage on LAN: With t=1 policy, 1 of 3 tangs alive = unlock works. Only all 3 down + a VPS reboot is a problem.
  • Service replication state: services that need data continue serving from RAM-cached state after LAN outage (no fresh writes until LAN returns + lookup store is reachable). See per-service runbooks.