2022-05-12

I think I found the problem

As best I can tell when I run compute loads I hit power envelope and/or memory limits that cause OSD flapping in my Ceph cluster, which causes recovery events that overload my storage network bandwidth which causes wait time failures which stall Corosync activity leading to a loss of quorum for Proxmox which may or may not be why my Pi-hole LXC containers keep locking up.

I think kubernetes is next. Right after I figure out a logging solution.

#homelab #flex #TestingInProduction #masochism #hireme