High Availability
Configure and manage Proxmox VE HA: groups, resource policies, and failover behavior.
High Availability
Proxmox VE HA automatically restarts VMs on a healthy node when a node failure is detected. Cloud-PVE pre-configures the HA stack (Corosync, fencing, watchdog) for you.
How HA works
- Corosync monitors cluster heartbeats between nodes.
- If a node misses heartbeats beyond the timeout, it is declared offline.
- Fencing (STONITH) isolates the failed node (power-off via IPMI/iDRAC) to prevent split-brain.
- HA Manager restarts the VMs that were running on the failed node on surviving nodes.
The entire process takes 20–60 seconds depending on your watchdog and fencing configuration.
Enabling HA for a VM
- Go to Datacenter → HA → Resources
- Click Add
- Select the VM and set:
- Max Restart: number of restart attempts (default: 1)
- Max Relocate: number of migration attempts before restart (default: 1)
- Group: assign to an HA group (optional)
HA Groups
HA groups define node preferences for VM placement. Go to Datacenter → HA → Groups:
Group: production
Nodes: node1:3, node2:2, node3:1
Higher priority numbers mean the node is preferred. VMs in this group will prefer node1, fall back to node2, then node3.
Resource states
| State | Meaning |
|---|---|
started | VM should be running, HA ensures it stays running |
stopped | VM should be stopped, HA won’t restart it |
disabled | HA management disabled for this VM |
ignored | HA ignores this VM |
Testing failover
To test HA without real hardware failure:
# On the node to test (run as root)
systemctl stop pve-cluster corosync
Watch the Datacenter → HA view, within ~30 seconds, your VMs should appear on another node.
Important: Only simulate failure on one node at a time. With a 3-node cluster, losing 2 nodes simultaneously breaks quorum.
Monitoring HA
Check the HA status:
ha-manager status
View HA logs:
journalctl -u pve-ha-lrm -n 50
journalctl -u pve-ha-crm -n 50