It turned out to be BTRFS mis-managing free space, deciding there was none left, and going into read-only mode. The QEMU/KVM server blocked on disk IO and paused the virtual machines, which meant that they couldn't even respond to pings.
I've setup a cron job to run a weekly balance on the BTRFS filesystem which will prevent this happening again. I've seen similar things in the past but didn't expect them in this case because the filesystem is only 50% full.