I had a crash yesterday in my ovirt cluster, which is made up of 3 nodes.

I just tried to add a new network, but the whole cluster crashed

I added a new network to my cluster, but while I was debugging the newswitch, when the switch was poweroff, the node detected the network card status down and then moved to Non-Operational state.



At this time  all of 3 nodes moved to Non-Operational state.

All virtual machines have started automatic migration,When I received the alert email, all virtual machines were suspended





In 15 minutes my newswitch were power up again.The 3 ovirt-nodes become active again, but many virtual machines become unresponsive or suspended due to forced migration, and only a few virtual machines are pulled up again due to cancelled migration

After I tried to terminate the migration tasks and restart ovirt-engine  service, I was still unable to restore most of the virtual machines, so I had to restart 3 ovirt-nodes to restore my virtual machine

I didn't recover all the virtual machines until an hour later


Then  I modify my migration policy  to " Do Not Migrate Virtual Machines"

Which migration Policy do you recommend?

I'm afraid to use cluster...


zhouhao@vip.friendtimes.net