[ovirt-users] Re: null

Monday, 6 January 2020

Hi,
        if you have a network in the cluster that is set to required any hosts in the
cluster will become non-operational if it's isn't available. You can set the
logical network not to be required while testing or if it isn't essential.

Regards,
             Paul S.
________________________________
From: Strahil <hunter86_bg(a)yahoo.com&gt;
Sent: 27 December 2019 12:18
To: zhouhao <zhouhao(a)vip.friendtimes.net&gt;
Cc: users <users(a)ovirt.org&gt;
Subject: [ovirt-users] Re: null

Caution External Mail: Do not click any links or open any attachments unless you trust the
sender and know that the content is safe.

I do use  automatic migration policy.

The main question you have to solve is:
1. Why the nodes became 'Non-operational' .Usually this happens when the
management interface (in your case HoatedEngine VM) could not reach the nodes over the
management network.

By default, management is going over the ovirtmgmt network. I guess you have created the
new network. Marked that new network as management network and then the switch was off ,
causing 'Non-Operational  state'.

2. Migrating VMs is usually a safe approach, but this behavior is quite strange. If a node
ia Non-operational ->  there could be no successful migration.

3. Some of the VMs got paused  due to storage issue.  Are you using GlusterFS, NFS or
iSCSI ? If yes, you need to clarify why you lost your storage.

I guess  for now you can mark each VM to be migrated only manually (VM -> Edit) and if
they are critical VMs , set a high Availability from each VM's Edit options.

In such case, if a node fails, the VMs will be restarted on another node.

4.  Have you setup node fencing ? For example APC, iLo, iDRAC and other fencing mechanisms
can allow the HostedEngine use another Host as fencing proxy and to reset the problematic
Hypervisor.

P.S.: You can define the following alias  in '~/.bashrc' :
alias virsh='virsh -c
qemu:///system?authfile=/etc/ovirt-hosted-engine/virsh_auth.conf'
Then you can verify your VMs even when a HostedEngine is down:
'virsh list --all'

Best Regards,
Strahil Nikolov

On Dec 27, 2019 08:40, zhouhao(a)vip.friendtimes.net wrote:

I had a crash yesterday in my ovirt cluster, which is made up of 3 nodes.

I just tried to add a new network, but the whole cluster crashed

I added a new network to my cluster, but while I was debugging the newswitch, when the
switch was poweroff, the node detected the network card status down and then moved to
Non-Operational state.

[cid:_Foxmail.1@70509720-d6f2-703b-588f-74d3f9e5d744]

At this time  all of 3 nodes moved to Non-Operational state.

All virtual machines have started automatic migration,When I received the alert email, all
virtual machines were suspended

[cid:_Foxmail.1@ada368b8-1ee1-7900-b071-893abafedb48]

[cid:_Foxmail.1@b944d96d-8869-080a-1dbb-303835b5d836]

[cid:_Foxmail.1@958fb6db-56f3-e0c2-e5d7-89529eba4dca]

[cid:_Foxmail.1@8d068bd2-6b94-9885-2d56-4b29840d3729]

[cid:_Foxmail.1@076577a9-4aee-f6ee-748d-f76c9b20f8c4]

In 15 minutes my newswitch were power up again.The 3 ovirt-nodes become active again, but
many virtual machines become unresponsive or suspended due to forced migration, and only a
few virtual machines are pulled up again due to cancelled migration

After I tried to terminate the migration tasks and restart ovirt-engine  service, I was
still unable to restore most of the virtual machines, so I had to restart 3 ovirt-nodes to
restore my virtual machine

I didn't recover all the virtual machines until an hour later

Then  I modify my migration policy  to " Do Not Migrate Virtual Machines"

Which migration Policy do you recommend?

I'm afraid to use cluster...

[cid:_Foxmail.1@98fc2844-fd89-acac-56b9-62752c2d69f8]
________________________________
zhouhao(a)vip.friendtimes.net
To view the terms under which this email is distributed, please go to:-
http://leedsbeckett.ac.uk/disclaimer/email/

2025

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

[ovirt-users] Re: null