[ovirt-users] master storage domain stuck in locked state

Bill Bill jax2568 at outlook.com
Sun Jan 22 20:07:09 UTC 2017


Hello,

It was 4.0.5 however, we’ve decided to pull the plug on oVirt for now as it’s too risky in taking down possibly a large number or servers due to this issue. I think oVirt should be a little less “picky” if you will, on storage connections. For example, this specific issue prevented anything storage related from being done. Because the “master” was locked you cannot:

Add other storage
Activate hosts
Start VM’s
Reinitialize the datacenter
Remove storage

These points above a huge – while oVirt is indeed open source, upstream of RHEV and doesn’t cost anything, I feel that in scenarios like this it could be the downfall of oVirt itself being too risky.

The logging with oVirt seems to be crazy though – we’ve been testing it now for about 2.5 years, maybe 3 years? Once oVirt gets in a state where it cannot connect to something, it just goes haywire – many likely don’t see this however, every time these things happened it when we’re testing failover scenarios to see how oVirt responds.

A few recommendations I would make are:

Drop the whole “master” storage thing – it complicates setting storage up. Either connect, or don’t connect. If there’s connectivity issues, oVirt gets hung up on switching to this “master” storage. If you have a single storage domain, you’ll likely have problems as we’ve experienced because once oVirt cannot find the “master” it begins to go berserk, then spirals out of control there. It might not on small setups with a few hypervisors, but on an install with a few hundred VM’s, large number of hypervisors etc, it seems to get ugly real quick.

Stop trying to reconnect things, I think that’s what I’m looking for. When something fails, oVirt just goes in a loop over and over which eventually causes dashboard issues, crazy amounts of logs etc. It would be better if oVirt would just stop, make a log entry and then quit, maybe after a few times.

In my case, I could mount the storage manually to ALL hosts, I could even force start the VM’s with virsh. The oVirt dashboard just kept saying it was locked, and wouldn’t let you do anything at all with the entire datacenter.

At this time, we’ve pushed these servers back into production using our current hypervisor software which is stable but does not have the benefits of oVirt. It’ll be revisited later on and is still in use for non-production things.


From: Maor Lipchuk<mailto:mlipchuk at redhat.com>
Sent: Sunday, January 22, 2017 7:33 AM
To: Bill Bill<mailto:jax2568 at outlook.com>
Cc: users<mailto:users at ovirt.org>
Subject: Re: [ovirt-users] master storage domain stuck in locked state



On Sun, Jan 22, 2017 at 2:31 PM, Maor Lipchuk <mlipchuk at redhat.com<mailto:mlipchuk at redhat.com>> wrote:
Hi Bill,

Can you please attach the engine and VDSM logs.
Does the storage domain still stuck?

Also which oVirt version are you using?


Regards,
Maor

On Sat, Jan 21, 2017 at 3:11 AM, Bill Bill <jax2568 at outlook.com<mailto:jax2568 at outlook.com>> wrote:

Also cannot reinitialize the datacenter because the storage domain is locked.

From: Bill Bill<mailto:jax2568 at outlook.com>
Sent: Friday, January 20, 2017 8:08 PM
To: users<mailto:users at ovirt.org>
Subject: RE: master storage domain stuck in locked state

Spoke too soon. Some hosts came back up but the storage domain is still locked so no vm’s can be started. What is the proper way to force this to be unlocked? Each time we look to move into production after successful testing, something like this always seems to pop up at the last minute rending oVirt questionable in terms of reliability for some unknown issue.



From: Bill Bill<mailto:jax2568 at outlook.com>
Sent: Friday, January 20, 2017 7:54 PM
To: users<mailto:users at ovirt.org>
Subject: RE: master storage domain stuck in locked state


So apparently something didn’t change the metadata to master before connection was lost. I changed the metadata role to master and it came backup. Seems emailing in helped because every time I can’t figure something out, email in a find it shortly after.


From: Bill Bill<mailto:jax2568 at outlook.com>
Sent: Friday, January 20, 2017 7:43 PM
To: users<mailto:users at ovirt.org>
Subject: master storage domain stuck in locked state

No clue how to get this out. I can mount all storage manually on the hypervisors. It seems like after a reboot oVirt is now having some issue and the storage domain is stuck in locked state. Because of this, can’t activate any other storage either, so the other domains are in maintenance and the master sits in locked state, has been for hours.

This sticks out on a hypervisor:

StoragePoolWrongMaster: Wrong Master domain or its version: u'SD=d8a0172e-837f-4552-92c7-566dc4e548e4, pool=3fd2ad92-e1eb-49c2-906d-00ec233f610a'

Not sure, nothing changed other than a reboot of the storage.

Engine log shows:

[org.ovirt.engine.core.vdsbroker.SetVdsStatusVDSCommand] (DefaultQuartzScheduler8) [5696732b] START, SetVdsStatusVDSCommand(HostName = U31U32NodeA, SetVdsStatusVDSCommandParameters:{runAsync='true', hostId='70e2b8e4-0752-47a8-884c-837a00013e79', status='NonOperational', nonOperationalReason='STORAGE_DOMAIN_UNREACHABLE', stopSpmFailureLogged='false', maintenanceReason='null'}), log id: 6db9820a

No idea why it says unreachable, it certainly is because I can manually mount ALL storage to the hypervisor.

Sent from Mail<https://go.microsoft.com/fwlink/?LinkId=550986> for Windows 10


_______________________________________________
Users mailing list
Users at ovirt.org<mailto:Users at ovirt.org>
http://lists.ovirt.org/mailman/listinfo/users



-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.ovirt.org/pipermail/users/attachments/20170122/b3399d34/attachment-0001.html>


More information about the Users mailing list