[ovirt-users] master storage domain stuck in locked state

Yaniv Kaul ykaul at redhat.com
Sun Jan 22 21:45:45 UTC 2017


On Jan 22, 2017 10:13 PM, "Bill Bill" <jax2568 at outlook.com> wrote:

Hello,



It was 4.0.5 however, we’ve decided to pull the plug on oVirt for now as
it’s too risky in taking down possibly a large number or servers due to
this issue. I think oVirt should be a little less “picky” if you will, on
storage connections. For example, this specific issue prevented anything
storage related from being done. Because the “master” was locked you cannot:



Add other storage

Activate hosts

Start VM’s

Reinitialize the datacenter

Remove storage



These points above a huge – while oVirt is indeed open source, upstream of
RHEV and doesn’t cost anything, I feel that in scenarios like this it could
be the downfall of oVirt itself being too risky.



The logging with oVirt seems to be crazy though – we’ve been testing it now
for about 2.5 years, maybe 3 years? Once oVirt gets in a state where it
cannot connect to something, it just goes haywire – many likely don’t see
this however, every time these things happened it when we’re testing
failover scenarios to see how oVirt responds.



A few recommendations I would make are:


Thank you for your recommendations. I agree with some, wholly disagree with
others.
I'd still appreciate if you could send us the requested logs.

TIA,
Y.



Drop the whole “master” storage thing – it complicates setting storage up.
Either connect, or don’t connect. If there’s connectivity issues, oVirt
gets hung up on switching to this “master” storage. If you have a single
storage domain, you’ll likely have problems as we’ve experienced because
once oVirt cannot find the “master” it begins to go berserk, then spirals
out of control there. It might not on small setups with a few hypervisors,
but on an install with a few hundred VM’s, large number of hypervisors etc,
it seems to get ugly real quick.



Stop trying to reconnect things, I think that’s what I’m looking for. When
something fails, oVirt just goes in a loop over and over which eventually
causes dashboard issues, crazy amounts of logs etc. It would be better if
oVirt would just stop, make a log entry and then quit, maybe after a few
times.



In my case, I could mount the storage manually to ALL hosts, I could even
force start the VM’s with virsh. The oVirt dashboard just kept saying it
was locked, and wouldn’t let you do anything at all with the entire
datacenter.



At this time, we’ve pushed these servers back into production using our
current hypervisor software which is stable but does not have the benefits
of oVirt. It’ll be revisited later on and is still in use for
non-production things.





*From: *Maor Lipchuk <mlipchuk at redhat.com>
*Sent: *Sunday, January 22, 2017 7:33 AM
*To: *Bill Bill <jax2568 at outlook.com>
*Cc: *users <users at ovirt.org>
*Subject: *Re: [ovirt-users] master storage domain stuck in locked state




On Sun, Jan 22, 2017 at 2:31 PM, Maor Lipchuk <mlipchuk at redhat.com> wrote:

> Hi Bill,
>
> Can you please attach the engine and VDSM logs.
> Does the storage domain still stuck?
>

Also which oVirt version are you using?


>
> Regards,
> Maor
>
> On Sat, Jan 21, 2017 at 3:11 AM, Bill Bill <jax2568 at outlook.com> wrote:
>
>>
>>
>> Also cannot reinitialize the datacenter because the storage domain is
>> locked.
>>
>>
>>
>> *From: *Bill Bill <jax2568 at outlook.com>
>> *Sent: *Friday, January 20, 2017 8:08 PM
>> *To: *users <users at ovirt.org>
>> *Subject: *RE: master storage domain stuck in locked state
>>
>>
>>
>> Spoke too soon. Some hosts came back up but the storage domain is still
>> locked so no vm’s can be started. What is the proper way to force this to
>> be unlocked? Each time we look to move into production after successful
>> testing, something like this always seems to pop up at the last minute
>> rending oVirt questionable in terms of reliability for some unknown issue.
>>
>>
>>
>>
>>
>>
>>
>> *From: *Bill Bill <jax2568 at outlook.com>
>> *Sent: *Friday, January 20, 2017 7:54 PM
>> *To: *users <users at ovirt.org>
>> *Subject: *RE: master storage domain stuck in locked state
>>
>>
>>
>>
>>
>> So apparently something didn’t change the metadata to master before
>> connection was lost. I changed the metadata role to master and it came
>> backup. Seems emailing in helped because every time I can’t figure
>> something out, email in a find it shortly after.
>>
>>
>>
>>
>>
>> *From: *Bill Bill <jax2568 at outlook.com>
>> *Sent: *Friday, January 20, 2017 7:43 PM
>> *To: *users <users at ovirt.org>
>> *Subject: *master storage domain stuck in locked state
>>
>>
>>
>> No clue how to get this out. I can mount all storage manually on the
>> hypervisors. It seems like after a reboot oVirt is now having some issue
>> and the storage domain is stuck in locked state. Because of this, can’t
>> activate any other storage either, so the other domains are in maintenance
>> and the master sits in locked state, has been for hours.
>>
>>
>>
>> This sticks out on a hypervisor:
>>
>>
>>
>> StoragePoolWrongMaster: Wrong Master domain or its version:
>> u'SD=d8a0172e-837f-4552-92c7-566dc4e548e4, pool=3fd2ad92-e1eb-49c2-906d-0
>> 0ec233f610a'
>>
>>
>>
>> Not sure, nothing changed other than a reboot of the storage.
>>
>>
>>
>> Engine log shows:
>>
>>
>>
>> [org.ovirt.engine.core.vdsbroker.SetVdsStatusVDSCommand]
>> (DefaultQuartzScheduler8) [5696732b] START, SetVdsStatusVDSCommand(HostName
>> = U31U32NodeA, SetVdsStatusVDSCommandParameters:{runAsync='true',
>> hostId='70e2b8e4-0752-47a8-884c-837a00013e79', status='NonOperational',
>> nonOperationalReason='STORAGE_DOMAIN_UNREACHABLE',
>> stopSpmFailureLogged='false', maintenanceReason='null'}), log id: 6db9820a
>>
>>
>>
>> No idea why it says unreachable, it certainly is because I can manually
>> mount ALL storage to the hypervisor.
>>
>>
>>
>> Sent from Mail <https://go.microsoft.com/fwlink/?LinkId=550986> for
>> Windows 10
>>
>>
>>
>> _______________________________________________
>> Users mailing list
>> Users at ovirt.org
>> http://lists.ovirt.org/mailman/listinfo/users
>>
>>
>

_______________________________________________
Users mailing list
Users at ovirt.org
http://lists.ovirt.org/mailman/listinfo/users
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.ovirt.org/pipermail/users/attachments/20170122/6bf99ea5/attachment.html>


More information about the Users mailing list