<html>
<head>
<meta http-equiv="Content-Type" content="text/html; charset=Windows-1252">
</head>
<body>
<meta name="Generator" content="Microsoft Word 15 (filtered medium)">
<style><!--
/* Font Definitions */
@font-face
        {font-family:"Cambria Math";
        panose-1:2 4 5 3 5 4 6 3 2 4;}
@font-face
        {font-family:Calibri;
        panose-1:2 15 5 2 2 2 4 3 2 4;}
/* Style Definitions */
p.MsoNormal, li.MsoNormal, div.MsoNormal
        {margin:0in;
        margin-bottom:.0001pt;
        font-size:11.0pt;
        font-family:"Calibri",sans-serif;}
a:link, span.MsoHyperlink
        {mso-style-priority:99;
        color:blue;
        text-decoration:underline;}
a:visited, span.MsoHyperlinkFollowed
        {mso-style-priority:99;
        color:#954F72;
        text-decoration:underline;}
.MsoChpDefault
        {mso-style-type:export-only;}
@page WordSection1
        {size:8.5in 11.0in;
        margin:1.0in 1.0in 1.0in 1.0in;}
div.WordSection1
        {page:WordSection1;}
--></style>
<div class="WordSection1">
<p class="MsoNormal">Hello,</p>
<p class="MsoNormal"><o:p> </o:p></p>
<p class="MsoNormal">It was 4.0.5 however, we’ve decided to pull the plug on oVirt for now as it’s too risky in taking down possibly a large number or servers due to this issue. I think oVirt should be a little less “picky” if you will, on storage connections.
For example, this specific issue prevented anything storage related from being done. Because the “master” was locked you cannot:</p>
<p class="MsoNormal"><o:p> </o:p></p>
<p class="MsoNormal">Add other storage</p>
<p class="MsoNormal">Activate hosts</p>
<p class="MsoNormal">Start VM’s</p>
<p class="MsoNormal">Reinitialize the datacenter</p>
<p class="MsoNormal">Remove storage</p>
<p class="MsoNormal"><o:p> </o:p></p>
<p class="MsoNormal">These points above a huge – while oVirt is indeed open source, upstream of RHEV and doesn’t cost anything, I feel that in scenarios like this it could be the downfall of oVirt itself being too risky.</p>
<p class="MsoNormal"><o:p> </o:p></p>
<p class="MsoNormal">The logging with oVirt seems to be crazy though – we’ve been testing it now for about 2.5 years, maybe 3 years? Once oVirt gets in a state where it cannot connect to something, it just goes haywire – many likely don’t see this however,
every time these things happened it when we’re testing failover scenarios to see how oVirt responds.</p>
<p class="MsoNormal"><o:p> </o:p></p>
<p class="MsoNormal">A few recommendations I would make are:</p>
<p class="MsoNormal"><o:p> </o:p></p>
<p class="MsoNormal">Drop the whole “master” storage thing – it complicates setting storage up. Either connect, or don’t connect. If there’s connectivity issues, oVirt gets hung up on switching to this “master” storage. If you have a single storage domain,
you’ll likely have problems as we’ve experienced because once oVirt cannot find the “master” it begins to go berserk, then spirals out of control there. It might not on small setups with a few hypervisors, but on an install with a few hundred VM’s, large number
of hypervisors etc, it seems to get ugly real quick.</p>
<p class="MsoNormal"><o:p> </o:p></p>
<p class="MsoNormal">Stop trying to reconnect things, I think that’s what I’m looking for. When something fails, oVirt just goes in a loop over and over which eventually causes dashboard issues, crazy amounts of logs etc. It would be better if oVirt would just
stop, make a log entry and then quit, maybe after a few times.</p>
<p class="MsoNormal"><o:p> </o:p></p>
<p class="MsoNormal">In my case, I could mount the storage manually to ALL hosts, I could even force start the VM’s with virsh. The oVirt dashboard just kept saying it was locked, and wouldn’t let you do anything at all with the entire datacenter.</p>
<p class="MsoNormal"><o:p> </o:p></p>
<p class="MsoNormal">At this time, we’ve pushed these servers back into production using our current hypervisor software which is stable but does not have the benefits of oVirt. It’ll be revisited later on and is still in use for non-production things.</p>
<p class="MsoNormal"><o:p> </o:p></p>
<p class="MsoNormal"><o:p> </o:p></p>
<div style="mso-element:para-border-div;border:none;border-top:solid #E1E1E1 1.0pt;padding:3.0pt 0in 0in 0in">
<p class="MsoNormal" style="border:none;padding:0in"><b>From: </b><a href="mailto:mlipchuk@redhat.com">Maor Lipchuk</a><br>
<b>Sent: </b>Sunday, January 22, 2017 7:33 AM<br>
<b>To: </b><a href="mailto:jax2568@outlook.com">Bill Bill</a><br>
<b>Cc: </b><a href="mailto:users@ovirt.org">users</a><br>
<b>Subject: </b>Re: [ovirt-users] master storage domain stuck in locked state</p>
</div>
<p class="MsoNormal"><o:p> </o:p></p>
</div>
<div>
<div dir="ltr"><br>
<div class="gmail_extra"><br>
<div class="gmail_quote">On Sun, Jan 22, 2017 at 2:31 PM, Maor Lipchuk <span dir="ltr">
<<a href="mailto:mlipchuk@redhat.com" target="_blank">mlipchuk@redhat.com</a>></span> wrote:<br>
<blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">
<div dir="ltr">Hi Bill,
<div><br>
</div>
<div>Can you please attach the engine and VDSM logs.</div>
<div>Does the storage domain still stuck?</div>
</div>
</blockquote>
<div><br>
</div>
<div>Also which oVirt version are you using?</div>
<div> </div>
<blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">
<div dir="ltr">
<div><br>
</div>
<div>Regards,</div>
<div>Maor</div>
</div>
<div class="gmail_extra"><br>
<div class="gmail_quote">
<div>
<div class="h5">On Sat, Jan 21, 2017 at 3:11 AM, Bill Bill <span dir="ltr"><<a href="mailto:jax2568@outlook.com" target="_blank">jax2568@outlook.com</a>></span> wrote:<br>
</div>
</div>
<blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">
<div>
<div class="h5">
<div lang="EN-US" link="blue" vlink="#954F72">
<div class="m_723276570252969701m_-2288590467547947270WordSection1">
<p class="MsoNormal"><u></u> <u></u></p>
<p class="MsoNormal">Also cannot reinitialize the datacenter because the storage domain is locked.</p>
<p class="MsoNormal"><u></u> <u></u></p>
<div style="border:none;border-top:solid #e1e1e1 1.0pt;padding:3.0pt 0in 0in 0in">
<p class="MsoNormal" style="border:none;padding:0in"><b>From: </b><a href="mailto:jax2568@outlook.com" target="_blank">Bill Bill</a><br>
<b>Sent: </b>Friday, January 20, 2017 8:08 PM<span><br>
<b>To: </b><a href="mailto:users@ovirt.org" target="_blank">users</a><br>
<b>Subject: </b>RE: master storage domain stuck in locked state</span></p>
</div>
<p class="MsoNormal"><u></u> <u></u></p>
</div>
<span>
<div>
<div class="m_723276570252969701m_-2288590467547947270WordSection1">
<p class="MsoNormal">Spoke too soon. Some hosts came back up but the storage domain is still locked so no vm’s can be started. What is the proper way to force this to be unlocked? Each time we look to move into production after successful testing, something
like this always seems to pop up at the last minute rending oVirt questionable in terms of reliability for some unknown issue.</p>
<p class="MsoNormal"><u></u> <u></u></p>
<p class="MsoNormal"><u></u> <u></u></p>
<p class="MsoNormal"><u></u> <u></u></p>
<div style="border:none;border-top:solid #e1e1e1 1.0pt;padding:3.0pt 0in 0in 0in">
<p class="MsoNormal" style="border:none;padding:0in"><b>From: </b><a href="mailto:jax2568@outlook.com" target="_blank">Bill Bill</a><br>
<b>Sent: </b>Friday, January 20, 2017 7:54 PM<br>
<b>To: </b><a href="mailto:users@ovirt.org" target="_blank">users</a><br>
<b>Subject: </b>RE: master storage domain stuck in locked state</p>
</div>
<p class="MsoNormal"><u></u> <u></u></p>
</div>
<div>
<div class="m_723276570252969701m_-2288590467547947270WordSection1">
<p class="MsoNormal"><u></u> <u></u></p>
<p class="MsoNormal">So apparently something didn’t change the metadata to master before connection was lost. I changed the metadata role to master and it came backup. Seems emailing in helped because every time I can’t figure something out, email in a find
it shortly after.</p>
<p class="MsoNormal"><u></u> <u></u></p>
<p class="MsoNormal"><u></u> <u></u></p>
<div style="border:none;border-top:solid #e1e1e1 1.0pt;padding:3.0pt 0in 0in 0in">
<p class="MsoNormal" style="border:none;padding:0in"><b>From: </b><a href="mailto:jax2568@outlook.com" target="_blank">Bill Bill</a><br>
<b>Sent: </b>Friday, January 20, 2017 7:43 PM<br>
<b>To: </b><a href="mailto:users@ovirt.org" target="_blank">users</a><br>
<b>Subject: </b>master storage domain stuck in locked state</p>
</div>
<p class="MsoNormal"><u></u> <u></u></p>
</div>
<div>
<div class="m_723276570252969701m_-2288590467547947270WordSection1">
<p class="MsoNormal">No clue how to get this out. I can mount all storage manually on the hypervisors. It seems like after a reboot oVirt is now having some issue and the storage domain is stuck in locked state. Because of this, can’t activate any other storage
either, so the other domains are in maintenance and the master sits in locked state, has been for hours.</p>
<p class="MsoNormal"><u></u> <u></u></p>
<p class="MsoNormal">This sticks out on a hypervisor:</p>
<p class="MsoNormal"><u></u> <u></u></p>
<p class="MsoNormal">StoragePoolWrongMaster: Wrong Master domain or its version: u'SD=d8a0172e-837f-4552-92c7-5<wbr>66dc4e548e4, pool=3fd2ad92-e1eb-49c2-906d-0<wbr>0ec233f610a'</p>
<p class="MsoNormal"><u></u> <u></u></p>
<p class="MsoNormal">Not sure, nothing changed other than a reboot of the storage.</p>
<p class="MsoNormal"><u></u> <u></u></p>
<p class="MsoNormal">Engine log shows:</p>
<p class="MsoNormal"><u></u> <u></u></p>
<p class="MsoNormal">[org.ovirt.engine.core.vdsbrok<wbr>er.SetVdsStatusVDSCommand] (DefaultQuartzScheduler8) [5696732b] START, SetVdsStatusVDSCommand(HostNam<wbr>e = U31U32NodeA, SetVdsStatusVDSCommandParamete<wbr>rs:{runAsync='true', hostId='70e2b8e4-0752-47a8-884<wbr>c-837a00013e79',
status='NonOperational', nonOperationalReason='STORAGE_<wbr>DOMAIN_UNREACHABLE', stopSpmFailureLogged='false', maintenanceReason='null'}), log id: 6db9820a</p>
<p class="MsoNormal"><u></u> <u></u></p>
<p class="MsoNormal">No idea why it says unreachable, it certainly is because I can manually mount ALL storage to the hypervisor.</p>
<p class="MsoNormal"><u></u> <u></u></p>
<p class="MsoNormal">Sent from <a href="https://go.microsoft.com/fwlink/?LinkId=550986" target="_blank">
Mail</a> for Windows 10</p>
<p class="MsoNormal"><u></u> <u></u></p>
</div>
</div>
</div>
</div>
</span></div>
<br>
</div>
</div>
______________________________<wbr>_________________<br>
Users mailing list<br>
<a href="mailto:Users@ovirt.org" target="_blank">Users@ovirt.org</a><br>
<a href="http://lists.ovirt.org/mailman/listinfo/users" rel="noreferrer" target="_blank">http://lists.ovirt.org/mailman<wbr>/listinfo/users</a><br>
<br>
</blockquote>
</div>
<br>
</div>
</blockquote>
</div>
<br>
</div>
</div>
</div>
</body>
</html>