
--_000_CO2PR0801MB0743662893F6DBFE28BB3C39A6730CO2PR0801MB0743_ Content-Type: text/plain; charset="Windows-1252" Content-Transfer-Encoding: quoted-printable Hello, It was 4.0.5 however, we=92ve decided to pull the plug on oVirt for now as = it=92s too risky in taking down possibly a large number or servers due to t= his issue. I think oVirt should be a little less =93picky=94 if you will, o= n storage connections. For example, this specific issue prevented anything = storage related from being done. Because the =93master=94 was locked you ca= nnot: Add other storage Activate hosts Start VM=92s Reinitialize the datacenter Remove storage These points above a huge =96 while oVirt is indeed open source, upstream o= f RHEV and doesn=92t cost anything, I feel that in scenarios like this it c= ould be the downfall of oVirt itself being too risky. The logging with oVirt seems to be crazy though =96 we=92ve been testing it= now for about 2.5 years, maybe 3 years? Once oVirt gets in a state where i= t cannot connect to something, it just goes haywire =96 many likely don=92t= see this however, every time these things happened it when we=92re testing= failover scenarios to see how oVirt responds. A few recommendations I would make are: Drop the whole =93master=94 storage thing =96 it complicates setting storag= e up. Either connect, or don=92t connect. If there=92s connectivity issues,= oVirt gets hung up on switching to this =93master=94 storage. If you have = a single storage domain, you=92ll likely have problems as we=92ve experienc= ed because once oVirt cannot find the =93master=94 it begins to go berserk,= then spirals out of control there. It might not on small setups with a few= hypervisors, but on an install with a few hundred VM=92s, large number of = hypervisors etc, it seems to get ugly real quick. Stop trying to reconnect things, I think that=92s what I=92m looking for. W= hen something fails, oVirt just goes in a loop over and over which eventual= ly causes dashboard issues, crazy amounts of logs etc. It would be better i= f oVirt would just stop, make a log entry and then quit, maybe after a few = times. In my case, I could mount the storage manually to ALL hosts, I could even f= orce start the VM=92s with virsh. The oVirt dashboard just kept saying it w= as locked, and wouldn=92t let you do anything at all with the entire datace= nter. At this time, we=92ve pushed these servers back into production using our c= urrent hypervisor software which is stable but does not have the benefits o= f oVirt. It=92ll be revisited later on and is still in use for non-producti= on things. From: Maor Lipchuk<mailto:mlipchuk@redhat.com> Sent: Sunday, January 22, 2017 7:33 AM To: Bill Bill<mailto:jax2568@outlook.com> Cc: users<mailto:users@ovirt.org> Subject: Re: [ovirt-users] master storage domain stuck in locked state On Sun, Jan 22, 2017 at 2:31 PM, Maor Lipchuk <mlipchuk@redhat.com<mailto:m= lipchuk@redhat.com>> wrote: Hi Bill, Can you please attach the engine and VDSM logs. Does the storage domain still stuck? Also which oVirt version are you using? Regards, Maor On Sat, Jan 21, 2017 at 3:11 AM, Bill Bill <jax2568@outlook.com<mailto:jax2= 568@outlook.com>> wrote: Also cannot reinitialize the datacenter because the storage domain is locke= d. From: Bill Bill<mailto:jax2568@outlook.com> Sent: Friday, January 20, 2017 8:08 PM To: users<mailto:users@ovirt.org> Subject: RE: master storage domain stuck in locked state Spoke too soon. Some hosts came back up but the storage domain is still loc= ked so no vm=92s can be started. What is the proper way to force this to be= unlocked? Each time we look to move into production after successful testi= ng, something like this always seems to pop up at the last minute rending o= Virt questionable in terms of reliability for some unknown issue. From: Bill Bill<mailto:jax2568@outlook.com> Sent: Friday, January 20, 2017 7:54 PM To: users<mailto:users@ovirt.org> Subject: RE: master storage domain stuck in locked state So apparently something didn=92t change the metadata to master before conne= ction was lost. I changed the metadata role to master and it came backup. S= eems emailing in helped because every time I can=92t figure something out, = email in a find it shortly after. From: Bill Bill<mailto:jax2568@outlook.com> Sent: Friday, January 20, 2017 7:43 PM To: users<mailto:users@ovirt.org> Subject: master storage domain stuck in locked state No clue how to get this out. I can mount all storage manually on the hyperv= isors. It seems like after a reboot oVirt is now having some issue and the = storage domain is stuck in locked state. Because of this, can=92t activate = any other storage either, so the other domains are in maintenance and the m= aster sits in locked state, has been for hours. This sticks out on a hypervisor: StoragePoolWrongMaster: Wrong Master domain or its version: u'SD=3Dd8a0172e= -837f-4552-92c7-566dc4e548e4, pool=3D3fd2ad92-e1eb-49c2-906d-00ec233f610a' Not sure, nothing changed other than a reboot of the storage. Engine log shows: [org.ovirt.engine.core.vdsbroker.SetVdsStatusVDSCommand] (DefaultQuartzSche= duler8) [5696732b] START, SetVdsStatusVDSCommand(HostName =3D U31U32NodeA, = SetVdsStatusVDSCommandParameters:{runAsync=3D'true', hostId=3D'70e2b8e4-075= 2-47a8-884c-837a00013e79', status=3D'NonOperational', nonOperationalReason= =3D'STORAGE_DOMAIN_UNREACHABLE', stopSpmFailureLogged=3D'false', maintenanc= eReason=3D'null'}), log id: 6db9820a No idea why it says unreachable, it certainly is because I can manually mou= nt ALL storage to the hypervisor. Sent from Mail<https://go.microsoft.com/fwlink/?LinkId=3D550986> for Window= s 10 _______________________________________________ Users mailing list Users@ovirt.org<mailto:Users@ovirt.org> http://lists.ovirt.org/mailman/listinfo/users --_000_CO2PR0801MB0743662893F6DBFE28BB3C39A6730CO2PR0801MB0743_ Content-Type: text/html; charset="Windows-1252" Content-Transfer-Encoding: quoted-printable <html> <head> <meta http-equiv=3D"Content-Type" content=3D"text/html; charset=3DWindows-1= 252"> </head> <body> <meta name=3D"Generator" content=3D"Microsoft Word 15 (filtered medium)"> <style><!-- /* Font Definitions */ @font-face {font-family:"Cambria Math"; panose-1:2 4 5 3 5 4 6 3 2 4;} @font-face {font-family:Calibri; panose-1:2 15 5 2 2 2 4 3 2 4;} /* Style Definitions */ p.MsoNormal, li.MsoNormal, div.MsoNormal {margin:0in; margin-bottom:.0001pt; font-size:11.0pt; font-family:"Calibri",sans-serif;} a:link, span.MsoHyperlink {mso-style-priority:99; color:blue; text-decoration:underline;} a:visited, span.MsoHyperlinkFollowed {mso-style-priority:99; color:#954F72; text-decoration:underline;} .MsoChpDefault {mso-style-type:export-only;} @page WordSection1 {size:8.5in 11.0in; margin:1.0in 1.0in 1.0in 1.0in;} div.WordSection1 {page:WordSection1;} --></style> <div class=3D"WordSection1"> <p class=3D"MsoNormal">Hello,</p> <p class=3D"MsoNormal"><o:p> </o:p></p> <p class=3D"MsoNormal">It was 4.0.5 however, we=92ve decided to pull the pl= ug on oVirt for now as it=92s too risky in taking down possibly a large num= ber or servers due to this issue. I think oVirt should be a little less =93= picky=94 if you will, on storage connections. For example, this specific issue prevented anything storage related from b= eing done. Because the =93master=94 was locked you cannot:</p> <p class=3D"MsoNormal"><o:p> </o:p></p> <p class=3D"MsoNormal">Add other storage</p> <p class=3D"MsoNormal">Activate hosts</p> <p class=3D"MsoNormal">Start VM=92s</p> <p class=3D"MsoNormal">Reinitialize the datacenter</p> <p class=3D"MsoNormal">Remove storage</p> <p class=3D"MsoNormal"><o:p> </o:p></p> <p class=3D"MsoNormal">These points above a huge =96 while oVirt is indeed = open source, upstream of RHEV and doesn=92t cost anything, I feel that in s= cenarios like this it could be the downfall of oVirt itself being too risky= .</p> <p class=3D"MsoNormal"><o:p> </o:p></p> <p class=3D"MsoNormal">The logging with oVirt seems to be crazy though =96 = we=92ve been testing it now for about 2.5 years, maybe 3 years? Once oVirt = gets in a state where it cannot connect to something, it just goes haywire = =96 many likely don=92t see this however, every time these things happened it when we=92re testing failover scenario= s to see how oVirt responds.</p> <p class=3D"MsoNormal"><o:p> </o:p></p> <p class=3D"MsoNormal">A few recommendations I would make are:</p> <p class=3D"MsoNormal"><o:p> </o:p></p> <p class=3D"MsoNormal">Drop the whole =93master=94 storage thing =96 it com= plicates setting storage up. Either connect, or don=92t connect. If there= =92s connectivity issues, oVirt gets hung up on switching to this =93master= =94 storage. If you have a single storage domain, you=92ll likely have problems as we=92ve experienced because once oVirt ca= nnot find the =93master=94 it begins to go berserk, then spirals out of con= trol there. It might not on small setups with a few hypervisors, but on an = install with a few hundred VM=92s, large number of hypervisors etc, it seems to get ugly real quick.</p> <p class=3D"MsoNormal"><o:p> </o:p></p> <p class=3D"MsoNormal">Stop trying to reconnect things, I think that=92s wh= at I=92m looking for. When something fails, oVirt just goes in a loop over = and over which eventually causes dashboard issues, crazy amounts of logs et= c. It would be better if oVirt would just stop, make a log entry and then quit, maybe after a few times.</p> <p class=3D"MsoNormal"><o:p> </o:p></p> <p class=3D"MsoNormal">In my case, I could mount the storage manually to AL= L hosts, I could even force start the VM=92s with virsh. The oVirt dashboar= d just kept saying it was locked, and wouldn=92t let you do anything at all= with the entire datacenter.</p> <p class=3D"MsoNormal"><o:p> </o:p></p> <p class=3D"MsoNormal">At this time, we=92ve pushed these servers back into= production using our current hypervisor software which is stable but does = not have the benefits of oVirt. It=92ll be revisited later on and is still = in use for non-production things.</p> <p class=3D"MsoNormal"><o:p> </o:p></p> <p class=3D"MsoNormal"><o:p> </o:p></p> <div style=3D"mso-element:para-border-div;border:none;border-top:solid #E1E= 1E1 1.0pt;padding:3.0pt 0in 0in 0in"> <p class=3D"MsoNormal" style=3D"border:none;padding:0in"><b>From: </b><a hr= ef=3D"mailto:mlipchuk@redhat.com">Maor Lipchuk</a><br> <b>Sent: </b>Sunday, January 22, 2017 7:33 AM<br> <b>To: </b><a href=3D"mailto:jax2568@outlook.com">Bill Bill</a><br> <b>Cc: </b><a href=3D"mailto:users@ovirt.org">users</a><br> <b>Subject: </b>Re: [ovirt-users] master storage domain stuck in locked sta= te</p> </div> <p class=3D"MsoNormal"><o:p> </o:p></p> </div> <div> <div dir=3D"ltr"><br> <div class=3D"gmail_extra"><br> <div class=3D"gmail_quote">On Sun, Jan 22, 2017 at 2:31 PM, Maor Lipchuk <s= pan dir=3D"ltr"> <<a href=3D"mailto:mlipchuk@redhat.com" target=3D"_blank">mlipchuk@redha= t.com</a>></span> wrote:<br> <blockquote class=3D"gmail_quote" style=3D"margin:0 0 0 .8ex;border-left:1p= x #ccc solid;padding-left:1ex"> <div dir=3D"ltr">Hi Bill, <div><br> </div> <div>Can you please attach the engine and VDSM logs.</div> <div>Does the storage domain still stuck?</div> </div> </blockquote> <div><br> </div> <div>Also which oVirt version are you using?</div> <div> </div> <blockquote class=3D"gmail_quote" style=3D"margin:0 0 0 .8ex;border-left:1p= x #ccc solid;padding-left:1ex"> <div dir=3D"ltr"> <div><br> </div> <div>Regards,</div> <div>Maor</div> </div> <div class=3D"gmail_extra"><br> <div class=3D"gmail_quote"> <div> <div class=3D"h5">On Sat, Jan 21, 2017 at 3:11 AM, Bill Bill <span dir=3D"l= tr"><<a href=3D"mailto:jax2568@outlook.com" target=3D"_blank">jax2568@ou= tlook.com</a>></span> wrote:<br> </div> </div> <blockquote class=3D"gmail_quote" style=3D"margin:0 0 0 .8ex;border-left:1p= x #ccc solid;padding-left:1ex"> <div> <div class=3D"h5"> <div lang=3D"EN-US" link=3D"blue" vlink=3D"#954F72"> <div class=3D"m_723276570252969701m_-2288590467547947270WordSection1"> <p class=3D"MsoNormal"><u></u> <u></u></p> <p class=3D"MsoNormal">Also cannot reinitialize the datacenter because the = storage domain is locked.</p> <p class=3D"MsoNormal"><u></u> <u></u></p> <div style=3D"border:none;border-top:solid #e1e1e1 1.0pt;padding:3.0pt 0in = 0in 0in"> <p class=3D"MsoNormal" style=3D"border:none;padding:0in"><b>From: </b><a hr= ef=3D"mailto:jax2568@outlook.com" target=3D"_blank">Bill Bill</a><br> <b>Sent: </b>Friday, January 20, 2017 8:08 PM<span><br> <b>To: </b><a href=3D"mailto:users@ovirt.org" target=3D"_blank">users</a><b= r> <b>Subject: </b>RE: master storage domain stuck in locked state</span></p> </div> <p class=3D"MsoNormal"><u></u> <u></u></p> </div> <span> <div> <div class=3D"m_723276570252969701m_-2288590467547947270WordSection1"> <p class=3D"MsoNormal">Spoke too soon. Some hosts came back up but the stor= age domain is still locked so no vm=92s can be started. What is the proper = way to force this to be unlocked? Each time we look to move into production= after successful testing, something like this always seems to pop up at the last minute rending oVirt question= able in terms of reliability for some unknown issue.</p> <p class=3D"MsoNormal"><u></u> <u></u></p> <p class=3D"MsoNormal"><u></u> <u></u></p> <p class=3D"MsoNormal"><u></u> <u></u></p> <div style=3D"border:none;border-top:solid #e1e1e1 1.0pt;padding:3.0pt 0in = 0in 0in"> <p class=3D"MsoNormal" style=3D"border:none;padding:0in"><b>From: </b><a hr= ef=3D"mailto:jax2568@outlook.com" target=3D"_blank">Bill Bill</a><br> <b>Sent: </b>Friday, January 20, 2017 7:54 PM<br> <b>To: </b><a href=3D"mailto:users@ovirt.org" target=3D"_blank">users</a><b= r> <b>Subject: </b>RE: master storage domain stuck in locked state</p> </div> <p class=3D"MsoNormal"><u></u> <u></u></p> </div> <div> <div class=3D"m_723276570252969701m_-2288590467547947270WordSection1"> <p class=3D"MsoNormal"><u></u> <u></u></p> <p class=3D"MsoNormal">So apparently something didn=92t change the metadata= to master before connection was lost. I changed the metadata role to maste= r and it came backup. Seems emailing in helped because every time I can=92t= figure something out, email in a find it shortly after.</p> <p class=3D"MsoNormal"><u></u> <u></u></p> <p class=3D"MsoNormal"><u></u> <u></u></p> <div style=3D"border:none;border-top:solid #e1e1e1 1.0pt;padding:3.0pt 0in = 0in 0in"> <p class=3D"MsoNormal" style=3D"border:none;padding:0in"><b>From: </b><a hr= ef=3D"mailto:jax2568@outlook.com" target=3D"_blank">Bill Bill</a><br> <b>Sent: </b>Friday, January 20, 2017 7:43 PM<br> <b>To: </b><a href=3D"mailto:users@ovirt.org" target=3D"_blank">users</a><b= r> <b>Subject: </b>master storage domain stuck in locked state</p> </div> <p class=3D"MsoNormal"><u></u> <u></u></p> </div> <div> <div class=3D"m_723276570252969701m_-2288590467547947270WordSection1"> <p class=3D"MsoNormal">No clue how to get this out. I can mount all storage= manually on the hypervisors. It seems like after a reboot oVirt is now hav= ing some issue and the storage domain is stuck in locked state. Because of = this, can=92t activate any other storage either, so the other domains are in maintenance and the master sits in loc= ked state, has been for hours.</p> <p class=3D"MsoNormal"><u></u> <u></u></p> <p class=3D"MsoNormal">This sticks out on a hypervisor:</p> <p class=3D"MsoNormal"><u></u> <u></u></p> <p class=3D"MsoNormal">StoragePoolWrongMaster: Wrong Master domain or its v= ersion: u'SD=3Dd8a0172e-837f-4552-92c7-5<wbr>66dc4e548e4, pool=3D3fd2ad92-e= 1eb-49c2-906d-0<wbr>0ec233f610a'</p> <p class=3D"MsoNormal"><u></u> <u></u></p> <p class=3D"MsoNormal">Not sure, nothing changed other than a reboot of the= storage.</p> <p class=3D"MsoNormal"><u></u> <u></u></p> <p class=3D"MsoNormal">Engine log shows:</p> <p class=3D"MsoNormal"><u></u> <u></u></p> <p class=3D"MsoNormal">[org.ovirt.engine.core.vdsbrok<wbr>er.SetVdsStatusVD= SCommand] (DefaultQuartzScheduler8) [5696732b] START, SetVdsStatusVDSComman= d(HostNam<wbr>e =3D U31U32NodeA, SetVdsStatusVDSCommandParamete<wbr>rs:{run= Async=3D'true', hostId=3D'70e2b8e4-0752-47a8-884<wbr>c-837a00013e79', status=3D'NonOperational', nonOperationalReason=3D'STORAGE_<wbr>DOMAIN_UNR= EACHABLE', stopSpmFailureLogged=3D'false', maintenanceReason=3D'null'}), lo= g id: 6db9820a</p> <p class=3D"MsoNormal"><u></u> <u></u></p> <p class=3D"MsoNormal">No idea why it says unreachable, it certainly is bec= ause I can manually mount ALL storage to the hypervisor.</p> <p class=3D"MsoNormal"><u></u> <u></u></p> <p class=3D"MsoNormal">Sent from <a href=3D"https://go.microsoft.com/fwlink= /?LinkId=3D550986" target=3D"_blank"> Mail</a> for Windows 10</p> <p class=3D"MsoNormal"><u></u> <u></u></p> </div> </div> </div> </div> </span></div> <br> </div> </div> ______________________________<wbr>_________________<br> Users mailing list<br> <a href=3D"mailto:Users@ovirt.org" target=3D"_blank">Users@ovirt.org</a><br=
<a href=3D"http://lists.ovirt.org/mailman/listinfo/users" rel=3D"noreferrer= " target=3D"_blank">http://lists.ovirt.org/mailman<wbr>/listinfo/users</a><= br> <br> </blockquote> </div> <br> </div> </blockquote> </div> <br> </div> </div> </div> </body> </html> --_000_CO2PR0801MB0743662893F6DBFE28BB3C39A6730CO2PR0801MB0743_--