--_000_CO2PR0801MB0743270D75299B04BDB7AEFFA6730CO2PR0801MB0743_
Content-Type: text/plain; charset="Windows-1252"
Content-Transfer-Encoding: quoted-printable
Hi,
Thanks for the reply back. Unfortunately, I don=92t have the logs anymore a=
s we=92ve already pulled those servers out of the rack to be used for other=
services.
From: Yaniv Kaul<mailto:ykaul@redhat.com
Sent:
Sunday, January 22, 2017 4:45 PM
To: Bill Bill<mailto:jax2568@outlook.com
Cc: Ovirt
Users<mailto:users@ovirt.org>; Maor Lipchuk<mailto:mlipchuk@redha=
t.com
Subject: Re: [ovirt-users] master storage domain stuck in
locked state
On Jan 22, 2017 10:13 PM, "Bill Bill"
<jax2568@outlook.com<mailto:jax2568@o=
utlook.com>> wrote:
Hello,
It was 4.0.5 however, we=92ve decided to pull the plug on oVirt for now as =
it=92s too risky in taking down possibly a large number or servers due to t=
his issue. I think oVirt should be a little less =93picky=94 if you will, o=
n storage connections. For example, this specific issue prevented anything =
storage related from being done. Because the =93master=94 was locked you ca=
nnot:
Add other storage
Activate hosts
Start VM=92s
Reinitialize the datacenter
Remove storage
These points above a huge =96 while oVirt is indeed open source, upstream o=
f RHEV and doesn=92t cost anything, I feel that in scenarios like this it c=
ould be the downfall of oVirt itself being too risky.
The logging with oVirt seems to be crazy though =96 we=92ve been testing it=
now for about 2.5 years, maybe 3 years? Once oVirt gets in a state where i=
t cannot connect to something, it just goes haywire =96 many likely don=92t=
see this however, every time these things happened it when we=92re testing=
failover scenarios to see how oVirt responds.
A few recommendations I would make are:
Thank you for your recommendations. I agree with some, wholly disagree with=
others.
I'd still appreciate if you could send us the requested logs.
TIA,
Y.
Drop the whole =93master=94 storage thing =96 it complicates setting storag=
e up. Either connect, or don=92t connect. If there=92s connectivity issues,=
oVirt gets hung up on switching to this =93master=94 storage. If you have =
a single storage domain, you=92ll likely have problems as we=92ve experienc=
ed because once oVirt cannot find the =93master=94 it begins to go berserk,=
then spirals out of control there. It might not on small setups with a few=
hypervisors, but on an install with a few hundred VM=92s, large number of =
hypervisors etc, it seems to get ugly real quick.
Stop trying to reconnect things, I think that=92s what I=92m looking for. W=
hen something fails, oVirt just goes in a loop over and over which eventual=
ly causes dashboard issues, crazy amounts of logs etc. It would be better i=
f oVirt would just stop, make a log entry and then quit, maybe after a few =
times.
In my case, I could mount the storage manually to ALL hosts, I could even f=
orce start the VM=92s with virsh. The oVirt dashboard just kept saying it w=
as locked, and wouldn=92t let you do anything at all with the entire datace=
nter.
At this time, we=92ve pushed these servers back into production using our c=
urrent hypervisor software which is stable but does not have the benefits o=
f oVirt. It=92ll be revisited later on and is still in use for non-producti=
on things.
From: Maor Lipchuk<mailto:mlipchuk@redhat.com
Sent:
Sunday, January 22, 2017 7:33 AM
To: Bill Bill<mailto:jax2568@outlook.com
Cc:
users<mailto:users@ovirt.org
Subject: Re: [ovirt-users] master
storage domain stuck in locked state
On Sun, Jan 22, 2017 at 2:31 PM, Maor Lipchuk <mlipchuk@redhat.com<mailto:m=
lipchuk(a)redhat.com>> wrote:
Hi Bill,
Can you please attach the engine and VDSM logs.
Does the storage domain still stuck?
Also which oVirt version are you using?
Regards,
Maor
On Sat, Jan 21, 2017 at 3:11 AM, Bill Bill <jax2568@outlook.com<mailto:jax2=
568(a)outlook.com>> wrote:
Also cannot reinitialize the datacenter because the storage domain is locke=
d.
From: Bill Bill<mailto:jax2568@outlook.com
Sent:
Friday, January 20, 2017 8:08 PM
To: users<mailto:users@ovirt.org
Subject:
RE: master storage domain stuck in locked state
Spoke too soon. Some hosts came back up but the storage domain is still loc=
ked so no vm=92s can be started. What is the proper way to force this to be=
unlocked? Each time we look to move into production after successful testi=
ng, something like this always seems to pop up at the last minute rending o=
Virt questionable in terms of reliability for some unknown issue.
From: Bill Bill<mailto:jax2568@outlook.com
Sent:
Friday, January 20, 2017 7:54 PM
To: users<mailto:users@ovirt.org
Subject:
RE: master storage domain stuck in locked state
So apparently something didn=92t change the metadata to master before conne=
ction was lost. I changed the metadata role to master and it came backup. S=
eems emailing in helped because every time I can=92t figure something out, =
email in a find it shortly after.
From: Bill Bill<mailto:jax2568@outlook.com
Sent:
Friday, January 20, 2017 7:43 PM
To: users<mailto:users@ovirt.org
Subject:
master storage domain stuck in locked state
No clue how to get this out. I can mount all storage manually on the hyperv=
isors. It seems like after a reboot oVirt is now having some issue and the =
storage domain is stuck in locked state. Because of this, can=92t activate =
any other storage either, so the other domains are in maintenance and the m=
aster sits in locked state, has been for hours.
This sticks out on a hypervisor:
StoragePoolWrongMaster: Wrong Master domain or its version: u'SD=3Dd8a0172e=
-837f-4552-92c7-566dc4e548e4, pool=3D3fd2ad92-e1eb-49c2-906d-00ec233f610a'
Not sure, nothing changed other than a reboot of the storage.
Engine log shows:
[org.ovirt.engine.core.vdsbroker.SetVdsStatusVDSCommand] (DefaultQuartzSche=
duler8) [5696732b] START, SetVdsStatusVDSCommand(HostName =3D U31U32NodeA, =
SetVdsStatusVDSCommandParameters:{runAsync=3D'true', hostId=3D'70e2b8e4-075=
2-47a8-884c-837a00013e79', status=3D'NonOperational', nonOperationalReason=
=3D'STORAGE_DOMAIN_UNREACHABLE', stopSpmFailureLogged=3D'false',
maintenanc=
eReason=3D'null'}), log id: 6db9820a
No idea why it says unreachable, it certainly is because I can manually mou=
nt ALL storage to the hypervisor.
Sent from
Mail<https://go.microsoft.com/fwlink/?LinkId=3D550986> for Window=
s 10
_______________________________________________
Users mailing list
Users@ovirt.org<mailto:Users@ovirt.org
http://lists.ovirt.org/mailman/listinfo/users
_______________________________________________
Users mailing list
Users@ovirt.org<mailto:Users@ovirt.org
http://lists.ovirt.org/mailman/listinfo/users
--_000_CO2PR0801MB0743270D75299B04BDB7AEFFA6730CO2PR0801MB0743_
Content-Type: text/html; charset="Windows-1252"
Content-Transfer-Encoding: quoted-printable
<html
<head
<meta http-equiv=3D"Content-Type"
content=3D"text/html; charset=3DWindows-1=
252"
</head
<body
<meta
name=3D"Generator" content=3D"Microsoft Word 15 (filtered medium)"
<style><!--
/* Font Definitions */
@font-face
{font-family:"Cambria Math";
panose-1:2 4 5 3 5 4 6 3 2 4;}
@font-face
{font-family:Calibri;
panose-1:2 15 5 2 2 2 4 3 2 4;}
/* Style Definitions */
p.MsoNormal, li.MsoNormal, div.MsoNormal
{margin:0in;
margin-bottom:.0001pt;
font-size:11.0pt;
font-family:"Calibri",sans-serif;}
a:link, span.MsoHyperlink
{mso-style-priority:99;
color:blue;
text-decoration:underline;}
a:visited, span.MsoHyperlinkFollowed
{mso-style-priority:99;
color:#954F72;
text-decoration:underline;}
.MsoChpDefault
{mso-style-type:export-only;}
@page WordSection1
{size:8.5in 11.0in;
margin:1.0in 1.0in 1.0in 1.0in;}
div.WordSection1
{page:WordSection1;}
--></style
<div
class=3D"WordSection1"
<p
class=3D"MsoNormal">Hi,</p
<p
class=3D"MsoNormal"><o:p> </o:p></p
<p class=3D"MsoNormal">Thanks for the reply
back. Unfortunately, I don=92t =
have the logs anymore as we=92ve already pulled those servers out of the ra=
ck to be used for other services.</p
<p
class=3D"MsoNormal"><o:p> </o:p></p
<p
class=3D"MsoNormal"><o:p> </o:p></p
<div
style=3D"mso-element:para-border-div;border:none;border-top:solid #E1E=
1E1 1.0pt;padding:3.0pt 0in 0in 0in"
<p
class=3D"MsoNormal"
style=3D"border:none;padding:0in"><b>From: </b><a hr=
ef=3D"mailto:ykaul@redhat.com">Yaniv Kaul</a><br
<b>Sent: </b>Sunday, January 22, 2017 4:45
PM<br
<b>To: </b><a
href=3D"mailto:jax2568@outlook.com">Bill Bill</a><br
<b>Cc: </b><a
href=3D"mailto:users@ovirt.org">Ovirt Users</a>; <a href=3D"m=
ailto:mlipchuk@redhat.com"
Maor Lipchuk</a><br
<b>Subject: </b>Re: [ovirt-users] master storage
domain stuck in locked sta=
te</p
</div
<p
class=3D"MsoNormal"><o:p> </o:p></p
</div
<div
<div dir=3D"auto"
<div><br
<div
class=3D"gmail_extra"><br
<div
class=3D"gmail_quote">On Jan 22, 2017 10:13 PM, "Bill Bill"
=
<<a
href=3D"mailto:jax2568@outlook.com">jax2568@outlook.com</a>>
wrot=
e:<br type=3D"attribution"
<blockquote class=3D"quote" style=3D"margin:0 0 0
.8ex;border-left:1px #ccc=
solid;padding-left:1ex"
<div
<div
class=3D"m_-823963203978568812WordSection1"
<p
class=3D"MsoNormal">Hello,</p
<p
class=3D"MsoNormal"><u></u> <u></u></p
<p class=3D"MsoNormal">It was 4.0.5 however,
we=92ve decided to pull the pl=
ug on oVirt for now as it=92s too risky in taking down possibly a large num=
ber or servers due to this issue. I think oVirt should be a little less =93=
picky=94 if you will, on storage connections.
For example, this specific issue prevented anything storage related from b=
eing done. Because the =93master=94 was locked you cannot:</p
<p
class=3D"MsoNormal"><u></u> <u></u></p
<p class=3D"MsoNormal">Add other
storage</p
<p
class=3D"MsoNormal">Activate hosts</p
<p
class=3D"MsoNormal">Start VM=92s</p
<p
class=3D"MsoNormal">Reinitialize the datacenter</p
<p class=3D"MsoNormal">Remove storage</p
<p
class=3D"MsoNormal"><u></u> <u></u></p
<p class=3D"MsoNormal">These points above a
huge =96 while oVirt is indeed =
open source, upstream of RHEV and doesn=92t cost anything, I feel that in s=
cenarios like this it could be the downfall of oVirt itself being too risky=
.</p
<p
class=3D"MsoNormal"><u></u> <u></u></p
<p class=3D"MsoNormal">The logging with oVirt
seems to be crazy though =96 =
we=92ve been testing it now for about 2.5 years, maybe 3 years? Once oVirt =
gets in a state where it cannot connect to something, it just goes haywire =
=96 many likely don=92t see this however,
every time these things happened it when we=92re testing failover scenario=
s to see how oVirt responds.</p
<p
class=3D"MsoNormal"><u></u> <u></u></p
<p class=3D"MsoNormal">A few recommendations I
would make are:</p
</div
</div
</blockquote
</div
</div
</div
<div
dir=3D"auto"><br
</div
<div dir=3D"auto">Thank you for your
recommendations. I agree with some, wh=
olly disagree with others. </div
<div
dir=3D"auto">I'd still appreciate if you could send us the requested l=
ogs. </div
<div
dir=3D"auto"><br
</div
<div dir=3D"auto">TIA, </div
<div dir=3D"auto">Y. </div
<div dir=3D"auto"><br
</div
<div
dir=3D"auto"
<div
class=3D"gmail_extra"
<div
class=3D"gmail_quote"
<blockquote
class=3D"quote" style=3D"margin:0 0 0 .8ex;border-left:1px #ccc=
solid;padding-left:1ex"
<div
<div
class=3D"m_-823963203978568812WordSection1"
<p
class=3D"MsoNormal"><u></u> <u></u></p
<p class=3D"MsoNormal">Drop the whole
=93master=94 storage thing =96 it com=
plicates setting storage up. Either connect, or don=92t connect. If there=
=92s connectivity issues, oVirt gets hung up on switching to this =93master=
=94 storage. If you have a single storage domain,
you=92ll likely have problems as we=92ve experienced because once oVirt ca=
nnot find the =93master=94 it begins to go berserk, then spirals out of con=
trol there. It might not on small setups with a few hypervisors, but on an =
install with a few hundred VM=92s, large number
of hypervisors etc, it seems to get ugly real quick.</p
<p
class=3D"MsoNormal"><u></u> <u></u></p
<p class=3D"MsoNormal">Stop trying to reconnect
things, I think that=92s wh=
at I=92m looking for. When something fails, oVirt just goes in a loop over =
and over which eventually causes dashboard issues, crazy amounts of logs et=
c. It would be better if oVirt would just
stop, make a log entry and then quit, maybe after a few times.</p
<p
class=3D"MsoNormal"><u></u> <u></u></p
<p class=3D"MsoNormal">In my case, I could
mount the storage manually to AL=
L hosts, I could even force start the VM=92s with virsh. The oVirt dashboar=
d just kept saying it was locked, and wouldn=92t let you do anything at all=
with the entire datacenter.</p
<p
class=3D"MsoNormal"><u></u> <u></u></p
<p class=3D"MsoNormal">At this time, we=92ve
pushed these servers back into=
production using our current hypervisor software which is stable but does =
not have the benefits of oVirt. It=92ll be revisited later on and is still =
in use for non-production things.</p
<p
class=3D"MsoNormal"><u></u> <u></u></p
<p
class=3D"MsoNormal"><u></u> <u></u></p
<div style=3D"border:none;border-top:solid #e1e1e1
1.0pt;padding:3.0pt 0in =
0in 0in"
<p class=3D"MsoNormal"
style=3D"border:none;padding:0in"><b>From: </b><a hr=
ef=3D"mailto:mlipchuk@redhat.com" target=3D"_blank">Maor
Lipchuk</a><br
<b>Sent: </b>Sunday,
January 22, 2017 7:33 AM<br
<b>To: </b><a
href=3D"mailto:jax2568@outlook.com" target=3D"_blank">Bill Bi=
ll</a><br
<b>Cc: </b><a
href=3D"mailto:users@ovirt.org"
target=3D"_blank">users</a><b=
r
<b>Subject: </b>Re: [ovirt-users] master storage
domain stuck in locked sta=
te</p
</div
<p
class=3D"MsoNormal"><u></u> <u></u></p
</div
<div
class=3D"elided-text"
<div
<div
dir=3D"ltr"><br
<div
class=3D"gmail_extra"><br
<div
class=3D"gmail_quote">On Sun, Jan 22, 2017 at 2:31 PM, Maor Lipchuk <s=
pan dir=3D"ltr"
<<a
href=3D"mailto:mlipchuk@redhat.com"
target=3D"_blank">mlipchuk@redha=
t.com</a>></span> wrote:<br
<blockquote class=3D"gmail_quote" style=3D"margin:0 0 0
.8ex;border-left:1p=
x #ccc solid;padding-left:1ex"
<div
dir=3D"ltr">Hi Bill,
<div><br
</div
<div>Can you please attach the engine and VDSM
logs.</div
<div>Does the storage domain
still stuck?</div
</div
</blockquote
<div><br
</div
<div>Also which oVirt version are you using?</div
<div> </div
<blockquote class=3D"gmail_quote" style=3D"margin:0 0 0
.8ex;border-left:1p=
x #ccc solid;padding-left:1ex"
<div
dir=3D"ltr"
<div><br
</div
<div>Regards,</div
<div>Maor</div
</div
<div
class=3D"gmail_extra"><br
<div
class=3D"gmail_quote"
<div
<div
class=3D"m_-823963203978568812h5">On Sat, Jan 21, 2017 at 3:11 AM, Bil=
l Bill <span dir=3D"ltr"
<<a href=3D"mailto:jax2568@outlook.com"
target=3D"_blank">jax2568@outloo=
k.com</a>></span> wrote:<br
</div
</div
<blockquote class=3D"gmail_quote"
style=3D"margin:0 0 0 .8ex;border-left:1p=
x #ccc solid;padding-left:1ex"
<div
<div
class=3D"m_-823963203978568812h5"
<div
lang=3D"EN-US" link=3D"blue" vlink=3D"#954F72"
<div
class=3D"m_-823963203978568812m_723276570252969701m_-22885904675479472=
70WordSection1"
<p
class=3D"MsoNormal"><u></u> <u></u></p
<p class=3D"MsoNormal">Also cannot reinitialize
the datacenter because the =
storage domain is locked.</p
<p
class=3D"MsoNormal"><u></u> <u></u></p
<div style=3D"border:none;border-top:solid #e1e1e1
1.0pt;padding:3.0pt 0in =
0in 0in"
<p class=3D"MsoNormal"
style=3D"border:none;padding:0in"><b>From: </b><a hr=
ef=3D"mailto:jax2568@outlook.com" target=3D"_blank">Bill
Bill</a><br
<b>Sent: </b>Friday,
January 20, 2017 8:08 PM<span><br
<b>To: </b><a href=3D"mailto:users@ovirt.org"
target=3D"_blank">users</a><b=
r
<b>Subject: </b>RE: master storage domain stuck in
locked state</span></p
</div
<p
class=3D"MsoNormal"><u></u> <u></u></p
</div
<span
<div
<div
class=3D"m_-823963203978568812m_723276570252969701m_-22885904675479472=
70WordSection1"
<p
class=3D"MsoNormal">Spoke too soon. Some hosts came back up but the stor=
age domain is still locked so no vm=92s can be started. What is the proper =
way to force this to be unlocked? Each time we look to move into production=
after successful testing, something
like this always seems to pop up at the last minute rending oVirt question=
able in terms of reliability for some unknown issue.</p
<p
class=3D"MsoNormal"><u></u> <u></u></p
<p
class=3D"MsoNormal"><u></u> <u></u></p
<p
class=3D"MsoNormal"><u></u> <u></u></p
<div style=3D"border:none;border-top:solid #e1e1e1
1.0pt;padding:3.0pt 0in =
0in 0in"
<p class=3D"MsoNormal"
style=3D"border:none;padding:0in"><b>From: </b><a hr=
ef=3D"mailto:jax2568@outlook.com" target=3D"_blank">Bill
Bill</a><br
<b>Sent: </b>Friday,
January 20, 2017 7:54 PM<br
<b>To: </b><a
href=3D"mailto:users@ovirt.org"
target=3D"_blank">users</a><b=
r
<b>Subject: </b>RE: master storage domain stuck in
locked state</p
</div
<p
class=3D"MsoNormal"><u></u> <u></u></p
</div
<div
<div
class=3D"m_-823963203978568812m_723276570252969701m_-22885904675479472=
70WordSection1"
<p
class=3D"MsoNormal"><u></u> <u></u></p
<p class=3D"MsoNormal">So apparently something
didn=92t change the metadata=
to master before connection was lost. I changed the metadata role to maste=
r and it came backup. Seems emailing in helped because every time I can=92t=
figure something out, email in a find
it shortly after.</p
<p
class=3D"MsoNormal"><u></u> <u></u></p
<p
class=3D"MsoNormal"><u></u> <u></u></p
<div style=3D"border:none;border-top:solid #e1e1e1
1.0pt;padding:3.0pt 0in =
0in 0in"
<p class=3D"MsoNormal"
style=3D"border:none;padding:0in"><b>From: </b><a hr=
ef=3D"mailto:jax2568@outlook.com" target=3D"_blank">Bill
Bill</a><br
<b>Sent: </b>Friday,
January 20, 2017 7:43 PM<br
<b>To: </b><a
href=3D"mailto:users@ovirt.org"
target=3D"_blank">users</a><b=
r
<b>Subject: </b>master storage domain stuck in
locked state</p
</div
<p
class=3D"MsoNormal"><u></u> <u></u></p
</div
<div
<div
class=3D"m_-823963203978568812m_723276570252969701m_-22885904675479472=
70WordSection1"
<p
class=3D"MsoNormal">No clue how to get this out. I can mount all storage=
manually on the hypervisors. It seems like after a reboot oVirt is now hav=
ing some issue and the storage domain is stuck in locked state. Because of =
this, can=92t activate any other storage
either, so the other domains are in maintenance and the master sits in loc=
ked state, has been for hours.</p
<p
class=3D"MsoNormal"><u></u> <u></u></p
<p class=3D"MsoNormal">This sticks out on a
hypervisor:</p
<p
class=3D"MsoNormal"><u></u> <u></u></p
<p class=3D"MsoNormal">StoragePoolWrongMaster:
Wrong Master domain or its v=
ersion: u'SD=3Dd8a0172e-837f-4552-92c7-5<wbr>66dc4e548e4, pool=3D3fd2ad92-e=
1eb-49c2-906d-0<wbr>0ec233f610a'</p
<p
class=3D"MsoNormal"><u></u> <u></u></p
<p class=3D"MsoNormal">Not sure, nothing
changed other than a reboot of the=
storage.</p
<p
class=3D"MsoNormal"><u></u> <u></u></p
<p class=3D"MsoNormal">Engine log
shows:</p
<p
class=3D"MsoNormal"><u></u> <u></u></p
<p
class=3D"MsoNormal">[org.ovirt.engine.core.vdsbrok<wbr>er.SetVdsStatusVD=
SCommand] (DefaultQuartzScheduler8) [5696732b] START, SetVdsStatusVDSComman=
d(HostNam<wbr>e =3D U31U32NodeA, SetVdsStatusVDSCommandParamete<wbr>rs:{run=
Async=3D'true',
hostId=3D'70e2b8e4-0752-47a8-884<wbr>c-837a00013e79',
status=3D'NonOperational',
nonOperationalReason=3D'STORAGE_<wbr>DOMAIN_UNR=
EACHABLE', stopSpmFailureLogged=3D'false',
maintenanceReason=3D'null'}), lo=
g id: 6db9820a</p
<p
class=3D"MsoNormal"><u></u> <u></u></p
<p class=3D"MsoNormal">No idea why it says
unreachable, it certainly is bec=
ause I can manually mount ALL storage to the hypervisor.</p
<p
class=3D"MsoNormal"><u></u> <u></u></p
<p class=3D"MsoNormal">Sent from <a
href=3D"https://go.microsoft.com/fwlink=
/?LinkId=3D550986" target=3D"_blank"
Mail</a> for Windows 10</p
<p
class=3D"MsoNormal"><u></u> <u></u></p
</div
</div
</div
</div
</span></div
<br
</div
</div
______________________________<wbr>_________________<br
Users mailing list<br
<a
href=3D"mailto:Users@ovirt.org"
target=3D"_blank">Users(a)ovirt.org</a><br=
<a
href=3D"http://lists.ovirt.org/mailman/listinfo/users" rel=3D"noreferrer=
"
target=3D"_blank">http://lists.ovirt.org/mailman<wbr>/...
br
<br
</blockquote
</div
<br
</div
</blockquote
</div
<br
</div
</div
</div
</div
</div
<br
______________________________<wbr>_________________<br
Users mailing list<br
<a
href=3D"mailto:Users@ovirt.org">Users@ovirt.org</a><br
<a
href=3D"http://lists.ovirt.org/mailman/listinfo/users" rel=3D"noreferrer=
"
target=3D"_blank">http://lists.ovirt.org/<wbr>mailman/...
br
<br
</blockquote
</div
<br
</div
</div
</div
</div
</body
</html
--_000_CO2PR0801MB0743270D75299B04BDB7AEFFA6730CO2PR0801MB0743_--