oVirt not starting primary storage domain

Hi all, I recently made a change with the gluster volume backing my primary storage domain (removed 1 of the 3 bricks, each w/ 3x replication), and now oVirt fails to activate the primary storage domain. After attempting to start the domain the engine goes through and does its various commications with VDSM, but then fails out with a "Sanlock resource read failure" - https://gist.githubusercontent.com/srepetsk/83ef13ddcf1e690a398e/raw/ada362a... Is there a way to figure out more on what this SpmStatusVDS error is and what might be causing it? Thanks, Stephen *Stephen Repetski*

<b>From: </b>"Stephen Repetski" <srepetsk@srepetsk.net><br><b>To: </= b>"users" <users@ovirt.org><br><b>Sent: </b>Thursday, July 23, 2015 1= 1:08:57 PM<br><b>Subject: </b>[ovirt-users] oVirt not starting primary stor= age domain<br><div><br></div><div dir=3D"ltr">Hi all,<div><br></div><div>I = recently made a change with the gluster volume backing my primary storage d= omain (removed 1 of the 3 bricks, each w/ 3x replication), and now oVirt fa= ils to activate the primary storage domain. After attempting to start the d= omain the engine goes through and does its various commications with VDSM, = but then fails out with a "Sanlock resource read failure" - <a href=3D= "https://gist.githubusercontent.com/srepetsk/83ef13ddcf1e690a398e/raw/ada36= 2ac43ae71984a90979a676f2738648ac4ac/gistfile1.txt" target=3D"_blank">https:= //gist.githubusercontent.com/srepetsk/83ef13ddcf1e690a398e/raw/ada362ac43ae= 71984a90979a676f2738648ac4ac/gistfile1.txt</a></div><div><br></div><div>Is =
------=_Part_3013702_1455000959.1437683300874 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit Hi Stephen, 1) Can you please provide the vdsm and gluster versions? 2) How you removed the brick? 3) Can you please attach the glusterfs log located under /var/log ? * Just for info - there is no support for gluster if the volume is not a 3-way replica Thanks in advance, Raz Tamir ratamir@redhat.com RedHat Israel RHEV-M QE Storage team ----- Original Message ----- From: "Stephen Repetski" <srepetsk@srepetsk.net> To: "users" <users@ovirt.org> Sent: Thursday, July 23, 2015 11:08:57 PM Subject: [ovirt-users] oVirt not starting primary storage domain Hi all, I recently made a change with the gluster volume backing my primary storage domain (removed 1 of the 3 bricks, each w/ 3x replication), and now oVirt fails to activate the primary storage domain. After attempting to start the domain the engine goes through and does its various commications with VDSM, but then fails out with a "Sanlock resource read failure" - https://gist.githubusercontent.com/srepetsk/83ef13ddcf1e690a398e/raw/ada362a... Is there a way to figure out more on what this SpmStatusVDS error is and what might be causing it? Thanks, Stephen Stephen Repetski _______________________________________________ Users mailing list Users@ovirt.org http://lists.ovirt.org/mailman/listinfo/users ------=_Part_3013702_1455000959.1437683300874 Content-Type: text/html; charset=utf-8 Content-Transfer-Encoding: quoted-printable <html><body><div style=3D"font-family: trebuchet ms,sans-serif; font-size: = 12pt; color: #000000"><div>Hi <span style=3D"font-family: Helvetica, A= rial, sans-serif; font-size: 16.3636360168457px;" data-mce-style=3D"font-fa= mily: Helvetica, Arial, sans-serif; font-size: 16.3636360168457px;">Stephen= ,</span><br></div><div><span style=3D"font-family: Helvetica, Arial, sans-s= erif; font-size: 16.3636360168457px;" data-mce-style=3D"font-family: Helvet= ica, Arial, sans-serif; font-size: 16.3636360168457px;">1) Can you please p= rovide the vdsm and gluster versions?</span></div><div><span style=3D"font-= family: Helvetica, Arial, sans-serif; font-size: 16.3636360168457px;" data-= mce-style=3D"font-family: Helvetica, Arial, sans-serif; font-size: 16.36363= 60168457px;">2) How you removed the brick?</span></div><div><span style=3D"= font-family: Helvetica, Arial, sans-serif; font-size: 16.3636360168457px;" = data-mce-style=3D"font-family: Helvetica, Arial, sans-serif; font-size: 16.= 3636360168457px;">3) Can you please attach the glusterfs log located under = /var/log ?</span></div><div><span style=3D"font-family: Helvetica, Arial, s= ans-serif; font-size: 16.3636360168457px;" data-mce-style=3D"font-family: H= elvetica, Arial, sans-serif; font-size: 16.3636360168457px;"><br></span></d= iv><div><span style=3D"font-family: Helvetica, Arial, sans-serif; font-size= : 16.3636360168457px;" data-mce-style=3D"font-family: Helvetica, Arial, san= s-serif; font-size: 16.3636360168457px;">* Just for info - there is no supp= ort for gluster if the volume is not a 3-way replica</span></div><div><br><= /div><div><span name=3D"x"></span><div><br></div><div><br></div><div><br>Th= anks in advance,<br>Raz Tamir<br>ratamir@redhat.com<br>RedHat Israel</div><= div>RHEV-M QE Storage team<br></div><span name=3D"x"></span><br></div><hr i= d=3D"zwchr"><div style=3D"color:#000;font-weight:normal;font-style:normal;t= ext-decoration:none;font-family:Helvetica,Arial,sans-serif;font-size:12pt;"= there a way to figure out more on what this SpmStatusVDS error is and what = might be causing it?</div><div><br></div><div>Thanks,</div><div>Stephen</di= v><div><br clear=3D"all"><div><div class=3D"gmail_signature"><div dir=3D"lt= r"><b style=3D"color:rgb(102,102,102)">Stephen Repetski</b><br></div></div>= </div> </div></div> <br>_______________________________________________<br>Users mailing list<b= r>Users@ovirt.org<br>http://lists.ovirt.org/mailman/listinfo/users<br></div=
<div><br></div></div></body></html> ------=_Part_3013702_1455000959.1437683300874--

Hi Raz: I'm using vdsm-4.16.14-0.el6.x86_64 with glusterfs-3.6.2-1.el6.x86_64 on oVirt 3.5.2. I removed the brick with: gluster remove-brick store1 replica 3 $1 $2 $3 start; gluster remove-brick store1 replica 3 $1 $2 $3 commit. Between the two commands I used the 'status' option to verify that all nodes were marked as 'completed' before running the 'commit' one. Also, the two log files you requested are available here: http://srepetsk.net/files/engine.log.20150723 && http://srepetsk.net/files/etc-glusterfs-glusterd.vol.log.20150723 The gluster log file is from one of the servers from a different brick in the primary (aka "store1") datacenter/gluster volume, so it was and still is in the volume. Thanks, Stephen *Stephen Repetski* Rochester Institute of Technology '13 | http://srepetsk.net On Thu, Jul 23, 2015 at 4:28 PM, Raz Tamir <ratamir@redhat.com> wrote:
Hi Stephen, 1) Can you please provide the vdsm and gluster versions? 2) How you removed the brick? 3) Can you please attach the glusterfs log located under /var/log ?
* Just for info - there is no support for gluster if the volume is not a 3-way replica
Thanks in advance, Raz Tamir ratamir@redhat.com RedHat Israel RHEV-M QE Storage team
------------------------------ *From: *"Stephen Repetski" <srepetsk@srepetsk.net> *To: *"users" <users@ovirt.org> *Sent: *Thursday, July 23, 2015 11:08:57 PM *Subject: *[ovirt-users] oVirt not starting primary storage domain
Hi all,
I recently made a change with the gluster volume backing my primary storage domain (removed 1 of the 3 bricks, each w/ 3x replication), and now oVirt fails to activate the primary storage domain. After attempting to start the domain the engine goes through and does its various commications with VDSM, but then fails out with a "Sanlock resource read failure" - https://gist.githubusercontent.com/srepetsk/83ef13ddcf1e690a398e/raw/ada362a...
Is there a way to figure out more on what this SpmStatusVDS error is and what might be causing it?
Thanks, Stephen
*Stephen Repetski*
_______________________________________________ Users mailing list Users@ovirt.org http://lists.ovirt.org/mailman/listinfo/users

------=_Part_3055420_61788721.1437686254896 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit As far as I can see from the logs you removed 3 bricks. Can you confirm? Thanks in advance, Raz Tamir ratamir@redhat.com RedHat Israel RHEV-M QE Storage team ----- Original Message ----- From: "Stephen Repetski" <srepetsk@srepetsk.net> To: "Raz Tamir" <ratamir@redhat.com> Cc: "users" <users@ovirt.org> Sent: Friday, July 24, 2015 12:01:16 AM Subject: Re: [ovirt-users] oVirt not starting primary storage domain Hi Raz: I'm using vdsm-4.16.14-0.el6.x86_64 with glusterfs-3.6.2-1.el6.x86_64 on oVirt 3.5.2. I removed the brick with: gluster remove-brick store1 replica 3 $1 $2 $3 start; gluster remove-brick store1 replica 3 $1 $2 $3 commit. Between the two commands I used the 'status' option to verify that all nodes were marked as 'completed' before running the 'commit' one. Also, the two log files you requested are available here: http://srepetsk.net/files/engine.log.20150723 && http://srepetsk.net/files/etc-glusterfs-glusterd.vol.log.20150723 The gluster log file is from one of the servers from a different brick in the primary (aka "store1") datacenter/gluster volume, so it was and still is in the volume. Thanks, Stephen Stephen Repetski Rochester Institute of Technology '13 | http://srepetsk.net On Thu, Jul 23, 2015 at 4:28 PM, Raz Tamir < ratamir@redhat.com > wrote: Hi Stephen, 1) Can you please provide the vdsm and gluster versions? 2) How you removed the brick? 3) Can you please attach the glusterfs log located under /var/log ? * Just for info - there is no support for gluster if the volume is not a 3-way replica Thanks in advance, Raz Tamir ratamir@redhat.com RedHat Israel RHEV-M QE Storage team From: "Stephen Repetski" < srepetsk@srepetsk.net > To: "users" < users@ovirt.org > Sent: Thursday, July 23, 2015 11:08:57 PM Subject: [ovirt-users] oVirt not starting primary storage domain Hi all, I recently made a change with the gluster volume backing my primary storage domain (removed 1 of the 3 bricks, each w/ 3x replication), and now oVirt fails to activate the primary storage domain. After attempting to start the domain the engine goes through and does its various commications with VDSM, but then fails out with a "Sanlock resource read failure" - https://gist.githubusercontent.com/srepetsk/83ef13ddcf1e690a398e/raw/ada362a... Is there a way to figure out more on what this SpmStatusVDS error is and what might be causing it? Thanks, Stephen Stephen Repetski _______________________________________________ Users mailing list Users@ovirt.org http://lists.ovirt.org/mailman/listinfo/users ------=_Part_3055420_61788721.1437686254896 Content-Type: text/html; charset=utf-8 Content-Transfer-Encoding: quoted-printable <html><body><div style=3D"font-family: trebuchet ms,sans-serif; font-size: = 12pt; color: #000000"><div>As far as I can see from the logs you removed 3 = bricks. Can you confirm?<br></div><div><br></div><div><span name=3D"x"></sp= an><div><br></div><div><br></div><div><br>Thanks in advance,<br>Raz Tamir<b= r>ratamir@redhat.com<br>RedHat Israel</div><div>RHEV-M QE Storage team<br><= /div><span name=3D"x"></span><br></div><hr id=3D"zwchr"><div style=3D"color= :#000;font-weight:normal;font-style:normal;text-decoration:none;font-family= :Helvetica,Arial,sans-serif;font-size:12pt;"><b>From: </b>"Stephen Repetski= " <srepetsk@srepetsk.net><br><b>To: </b>"Raz Tamir" <ratamir@redha= t.com><br><b>Cc: </b>"users" <users@ovirt.org><br><b>Sent: </b>Fri= day, July 24, 2015 12:01:16 AM<br><b>Subject: </b>Re: [ovirt-users] oVirt n= ot starting primary storage domain<br><div><br></div><div dir=3D"ltr">Hi Ra= z:<div><br></div><div>I'm using vdsm-4.16.14-0.el6.x86_64 with gl= usterfs-3.6.2-1.el6.x86_64 on oVirt 3.5.2.</div><div><br></div><div>I remov= ed the brick with: gluster remove-brick store1 replica 3 $1 $2 $3 start; gl= uster remove-brick store1 replica 3 $1 $2 $3 commit. Between the two comman= ds I used the 'status' option to verify that all nodes were marked as 'comp= leted' before running the 'commit' one.</div><div><br></div><div>Also, the = two log files you requested are available here:</div><div><a href=3D"http:/= /srepetsk.net/files/engine.log.20150723" target=3D"_blank">http://srepetsk.= net/files/engine.log.20150723</a> && <a href=3D"http://srepets= k.net/files/etc-glusterfs-glusterd.vol.log.20150723" target=3D"_blank">http= ://srepetsk.net/files/etc-glusterfs-glusterd.vol.log.20150723</a><br></div>= <div>The gluster log file is from one of the servers from a different brick= in the primary (aka "store1") datacenter/gluster volume, so it was and sti= ll is in the volume.</div><div><br></div><div><br></div><div>Thanks,</div><= div>Stephen</div><div><br></div></div><div class=3D"gmail_extra"><br clear= =3D"all"><div><div class=3D"gmail_signature"><div dir=3D"ltr"><b style=3D"c= olor:rgb(102,102,102)">Stephen Repetski</b><br style=3D"color:rgb(102,102,1= 02)"><span style=3D"color:rgb(102,102,102)">Rochester Institute of Technolo= gy '13 | </span><a style=3D"color:rgb(102,102,102)" href=3D"http://srepetsk= .net" target=3D"_blank">http://srepetsk.net</a><br></div></div></div> <br><div class=3D"gmail_quote">On Thu, Jul 23, 2015 at 4:28 PM, Raz Tamir <= span dir=3D"ltr"><<a href=3D"mailto:ratamir@redhat.com" target=3D"_blank= ">ratamir@redhat.com</a>></span> wrote:<br><blockquote class=3D"gmail_qu= ote" style=3D"margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex= "><div><div style=3D"font-family:trebuchet ms,sans-serif;font-size:12pt;col= or:#000000"><div>Hi <span style=3D"font-family:Helvetica,Arial,sans-se= rif;font-size:16.3636360168457px">Stephen,</span><br></div><div><span style= =3D"font-family:Helvetica,Arial,sans-serif;font-size:16.3636360168457px">1)= Can you please provide the vdsm and gluster versions?</span></div><div><sp= an style=3D"font-family:Helvetica,Arial,sans-serif;font-size:16.36363601684= 57px">2) How you removed the brick?</span></div><div><span style=3D"font-fa= mily:Helvetica,Arial,sans-serif;font-size:16.3636360168457px">3) Can you pl= ease attach the glusterfs log located under /var/log ?</span></div><div><sp= an style=3D"font-family:Helvetica,Arial,sans-serif;font-size:16.36363601684= 57px"><br></span></div><div><span style=3D"font-family:Helvetica,Arial,sans= -serif;font-size:16.3636360168457px">* Just for info - there is no support = for gluster if the volume is not a 3-way replica</span></div><div><br></div=
<div><span></span><div><br></div><div><br></div><div><br>Thanks in advance= ,<br>Raz Tamir<br><a href=3D"mailto:ratamir@redhat.com" target=3D"_blank">r= atamir@redhat.com</a><br>RedHat Israel</div><div>RHEV-M QE Storage team<br>= </div><span></span><br></div><hr><div style=3D"color:#000;font-weight:norma= l;font-style:normal;text-decoration:none;font-family:Helvetica,Arial,sans-s= erif;font-size:12pt"><b>From: </b>"Stephen Repetski" <<a href=3D"mailto:= srepetsk@srepetsk.net" target=3D"_blank">srepetsk@srepetsk.net</a>><br><= b>To: </b>"users" <<a href=3D"mailto:users@ovirt.org" target=3D"_blank">= users@ovirt.org</a>><br><b>Sent: </b>Thursday, July 23, 2015 11:08:57 PM= <br><b>Subject: </b>[ovirt-users] oVirt not starting primary storage domain= <div><div class=3D"h5"><br><div><br></div><div dir=3D"ltr">Hi all,<div><br>= </div><div>I recently made a change with the gluster volume backing my prim= ary storage domain (removed 1 of the 3 bricks, each w/ 3x replication), and= now oVirt fails to activate the primary storage domain. After attempting t= o start the domain the engine goes through and does its various commication= s with VDSM, but then fails out with a "Sanlock resource read failure" -&nb= sp;<a href=3D"https://gist.githubusercontent.com/srepetsk/83ef13ddcf1e690a3= 98e/raw/ada362ac43ae71984a90979a676f2738648ac4ac/gistfile1.txt" target=3D"_= blank">https://gist.githubusercontent.com/srepetsk/83ef13ddcf1e690a398e/raw= /ada362ac43ae71984a90979a676f2738648ac4ac/gistfile1.txt</a></div><div><br><= /div><div>Is there a way to figure out more on what this SpmStatusVDS error= is and what might be causing it?</div><div><br></div><div>Thanks,</div><di= v>Stephen</div><div><br clear=3D"all"><div><div><div dir=3D"ltr"><b style= =3D"color:rgb(102,102,102)">Stephen Repetski</b><br></div></div></div> </div></div> <br></div></div>_______________________________________________<br>Users ma= iling list<br><a href=3D"mailto:Users@ovirt.org" target=3D"_blank">Users@ov= irt.org</a><br><a href=3D"http://lists.ovirt.org/mailman/listinfo/users" ta= rget=3D"_blank">http://lists.ovirt.org/mailman/listinfo/users</a><br></div>= <div><br></div></div></div></blockquote></div><br></div> </div><div><br></div></div></body></html> ------=_Part_3055420_61788721.1437686254896--

That is correct. The volume was 9 servers w/ 3x replication, and I wanted to move all data off of one of the sets of 3 servers, and those were which I removed w/ remove-brick start and commit. Per the RH documentation ( https://access.redhat.com/documentation/en-US/Red_Hat_Storage/2.0/html/Admin...), this should not be an issue assuming the remove-brick process completes before committing it. *Stephen Repetski* On Thu, Jul 23, 2015 at 5:17 PM, Raz Tamir <ratamir@redhat.com> wrote:
As far as I can see from the logs you removed 3 bricks. Can you confirm?
Thanks in advance, Raz Tamir ratamir@redhat.com RedHat Israel RHEV-M QE Storage team
------------------------------ *From: *"Stephen Repetski" <srepetsk@srepetsk.net> *To: *"Raz Tamir" <ratamir@redhat.com> *Cc: *"users" <users@ovirt.org> *Sent: *Friday, July 24, 2015 12:01:16 AM *Subject: *Re: [ovirt-users] oVirt not starting primary storage domain
Hi Raz:
I'm using vdsm-4.16.14-0.el6.x86_64 with glusterfs-3.6.2-1.el6.x86_64 on oVirt 3.5.2.
I removed the brick with: gluster remove-brick store1 replica 3 $1 $2 $3 start; gluster remove-brick store1 replica 3 $1 $2 $3 commit. Between the two commands I used the 'status' option to verify that all nodes were marked as 'completed' before running the 'commit' one.
Also, the two log files you requested are available here: http://srepetsk.net/files/engine.log.20150723 && http://srepetsk.net/files/etc-glusterfs-glusterd.vol.log.20150723 The gluster log file is from one of the servers from a different brick in the primary (aka "store1") datacenter/gluster volume, so it was and still is in the volume.
Thanks, Stephen
*Stephen Repetski* Rochester Institute of Technology '13 | http://srepetsk.net
On Thu, Jul 23, 2015 at 4:28 PM, Raz Tamir <ratamir@redhat.com> wrote:
Hi Stephen, 1) Can you please provide the vdsm and gluster versions? 2) How you removed the brick? 3) Can you please attach the glusterfs log located under /var/log ?
* Just for info - there is no support for gluster if the volume is not a 3-way replica
Thanks in advance, Raz Tamir ratamir@redhat.com RedHat Israel RHEV-M QE Storage team
------------------------------ *From: *"Stephen Repetski" <srepetsk@srepetsk.net> *To: *"users" <users@ovirt.org> *Sent: *Thursday, July 23, 2015 11:08:57 PM *Subject: *[ovirt-users] oVirt not starting primary storage domain
Hi all,
I recently made a change with the gluster volume backing my primary storage domain (removed 1 of the 3 bricks, each w/ 3x replication), and now oVirt fails to activate the primary storage domain. After attempting to start the domain the engine goes through and does its various commications with VDSM, but then fails out with a "Sanlock resource read failure" - https://gist.githubusercontent.com/srepetsk/83ef13ddcf1e690a398e/raw/ada362a...
Is there a way to figure out more on what this SpmStatusVDS error is and what might be causing it?
Thanks, Stephen
*Stephen Repetski*
_______________________________________________ Users mailing list Users@ovirt.org http://lists.ovirt.org/mailman/listinfo/users

"Raz Tamir" <ratamir@redhat.com><br><b>Cc: </b>"users" <users@ovi= rt.org><br><b>Sent: </b>Friday, July 24, 2015 12:23:07 AM<br><b>Subject:= </b>Re: [ovirt-users] oVirt not starting primary storage domain<br><div><b= r></div><div dir=3D"ltr">That is correct. The volume was 9 servers w/ 3x re=
<b style=3D"color:rgb(102,102,102)">Stephen Repetski</b><br></div></div></=
<br>RedHat Israel</div><div>RHEV-M QE Storage team<br></div><span></span><= br></div><hr></span><div style=3D"color:#000;font-weight:normal;font-style:= normal;text-decoration:none;font-family:Helvetica,Arial,sans-serif;font-siz= e:12pt"><span class=3D""><b>From: </b>"Stephen Repetski" <<a href=3D"mai= lto:srepetsk@srepetsk.net" target=3D"_blank">srepetsk@srepetsk.net</a>><= br></span><b>To: </b>"Raz Tamir" <<a href=3D"mailto:ratamir@redhat.com" = target=3D"_blank">ratamir@redhat.com</a>><br><b>Cc: </b>"users" <<a h= ref=3D"mailto:users@ovirt.org" target=3D"_blank">users@ovirt.org</a>><br= <b>Sent: </b>Friday, July 24, 2015 12:01:16 AM<br><b>Subject: </b>Re: [ovi= rt-users] oVirt not starting primary storage domain<div><div class=3D"h5"><= br><div><br></div><div dir=3D"ltr">Hi Raz:<div><br></div><div>I'm using&nbs=
</div> <br><div class=3D"gmail_quote">On Thu, Jul 23, 2015 at 4:28 PM, Raz Tamir <= span dir=3D"ltr"><<a href=3D"mailto:ratamir@redhat.com" target=3D"_blank= ">ratamir@redhat.com</a>></span> wrote:<br><blockquote class=3D"gmail_qu= ote" style=3D"margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex= "><div><div style=3D"font-family:trebuchet ms,sans-serif;font-size:12pt;col= or:#000000"><div>Hi <span style=3D"font-family:Helvetica,Arial,sans-se= rif;font-size:16.3636360168457px">Stephen,</span><br></div><div><span style= =3D"font-family:Helvetica,Arial,sans-serif;font-size:16.3636360168457px">1)= Can you please provide the vdsm and gluster versions?</span></div><div><sp= an style=3D"font-family:Helvetica,Arial,sans-serif;font-size:16.36363601684= 57px">2) How you removed the brick?</span></div><div><span style=3D"font-fa= mily:Helvetica,Arial,sans-serif;font-size:16.3636360168457px">3) Can you pl= ease attach the glusterfs log located under /var/log ?</span></div><div><sp= an style=3D"font-family:Helvetica,Arial,sans-serif;font-size:16.36363601684= 57px"><br></span></div><div><span style=3D"font-family:Helvetica,Arial,sans= -serif;font-size:16.3636360168457px">* Just for info - there is no support = for gluster if the volume is not a 3-way replica</span></div><div><br></div= <div><span></span><div><br></div><div><br></div><div><br>Thanks in advance= ,<br>Raz Tamir<br><a href=3D"mailto:ratamir@redhat.com" target=3D"_blank">r= atamir@redhat.com</a><br>RedHat Israel</div><div>RHEV-M QE Storage team<br>= </div><span></span><br></div><hr><div style=3D"color:#000;font-weight:norma= l;font-style:normal;text-decoration:none;font-family:Helvetica,Arial,sans-s= erif;font-size:12pt"><b>From: </b>"Stephen Repetski" <<a href=3D"mailto:= srepetsk@srepetsk.net" target=3D"_blank">srepetsk@srepetsk.net</a>><br><= b>To: </b>"users" <<a href=3D"mailto:users@ovirt.org" target=3D"_blank">= users@ovirt.org</a>><br><b>Sent: </b>Thursday, July 23, 2015 11:08:57 PM= <br><b>Subject: </b>[ovirt-users] oVirt not starting primary storage domain= <div><div><br><div><br></div><div dir=3D"ltr">Hi all,<div><br></div><div>I = recently made a change with the gluster volume backing my primary storage d= omain (removed 1 of the 3 bricks, each w/ 3x replication), and now oVirt fa= ils to activate the primary storage domain. After attempting to start the d= omain the engine goes through and does its various commications with VDSM, = but then fails out with a "Sanlock resource read failure" - <a href=3D= "https://gist.githubusercontent.com/srepetsk/83ef13ddcf1e690a398e/raw/ada36= 2ac43ae71984a90979a676f2738648ac4ac/gistfile1.txt" target=3D"_blank">https:= //gist.githubusercontent.com/srepetsk/83ef13ddcf1e690a398e/raw/ada362ac43ae= 71984a90979a676f2738648ac4ac/gistfile1.txt</a></div><div><br></div><div>Is =
------=_Part_3061920_1886700399.1437687354517 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit thanks for the detailed answer. I will take a further look and update you when I will have news Thanks in advance, Raz Tamir ratamir@redhat.com RedHat Israel RHEV-M QE Storage team ----- Original Message ----- From: "Stephen Repetski" <srepetsk@srepetsk.net> To: "Raz Tamir" <ratamir@redhat.com> Cc: "users" <users@ovirt.org> Sent: Friday, July 24, 2015 12:23:07 AM Subject: Re: [ovirt-users] oVirt not starting primary storage domain That is correct. The volume was 9 servers w/ 3x replication, and I wanted to move all data off of one of the sets of 3 servers, and those were which I removed w/ remove-brick start and commit. Per the RH documentation ( https://access.redhat.com/documentation/en-US/Red_Hat_Storage/2.0/html/Admin... ), this should not be an issue assuming the remove-brick process completes before committing it. Stephen Repetski On Thu, Jul 23, 2015 at 5:17 PM, Raz Tamir < ratamir@redhat.com > wrote: As far as I can see from the logs you removed 3 bricks. Can you confirm? Thanks in advance, Raz Tamir ratamir@redhat.com RedHat Israel RHEV-M QE Storage team From: "Stephen Repetski" < srepetsk@srepetsk.net > To: "Raz Tamir" < ratamir@redhat.com > Cc: "users" < users@ovirt.org > Sent: Friday, July 24, 2015 12:01:16 AM Subject: Re: [ovirt-users] oVirt not starting primary storage domain Hi Raz: I'm using vdsm-4.16.14-0.el6.x86_64 with glusterfs-3.6.2-1.el6.x86_64 on oVirt 3.5.2. I removed the brick with: gluster remove-brick store1 replica 3 $1 $2 $3 start; gluster remove-brick store1 replica 3 $1 $2 $3 commit. Between the two commands I used the 'status' option to verify that all nodes were marked as 'completed' before running the 'commit' one. Also, the two log files you requested are available here: http://srepetsk.net/files/engine.log.20150723 && http://srepetsk.net/files/etc-glusterfs-glusterd.vol.log.20150723 The gluster log file is from one of the servers from a different brick in the primary (aka "store1") datacenter/gluster volume, so it was and still is in the volume. Thanks, Stephen Stephen Repetski Rochester Institute of Technology '13 | http://srepetsk.net On Thu, Jul 23, 2015 at 4:28 PM, Raz Tamir < ratamir@redhat.com > wrote: <blockquote> Hi Stephen, 1) Can you please provide the vdsm and gluster versions? 2) How you removed the brick? 3) Can you please attach the glusterfs log located under /var/log ? * Just for info - there is no support for gluster if the volume is not a 3-way replica Thanks in advance, Raz Tamir ratamir@redhat.com RedHat Israel RHEV-M QE Storage team From: "Stephen Repetski" < srepetsk@srepetsk.net > To: "users" < users@ovirt.org > Sent: Thursday, July 23, 2015 11:08:57 PM Subject: [ovirt-users] oVirt not starting primary storage domain Hi all, I recently made a change with the gluster volume backing my primary storage domain (removed 1 of the 3 bricks, each w/ 3x replication), and now oVirt fails to activate the primary storage domain. After attempting to start the domain the engine goes through and does its various commications with VDSM, but then fails out with a "Sanlock resource read failure" - https://gist.githubusercontent.com/srepetsk/83ef13ddcf1e690a398e/raw/ada362a... Is there a way to figure out more on what this SpmStatusVDS error is and what might be causing it? Thanks, Stephen Stephen Repetski _______________________________________________ Users mailing list Users@ovirt.org http://lists.ovirt.org/mailman/listinfo/users </blockquote> ------=_Part_3061920_1886700399.1437687354517 Content-Type: text/html; charset=utf-8 Content-Transfer-Encoding: quoted-printable <html><body><div style=3D"font-family: trebuchet ms,sans-serif; font-size: = 12pt; color: #000000"><div>thanks for the detailed answer.</div><div>I will= take a further look and update you when I will have news</div><div><br></d= iv><div><span name=3D"x"></span><div><br></div><div><br></div><div><br>Than= ks in advance,<br>Raz Tamir<br>ratamir@redhat.com<br>RedHat Israel</div><di= v>RHEV-M QE Storage team<br></div><span name=3D"x"></span><br></div><hr id= =3D"zwchr"><div style=3D"color:#000;font-weight:normal;font-style:normal;te= xt-decoration:none;font-family:Helvetica,Arial,sans-serif;font-size:12pt;">= <b>From: </b>"Stephen Repetski" <srepetsk@srepetsk.net><br><b>To: </b= plication, and I wanted to move all data off of one of the sets of 3 server= s, and those were which I removed w/ remove-brick start and commit. Per the= RH documentation (<a href=3D"https://access.redhat.com/documentation/en-US= /Red_Hat_Storage/2.0/html/Administration_Guide/sect-User_Guide-Managing_Vol= umes-Shrinking.html" target=3D"_blank">https://access.redhat.com/documentat= ion/en-US/Red_Hat_Storage/2.0/html/Administration_Guide/sect-User_Guide-Man= aging_Volumes-Shrinking.html</a>), this should not be an issue assuming the= remove-brick process completes before committing it.<div class=3D"gmail_ex= tra"><br clear=3D"all"><div><div class=3D"gmail_signature"><div dir=3D"ltr"= div> <br><div class=3D"gmail_quote">On Thu, Jul 23, 2015 at 5:17 PM, Raz Tamir <= span dir=3D"ltr"><<a href=3D"mailto:ratamir@redhat.com" target=3D"_blank= ">ratamir@redhat.com</a>></span> wrote:<br><blockquote class=3D"gmail_qu= ote" style=3D"margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex= "><div><div style=3D"font-family:trebuchet ms,sans-serif;font-size:12pt;col= or:#000000"><div>As far as I can see from the logs you removed 3 bricks. Ca= n you confirm?<br></div><span class=3D""><div><br></div><div><span></span><= div><br></div><div><br></div><div><br>Thanks in advance,<br>Raz Tamir<br><a= href=3D"mailto:ratamir@redhat.com" target=3D"_blank">ratamir@redhat.com</a= p;vdsm-4.16.14-0.el6.x86_64 with glusterfs-3.6.2-1.el6.x86_64 on oVirt= 3.5.2.</div><div><br></div><div>I removed the brick with: gluster remove-b= rick store1 replica 3 $1 $2 $3 start; gluster remove-brick store1 replica 3= $1 $2 $3 commit. Between the two commands I used the 'status' option to ve= rify that all nodes were marked as 'completed' before running the 'commit' = one.</div><div><br></div><div>Also, the two log files you requested are ava= ilable here:</div><div><a href=3D"http://srepetsk.net/files/engine.log.2015= 0723" target=3D"_blank">http://srepetsk.net/files/engine.log.20150723</a> &= amp;& <a href=3D"http://srepetsk.net/files/etc-glusterfs-glusterd.= vol.log.20150723" target=3D"_blank">http://srepetsk.net/files/etc-glusterfs= -glusterd.vol.log.20150723</a><br></div><div>The gluster log file is from o= ne of the servers from a different brick in the primary (aka "store1") data= center/gluster volume, so it was and still is in the volume.</div><div><br>= </div><div><br></div><div>Thanks,</div><div>Stephen</div><div><br></div></d= iv><div class=3D"gmail_extra"><br clear=3D"all"><div><div><div dir=3D"ltr">= <b style=3D"color:rgb(102,102,102)">Stephen Repetski</b><br style=3D"color:= rgb(102,102,102)"><span style=3D"color:rgb(102,102,102)">Rochester Institut= e of Technology '13 | </span><a style=3D"color:rgb(102,102,102)" href=3D"ht= tp://srepetsk.net" target=3D"_blank">http://srepetsk.net</a><br></div></div= there a way to figure out more on what this SpmStatusVDS error is and what = might be causing it?</div><div><br></div><div>Thanks,</div><div>Stephen</di= v><div><br clear=3D"all"><div><div><div dir=3D"ltr"><b style=3D"color:rgb(1= 02,102,102)">Stephen Repetski</b><br></div></div></div> </div></div> <br></div></div>_______________________________________________<br>Users ma= iling list<br><a href=3D"mailto:Users@ovirt.org" target=3D"_blank">Users@ov= irt.org</a><br><a href=3D"http://lists.ovirt.org/mailman/listinfo/users" ta= rget=3D"_blank">http://lists.ovirt.org/mailman/listinfo/users</a><br></div>= <div><br></div></div></div></blockquote></div><br></div> </div></div></div><div><br></div></div></div></blockquote></div><br></div><= /div> </div><div><br></div></div></body></html> ------=_Part_3061920_1886700399.1437687354517--

Hello Raz, I have been digging more into the issue today, and I found one likely reason why I am getting the sanlock error: the /path/to/storagedomain/dom_md/leases file is apparently missing. /var/log/sanlock.log Jul 24 13:37:52 virt0 sanlock[3140]: 2015-07-24 13:37:52+0000 3012847 [9110]: open error -2 /rhev/data-center/mnt/glusterSD/virt-data.syseng.contoso.com: store1/30b39180-c50d-4464-a944-18c1bfbe4b22/dom_md/leases Jul 24 13:37:53 virt0 sanlock[3140]: 2015-07-24 13:37:53+0000 3012848 [3140]: ci 2 fd 22 pid -1 recv errno 104 [root@virt2 30b39180-c50d-4464-a944-18c1bfbe4b22]# find dom_md/ dom_md/ dom_md/ids dom_md/inbox dom_md/outbox dom_md/metadata This is obviously a problem, but I do not know how to proceed. Is there a way to regenerate or repair the file in order to reattach the domain? Thanks Stephen On Thu, Jul 23, 2015 at 5:35 PM, Raz Tamir <ratamir@redhat.com> wrote:
thanks for the detailed answer. I will take a further look and update you when I will have news
Thanks in advance, Raz Tamir ratamir@redhat.com RedHat Israel RHEV-M QE Storage team
------------------------------ *From: *"Stephen Repetski" <srepetsk@srepetsk.net> *To: *"Raz Tamir" <ratamir@redhat.com> *Cc: *"users" <users@ovirt.org> *Sent: *Friday, July 24, 2015 12:23:07 AM
*Subject: *Re: [ovirt-users] oVirt not starting primary storage domain
That is correct. The volume was 9 servers w/ 3x replication, and I wanted to move all data off of one of the sets of 3 servers, and those were which I removed w/ remove-brick start and commit. Per the RH documentation ( https://access.redhat.com/documentation/en-US/Red_Hat_Storage/2.0/html/Admin...), this should not be an issue assuming the remove-brick process completes before committing it.
*Stephen Repetski*
On Thu, Jul 23, 2015 at 5:17 PM, Raz Tamir <ratamir@redhat.com> wrote:
As far as I can see from the logs you removed 3 bricks. Can you confirm?
Thanks in advance, Raz Tamir ratamir@redhat.com RedHat Israel RHEV-M QE Storage team
------------------------------ *From: *"Stephen Repetski" <srepetsk@srepetsk.net> *To: *"Raz Tamir" <ratamir@redhat.com> *Cc: *"users" <users@ovirt.org> *Sent: *Friday, July 24, 2015 12:01:16 AM *Subject: *Re: [ovirt-users] oVirt not starting primary storage domain
Hi Raz:
I'm using vdsm-4.16.14-0.el6.x86_64 with glusterfs-3.6.2-1.el6.x86_64 on oVirt 3.5.2.
I removed the brick with: gluster remove-brick store1 replica 3 $1 $2 $3 start; gluster remove-brick store1 replica 3 $1 $2 $3 commit. Between the two commands I used the 'status' option to verify that all nodes were marked as 'completed' before running the 'commit' one.
Also, the two log files you requested are available here: http://srepetsk.net/files/engine.log.20150723 && http://srepetsk.net/files/etc-glusterfs-glusterd.vol.log.20150723 The gluster log file is from one of the servers from a different brick in the primary (aka "store1") datacenter/gluster volume, so it was and still is in the volume.
Thanks, Stephen
*Stephen Repetski* Rochester Institute of Technology '13 | http://srepetsk.net
On Thu, Jul 23, 2015 at 4:28 PM, Raz Tamir <ratamir@redhat.com> wrote:
Hi Stephen, 1) Can you please provide the vdsm and gluster versions? 2) How you removed the brick? 3) Can you please attach the glusterfs log located under /var/log ?
* Just for info - there is no support for gluster if the volume is not a 3-way replica
Thanks in advance, Raz Tamir ratamir@redhat.com RedHat Israel RHEV-M QE Storage team
------------------------------ *From: *"Stephen Repetski" <srepetsk@srepetsk.net> *To: *"users" <users@ovirt.org> *Sent: *Thursday, July 23, 2015 11:08:57 PM *Subject: *[ovirt-users] oVirt not starting primary storage domain
Hi all,
I recently made a change with the gluster volume backing my primary storage domain (removed 1 of the 3 bricks, each w/ 3x replication), and now oVirt fails to activate the primary storage domain. After attempting to start the domain the engine goes through and does its various commications with VDSM, but then fails out with a "Sanlock resource read failure" - https://gist.githubusercontent.com/srepetsk/83ef13ddcf1e690a398e/raw/ada362a...
Is there a way to figure out more on what this SpmStatusVDS error is and what might be causing it?
Thanks, Stephen
*Stephen Repetski*
_______________________________________________ Users mailing list Users@ovirt.org http://lists.ovirt.org/mailman/listinfo/users
participants (2)
-
Raz Tamir
-
Stephen Repetski