
Hi, this is is here again and we are getting several vm's going into storage error in our 4 node cluster running on centos 7.4 with gluster and ovirt 4.2.1. Gluster version: 3.12.6 volume status [root@ovirt3 ~]# gluster volume status Status of volume: data Gluster process TCP Port RDMA Port Online Pid ------------------------------------------------------------------------------ Brick ovirt0:/gluster/brick3/data 49152 0 Y 9102 Brick ovirt2:/gluster/brick3/data 49152 0 Y 28063 Brick ovirt3:/gluster/brick3/data 49152 0 Y 28379 Brick ovirt0:/gluster/brick4/data 49153 0 Y 9111 Brick ovirt2:/gluster/brick4/data 49153 0 Y 28069 Brick ovirt3:/gluster/brick4/data 49153 0 Y 28388 Brick ovirt0:/gluster/brick5/data 49154 0 Y 9120 Brick ovirt2:/gluster/brick5/data 49154 0 Y 28075 Brick ovirt3:/gluster/brick5/data 49154 0 Y 28397 Brick ovirt0:/gluster/brick6/data 49155 0 Y 9129 Brick ovirt2:/gluster/brick6_1/data 49155 0 Y 28081 Brick ovirt3:/gluster/brick6/data 49155 0 Y 28404 Brick ovirt0:/gluster/brick7/data 49156 0 Y 9138 Brick ovirt2:/gluster/brick7/data 49156 0 Y 28089 Brick ovirt3:/gluster/brick7/data 49156 0 Y 28411 Brick ovirt0:/gluster/brick8/data 49157 0 Y 9145 Brick ovirt2:/gluster/brick8/data 49157 0 Y 28095 Brick ovirt3:/gluster/brick8/data 49157 0 Y 28418 Brick ovirt1:/gluster/brick3/data 49152 0 Y 23139 Brick ovirt1:/gluster/brick4/data 49153 0 Y 23145 Brick ovirt1:/gluster/brick5/data 49154 0 Y 23152 Brick ovirt1:/gluster/brick6/data 49155 0 Y 23159 Brick ovirt1:/gluster/brick7/data 49156 0 Y 23166 Brick ovirt1:/gluster/brick8/data 49157 0 Y 23173 Self-heal Daemon on localhost N/A N/A Y 7757 Bitrot Daemon on localhost N/A N/A Y 7766 Scrubber Daemon on localhost N/A N/A Y 7785 Self-heal Daemon on ovirt2 N/A N/A Y 8205 Bitrot Daemon on ovirt2 N/A N/A Y 8216 Scrubber Daemon on ovirt2 N/A N/A Y 8227 Self-heal Daemon on ovirt0 N/A N/A Y 32665 Bitrot Daemon on ovirt0 N/A N/A Y 32674 Scrubber Daemon on ovirt0 N/A N/A Y 32712 Self-heal Daemon on ovirt1 N/A N/A Y 31759 Bitrot Daemon on ovirt1 N/A N/A Y 31768 Scrubber Daemon on ovirt1 N/A N/A Y 31790 Task Status of Volume data ------------------------------------------------------------------------------ Task : Rebalance ID : 62942ba3-db9e-4604-aa03-4970767f4d67 Status : completed Status of volume: engine Gluster process TCP Port RDMA Port Online Pid ------------------------------------------------------------------------------ Brick ovirt0:/gluster/brick1/engine 49158 0 Y 9155 Brick ovirt2:/gluster/brick1/engine 49158 0 Y 28107 Brick ovirt3:/gluster/brick1/engine 49158 0 Y 28427 Self-heal Daemon on localhost N/A N/A Y 7757 Self-heal Daemon on ovirt1 N/A N/A Y 31759 Self-heal Daemon on ovirt0 N/A N/A Y 32665 Self-heal Daemon on ovirt2 N/A N/A Y 8205 Task Status of Volume engine ------------------------------------------------------------------------------ There are no active volume tasks Status of volume: iso Gluster process TCP Port RDMA Port Online Pid ------------------------------------------------------------------------------ Brick ovirt0:/gluster/brick2/iso 49159 0 Y 9164 Brick ovirt2:/gluster/brick2/iso 49159 0 Y 28116 Brick ovirt3:/gluster/brick2/iso 49159 0 Y 28436 NFS Server on localhost 2049 0 Y 7746 Self-heal Daemon on localhost N/A N/A Y 7757 NFS Server on ovirt1 2049 0 Y 31748 Self-heal Daemon on ovirt1 N/A N/A Y 31759 NFS Server on ovirt0 2049 0 Y 32656 Self-heal Daemon on ovirt0 N/A N/A Y 32665 NFS Server on ovirt2 2049 0 Y 8194 Self-heal Daemon on ovirt2 N/A N/A Y 8205 Task Status of Volume iso ------------------------------------------------------------------------------ There are no active volume tasks

Can you provide "gluster volume info" and the mount logs of the data volume (I assume that this hosts the vdisks for the VM's with storage error). Also vdsm.log at the corresponding time. On Fri, Mar 16, 2018 at 3:45 AM, Endre Karlson <endre.karlson@gmail.com> wrote:
Hi, this is is here again and we are getting several vm's going into storage error in our 4 node cluster running on centos 7.4 with gluster and ovirt 4.2.1.
Gluster version: 3.12.6
volume status [root@ovirt3 ~]# gluster volume status Status of volume: data Gluster process TCP Port RDMA Port Online Pid ------------------------------------------------------------ ------------------ Brick ovirt0:/gluster/brick3/data 49152 0 Y 9102 Brick ovirt2:/gluster/brick3/data 49152 0 Y 28063 Brick ovirt3:/gluster/brick3/data 49152 0 Y 28379 Brick ovirt0:/gluster/brick4/data 49153 0 Y 9111 Brick ovirt2:/gluster/brick4/data 49153 0 Y 28069 Brick ovirt3:/gluster/brick4/data 49153 0 Y 28388 Brick ovirt0:/gluster/brick5/data 49154 0 Y 9120 Brick ovirt2:/gluster/brick5/data 49154 0 Y 28075 Brick ovirt3:/gluster/brick5/data 49154 0 Y 28397 Brick ovirt0:/gluster/brick6/data 49155 0 Y 9129 Brick ovirt2:/gluster/brick6_1/data 49155 0 Y 28081 Brick ovirt3:/gluster/brick6/data 49155 0 Y 28404 Brick ovirt0:/gluster/brick7/data 49156 0 Y 9138 Brick ovirt2:/gluster/brick7/data 49156 0 Y 28089 Brick ovirt3:/gluster/brick7/data 49156 0 Y 28411 Brick ovirt0:/gluster/brick8/data 49157 0 Y 9145 Brick ovirt2:/gluster/brick8/data 49157 0 Y 28095 Brick ovirt3:/gluster/brick8/data 49157 0 Y 28418 Brick ovirt1:/gluster/brick3/data 49152 0 Y 23139 Brick ovirt1:/gluster/brick4/data 49153 0 Y 23145 Brick ovirt1:/gluster/brick5/data 49154 0 Y 23152 Brick ovirt1:/gluster/brick6/data 49155 0 Y 23159 Brick ovirt1:/gluster/brick7/data 49156 0 Y 23166 Brick ovirt1:/gluster/brick8/data 49157 0 Y 23173 Self-heal Daemon on localhost N/A N/A Y 7757 Bitrot Daemon on localhost N/A N/A Y 7766 Scrubber Daemon on localhost N/A N/A Y 7785 Self-heal Daemon on ovirt2 N/A N/A Y 8205 Bitrot Daemon on ovirt2 N/A N/A Y 8216 Scrubber Daemon on ovirt2 N/A N/A Y 8227 Self-heal Daemon on ovirt0 N/A N/A Y 32665 Bitrot Daemon on ovirt0 N/A N/A Y 32674 Scrubber Daemon on ovirt0 N/A N/A Y 32712 Self-heal Daemon on ovirt1 N/A N/A Y 31759 Bitrot Daemon on ovirt1 N/A N/A Y 31768 Scrubber Daemon on ovirt1 N/A N/A Y 31790
Task Status of Volume data ------------------------------------------------------------ ------------------ Task : Rebalance ID : 62942ba3-db9e-4604-aa03-4970767f4d67 Status : completed
Status of volume: engine Gluster process TCP Port RDMA Port Online Pid ------------------------------------------------------------ ------------------ Brick ovirt0:/gluster/brick1/engine 49158 0 Y 9155 Brick ovirt2:/gluster/brick1/engine 49158 0 Y 28107 Brick ovirt3:/gluster/brick1/engine 49158 0 Y 28427 Self-heal Daemon on localhost N/A N/A Y 7757 Self-heal Daemon on ovirt1 N/A N/A Y 31759 Self-heal Daemon on ovirt0 N/A N/A Y 32665 Self-heal Daemon on ovirt2 N/A N/A Y 8205
Task Status of Volume engine ------------------------------------------------------------ ------------------ There are no active volume tasks
Status of volume: iso Gluster process TCP Port RDMA Port Online Pid ------------------------------------------------------------ ------------------ Brick ovirt0:/gluster/brick2/iso 49159 0 Y 9164 Brick ovirt2:/gluster/brick2/iso 49159 0 Y 28116 Brick ovirt3:/gluster/brick2/iso 49159 0 Y 28436 NFS Server on localhost 2049 0 Y 7746 Self-heal Daemon on localhost N/A N/A Y 7757 NFS Server on ovirt1 2049 0 Y 31748 Self-heal Daemon on ovirt1 N/A N/A Y 31759 NFS Server on ovirt0 2049 0 Y 32656 Self-heal Daemon on ovirt0 N/A N/A Y 32665 NFS Server on ovirt2 2049 0 Y 8194 Self-heal Daemon on ovirt2 N/A N/A Y 8205
Task Status of Volume iso ------------------------------------------------------------ ------------------ There are no active volume tasks
_______________________________________________ Users mailing list Users@ovirt.org http://lists.ovirt.org/mailman/listinfo/users

--Apple-Mail=_F4CB6E88-1C71-4B40-801A-29625F33DF0D Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset=utf-8 I=E2=80=99ve also encounter something similar on my setup, ovirt 3.1.9 = with a gluster 3.12.3 storage cluster. All the storage domains in = question are setup as gluster volumes & sharded, and I=E2=80=99ve = enabled libgfapi support in the engine. It=E2=80=99s happened primarily = to VMs that haven=E2=80=99t been restarted to switch to gfapi yet (still = have fuse mounts for these), but one or two VMs that have been switched = to gfapi mounts as well. I started updating the storage cluster to gluster 3.12.6 yesterday and = got more annoying/bad behavior as well. Many VMs that were =E2=80=9Chigh = disk use=E2=80=9D VMs experienced hangs, but not as storage related = pauses. Instead, they hang and their watchdogs eventually reported CPU = hangs. All did eventually resume normal operation, but it was annoying, = to be sure. The Ovirt Engine also lost contact with all of my VMs = (unknown status, ? in GUI), even though it still had contact with the = hosts. My gluster cluster reported no errors, volume status was normal, = and all peers and bricks were connected. Didn=E2=80=99t see anything in = the gluster logs that indicated problems, but there were reports of = failed heals that eventually went away.=20 Seems like something in vdsm and/or libgfapi isn=E2=80=99t handling the = gfapi mounts well during healing and the related locks, but I can=E2=80=99= t tell what it is. I=E2=80=99ve got two more servers in the cluster to = upgrade to 3.12.6 yet, and I=E2=80=99ll keep an eye on more logs while = I=E2=80=99m doing it, will report on it after I get more info. -Darrell
From: Sahina Bose <sabose@redhat.com> Subject: Re: [ovirt-users] Ovirt vm's paused due to storage error Date: March 22, 2018 at 4:56:13 AM CDT To: Endre Karlson Cc: users =20 Can you provide "gluster volume info" and the mount logs of the data = volume (I assume that this hosts the vdisks for the VM's with storage = error). =20 Also vdsm.log at the corresponding time. =20 On Fri, Mar 16, 2018 at 3:45 AM, Endre Karlson = <endre.karlson@gmail.com <mailto:endre.karlson@gmail.com>> wrote: Hi, this is is here again and we are getting several vm's going into = storage error in our 4 node cluster running on centos 7.4 with gluster = and ovirt 4.2.1. =20 Gluster version: 3.12.6 =20 volume status [root@ovirt3 ~]# gluster volume status Status of volume: data Gluster process TCP Port RDMA Port = Online Pid = --------------------------------------------------------------------------=
Brick ovirt0:/gluster/brick3/data 49152 0 Y = 9102=20 Brick ovirt2:/gluster/brick3/data 49152 0 Y = 28063 Brick ovirt3:/gluster/brick3/data 49152 0 Y = 28379 Brick ovirt0:/gluster/brick4/data 49153 0 Y = 9111=20 Brick ovirt2:/gluster/brick4/data 49153 0 Y = 28069 Brick ovirt3:/gluster/brick4/data 49153 0 Y = 28388 Brick ovirt0:/gluster/brick5/data 49154 0 Y = 9120=20 Brick ovirt2:/gluster/brick5/data 49154 0 Y = 28075 Brick ovirt3:/gluster/brick5/data 49154 0 Y = 28397 Brick ovirt0:/gluster/brick6/data 49155 0 Y = 9129=20 Brick ovirt2:/gluster/brick6_1/data 49155 0 Y = 28081 Brick ovirt3:/gluster/brick6/data 49155 0 Y = 28404 Brick ovirt0:/gluster/brick7/data 49156 0 Y = 9138=20 Brick ovirt2:/gluster/brick7/data 49156 0 Y = 28089 Brick ovirt3:/gluster/brick7/data 49156 0 Y = 28411 Brick ovirt0:/gluster/brick8/data 49157 0 Y = 9145=20 Brick ovirt2:/gluster/brick8/data 49157 0 Y = 28095 Brick ovirt3:/gluster/brick8/data 49157 0 Y = 28418 Brick ovirt1:/gluster/brick3/data 49152 0 Y = 23139 Brick ovirt1:/gluster/brick4/data 49153 0 Y = 23145 Brick ovirt1:/gluster/brick5/data 49154 0 Y = 23152 Brick ovirt1:/gluster/brick6/data 49155 0 Y = 23159 Brick ovirt1:/gluster/brick7/data 49156 0 Y = 23166 Brick ovirt1:/gluster/brick8/data 49157 0 Y = 23173 Self-heal Daemon on localhost N/A N/A Y = 7757=20 Bitrot Daemon on localhost N/A N/A Y = 7766=20 Scrubber Daemon on localhost N/A N/A Y = 7785=20 Self-heal Daemon on ovirt2 N/A N/A Y = 8205=20 Bitrot Daemon on ovirt2 N/A N/A Y = 8216=20 Scrubber Daemon on ovirt2 N/A N/A Y = 8227=20 Self-heal Daemon on ovirt0 N/A N/A Y = 32665 Bitrot Daemon on ovirt0 N/A N/A Y = 32674 Scrubber Daemon on ovirt0 N/A N/A Y = 32712 Self-heal Daemon on ovirt1 N/A N/A Y = 31759 Bitrot Daemon on ovirt1 N/A N/A Y = 31768 Scrubber Daemon on ovirt1 N/A N/A Y = 31790 =20 Task Status of Volume data = --------------------------------------------------------------------------=
Task : Rebalance =20 ID : 62942ba3-db9e-4604-aa03-4970767f4d67 Status : completed =20 =20 Status of volume: engine Gluster process TCP Port RDMA Port = Online Pid = --------------------------------------------------------------------------=
Brick ovirt0:/gluster/brick1/engine 49158 0 Y = 9155=20 Brick ovirt2:/gluster/brick1/engine 49158 0 Y = 28107 Brick ovirt3:/gluster/brick1/engine 49158 0 Y = 28427 Self-heal Daemon on localhost N/A N/A Y = 7757=20 Self-heal Daemon on ovirt1 N/A N/A Y = 31759 Self-heal Daemon on ovirt0 N/A N/A Y = 32665 Self-heal Daemon on ovirt2 N/A N/A Y = 8205=20 =20 Task Status of Volume engine = --------------------------------------------------------------------------=
There are no active volume tasks =20 Status of volume: iso Gluster process TCP Port RDMA Port = Online Pid = --------------------------------------------------------------------------=
Brick ovirt0:/gluster/brick2/iso 49159 0 Y = 9164=20 Brick ovirt2:/gluster/brick2/iso 49159 0 Y = 28116 Brick ovirt3:/gluster/brick2/iso 49159 0 Y = 28436 NFS Server on localhost 2049 0 Y = 7746=20 Self-heal Daemon on localhost N/A N/A Y = 7757=20 NFS Server on ovirt1 2049 0 Y = 31748 Self-heal Daemon on ovirt1 N/A N/A Y = 31759 NFS Server on ovirt0 2049 0 Y = 32656 Self-heal Daemon on ovirt0 N/A N/A Y = 32665 NFS Server on ovirt2 2049 0 Y = 8194=20 Self-heal Daemon on ovirt2 N/A N/A Y = 8205=20 =20 Task Status of Volume iso = --------------------------------------------------------------------------=
There are no active volume tasks =20 =20 _______________________________________________ Users mailing list Users@ovirt.org <mailto:Users@ovirt.org> http://lists.ovirt.org/mailman/listinfo/users = <http://lists.ovirt.org/mailman/listinfo/users> =20 =20 _______________________________________________ Users mailing list Users@ovirt.org http://lists.ovirt.org/mailman/listinfo/users
--Apple-Mail=_F4CB6E88-1C71-4B40-801A-29625F33DF0D Content-Transfer-Encoding: quoted-printable Content-Type: text/html; charset=utf-8 <html><head><meta http-equiv=3D"Content-Type" content=3D"text/html; = charset=3Dutf-8"></head><body style=3D"word-wrap: break-word; = -webkit-nbsp-mode: space; line-break: after-white-space;" = class=3D"">I=E2=80=99ve also encounter something similar on my setup, = ovirt 3.1.9 with a gluster 3.12.3 storage cluster. All the storage = domains in question are setup as gluster volumes & sharded, and = I=E2=80=99ve enabled libgfapi support in the engine. It=E2=80=99s = happened primarily to VMs that haven=E2=80=99t been restarted to switch = to gfapi yet (still have fuse mounts for these), but one or two VMs that = have been switched to gfapi mounts as well.<div class=3D""><br = class=3D""></div><div class=3D"">I started updating the storage cluster = to gluster 3.12.6 yesterday and got more annoying/bad behavior as well. = Many VMs that were =E2=80=9Chigh disk use=E2=80=9D VMs experienced = hangs, but not as storage related pauses. Instead, they hang and their = watchdogs eventually reported CPU hangs. All did eventually resume = normal operation, but it was annoying, to be sure. The Ovirt Engine also = lost contact with all of my VMs (unknown status, ? in GUI), even though = it still had contact with the hosts. My gluster cluster reported no = errors, volume status was normal, and all peers and bricks were = connected. Didn=E2=80=99t see anything in the gluster logs that = indicated problems, but there were reports of failed heals that = eventually went away. </div><div class=3D""><br class=3D""></div><div= class=3D"">Seems like something in vdsm and/or libgfapi isn=E2=80=99t = handling the gfapi mounts well during healing and the related locks, but = I can=E2=80=99t tell what it is. I=E2=80=99ve got two more servers in = the cluster to upgrade to 3.12.6 yet, and I=E2=80=99ll keep an eye on = more logs while I=E2=80=99m doing it, will report on it after I get more = info.</div><div class=3D""><br class=3D""><div><blockquote type=3D"cite" = class=3D""></blockquote> -Darrell<br class=3D""><blockquote = type=3D"cite" class=3D""><hr style=3D"border:none;border-top:solid = #B5C4DF 1.0pt;padding:0 0 0 0;margin:10px 0 5px 0;" class=3D""><span = style=3D"margin: -1.3px 0.0px 0.0px 0.0px" id=3D"RwhHeaderAttributes" = class=3D""><font face=3D"Helvetica" size=3D"4" color=3D"#000000" = style=3D"font: 13.0px Helvetica; color: #000000" class=3D""><b = class=3D"">From:</b> Sahina Bose <<a href=3D"mailto:sabose@redhat.com" = class=3D"">sabose@redhat.com</a>></font></span><br class=3D""> <span style=3D"margin: -1.3px 0.0px 0.0px 0.0px" class=3D""><font = face=3D"Helvetica" size=3D"4" color=3D"#000000" style=3D"font: 13.0px = Helvetica; color: #000000" class=3D""><b class=3D"">Subject:</b> Re: = [ovirt-users] Ovirt vm's paused due to storage error</font></span><br = class=3D""> <span style=3D"margin: -1.3px 0.0px 0.0px 0.0px" class=3D""><font = face=3D"Helvetica" size=3D"4" color=3D"#000000" style=3D"font: 13.0px = Helvetica; color: #000000" class=3D""><b class=3D"">Date:</b> March 22, = 2018 at 4:56:13 AM CDT</font></span><br class=3D""> <span style=3D"margin: -1.3px 0.0px 0.0px 0.0px" class=3D""><font = face=3D"Helvetica" size=3D"4" color=3D"#000000" style=3D"font: 13.0px = Helvetica; color: #000000" class=3D""><b class=3D"">To:</b> Endre = Karlson</font></span><br class=3D""> <span style=3D"margin: -1.3px 0.0px 0.0px 0.0px" class=3D""><font = face=3D"Helvetica" size=3D"4" color=3D"#000000" style=3D"font: 13.0px = Helvetica; color: #000000" class=3D""><b class=3D"">Cc:</b> = users</font></span><br class=3D""> <br class=3D"Apple-interchange-newline"><div class=3D""><div dir=3D"ltr" = class=3D""><div class=3D"">Can you provide "gluster volume info" = and the mount logs of the data volume (I assume that this hosts = the vdisks for the VM's with storage error).<br class=3D""><br = class=3D""></div>Also vdsm.log at the corresponding time.<br = class=3D""></div><div class=3D"gmail_extra"><br class=3D""><div = class=3D"gmail_quote">On Fri, Mar 16, 2018 at 3:45 AM, Endre Karlson = <span dir=3D"ltr" class=3D""><<a = href=3D"mailto:endre.karlson@gmail.com" target=3D"_blank" = class=3D"">endre.karlson@gmail.com</a>></span> wrote:<br = class=3D""><blockquote class=3D"gmail_quote" style=3D"margin:0 0 0 = .8ex;border-left:1px #ccc solid;padding-left:1ex"><div dir=3D"ltr" = class=3D"">Hi, this is is here again and we are getting several vm's = going into storage error in our 4 node cluster running on centos 7.4 = with gluster and ovirt 4.2.1.<div class=3D""><br class=3D""></div><div = class=3D"">Gluster version: 3.12.6<br class=3D""></div><div class=3D""><br= class=3D""></div><div class=3D"">volume status</div><div class=3D""><div = class=3D"">[root@ovirt3 ~]# gluster volume status</div><div = class=3D"">Status of volume: data</div><div class=3D"">Gluster = process = TCP Port RDMA Port = Online Pid</div><div class=3D"">------------------------------<wbr = class=3D"">------------------------------<wbr = class=3D"">------------------</div><div class=3D"">Brick = ovirt0:/gluster/brick3/data = 49152 0 = Y 9102 </div><div class=3D"">Brick = ovirt2:/gluster/brick3/data = 49152 0 = Y 28063</div><div class=3D"">Brick = ovirt3:/gluster/brick3/data = 49152 0 = Y 28379</div><div class=3D"">Brick = ovirt0:/gluster/brick4/data = 49153 0 = Y 9111 </div><div class=3D"">Brick = ovirt2:/gluster/brick4/data = 49153 0 = Y 28069</div><div class=3D"">Brick = ovirt3:/gluster/brick4/data = 49153 0 = Y 28388</div><div class=3D"">Brick = ovirt0:/gluster/brick5/data = 49154 0 = Y 9120 </div><div class=3D"">Brick = ovirt2:/gluster/brick5/data = 49154 0 = Y 28075</div><div class=3D"">Brick = ovirt3:/gluster/brick5/data = 49154 0 = Y 28397</div><div class=3D"">Brick = ovirt0:/gluster/brick6/data = 49155 0 = Y 9129 </div><div class=3D"">Brick = ovirt2:/gluster/brick6_1/data = 49155 0 = Y 28081</div><div class=3D"">Brick = ovirt3:/gluster/brick6/data = 49155 0 = Y 28404</div><div class=3D"">Brick = ovirt0:/gluster/brick7/data = 49156 0 = Y 9138 </div><div class=3D"">Brick = ovirt2:/gluster/brick7/data = 49156 0 = Y 28089</div><div class=3D"">Brick = ovirt3:/gluster/brick7/data = 49156 0 = Y 28411</div><div class=3D"">Brick = ovirt0:/gluster/brick8/data = 49157 0 = Y 9145 </div><div class=3D"">Brick = ovirt2:/gluster/brick8/data = 49157 0 = Y 28095</div><div class=3D"">Brick = ovirt3:/gluster/brick8/data = 49157 0 = Y 28418</div><div class=3D"">Brick = ovirt1:/gluster/brick3/data = 49152 0 = Y 23139</div><div class=3D"">Brick = ovirt1:/gluster/brick4/data = 49153 0 = Y 23145</div><div class=3D"">Brick = ovirt1:/gluster/brick5/data = 49154 0 = Y 23152</div><div class=3D"">Brick = ovirt1:/gluster/brick6/data = 49155 0 = Y 23159</div><div class=3D"">Brick = ovirt1:/gluster/brick7/data = 49156 0 = Y 23166</div><div class=3D"">Brick = ovirt1:/gluster/brick8/data = 49157 0 = Y 23173</div><div class=3D"">Self-heal Daemon = on localhost = N/A N/A = Y 7757 </div><div class=3D"">Bitrot = Daemon on localhost = N/A N/A = Y 7766 </div><div = class=3D"">Scrubber Daemon on localhost = N/A N/A = Y 7785 </div><div = class=3D"">Self-heal Daemon on ovirt2 = N/A N/A = Y 8205 </div><div = class=3D"">Bitrot Daemon on ovirt2 = N/A = N/A Y = 8216 </div><div class=3D"">Scrubber Daemon on ovirt2 = N/A = N/A Y = 8227 </div><div class=3D"">Self-heal Daemon on ovirt0 = N/A = N/A Y = 32665</div><div class=3D"">Bitrot Daemon on ovirt0 = N/A = N/A Y = 32674</div><div class=3D"">Scrubber Daemon on ovirt0 = N/A = N/A Y = 32712</div><div class=3D"">Self-heal Daemon on ovirt1 = N/A = N/A Y = 31759</div><div class=3D"">Bitrot Daemon on ovirt1 = N/A = N/A Y = 31768</div><div class=3D"">Scrubber Daemon on ovirt1 = N/A = N/A Y = 31790</div><div class=3D""> </div><div class=3D"">Task Status = of Volume data</div><div class=3D"">------------------------------<wbr = class=3D"">------------------------------<wbr = class=3D"">------------------</div><div class=3D"">Task = : Rebalance = </div><div class=3D"">ID = : = 62942ba3-db9e-4604-aa03-<wbr class=3D"">4970767f4d67</div><div = class=3D"">Status = : completed </div><div = class=3D""> </div><div class=3D"">Status of volume: = engine</div><div class=3D"">Gluster process = = TCP Port RDMA Port Online Pid</div><div = class=3D"">------------------------------<wbr = class=3D"">------------------------------<wbr = class=3D"">------------------</div><div class=3D"">Brick = ovirt0:/gluster/brick1/engine = 49158 0 = Y 9155 </div><div class=3D"">Brick = ovirt2:/gluster/brick1/engine = 49158 0 = Y 28107</div><div class=3D"">Brick = ovirt3:/gluster/brick1/engine = 49158 0 = Y 28427</div><div class=3D"">Self-heal Daemon = on localhost = N/A N/A = Y 7757 </div><div class=3D"">Self-heal = Daemon on ovirt1 = N/A N/A = Y 31759</div><div class=3D"">Self-heal Daemon = on ovirt0 = N/A N/A Y = 32665</div><div class=3D"">Self-heal Daemon on = ovirt2 = N/A N/A Y = 8205 </div><div class=3D""> </div><div = class=3D"">Task Status of Volume engine</div><div = class=3D"">------------------------------<wbr = class=3D"">------------------------------<wbr = class=3D"">------------------</div><div class=3D"">There are no active = volume tasks</div><div class=3D""> </div><div class=3D"">Status of = volume: iso</div><div class=3D"">Gluster process = = TCP Port RDMA Port Online Pid</div><div = class=3D"">------------------------------<wbr = class=3D"">------------------------------<wbr = class=3D"">------------------</div><div class=3D"">Brick = ovirt0:/gluster/brick2/iso = 49159 0 Y = 9164 </div><div class=3D"">Brick = ovirt2:/gluster/brick2/iso = 49159 0 Y = 28116</div><div class=3D"">Brick = ovirt3:/gluster/brick2/iso = 49159 0 Y = 28436</div><div class=3D"">NFS Server on = localhost = 2049 0 = Y 7746 </div><div = class=3D"">Self-heal Daemon on localhost = N/A N/A = Y 7757 </div><div = class=3D"">NFS Server on ovirt1 = 2049 = 0 Y = 31748</div><div class=3D"">Self-heal Daemon on ovirt1 = N/A = N/A Y = 31759</div><div class=3D"">NFS Server on ovirt0 = = 2049 0 Y = 32656</div><div class=3D"">Self-heal Daemon on = ovirt0 = N/A N/A Y = 32665</div><div class=3D"">NFS Server on = ovirt2 = 2049 0 = Y 8194 </div><div = class=3D"">Self-heal Daemon on ovirt2 = N/A N/A = Y 8205 </div><div = class=3D""> </div><div class=3D"">Task Status of Volume = iso</div><div class=3D"">------------------------------<wbr = class=3D"">------------------------------<wbr = class=3D"">------------------</div><div class=3D"">There are no active = volume tasks</div></div><div class=3D""><br class=3D""></div></div> <br class=3D"">______________________________<wbr = class=3D"">_________________<br class=3D""> Users mailing list<br class=3D""> <a href=3D"mailto:Users@ovirt.org" class=3D"">Users@ovirt.org</a><br = class=3D""> <a href=3D"http://lists.ovirt.org/mailman/listinfo/users" = rel=3D"noreferrer" target=3D"_blank" = class=3D"">http://lists.ovirt.org/<wbr = class=3D"">mailman/listinfo/users</a><br class=3D""> <br class=3D""></blockquote></div><br class=3D""></div> _______________________________________________<br class=3D"">Users = mailing list<br class=3D""><a href=3D"mailto:Users@ovirt.org" = class=3D"">Users@ovirt.org</a><br = class=3D"">http://lists.ovirt.org/mailman/listinfo/users<br = class=3D""></div></blockquote></div><br class=3D""></div></body></html>= --Apple-Mail=_F4CB6E88-1C71-4B40-801A-29625F33DF0D--

From: Darrell Budic <budic@onholyground.com> Subject: Re: [ovirt-users] Ovirt vm's paused due to storage error Date: March 22, 2018 at 1:23:29 PM CDT To: users =20 I=E2=80=99ve also encounter something similar on my setup, ovirt 3.1.9 = with a gluster 3.12.3 storage cluster. All the storage domains in = question are setup as gluster volumes & sharded, and I=E2=80=99ve = enabled libgfapi support in the engine. It=E2=80=99s happened primarily = to VMs that haven=E2=80=99t been restarted to switch to gfapi yet (still = have fuse mounts for these), but one or two VMs that have been switched = to gfapi mounts as well. =20 I started updating the storage cluster to gluster 3.12.6 yesterday and = got more annoying/bad behavior as well. Many VMs that were =E2=80=9Chigh = disk use=E2=80=9D VMs experienced hangs, but not as storage related =
=20 Seems like something in vdsm and/or libgfapi isn=E2=80=99t handling =
--Apple-Mail=_AA0C2142-15F7-48E1-B6CB-478CFAF1E9D2 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset=utf-8 Found (and caused) my problem.=20 I=E2=80=99d been evaluating different settings for (default settings = shown): cluster.shd-max-threads 1 = =20 cluster.shd-wait-qlength 1024 = =20 and had forgotten to reset them after testing. I had them at max-thread = 8 and qlength 10000. It worked in that the cluster healed in approximately half the time, and = was a total failure in that my cluster experienced IO pauses and at = least one VM abnormal shutdown.=20 I have 6 core processers in these boxes, and it looks like I just = overloaded them to the point that normal IO wasn=E2=80=99t getting = serviced because the self-heal was getting too much priority. I=E2=80=99ve= reverted to the defaults for these, and things are now behaving = normally, no pauses during healing at all. Moral of the story is don=E2=80=99t forget to undo testing settings when = done, and really don=E2=80=99t test extreme settings in production! Back to upgrading my test cluster so I can properly abuse things like = this. -Darrell pauses. Instead, they hang and their watchdogs eventually reported CPU = hangs. All did eventually resume normal operation, but it was annoying, = to be sure. The Ovirt Engine also lost contact with all of my VMs = (unknown status, ? in GUI), even though it still had contact with the = hosts. My gluster cluster reported no errors, volume status was normal, = and all peers and bricks were connected. Didn=E2=80=99t see anything in = the gluster logs that indicated problems, but there were reports of = failed heals that eventually went away.=20 the gfapi mounts well during healing and the related locks, but I = can=E2=80=99t tell what it is. I=E2=80=99ve got two more servers in the = cluster to upgrade to 3.12.6 yet, and I=E2=80=99ll keep an eye on more = logs while I=E2=80=99m doing it, will report on it after I get more = info.
=20 -Darrell
From: Sahina Bose <sabose@redhat.com <mailto:sabose@redhat.com>> Subject: Re: [ovirt-users] Ovirt vm's paused due to storage error Date: March 22, 2018 at 4:56:13 AM CDT To: Endre Karlson Cc: users =20 Can you provide "gluster volume info" and the mount logs of the data = volume (I assume that this hosts the vdisks for the VM's with storage = error). =20 Also vdsm.log at the corresponding time. =20 On Fri, Mar 16, 2018 at 3:45 AM, Endre Karlson = <endre.karlson@gmail.com <mailto:endre.karlson@gmail.com>> wrote: Hi, this is is here again and we are getting several vm's going into = storage error in our 4 node cluster running on centos 7.4 with gluster = and ovirt 4.2.1. =20 Gluster version: 3.12.6 =20 volume status [root@ovirt3 ~]# gluster volume status Status of volume: data Gluster process TCP Port RDMA Port = Online Pid = --------------------------------------------------------------------------=
Brick ovirt0:/gluster/brick3/data 49152 0 Y = 9102=20 Brick ovirt2:/gluster/brick3/data 49152 0 Y = 28063 Brick ovirt3:/gluster/brick3/data 49152 0 Y = 28379 Brick ovirt0:/gluster/brick4/data 49153 0 Y = 9111=20 Brick ovirt2:/gluster/brick4/data 49153 0 Y = 28069 Brick ovirt3:/gluster/brick4/data 49153 0 Y = 28388 Brick ovirt0:/gluster/brick5/data 49154 0 Y = 9120=20 Brick ovirt2:/gluster/brick5/data 49154 0 Y = 28075 Brick ovirt3:/gluster/brick5/data 49154 0 Y = 28397 Brick ovirt0:/gluster/brick6/data 49155 0 Y = 9129=20 Brick ovirt2:/gluster/brick6_1/data 49155 0 Y = 28081 Brick ovirt3:/gluster/brick6/data 49155 0 Y = 28404 Brick ovirt0:/gluster/brick7/data 49156 0 Y = 9138=20 Brick ovirt2:/gluster/brick7/data 49156 0 Y = 28089 Brick ovirt3:/gluster/brick7/data 49156 0 Y = 28411 Brick ovirt0:/gluster/brick8/data 49157 0 Y = 9145=20 Brick ovirt2:/gluster/brick8/data 49157 0 Y = 28095 Brick ovirt3:/gluster/brick8/data 49157 0 Y = 28418 Brick ovirt1:/gluster/brick3/data 49152 0 Y = 23139 Brick ovirt1:/gluster/brick4/data 49153 0 Y = 23145 Brick ovirt1:/gluster/brick5/data 49154 0 Y = 23152 Brick ovirt1:/gluster/brick6/data 49155 0 Y = 23159 Brick ovirt1:/gluster/brick7/data 49156 0 Y = 23166 Brick ovirt1:/gluster/brick8/data 49157 0 Y = 23173 Self-heal Daemon on localhost N/A N/A Y = 7757=20 Bitrot Daemon on localhost N/A N/A Y = 7766=20 Scrubber Daemon on localhost N/A N/A Y = 7785=20 Self-heal Daemon on ovirt2 N/A N/A Y = 8205=20 Bitrot Daemon on ovirt2 N/A N/A Y = 8216=20 Scrubber Daemon on ovirt2 N/A N/A Y = 8227=20 Self-heal Daemon on ovirt0 N/A N/A Y = 32665 Bitrot Daemon on ovirt0 N/A N/A Y = 32674 Scrubber Daemon on ovirt0 N/A N/A Y = 32712 Self-heal Daemon on ovirt1 N/A N/A Y = 31759 Bitrot Daemon on ovirt1 N/A N/A Y = 31768 Scrubber Daemon on ovirt1 N/A N/A Y = 31790 =20 Task Status of Volume data = --------------------------------------------------------------------------=
Task : Rebalance =20 ID : 62942ba3-db9e-4604-aa03-4970767f4d67 Status : completed =20 =20 Status of volume: engine Gluster process TCP Port RDMA Port = Online Pid = --------------------------------------------------------------------------=
Brick ovirt0:/gluster/brick1/engine 49158 0 Y = 9155=20 Brick ovirt2:/gluster/brick1/engine 49158 0 Y = 28107 Brick ovirt3:/gluster/brick1/engine 49158 0 Y = 28427 Self-heal Daemon on localhost N/A N/A Y = 7757=20 Self-heal Daemon on ovirt1 N/A N/A Y = 31759 Self-heal Daemon on ovirt0 N/A N/A Y = 32665 Self-heal Daemon on ovirt2 N/A N/A Y = 8205=20 =20 Task Status of Volume engine = --------------------------------------------------------------------------=
There are no active volume tasks =20 Status of volume: iso Gluster process TCP Port RDMA Port = Online Pid = --------------------------------------------------------------------------=
Brick ovirt0:/gluster/brick2/iso 49159 0 Y = 9164=20 Brick ovirt2:/gluster/brick2/iso 49159 0 Y = 28116 Brick ovirt3:/gluster/brick2/iso 49159 0 Y = 28436 NFS Server on localhost 2049 0 Y = 7746=20 Self-heal Daemon on localhost N/A N/A Y = 7757=20 NFS Server on ovirt1 2049 0 Y = 31748 Self-heal Daemon on ovirt1 N/A N/A Y = 31759 NFS Server on ovirt0 2049 0 Y = 32656 Self-heal Daemon on ovirt0 N/A N/A Y = 32665 NFS Server on ovirt2 2049 0 Y = 8194=20 Self-heal Daemon on ovirt2 N/A N/A Y = 8205=20 =20 Task Status of Volume iso = --------------------------------------------------------------------------=
There are no active volume tasks =20 =20 _______________________________________________ Users mailing list Users@ovirt.org <mailto:Users@ovirt.org> http://lists.ovirt.org/mailman/listinfo/users = <http://lists.ovirt.org/mailman/listinfo/users> =20 =20 _______________________________________________ Users mailing list Users@ovirt.org <mailto:Users@ovirt.org> http://lists.ovirt.org/mailman/listinfo/users =20
Users mailing list Users@ovirt.org http://lists.ovirt.org/mailman/listinfo/users
--Apple-Mail=_AA0C2142-15F7-48E1-B6CB-478CFAF1E9D2 Content-Transfer-Encoding: quoted-printable Content-Type: text/html; charset=utf-8 <html><head><meta http-equiv=3D"Content-Type" content=3D"text/html; = charset=3Dutf-8"></head><body style=3D"word-wrap: break-word; = -webkit-nbsp-mode: space; line-break: after-white-space;" class=3D"">Found= (and caused) my problem. <div class=3D""><br class=3D""></div><div = class=3D"">I=E2=80=99d been evaluating different settings for (default = settings shown):</div><div class=3D""><div style=3D"margin: 0px; = font-stretch: normal; line-height: normal;" class=3D""><span = style=3D"font-variant-ligatures: no-common-ligatures;" = class=3D"">cluster.</span><span style=3D"font-variant-ligatures: = no-common-ligatures;" class=3D""><span class=3D"">shd</span></span><span = style=3D"font-variant-ligatures: no-common-ligatures;" = class=3D"">-max-threads = 1 = = </span></div><div style=3D"margin: 0px; font-stretch: normal; = line-height: normal;" class=3D""><span style=3D"font-variant-ligatures: = no-common-ligatures" class=3D"">cluster.</span><span = style=3D"font-variant-ligatures: no-common-ligatures;" class=3D""><span = class=3D"">shd</span></span><span style=3D"font-variant-ligatures: = no-common-ligatures" class=3D"">-wait-qlength = 1024 = = </span></div><div class=3D""><span = style=3D"font-variant-ligatures: no-common-ligatures" class=3D""><br = class=3D""></span></div><div class=3D""><span = style=3D"font-variant-ligatures: no-common-ligatures" class=3D"">and had = forgotten to reset them after testing. I had them at max-thread 8 and = qlength 10000.</span></div><div class=3D""><span = style=3D"font-variant-ligatures: no-common-ligatures" class=3D""><br = class=3D""></span></div><div class=3D""><span = style=3D"font-variant-ligatures: no-common-ligatures" class=3D"">It = worked in that the cluster healed in approximately half the time, and = was a total failure in that my cluster experienced IO pauses and at = least one VM abnormal shutdown. </span></div><div class=3D""><span = style=3D"font-variant-ligatures: no-common-ligatures" class=3D""><br = class=3D""></span></div><div class=3D""><span = style=3D"font-variant-ligatures: no-common-ligatures" class=3D"">I have = 6 core processers in these boxes, and it looks like I just overloaded = them to the point that normal IO wasn=E2=80=99t getting serviced because = the self-heal was getting too much priority. I=E2=80=99ve reverted to = the defaults for these, and things are now behaving normally, no pauses = during healing at all.</span></div><div class=3D""><span = style=3D"font-variant-ligatures: no-common-ligatures" class=3D""><br = class=3D""></span></div><div class=3D""><span = style=3D"font-variant-ligatures: no-common-ligatures" class=3D"">Moral = of the story is don=E2=80=99t forget to undo testing settings when done, = and really don=E2=80=99t test extreme settings in = production!</span></div><div class=3D""><span = style=3D"font-variant-ligatures: no-common-ligatures" class=3D""><br = class=3D""></span></div><div class=3D""><span = style=3D"font-variant-ligatures: no-common-ligatures" class=3D"">Back to = upgrading my test cluster so I can properly abuse things like = this.</span></div><div class=3D""><span style=3D"font-variant-ligatures: = no-common-ligatures" class=3D""><br class=3D""></span></div><div = class=3D""><span style=3D"font-variant-ligatures: no-common-ligatures" = class=3D""> -Darrell</span></div><div><blockquote type=3D"cite" = class=3D""><hr style=3D"border:none;border-top:solid #B5C4DF = 1.0pt;padding:0 0 0 0;margin:10px 0 5px 0;" class=3D""><span = style=3D"margin: -1.3px 0.0px 0.0px 0.0px" id=3D"RwhHeaderAttributes" = class=3D""><font face=3D"Helvetica" size=3D"4" color=3D"#000000" = style=3D"font: 13.0px Helvetica; color: #000000" class=3D""><b = class=3D"">From:</b> Darrell Budic <<a = href=3D"mailto:budic@onholyground.com" = class=3D"">budic@onholyground.com</a>></font></span><br class=3D""> <span style=3D"margin: -1.3px 0.0px 0.0px 0.0px" class=3D""><font = face=3D"Helvetica" size=3D"4" color=3D"#000000" style=3D"font: 13.0px = Helvetica; color: #000000" class=3D""><b class=3D"">Subject:</b> Re: = [ovirt-users] Ovirt vm's paused due to storage error</font></span><br = class=3D""> <span style=3D"margin: -1.3px 0.0px 0.0px 0.0px" class=3D""><font = face=3D"Helvetica" size=3D"4" color=3D"#000000" style=3D"font: 13.0px = Helvetica; color: #000000" class=3D""><b class=3D"">Date:</b> March 22, = 2018 at 1:23:29 PM CDT</font></span><br class=3D""> <span style=3D"margin: -1.3px 0.0px 0.0px 0.0px" class=3D""><font = face=3D"Helvetica" size=3D"4" color=3D"#000000" style=3D"font: 13.0px = Helvetica; color: #000000" class=3D""><b class=3D"">To:</b> = users</font></span><br class=3D""> <br class=3D"Apple-interchange-newline"><div class=3D""><meta = http-equiv=3D"Content-Type" content=3D"text/html; charset=3Dutf-8" = class=3D""><div style=3D"word-wrap: break-word; -webkit-nbsp-mode: = space; line-break: after-white-space;" class=3D"">I=E2=80=99ve also = encounter something similar on my setup, ovirt 3.1.9 with a gluster = 3.12.3 storage cluster. All the storage domains in question are setup as = gluster volumes & sharded, and I=E2=80=99ve enabled libgfapi support = in the engine. It=E2=80=99s happened primarily to VMs that haven=E2=80=99t= been restarted to switch to gfapi yet (still have fuse mounts for = these), but one or two VMs that have been switched to gfapi mounts as = well.<div class=3D""><br class=3D""></div><div class=3D"">I started = updating the storage cluster to gluster 3.12.6 yesterday and got more = annoying/bad behavior as well. Many VMs that were =E2=80=9Chigh disk = use=E2=80=9D VMs experienced hangs, but not as storage related pauses. = Instead, they hang and their watchdogs eventually reported CPU hangs. = All did eventually resume normal operation, but it was annoying, to be = sure. The Ovirt Engine also lost contact with all of my VMs (unknown = status, ? in GUI), even though it still had contact with the hosts. My = gluster cluster reported no errors, volume status was normal, and all = peers and bricks were connected. Didn=E2=80=99t see anything in the = gluster logs that indicated problems, but there were reports of failed = heals that eventually went away. </div><div class=3D""><br = class=3D""></div><div class=3D"">Seems like something in vdsm and/or = libgfapi isn=E2=80=99t handling the gfapi mounts well during healing and = the related locks, but I can=E2=80=99t tell what it is. I=E2=80=99ve got = two more servers in the cluster to upgrade to 3.12.6 yet, and I=E2=80=99ll= keep an eye on more logs while I=E2=80=99m doing it, will report on it = after I get more info.</div><div class=3D""><br class=3D""><div = class=3D""><blockquote type=3D"cite" class=3D""></blockquote> = -Darrell<br class=3D""><blockquote type=3D"cite" class=3D""><hr = style=3D"border:none;border-top:solid #B5C4DF 1.0pt;padding:0 0 0 = 0;margin:10px 0 5px 0;" class=3D""><span style=3D"margin: -1.3px 0.0px = 0.0px 0.0px" id=3D"RwhHeaderAttributes" class=3D""><font = face=3D"Helvetica" size=3D"4" style=3D"font-style: normal; = font-variant-caps: normal; font-weight: normal; font-stretch: normal; = font-size: 13px; line-height: normal; font-family: Helvetica;" = class=3D""><b class=3D"">From:</b> Sahina Bose <<a = href=3D"mailto:sabose@redhat.com" = class=3D"">sabose@redhat.com</a>></font></span><br class=3D""> <span style=3D"margin: -1.3px 0.0px 0.0px 0.0px" class=3D""><font = face=3D"Helvetica" size=3D"4" style=3D"font-style: normal; = font-variant-caps: normal; font-weight: normal; font-stretch: normal; = font-size: 13px; line-height: normal; font-family: Helvetica;" = class=3D""><b class=3D"">Subject:</b> Re: [ovirt-users] Ovirt vm's = paused due to storage error</font></span><br class=3D""> <span style=3D"margin: -1.3px 0.0px 0.0px 0.0px" class=3D""><font = face=3D"Helvetica" size=3D"4" style=3D"font-style: normal; = font-variant-caps: normal; font-weight: normal; font-stretch: normal; = font-size: 13px; line-height: normal; font-family: Helvetica;" = class=3D""><b class=3D"">Date:</b> March 22, 2018 at 4:56:13 AM = CDT</font></span><br class=3D""> <span style=3D"margin: -1.3px 0.0px 0.0px 0.0px" class=3D""><font = face=3D"Helvetica" size=3D"4" style=3D"font-style: normal; = font-variant-caps: normal; font-weight: normal; font-stretch: normal; = font-size: 13px; line-height: normal; font-family: Helvetica;" = class=3D""><b class=3D"">To:</b> Endre Karlson</font></span><br = class=3D""> <span style=3D"margin: -1.3px 0.0px 0.0px 0.0px" class=3D""><font = face=3D"Helvetica" size=3D"4" style=3D"font-style: normal; = font-variant-caps: normal; font-weight: normal; font-stretch: normal; = font-size: 13px; line-height: normal; font-family: Helvetica;" = class=3D""><b class=3D"">Cc:</b> users</font></span><br class=3D""> <br class=3D"Apple-interchange-newline"><div class=3D""><div dir=3D"ltr" = class=3D""><div class=3D"">Can you provide "gluster volume info" = and the mount logs of the data volume (I assume that this hosts = the vdisks for the VM's with storage error).<br class=3D""><br = class=3D""></div>Also vdsm.log at the corresponding time.<br = class=3D""></div><div class=3D"gmail_extra"><br class=3D""><div = class=3D"gmail_quote">On Fri, Mar 16, 2018 at 3:45 AM, Endre Karlson = <span dir=3D"ltr" class=3D""><<a = href=3D"mailto:endre.karlson@gmail.com" target=3D"_blank" = class=3D"">endre.karlson@gmail.com</a>></span> wrote:<br = class=3D""><blockquote style=3D"margin:0 0 0 .8ex;border-left:1px #ccc = solid;padding-left:1ex" class=3D"gmail_quote"><div dir=3D"ltr" = class=3D"">Hi, this is is here again and we are getting several vm's = going into storage error in our 4 node cluster running on centos 7.4 = with gluster and ovirt 4.2.1.<div class=3D""><br class=3D""></div><div = class=3D"">Gluster version: 3.12.6<br class=3D""></div><div class=3D""><br= class=3D""></div><div class=3D"">volume status</div><div class=3D""><div = class=3D"">[root@ovirt3 ~]# gluster volume status</div><div = class=3D"">Status of volume: data</div><div class=3D"">Gluster = process = TCP Port RDMA Port = Online Pid</div><div class=3D"">------------------------------<wbr = class=3D"">------------------------------<wbr = class=3D"">------------------</div><div class=3D"">Brick = ovirt0:/gluster/brick3/data = 49152 0 = Y 9102 </div><div class=3D"">Brick = ovirt2:/gluster/brick3/data = 49152 0 = Y 28063</div><div class=3D"">Brick = ovirt3:/gluster/brick3/data = 49152 0 = Y 28379</div><div class=3D"">Brick = ovirt0:/gluster/brick4/data = 49153 0 = Y 9111 </div><div class=3D"">Brick = ovirt2:/gluster/brick4/data = 49153 0 = Y 28069</div><div class=3D"">Brick = ovirt3:/gluster/brick4/data = 49153 0 = Y 28388</div><div class=3D"">Brick = ovirt0:/gluster/brick5/data = 49154 0 = Y 9120 </div><div class=3D"">Brick = ovirt2:/gluster/brick5/data = 49154 0 = Y 28075</div><div class=3D"">Brick = ovirt3:/gluster/brick5/data = 49154 0 = Y 28397</div><div class=3D"">Brick = ovirt0:/gluster/brick6/data = 49155 0 = Y 9129 </div><div class=3D"">Brick = ovirt2:/gluster/brick6_1/data = 49155 0 = Y 28081</div><div class=3D"">Brick = ovirt3:/gluster/brick6/data = 49155 0 = Y 28404</div><div class=3D"">Brick = ovirt0:/gluster/brick7/data = 49156 0 = Y 9138 </div><div class=3D"">Brick = ovirt2:/gluster/brick7/data = 49156 0 = Y 28089</div><div class=3D"">Brick = ovirt3:/gluster/brick7/data = 49156 0 = Y 28411</div><div class=3D"">Brick = ovirt0:/gluster/brick8/data = 49157 0 = Y 9145 </div><div class=3D"">Brick = ovirt2:/gluster/brick8/data = 49157 0 = Y 28095</div><div class=3D"">Brick = ovirt3:/gluster/brick8/data = 49157 0 = Y 28418</div><div class=3D"">Brick = ovirt1:/gluster/brick3/data = 49152 0 = Y 23139</div><div class=3D"">Brick = ovirt1:/gluster/brick4/data = 49153 0 = Y 23145</div><div class=3D"">Brick = ovirt1:/gluster/brick5/data = 49154 0 = Y 23152</div><div class=3D"">Brick = ovirt1:/gluster/brick6/data = 49155 0 = Y 23159</div><div class=3D"">Brick = ovirt1:/gluster/brick7/data = 49156 0 = Y 23166</div><div class=3D"">Brick = ovirt1:/gluster/brick8/data = 49157 0 = Y 23173</div><div class=3D"">Self-heal Daemon = on localhost = N/A N/A = Y 7757 </div><div class=3D"">Bitrot = Daemon on localhost = N/A N/A = Y 7766 </div><div = class=3D"">Scrubber Daemon on localhost = N/A N/A = Y 7785 </div><div = class=3D"">Self-heal Daemon on ovirt2 = N/A N/A = Y 8205 </div><div = class=3D"">Bitrot Daemon on ovirt2 = N/A = N/A Y = 8216 </div><div class=3D"">Scrubber Daemon on ovirt2 = N/A = N/A Y = 8227 </div><div class=3D"">Self-heal Daemon on ovirt0 = N/A = N/A Y = 32665</div><div class=3D"">Bitrot Daemon on ovirt0 = N/A = N/A Y = 32674</div><div class=3D"">Scrubber Daemon on ovirt0 = N/A = N/A Y = 32712</div><div class=3D"">Self-heal Daemon on ovirt1 = N/A = N/A Y = 31759</div><div class=3D"">Bitrot Daemon on ovirt1 = N/A = N/A Y = 31768</div><div class=3D"">Scrubber Daemon on ovirt1 = N/A = N/A Y = 31790</div><div class=3D""> </div><div class=3D"">Task Status = of Volume data</div><div class=3D"">------------------------------<wbr = class=3D"">------------------------------<wbr = class=3D"">------------------</div><div class=3D"">Task = : Rebalance = </div><div class=3D"">ID = : = 62942ba3-db9e-4604-aa03-<wbr class=3D"">4970767f4d67</div><div = class=3D"">Status = : completed </div><div = class=3D""> </div><div class=3D"">Status of volume: = engine</div><div class=3D"">Gluster process = = TCP Port RDMA Port Online Pid</div><div = class=3D"">------------------------------<wbr = class=3D"">------------------------------<wbr = class=3D"">------------------</div><div class=3D"">Brick = ovirt0:/gluster/brick1/engine = 49158 0 = Y 9155 </div><div class=3D"">Brick = ovirt2:/gluster/brick1/engine = 49158 0 = Y 28107</div><div class=3D"">Brick = ovirt3:/gluster/brick1/engine = 49158 0 = Y 28427</div><div class=3D"">Self-heal Daemon = on localhost = N/A N/A = Y 7757 </div><div class=3D"">Self-heal = Daemon on ovirt1 = N/A N/A = Y 31759</div><div class=3D"">Self-heal Daemon = on ovirt0 = N/A N/A Y = 32665</div><div class=3D"">Self-heal Daemon on = ovirt2 = N/A N/A Y = 8205 </div><div class=3D""> </div><div = class=3D"">Task Status of Volume engine</div><div = class=3D"">------------------------------<wbr = class=3D"">------------------------------<wbr = class=3D"">------------------</div><div class=3D"">There are no active = volume tasks</div><div class=3D""> </div><div class=3D"">Status of = volume: iso</div><div class=3D"">Gluster process = = TCP Port RDMA Port Online Pid</div><div = class=3D"">------------------------------<wbr = class=3D"">------------------------------<wbr = class=3D"">------------------</div><div class=3D"">Brick = ovirt0:/gluster/brick2/iso = 49159 0 Y = 9164 </div><div class=3D"">Brick = ovirt2:/gluster/brick2/iso = 49159 0 Y = 28116</div><div class=3D"">Brick = ovirt3:/gluster/brick2/iso = 49159 0 Y = 28436</div><div class=3D"">NFS Server on = localhost = 2049 0 = Y 7746 </div><div = class=3D"">Self-heal Daemon on localhost = N/A N/A = Y 7757 </div><div = class=3D"">NFS Server on ovirt1 = 2049 = 0 Y = 31748</div><div class=3D"">Self-heal Daemon on ovirt1 = N/A = N/A Y = 31759</div><div class=3D"">NFS Server on ovirt0 = = 2049 0 Y = 32656</div><div class=3D"">Self-heal Daemon on = ovirt0 = N/A N/A Y = 32665</div><div class=3D"">NFS Server on = ovirt2 = 2049 0 = Y 8194 </div><div = class=3D"">Self-heal Daemon on ovirt2 = N/A N/A = Y 8205 </div><div = class=3D""> </div><div class=3D"">Task Status of Volume = iso</div><div class=3D"">------------------------------<wbr = class=3D"">------------------------------<wbr = class=3D"">------------------</div><div class=3D"">There are no active = volume tasks</div></div><div class=3D""><br class=3D""></div></div> <br class=3D"">______________________________<wbr = class=3D"">_________________<br class=3D""> Users mailing list<br class=3D""> <a href=3D"mailto:Users@ovirt.org" class=3D"">Users@ovirt.org</a><br = class=3D""> <a href=3D"http://lists.ovirt.org/mailman/listinfo/users" = rel=3D"noreferrer" target=3D"_blank" = class=3D"">http://lists.ovirt.org/<wbr = class=3D"">mailman/listinfo/users</a><br class=3D""> <br class=3D""></blockquote></div><br class=3D""></div> _______________________________________________<br class=3D"">Users = mailing list<br class=3D""><a href=3D"mailto:Users@ovirt.org" = class=3D"">Users@ovirt.org</a><br class=3D""><a = href=3D"http://lists.ovirt.org/mailman/listinfo/users" = class=3D"">http://lists.ovirt.org/mailman/listinfo/users</a><br = class=3D""></div></blockquote></div><br = class=3D""></div></div>_______________________________________________<br = class=3D"">Users mailing list<br class=3D""><a = href=3D"mailto:Users@ovirt.org" class=3D"">Users@ovirt.org</a><br = class=3D"">http://lists.ovirt.org/mailman/listinfo/users<br = class=3D""></div></blockquote></div><br class=3D""></div></body></html>= --Apple-Mail=_AA0C2142-15F7-48E1-B6CB-478CFAF1E9D2--
participants (3)
-
Darrell Budic
-
Endre Karlson
-
Sahina Bose