<html><head><meta http-equiv="Content-Type" content="text/html; charset=utf-8"></head><body style="word-wrap: break-word; -webkit-nbsp-mode: space; line-break: after-white-space;" class="">I’ve also encounter something similar on my setup, ovirt 3.1.9 with a gluster 3.12.3 storage cluster. All the storage domains in question are setup as gluster volumes & sharded, and I’ve enabled libgfapi support in the engine. It’s happened primarily to VMs that haven’t been restarted to switch to gfapi yet (still have fuse mounts for these), but one or two VMs that have been switched to gfapi mounts as well.<div class=""><br class=""></div><div class="">I started updating the storage cluster to gluster 3.12.6 yesterday and got more annoying/bad behavior as well. Many VMs that were “high disk use” VMs experienced hangs, but not as storage related pauses. Instead, they hang and their watchdogs eventually reported CPU hangs. All did eventually resume normal operation, but it was annoying, to be sure. The Ovirt Engine also lost contact with all of my VMs (unknown status, ? in GUI), even though it still had contact with the hosts. My gluster cluster reported no errors, volume status was normal, and all peers and bricks were connected. Didn’t see anything in the gluster logs that indicated problems, but there were reports of failed heals that eventually went away. </div><div class=""><br class=""></div><div class="">Seems like something in vdsm and/or libgfapi isn’t handling the gfapi mounts well during healing and the related locks, but I can’t tell what it is. I’ve got two more servers in the cluster to upgrade to 3.12.6 yet, and I’ll keep an eye on more logs while I’m doing it, will report on it after I get more info.</div><div class=""><br class=""><div><blockquote type="cite" class=""></blockquote> -Darrell<br class=""><blockquote type="cite" class=""><hr style="border:none;border-top:solid #B5C4DF 1.0pt;padding:0 0 0 0;margin:10px 0 5px 0;" class=""><span style="margin: -1.3px 0.0px 0.0px 0.0px" id="RwhHeaderAttributes" class=""><font face="Helvetica" size="4" color="#000000" style="font: 13.0px Helvetica; color: #000000" class=""><b class="">From:</b> Sahina Bose <<a href="mailto:sabose@redhat.com" class="">sabose@redhat.com</a>></font></span><br class="">
<span style="margin: -1.3px 0.0px 0.0px 0.0px" class=""><font face="Helvetica" size="4" color="#000000" style="font: 13.0px Helvetica; color: #000000" class=""><b class="">Subject:</b> Re: [ovirt-users] Ovirt vm's paused due to storage error</font></span><br class="">
<span style="margin: -1.3px 0.0px 0.0px 0.0px" class=""><font face="Helvetica" size="4" color="#000000" style="font: 13.0px Helvetica; color: #000000" class=""><b class="">Date:</b> March 22, 2018 at 4:56:13 AM CDT</font></span><br class="">
<span style="margin: -1.3px 0.0px 0.0px 0.0px" class=""><font face="Helvetica" size="4" color="#000000" style="font: 13.0px Helvetica; color: #000000" class=""><b class="">To:</b> Endre Karlson</font></span><br class="">
<span style="margin: -1.3px 0.0px 0.0px 0.0px" class=""><font face="Helvetica" size="4" color="#000000" style="font: 13.0px Helvetica; color: #000000" class=""><b class="">Cc:</b> users</font></span><br class="">
<br class="Apple-interchange-newline"><div class=""><div dir="ltr" class=""><div class="">Can you provide "gluster volume info" and the mount logs of the data volume (I assume that this hosts the vdisks for the VM's with storage error).<br class=""><br class=""></div>Also vdsm.log at the corresponding time.<br class=""></div><div class="gmail_extra"><br class=""><div class="gmail_quote">On Fri, Mar 16, 2018 at 3:45 AM, Endre Karlson <span dir="ltr" class=""><<a href="mailto:endre.karlson@gmail.com" target="_blank" class="">endre.karlson@gmail.com</a>></span> wrote:<br class=""><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div dir="ltr" class="">Hi, this is is here again and we are getting several vm's going into storage error in our 4 node cluster running on centos 7.4 with gluster and ovirt 4.2.1.<div class=""><br class=""></div><div class="">Gluster version: 3.12.6<br class=""></div><div class=""><br class=""></div><div class="">volume status</div><div class=""><div class="">[root@ovirt3 ~]# gluster volume status</div><div class="">Status of volume: data</div><div class="">Gluster process TCP Port RDMA Port Online Pid</div><div class="">------------------------------<wbr class="">------------------------------<wbr class="">------------------</div><div class="">Brick ovirt0:/gluster/brick3/data 49152 0 Y 9102 </div><div class="">Brick ovirt2:/gluster/brick3/data 49152 0 Y 28063</div><div class="">Brick ovirt3:/gluster/brick3/data 49152 0 Y 28379</div><div class="">Brick ovirt0:/gluster/brick4/data 49153 0 Y 9111 </div><div class="">Brick ovirt2:/gluster/brick4/data 49153 0 Y 28069</div><div class="">Brick ovirt3:/gluster/brick4/data 49153 0 Y 28388</div><div class="">Brick ovirt0:/gluster/brick5/data 49154 0 Y 9120 </div><div class="">Brick ovirt2:/gluster/brick5/data 49154 0 Y 28075</div><div class="">Brick ovirt3:/gluster/brick5/data 49154 0 Y 28397</div><div class="">Brick ovirt0:/gluster/brick6/data 49155 0 Y 9129 </div><div class="">Brick ovirt2:/gluster/brick6_1/data 49155 0 Y 28081</div><div class="">Brick ovirt3:/gluster/brick6/data 49155 0 Y 28404</div><div class="">Brick ovirt0:/gluster/brick7/data 49156 0 Y 9138 </div><div class="">Brick ovirt2:/gluster/brick7/data 49156 0 Y 28089</div><div class="">Brick ovirt3:/gluster/brick7/data 49156 0 Y 28411</div><div class="">Brick ovirt0:/gluster/brick8/data 49157 0 Y 9145 </div><div class="">Brick ovirt2:/gluster/brick8/data 49157 0 Y 28095</div><div class="">Brick ovirt3:/gluster/brick8/data 49157 0 Y 28418</div><div class="">Brick ovirt1:/gluster/brick3/data 49152 0 Y 23139</div><div class="">Brick ovirt1:/gluster/brick4/data 49153 0 Y 23145</div><div class="">Brick ovirt1:/gluster/brick5/data 49154 0 Y 23152</div><div class="">Brick ovirt1:/gluster/brick6/data 49155 0 Y 23159</div><div class="">Brick ovirt1:/gluster/brick7/data 49156 0 Y 23166</div><div class="">Brick ovirt1:/gluster/brick8/data 49157 0 Y 23173</div><div class="">Self-heal Daemon on localhost N/A N/A Y 7757 </div><div class="">Bitrot Daemon on localhost N/A N/A Y 7766 </div><div class="">Scrubber Daemon on localhost N/A N/A Y 7785 </div><div class="">Self-heal Daemon on ovirt2 N/A N/A Y 8205 </div><div class="">Bitrot Daemon on ovirt2 N/A N/A Y 8216 </div><div class="">Scrubber Daemon on ovirt2 N/A N/A Y 8227 </div><div class="">Self-heal Daemon on ovirt0 N/A N/A Y 32665</div><div class="">Bitrot Daemon on ovirt0 N/A N/A Y 32674</div><div class="">Scrubber Daemon on ovirt0 N/A N/A Y 32712</div><div class="">Self-heal Daemon on ovirt1 N/A N/A Y 31759</div><div class="">Bitrot Daemon on ovirt1 N/A N/A Y 31768</div><div class="">Scrubber Daemon on ovirt1 N/A N/A Y 31790</div><div class=""> </div><div class="">Task Status of Volume data</div><div class="">------------------------------<wbr class="">------------------------------<wbr class="">------------------</div><div class="">Task : Rebalance </div><div class="">ID : 62942ba3-db9e-4604-aa03-<wbr class="">4970767f4d67</div><div class="">Status : completed </div><div class=""> </div><div class="">Status of volume: engine</div><div class="">Gluster process TCP Port RDMA Port Online Pid</div><div class="">------------------------------<wbr class="">------------------------------<wbr class="">------------------</div><div class="">Brick ovirt0:/gluster/brick1/engine 49158 0 Y 9155 </div><div class="">Brick ovirt2:/gluster/brick1/engine 49158 0 Y 28107</div><div class="">Brick ovirt3:/gluster/brick1/engine 49158 0 Y 28427</div><div class="">Self-heal Daemon on localhost N/A N/A Y 7757 </div><div class="">Self-heal Daemon on ovirt1 N/A N/A Y 31759</div><div class="">Self-heal Daemon on ovirt0 N/A N/A Y 32665</div><div class="">Self-heal Daemon on ovirt2 N/A N/A Y 8205 </div><div class=""> </div><div class="">Task Status of Volume engine</div><div class="">------------------------------<wbr class="">------------------------------<wbr class="">------------------</div><div class="">There are no active volume tasks</div><div class=""> </div><div class="">Status of volume: iso</div><div class="">Gluster process TCP Port RDMA Port Online Pid</div><div class="">------------------------------<wbr class="">------------------------------<wbr class="">------------------</div><div class="">Brick ovirt0:/gluster/brick2/iso 49159 0 Y 9164 </div><div class="">Brick ovirt2:/gluster/brick2/iso 49159 0 Y 28116</div><div class="">Brick ovirt3:/gluster/brick2/iso 49159 0 Y 28436</div><div class="">NFS Server on localhost 2049 0 Y 7746 </div><div class="">Self-heal Daemon on localhost N/A N/A Y 7757 </div><div class="">NFS Server on ovirt1 2049 0 Y 31748</div><div class="">Self-heal Daemon on ovirt1 N/A N/A Y 31759</div><div class="">NFS Server on ovirt0 2049 0 Y 32656</div><div class="">Self-heal Daemon on ovirt0 N/A N/A Y 32665</div><div class="">NFS Server on ovirt2 2049 0 Y 8194 </div><div class="">Self-heal Daemon on ovirt2 N/A N/A Y 8205 </div><div class=""> </div><div class="">Task Status of Volume iso</div><div class="">------------------------------<wbr class="">------------------------------<wbr class="">------------------</div><div class="">There are no active volume tasks</div></div><div class=""><br class=""></div></div>
<br class="">______________________________<wbr class="">_________________<br class="">
Users mailing list<br class="">
<a href="mailto:Users@ovirt.org" class="">Users@ovirt.org</a><br class="">
<a href="http://lists.ovirt.org/mailman/listinfo/users" rel="noreferrer" target="_blank" class="">http://lists.ovirt.org/<wbr class="">mailman/listinfo/users</a><br class="">
<br class=""></blockquote></div><br class=""></div>
_______________________________________________<br class="">Users mailing list<br class=""><a href="mailto:Users@ovirt.org" class="">Users@ovirt.org</a><br class="">http://lists.ovirt.org/mailman/listinfo/users<br class=""></div></blockquote></div><br class=""></div></body></html>