VMs freeze during vm-host reboot

Hi, I have a 3-node CentOS7 based oVirt+replica-3 gluster environment with an engine on dedicated hardware. After putting the first vm-host into maintenance mode to update it from vdsm-4.17.28-1 to vdsm-4.17.32-0 and from glusterfs-3.7.11-1 to glusterfs-3.7.12-2 (among others), random VMs froze (not pauzed. oVirt showed them as 'up') until the update was done and the vm-host was rebooted and active again. After all the vm-hosts were upgraded, I never experienced the problem again. Can this be a bug, fixed with the upgrade to glusterfs-3.7.12-2? Has anyone experienced the same problem? Thanks in advance! (next week I'm not able to check my e-mail, so response can be delayed). Regards, Bertjan ------------------------------------------------------------------------------ De informatie opgenomen in dit bericht kan vertrouwelijk zijn en is uitsluitend bestemd voor de geadresseerde. Indien u dit bericht onterecht ontvangt, wordt u verzocht de inhoud niet te gebruiken en de afzender direct te informeren door het bericht te retourneren. Het Universitair Medisch Centrum Utrecht is een publiekrechtelijke rechtspersoon in de zin van de W.H.W. (Wet Hoger Onderwijs en Wetenschappelijk Onderzoek) en staat geregistreerd bij de Kamer van Koophandel voor Midden-Nederland onder nr. 30244197. Denk s.v.p aan het milieu voor u deze e-mail afdrukt. ------------------------------------------------------------------------------ This message may contain confidential information and is intended exclusively for the addressee. If you receive this message unintentionally, please do not use the contents but notify the sender immediately by return e-mail. University Medical Center Utrecht is a legal person by public law and is registered at the Chamber of Commerce for Midden-Nederland under no. 30244197. Please consider the environment before printing this e-mail.

On 08 Jul 2016, at 12:06, bertjan <b.j.goorkate@umcutrecht.nl> wrote:
Hi,
I have a 3-node CentOS7 based oVirt+replica-3 gluster environment with an engine on dedicated hardware.
After putting the first vm-host into maintenance mode to update it from vdsm-4.17.28-1 to vdsm-4.17.32-0 and from glusterfs-3.7.11-1 to glusterfs-3.7.12-2 (among others), random VMs froze (not pauzed. oVirt showed them as 'up') until the update was done and the vm-host was rebooted and active again.
I suppose the host you were updating at that time had no running VMs, right? If so, then indeed perhaps a gluster issue
After all the vm-hosts were upgraded, I never experienced the problem again. Can this be a bug, fixed with the upgrade to glusterfs-3.7.12-2?
Has anyone experienced the same problem?
Thanks in advance! (next week I'm not able to check my e-mail, so response can be delayed).
Regards,
Bertjan
------------------------------------------------------------------------------
De informatie opgenomen in dit bericht kan vertrouwelijk zijn en is uitsluitend bestemd voor de geadresseerde. Indien u dit bericht onterecht ontvangt, wordt u verzocht de inhoud niet te gebruiken en de afzender direct te informeren door het bericht te retourneren. Het Universitair Medisch Centrum Utrecht is een publiekrechtelijke rechtspersoon in de zin van de W.H.W. (Wet Hoger Onderwijs en Wetenschappelijk Onderzoek) en staat geregistreerd bij de Kamer van Koophandel voor Midden-Nederland onder nr. 30244197.
Denk s.v.p aan het milieu voor u deze e-mail afdrukt.
------------------------------------------------------------------------------
This message may contain confidential information and is intended exclusively for the addressee. If you receive this message unintentionally, please do not use the contents but notify the sender immediately by return e-mail. University Medical Center Utrecht is a legal person by public law and is registered at the Chamber of Commerce for Midden-Nederland under no. 30244197.
Please consider the environment before printing this e-mail. _______________________________________________ Users mailing list Users@ovirt.org http://lists.ovirt.org/mailman/listinfo/users

Hi Michal, That's right. I put it in maintenance mode, so there were no VMs. The frozen VMs were on the other hosts. That's wat makes it strange and why it doesn't give me a good feeling. When someone can say 'I know the issue and it is fixed with gluster version 3.7.12', I would feel more reassured about it... Regards, Bertjan On Fri, Jul 08, 2016 at 12:23:21PM +0200, Michal Skrivanek wrote:
On 08 Jul 2016, at 12:06, bertjan <b.j.goorkate@umcutrecht.nl> wrote:
Hi,
I have a 3-node CentOS7 based oVirt+replica-3 gluster environment with an engine on dedicated hardware.
After putting the first vm-host into maintenance mode to update it from vdsm-4.17.28-1 to vdsm-4.17.32-0 and from glusterfs-3.7.11-1 to glusterfs-3.7.12-2 (among others), random VMs froze (not pauzed. oVirt showed them as 'up') until the update was done and the vm-host was rebooted and active again.
I suppose the host you were updating at that time had no running VMs, right? If so, then indeed perhaps a gluster issue
After all the vm-hosts were upgraded, I never experienced the problem again. Can this be a bug, fixed with the upgrade to glusterfs-3.7.12-2?
Has anyone experienced the same problem?
Thanks in advance! (next week I'm not able to check my e-mail, so response can be delayed).
Regards,
Bertjan
------------------------------------------------------------------------------
De informatie opgenomen in dit bericht kan vertrouwelijk zijn en is uitsluitend bestemd voor de geadresseerde. Indien u dit bericht onterecht ontvangt, wordt u verzocht de inhoud niet te gebruiken en de afzender direct te informeren door het bericht te retourneren. Het Universitair Medisch Centrum Utrecht is een publiekrechtelijke rechtspersoon in de zin van de W.H.W. (Wet Hoger Onderwijs en Wetenschappelijk Onderzoek) en staat geregistreerd bij de Kamer van Koophandel voor Midden-Nederland onder nr. 30244197.
Denk s.v.p aan het milieu voor u deze e-mail afdrukt.
------------------------------------------------------------------------------
This message may contain confidential information and is intended exclusively for the addressee. If you receive this message unintentionally, please do not use the contents but notify the sender immediately by return e-mail. University Medical Center Utrecht is a legal person by public law and is registered at the Chamber of Commerce for Midden-Nederland under no. 30244197.
Please consider the environment before printing this e-mail. _______________________________________________ Users mailing list Users@ovirt.org http://lists.ovirt.org/mailman/listinfo/users

Did you see any errors in the gluster mount logs during the time when the VMs were frozen ( I assume the I/O not responding during this time?) . There have been bugs fixed around concurrent I/O on gluster volume and vm's pausing in 3.7.12 - the mount logs can tell us if you ran into similar issue. On 07/08/2016 03:58 PM, bertjan wrote:
Hi Michal,
That's right. I put it in maintenance mode, so there were no VMs.
The frozen VMs were on the other hosts. That's wat makes it strange and why it doesn't give me a good feeling. When someone can say 'I know the issue and it is fixed with gluster version 3.7.12', I would feel more reassured about it...
Regards,
Bertjan
On Fri, Jul 08, 2016 at 12:23:21PM +0200, Michal Skrivanek wrote:
On 08 Jul 2016, at 12:06, bertjan <b.j.goorkate@umcutrecht.nl> wrote:
Hi,
I have a 3-node CentOS7 based oVirt+replica-3 gluster environment with an engine on dedicated hardware.
After putting the first vm-host into maintenance mode to update it from vdsm-4.17.28-1 to vdsm-4.17.32-0 and from glusterfs-3.7.11-1 to glusterfs-3.7.12-2 (among others), random VMs froze (not pauzed. oVirt showed them as 'up') until the update was done and the vm-host was rebooted and active again. I suppose the host you were updating at that time had no running VMs, right? If so, then indeed perhaps a gluster issue
After all the vm-hosts were upgraded, I never experienced the problem again. Can this be a bug, fixed with the upgrade to glusterfs-3.7.12-2?
Has anyone experienced the same problem?
Thanks in advance! (next week I'm not able to check my e-mail, so response can be delayed).
Regards,
Bertjan
------------------------------------------------------------------------------
De informatie opgenomen in dit bericht kan vertrouwelijk zijn en is uitsluitend bestemd voor de geadresseerde. Indien u dit bericht onterecht ontvangt, wordt u verzocht de inhoud niet te gebruiken en de afzender direct te informeren door het bericht te retourneren. Het Universitair Medisch Centrum Utrecht is een publiekrechtelijke rechtspersoon in de zin van de W.H.W. (Wet Hoger Onderwijs en Wetenschappelijk Onderzoek) en staat geregistreerd bij de Kamer van Koophandel voor Midden-Nederland onder nr. 30244197.
Denk s.v.p aan het milieu voor u deze e-mail afdrukt.
------------------------------------------------------------------------------
This message may contain confidential information and is intended exclusively for the addressee. If you receive this message unintentionally, please do not use the contents but notify the sender immediately by return e-mail. University Medical Center Utrecht is a legal person by public law and is registered at the Chamber of Commerce for Midden-Nederland under no. 30244197.
Please consider the environment before printing this e-mail. _______________________________________________ Users mailing list Users@ovirt.org http://lists.ovirt.org/mailman/listinfo/users
Users mailing list Users@ovirt.org http://lists.ovirt.org/mailman/listinfo/users

Hi, Sorry for my delayed response. I've been away on a holiday. Around the time of the frozen VMs I see lots and lots and lots of "Transport endpoint is not connected" messages in the 'rhev-data-center-mnt-glusterSD-vmhost1.local:_vmstore1.log-20160711' log file, like: [2016-07-06 08:13:08.297203] E [MSGID: 114031] [client-rpc-fops.c:972:client3_3_flush_cbk] 0-vmstore1-client-2: remote operation failed [Transport endpoint is not connected] [2016-07-06 08:13:08.298066] W [MSGID: 114031] [client-rpc-fops.c:845:client3_3_statfs_cbk] 0-vmstore1-client-2: remote operation failed [Transport endpoint is not connected] [2016-07-06 08:13:08.299478] W [MSGID: 114031] [client-rpc-fops.c:2974:client3_3_lookup_cbk] 0-vmstore1-client-2: remote operation failed. Path: /9056e6a8-105f-4c63-bfc1-848f674a942a/images (96ff2aa1-6ade-48e5-933d-d18680c29913) [Transport endpoint is not connected] [2016-07-06 08:13:08.300435] W [MSGID: 114031] [client-rpc-fops.c:2974:client3_3_lookup_cbk] 0-vmstore1-client-2: remote operation failed. Path: (null) (00000000-0000-0000-0000-000000000000) [Transport endpoint is not connected] [2016-07-06 08:13:08.300939] E [MSGID: 114031] [client-rpc-fops.c:1730:client3_3_entrylk_cbk] 0-vmstore1-client-2: remote operation failed [Transport endpoint is not connected] [2016-07-06 08:13:08.301351] W [MSGID: 114031] [client-rpc-fops.c:2974:client3_3_lookup_cbk] 0-vmstore1-client-2: remote operation failed. Path: <gfid:e38729d9-320e-4230-acab-f363cf48089e> (e38729d9-320e-4230-acab-f363cf48089e) [Transport endpoint is not connected] [2016-07-06 08:13:08.303396] W [MSGID: 114031] [client-rpc-fops.c:2974:client3_3_lookup_cbk] 0-vmstore1-client-2: remote operation failed. Path: (null) (00000000-0000-0000-0000-000000000000) [Transport endpoint is not connected] [2016-07-06 08:13:08.303954] E [MSGID: 114031] [client-rpc-fops.c:2886:client3_3_opendir_cbk] 0-vmstore1-client-2: remote operation failed. Path: /9056e6a8-105f-4c63-bfc1-848f674a942a/images (96ff2aa1-6ade-48e5-933d-d18680c29913) [Transport endpoint is not connected] [2016-07-06 08:13:08.306293] W [MSGID: 114031] [client-rpc-fops.c:845:client3_3_statfs_cbk] 0-vmstore1-client-2: remote operation failed [Transport endpoint is not connected] [2016-07-06 08:13:08.314599] W [MSGID: 114031] [client-rpc-fops.c:845:client3_3_statfs_cbk] 0-vmstore1-client-2: remote operation failed [Transport endpoint is not connected] (repeated thousands of times) Together with: [2016-07-06 08:16:49.233677] W [socket.c:589:__socket_rwv] 0-glusterfs: readv on 10.0.0.153:24007 failed (No data available) [2016-07-06 08:16:50.326447] W [socket.c:589:__socket_rwv] 0-vmstore1-client-0: readv on 10.0.0.153:49152 failed (No data available) [2016-07-06 08:16:50.326492] I [MSGID: 114018] [client.c:2030:client_rpc_notify] 0-vmstore1-client-0: disconnected from vmstore1-client-0. Client process will keep trying to connect to glusterd until brick's port is available [2016-07-06 08:16:51.523632] W [fuse-bridge.c:2302:fuse_writev_cbk] 0-glusterfs-fuse: 33314276: WRITE => -1 (Read-only file system) [2016-07-06 08:16:51.523890] W [MSGID: 114061] [client-rpc-fops.c:4449:client3_3_flush] 0-vmstore1-client-2: (54da8812-7f5e-48e1-87a8-4f7f17e44918) remote_fd is -1. EBADFD [File descriptor in bad state] [2016-07-06 08:16:59.575848] E [socket.c:2279:socket_connect_finish] 0-glusterfs: connection to 10.0.0.153:24007 failed (Connection refused) [2016-07-06 08:17:00.578271] E [socket.c:2279:socket_connect_finish] 0-vmstore1-client-0: connection to 10.0.0.153:24007 failed (Connection refused) [2016-07-06 08:20:42.236021] I [fuse-bridge.c:4997:fuse_thread_proc] 0-fuse: unmounting /rhev/data-center/mnt/glusterSD/vmhost1.local:_vmstore1 [2016-07-06 08:20:42.236230] W [glusterfsd.c:1251:cleanup_and_exit] (-->/lib64/libpthread.so.0(+0x7dc5) [0x7f94b8ba1dc5] -->/usr/sbin/glusterfs(glusterfs_sigwaiter+0xe5) [0x7f94ba20c8b5] -->/usr/sbin/glusterfs(cleanup_and_exit+0x69) [0x7f94ba20c739] ) 0-: received signum (15), shutting down [2016-07-06 08:20:42.236271] I [fuse-bridge.c:5704:fini] 0-fuse: Unmounting '/rhev/data-center/mnt/glusterSD/vmhost1.local:_vmstore1'. [2016-07-06 08:30:09.594553] I [MSGID: 100030] [glusterfsd.c:2338:main] 0-/usr/sbin/glusterfs: Started running /usr/sbin/glusterfs version 3.7.12 (args: /usr/sbin/glusterfs --volfile-server=vmhost1.local --volfile-server=10.0.0.153 --volfile-server=10.0.0.160 --volfile-server=10.0.0.198 --volfile-id=/vmstore1 /rhev/data-center/mnt/glusterSD/vmhost1.local:_vmstore1) After the last line ("Started running /usr/sbin/glusterfs version 3.7.12") everything went back to normal. (I replaced the hostnames and IP-addresses). Is this the information you need? Thanks in advance! Regards, Bertjan On Mon, Jul 11, 2016 at 01:49:19PM +0530, Sahina Bose wrote:
Did you see any errors in the gluster mount logs during the time when the VMs were frozen ( I assume the I/O not responding during this time?) . There have been bugs fixed around concurrent I/O on gluster volume and vm's pausing in 3.7.12 - the mount logs can tell us if you ran into similar issue.
On 07/08/2016 03:58 PM, bertjan wrote:
Hi Michal,
That's right. I put it in maintenance mode, so there were no VMs.
The frozen VMs were on the other hosts. That's wat makes it strange and why it doesn't give me a good feeling. When someone can say 'I know the issue and it is fixed with gluster version 3.7.12', I would feel more reassured about it...
Regards,
Bertjan
On Fri, Jul 08, 2016 at 12:23:21PM +0200, Michal Skrivanek wrote:
On 08 Jul 2016, at 12:06, bertjan <b.j.goorkate@umcutrecht.nl> wrote:
Hi,
I have a 3-node CentOS7 based oVirt+replica-3 gluster environment with an engine on dedicated hardware.
After putting the first vm-host into maintenance mode to update it from vdsm-4.17.28-1 to vdsm-4.17.32-0 and from glusterfs-3.7.11-1 to glusterfs-3.7.12-2 (among others), random VMs froze (not pauzed. oVirt showed them as 'up') until the update was done and the vm-host was rebooted and active again. I suppose the host you were updating at that time had no running VMs, right? If so, then indeed perhaps a gluster issue
After all the vm-hosts were upgraded, I never experienced the problem again. Can this be a bug, fixed with the upgrade to glusterfs-3.7.12-2?
Has anyone experienced the same problem?
Thanks in advance! (next week I'm not able to check my e-mail, so response can be delayed).
Regards,
Bertjan
------------------------------------------------------------------------------
De informatie opgenomen in dit bericht kan vertrouwelijk zijn en is uitsluitend bestemd voor de geadresseerde. Indien u dit bericht onterecht ontvangt, wordt u verzocht de inhoud niet te gebruiken en de afzender direct te informeren door het bericht te retourneren. Het Universitair Medisch Centrum Utrecht is een publiekrechtelijke rechtspersoon in de zin van de W.H.W. (Wet Hoger Onderwijs en Wetenschappelijk Onderzoek) en staat geregistreerd bij de Kamer van Koophandel voor Midden-Nederland onder nr. 30244197.
Denk s.v.p aan het milieu voor u deze e-mail afdrukt.
------------------------------------------------------------------------------
This message may contain confidential information and is intended exclusively for the addressee. If you receive this message unintentionally, please do not use the contents but notify the sender immediately by return e-mail. University Medical Center Utrecht is a legal person by public law and is registered at the Chamber of Commerce for Midden-Nederland under no. 30244197.
Please consider the environment before printing this e-mail. _______________________________________________ Users mailing list Users@ovirt.org http://lists.ovirt.org/mailman/listinfo/users
Users mailing list Users@ovirt.org http://lists.ovirt.org/mailman/listinfo/users
_______________________________________________ Users mailing list Users@ovirt.org http://lists.ovirt.org/mailman/listinfo/users
participants (3)
-
bertjan
-
Michal Skrivanek
-
Sahina Bose