[ovirt-users] VMs freeze during vm-host reboot

bertjan b.j.goorkate at umcutrecht.nl
Thu Jul 21 15:19:40 UTC 2016


Hi,

Sorry for my delayed response. I've been away on a holiday.

Around the time of the frozen VMs I see lots and lots and lots of "Transport endpoint is not connected"
messages in the 'rhev-data-center-mnt-glusterSD-vmhost1.local:_vmstore1.log-20160711' log file, like:

[2016-07-06 08:13:08.297203] E [MSGID: 114031] [client-rpc-fops.c:972:client3_3_flush_cbk] 0-vmstore1-client-2: remote operation failed [Transport endpoint is not connected]
[2016-07-06 08:13:08.298066] W [MSGID: 114031] [client-rpc-fops.c:845:client3_3_statfs_cbk] 0-vmstore1-client-2: remote operation failed [Transport endpoint is not connected]
[2016-07-06 08:13:08.299478] W [MSGID: 114031] [client-rpc-fops.c:2974:client3_3_lookup_cbk] 0-vmstore1-client-2: remote operation failed. Path: /9056e6a8-105f-4c63-bfc1-848f674a942a/images (96ff2aa1-6ade-48e5-933d-d18680c29913) [Transport endpoint is not connected]
[2016-07-06 08:13:08.300435] W [MSGID: 114031] [client-rpc-fops.c:2974:client3_3_lookup_cbk] 0-vmstore1-client-2: remote operation failed. Path: (null) (00000000-0000-0000-0000-000000000000) [Transport endpoint is not connected]
[2016-07-06 08:13:08.300939] E [MSGID: 114031] [client-rpc-fops.c:1730:client3_3_entrylk_cbk] 0-vmstore1-client-2: remote operation failed [Transport endpoint is not connected]
[2016-07-06 08:13:08.301351] W [MSGID: 114031] [client-rpc-fops.c:2974:client3_3_lookup_cbk] 0-vmstore1-client-2: remote operation failed. Path: <gfid:e38729d9-320e-4230-acab-f363cf48089e> (e38729d9-320e-4230-acab-f363cf48089e) [Transport endpoint is not connected]
[2016-07-06 08:13:08.303396] W [MSGID: 114031] [client-rpc-fops.c:2974:client3_3_lookup_cbk] 0-vmstore1-client-2: remote operation failed. Path: (null) (00000000-0000-0000-0000-000000000000) [Transport endpoint is not connected]
[2016-07-06 08:13:08.303954] E [MSGID: 114031] [client-rpc-fops.c:2886:client3_3_opendir_cbk] 0-vmstore1-client-2: remote operation failed. Path: /9056e6a8-105f-4c63-bfc1-848f674a942a/images (96ff2aa1-6ade-48e5-933d-d18680c29913) [Transport endpoint is not connected]
[2016-07-06 08:13:08.306293] W [MSGID: 114031] [client-rpc-fops.c:845:client3_3_statfs_cbk] 0-vmstore1-client-2: remote operation failed [Transport endpoint is not connected]
[2016-07-06 08:13:08.314599] W [MSGID: 114031] [client-rpc-fops.c:845:client3_3_statfs_cbk] 0-vmstore1-client-2: remote operation failed [Transport endpoint is not connected]
(repeated thousands of times)

Together with:

[2016-07-06 08:16:49.233677] W [socket.c:589:__socket_rwv] 0-glusterfs: readv on 10.0.0.153:24007 failed (No data available)
[2016-07-06 08:16:50.326447] W [socket.c:589:__socket_rwv] 0-vmstore1-client-0: readv on 10.0.0.153:49152 failed (No data available)
[2016-07-06 08:16:50.326492] I [MSGID: 114018] [client.c:2030:client_rpc_notify] 0-vmstore1-client-0: disconnected from vmstore1-client-0. Client process will keep trying to connect to glusterd until brick's port is available
[2016-07-06 08:16:51.523632] W [fuse-bridge.c:2302:fuse_writev_cbk] 0-glusterfs-fuse: 33314276: WRITE => -1 (Read-only file system)
[2016-07-06 08:16:51.523890] W [MSGID: 114061] [client-rpc-fops.c:4449:client3_3_flush] 0-vmstore1-client-2:  (54da8812-7f5e-48e1-87a8-4f7f17e44918) remote_fd is -1. EBADFD [File descriptor in bad state]
[2016-07-06 08:16:59.575848] E [socket.c:2279:socket_connect_finish] 0-glusterfs: connection to 10.0.0.153:24007 failed (Connection refused)
[2016-07-06 08:17:00.578271] E [socket.c:2279:socket_connect_finish] 0-vmstore1-client-0: connection to 10.0.0.153:24007 failed (Connection refused)
[2016-07-06 08:20:42.236021] I [fuse-bridge.c:4997:fuse_thread_proc] 0-fuse: unmounting /rhev/data-center/mnt/glusterSD/vmhost1.local:_vmstore1
[2016-07-06 08:20:42.236230] W [glusterfsd.c:1251:cleanup_and_exit] (-->/lib64/libpthread.so.0(+0x7dc5) [0x7f94b8ba1dc5] -->/usr/sbin/glusterfs(glusterfs_sigwaiter+0xe5) [0x7f94ba20c8b5] -->/usr/sbin/glusterfs(cleanup_and_exit+0x69) [0x7f94ba20c739] ) 0-: received signum (15), shutting down
[2016-07-06 08:20:42.236271] I [fuse-bridge.c:5704:fini] 0-fuse: Unmounting '/rhev/data-center/mnt/glusterSD/vmhost1.local:_vmstore1'.
[2016-07-06 08:30:09.594553] I [MSGID: 100030] [glusterfsd.c:2338:main] 0-/usr/sbin/glusterfs: Started running /usr/sbin/glusterfs version 3.7.12 (args: /usr/sbin/glusterfs --volfile-server=vmhost1.local --volfile-server=10.0.0.153 --volfile-server=10.0.0.160 --volfile-server=10.0.0.198 --volfile-id=/vmstore1 /rhev/data-center/mnt/glusterSD/vmhost1.local:_vmstore1)

After the last line ("Started running /usr/sbin/glusterfs version 3.7.12") everything went back to normal. 
(I replaced the hostnames and IP-addresses).

Is this the information you need?

Thanks in advance!

Regards,

Bertjan


On Mon, Jul 11, 2016 at 01:49:19PM +0530, Sahina Bose wrote:
> Did you see any errors in the gluster mount logs during the time when the
> VMs were frozen ( I assume the I/O not responding during this time?) . There
> have been bugs fixed around concurrent I/O on gluster volume and vm's
> pausing in 3.7.12 - the mount logs can tell us if you ran into similar
> issue.
> 
> On 07/08/2016 03:58 PM, bertjan wrote:
> > Hi Michal,
> > 
> > That's right. I put it in maintenance mode, so there were no VMs.
> > 
> > The frozen VMs were on the other hosts. That's wat makes it strange and
> > why it doesn't give me a good feeling. When someone can say  'I know the
> > issue and it is fixed with gluster version 3.7.12', I would feel more reassured
> > about it...
> > 
> > Regards,
> > 
> > Bertjan
> > 
> > On Fri, Jul 08, 2016 at 12:23:21PM +0200, Michal Skrivanek wrote:
> > > > On 08 Jul 2016, at 12:06, bertjan <b.j.goorkate at umcutrecht.nl> wrote:
> > > > 
> > > > Hi,
> > > > 
> > > > I have a 3-node CentOS7 based oVirt+replica-3 gluster environment with an engine
> > > > on dedicated hardware.
> > > > 
> > > > After putting the first vm-host into maintenance mode to update it from vdsm-4.17.28-1
> > > > to vdsm-4.17.32-0 and from glusterfs-3.7.11-1 to glusterfs-3.7.12-2 (among others),
> > > > random VMs froze (not pauzed. oVirt showed them as 'up') until the update was done and
> > > > the vm-host was rebooted and active again.
> > > I suppose the host you were updating at that time had no running VMs, right?
> > > If so, then indeed perhaps a gluster issue
> > > 
> > > > After all the vm-hosts were upgraded, I never experienced the problem again.
> > > > Can this be a bug, fixed with the upgrade to glusterfs-3.7.12-2?
> > > > 
> > > > Has anyone experienced the same problem?
> > > > 
> > > > Thanks in advance! (next week I'm not able to check my e-mail, so response can be delayed).
> > > > 
> > > > Regards,
> > > > 
> > > > Bertjan
> > > > 
> > > > 
> > > > 
> > > > ------------------------------------------------------------------------------
> > > > 
> > > > De informatie opgenomen in dit bericht kan vertrouwelijk zijn en is
> > > > uitsluitend bestemd voor de geadresseerde. Indien u dit bericht onterecht
> > > > ontvangt, wordt u verzocht de inhoud niet te gebruiken en de afzender direct
> > > > te informeren door het bericht te retourneren. Het Universitair Medisch
> > > > Centrum Utrecht is een publiekrechtelijke rechtspersoon in de zin van de W.H.W.
> > > > (Wet Hoger Onderwijs en Wetenschappelijk Onderzoek) en staat geregistreerd bij
> > > > de Kamer van Koophandel voor Midden-Nederland onder nr. 30244197.
> > > > 
> > > > Denk s.v.p aan het milieu voor u deze e-mail afdrukt.
> > > > 
> > > > ------------------------------------------------------------------------------
> > > > 
> > > > This message may contain confidential information and is intended exclusively
> > > > for the addressee. If you receive this message unintentionally, please do not
> > > > use the contents but notify the sender immediately by return e-mail. University
> > > > Medical Center Utrecht is a legal person by public law and is registered at
> > > > the Chamber of Commerce for Midden-Nederland under no. 30244197.
> > > > 
> > > > Please consider the environment before printing this e-mail.
> > > > _______________________________________________
> > > > Users mailing list
> > > > Users at ovirt.org
> > > > http://lists.ovirt.org/mailman/listinfo/users
> > _______________________________________________
> > Users mailing list
> > Users at ovirt.org
> > http://lists.ovirt.org/mailman/listinfo/users
> 
> _______________________________________________
> Users mailing list
> Users at ovirt.org
> http://lists.ovirt.org/mailman/listinfo/users



More information about the Users mailing list