[ovirt-users] urgent issue
Chris Liebman
chris.l at taboola.com
Wed Sep 9 15:31:07 UTC 2015
Ok - I think I'm going to switch to local storage - I've had way to many
unexplainable issue with glusterfs :-(. Is there any reason I cant add
local storage to the existing shared-storage cluster? I see that the menu
item is greyed out....
On Tue, Sep 8, 2015 at 4:19 PM, Chris Liebman <chris.l at taboola.com> wrote:
> Its possible that this is specific to just one gluster volume... I've
> moved a few VM disks off of that volume and am able to start them fine. My
> recolection is that any VM started on the "bad" volume causes it to be
> disconnected and forces the ovirt node to be marked down until
> Maint->Activate.
>
> On Tue, Sep 8, 2015 at 3:52 PM, Chris Liebman <chris.l at taboola.com> wrote:
>
>> In attempting to put an ovirt cluster in production I'm running into some
>> off errors with gluster it looks like. Its 12 hosts each with one brick in
>> distributed-replicate. (actually 2 bricks but they are separate volumes)
>>
>> [root at ovirt-node268 glusterfs]# rpm -qa | grep vdsm
>>
>> vdsm-jsonrpc-4.16.20-0.el6.noarch
>>
>> vdsm-gluster-4.16.20-0.el6.noarch
>>
>> vdsm-xmlrpc-4.16.20-0.el6.noarch
>>
>> vdsm-yajsonrpc-4.16.20-0.el6.noarch
>>
>> vdsm-4.16.20-0.el6.x86_64
>>
>> vdsm-python-zombiereaper-4.16.20-0.el6.noarch
>>
>> vdsm-python-4.16.20-0.el6.noarch
>>
>> vdsm-cli-4.16.20-0.el6.noarch
>>
>>
>> Everything was fine last week, however, today various clients in the
>> gluster cluster seem get "client quorum not met" periodically - when they
>> get this they take one of the bricks offline - this causes VM's to be
>> attempted to move - sometimes 20 at a time. That takes a long time :-(.
>> I've tried disabling automatic migration and teh VM's get paused when this
>> happens - resuming gets nothing at that point as the volumes mount on the
>> server hosting the VM is not connected:
>>
>> from
>> rhev-data-center-mnt-glusterSD-ovirt-node268.la.taboolasyndication.com:
>> _LADC-TBX-V02.log:
>>
>> [2015-09-08 21:18:42.920771] W [MSGID: 108001]
>> [afr-common.c:4043:afr_notify] 2-LADC-TBX-V02-replicate-2: Client-quorum is not
>> met
>>
>> [2015-09-08 21:18:42.931751] I [fuse-bridge.c:4900:fuse_thread_proc]
>> 0-fuse: unmounting
>> /rhev/data-center/mnt/glusterSD/ovirt-node268.la.taboolasyndication.com:
>> _LADC-TBX-V02
>>
>> [2015-09-08 21:18:42.931836] W [glusterfsd.c:1219:cleanup_and_exit]
>> (-->/lib64/libpthread.so.0(+0x7a51) [0x7f1bebc84a51]
>> -->/usr/sbin/glusterfs(glusterfs_sigwaiter+0xcd) [0x405e4d]
>> -->/usr/sbin/glusterfs(cleanup_and_exit+0x
>>
>> 65) [0x4059b5] ) 0-: received signum (15), shutting down
>>
>> [2015-09-08 21:18:42.931858] I [fuse-bridge.c:5595:fini] 0-fuse:
>> Unmounting
>> '/rhev/data-center/mnt/glusterSD/ovirt-node268.la.taboolasyndication.com:
>> _LADC-TBX-V02'.
>>
>>
>> And the mount is broken at that point:
>>
>> [root at ovirt-node267 ~]# df
>>
>> *df:
>> `/rhev/data-center/mnt/glusterSD/ovirt-node268.la.taboolasyndication.com:_LADC-TBX-V02':
>> Transport endpoint is not connected*
>>
>> Filesystem 1K-blocks Used Available Use% Mounted on
>>
>> /dev/sda3 51475068 1968452 46885176 5% /
>>
>> tmpfs 132210244 0 132210244 0% /dev/shm
>>
>> /dev/sda2 487652 32409 429643 8% /boot
>>
>> /dev/sda1 204580 260 204320 1% /boot/efi
>>
>> /dev/sda5 1849960960 156714056 1599267616 9% /data1
>>
>> /dev/sdb1 1902274676 18714468 1786923588 2% /data2
>>
>> ovirt-node268.la.taboolasyndication.com:/LADC-TBX-V01
>>
>> 9249804800 727008640 8052899712 9%
>> /rhev/data-center/mnt/glusterSD/ovirt-node268.la.taboolasyndication.com:
>> _LADC-TBX-V01
>>
>> ovirt-node251.la.taboolasyndication.com:/LADC-TBX-V03
>>
>> 1849960960 73728 1755907968 1%
>> /rhev/data-center/mnt/glusterSD/ovirt-node251.la.taboolasyndication.com:
>> _LADC-TBX-V03
>>
>> The fix for that is to put the server in maintenance mode then activate
>> it again. But all VM's need to be migrated or stopped for that to work.
>>
>> I'm not seeing any obvious network or disk errors......
>>
>> Are their configuration options I'm missing?
>>
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.ovirt.org/pipermail/users/attachments/20150909/130f485a/attachment-0001.html>
More information about the Users
mailing list