[Users] Stopping glusterfsd service shut down data center

Amedeo Salvati amedeo at oscert.net
Sun Jan 5 18:47:28 UTC 2014


Hi all,

I'm testing ovirt+glusterfs with only two node for all (engine, 
glusterfs, hypervisors),  on centos 6.5 hosts following guide at:

http://community.redhat.com/blog/2013/09/up-and-running-with-ovirt-3-3/
http://www.gluster.org/2013/09/ovirt-3-3-glusterized/

but with some change like setting on glusterfs, parameter 
cluster.server-quorum-ratio to 50% (due to prevent glusterfs to go down 
if one node goes done) and option on /etc/glusterfs/glusterd.vol "option 
base-port 50152" (due to libvirt port conflict).

So, with the above parameter I was able to stop/reboot node not used to 
directly mount glusterfs (eg lovhm002), but when I stop/reboot node, 
that is used to mount glusterfs (eg node lovhm001), all data center goes 
done, especially when I stop service glusterfsd (not glusterd 
service!!!), but the glusterfs still alive and is reachable on node 
lovhm002 that survives but ovirt/libvirt marks DC/storage in error.

Do you have any ideas to configure DC/Cluster on ovirt that remains 
aware if node used to mount glusterfs goes down?

This is a sample vdsmd.log on node that remains on-line (lovhm002) when 
I stopped services glusterd and glusterfsd node lovhm001:

Thread-294::DEBUG::2014-01-05 
19:12:32,475::task::1168::TaskManager.Task::(prepare) 
Task=`a003cde0-a11a-489e-94c2-611f3d096a81`::finished: {'info': 
{'spm_id': 2, 'master_uuid': '85ad5f7d-3b67-4618-a871-f9ec886020a4', 
'name': 'PROD', 'version': '3', 'domains': 
'85ad5f7d-3b67-4618-a871-f9ec886020a4:Active,6a9b4fa6-f393-4036-bd4e-0bc9dccb1594:Active', 
'pool_status': 'connected', 'isoprefix': 
'/rhev/data-center/mnt/lovhm001.fabber.it:_var_lib_exports_iso/6a9b4fa6-f393-4036-bd4e-0bc9dccb1594/images/11111111-1111-1111-1111-111111111111', 
'type': 'GLUSTERFS', 'master_ver': 2, 'lver': 3}, 'dominfo': 
{'85ad5f7d-3b67-4618-a871-f9ec886020a4': {'status': 'Active', 
'diskfree': '374350675968', 'alerts': [], 'version': 3, 'disktotal': 
'375626137600'}, '6a9b4fa6-f393-4036-bd4e-0bc9dccb1594': {'status': 
'Active', 'diskfree': '45249200128', 'alerts': [], 'version': 0, 
'disktotal': '51604619264'}}}
Thread-294::DEBUG::2014-01-05 
19:12:32,476::task::579::TaskManager.Task::(_updateState) 
Task=`a003cde0-a11a-489e-94c2-611f3d096a81`::moving from state preparing 
-> state finished
Thread-294::DEBUG::2014-01-05 
19:12:32,476::resourceManager::939::ResourceManager.Owner::(releaseAll) 
Owner.releaseAll requests {} resources 
{'Storage.2eceb484-73e0-464a-965b-69f067918080': < ResourceRef 
'Storage.2eceb484-73e0-464a-965b-69f067918080', isValid: 'True' obj: 
'None'>}
Thread-294::DEBUG::2014-01-05 
19:12:32,476::resourceManager::976::ResourceManager.Owner::(cancelAll) 
Owner.cancelAll requests {}
Thread-294::DEBUG::2014-01-05 
19:12:32,476::resourceManager::615::ResourceManager::(releaseResource) 
Trying to release resource 'Storage.2eceb484-73e0-464a-965b-69f067918080'
Thread-294::DEBUG::2014-01-05 
19:12:32,476::resourceManager::634::ResourceManager::(releaseResource) 
Released resource 'Storage.2eceb484-73e0-464a-965b-69f067918080' (0 
active users)
Thread-294::DEBUG::2014-01-05 
19:12:32,476::resourceManager::640::ResourceManager::(releaseResource) 
Resource 'Storage.2eceb484-73e0-464a-965b-69f067918080' is free, finding 
out if anyone is waiting for it.
Thread-294::DEBUG::2014-01-05 
19:12:32,476::resourceManager::648::ResourceManager::(releaseResource) 
No one is waiting for resource 
'Storage.2eceb484-73e0-464a-965b-69f067918080', Clearing records.
Thread-294::DEBUG::2014-01-05 
19:12:32,476::task::974::TaskManager.Task::(_decref) 
Task=`a003cde0-a11a-489e-94c2-611f3d096a81`::ref 0 aborting False
Thread-296::DEBUG::2014-01-05 
19:12:36,070::BindingXMLRPC::974::vds::(wrapper) client 
[5.39.66.85]::call volumesList with () {} flowID [76f294ea]
Thread-296::DEBUG::2014-01-05 
19:12:36,079::BindingXMLRPC::981::vds::(wrapper) return volumesList with 
{'status': {'message': 'Done', 'code': 0}, 'volumes': {'vmdata': 
{'transportType': ['TCP'], 'uuid': 
'e9b05f7a-f392-44f3-9d44-04761c36437d', 'bricks': 
['lovhm001.fabber.it:/vmdata', 'lovhm002.fabber.it:/vmdata'], 
'volumeName': 'vmdata', 'volumeType': 'REPLICATE', 'replicaCount': '2', 
'brickCount': '2', 'distCount': '2', 'volumeStatus': 'ONLINE', 
'stripeCount': '1', 'options': {'cluster.server-quorum-type': 'server', 
'cluster.eager-lock': 'enable', 'performance.stat-prefetch': 'off', 
'auth.allow': '*', 'cluster.quorum-type': 'auto', 
'performance.quick-read': 'off', 'network.remote-dio': 'enable', 
'nfs.disable': 'on', 'performance.io-cache': 'off', 
'server.allow-insecure': 'on', 'storage.owner-uid': '36', 'user.cifs': 
'disable', 'performance.read-ahead': 'off', 'storage.owner-gid': '36', 
'cluster.server-quorum-ratio': '50%'}}}}
libvirtEventLoop::INFO::2014-01-05 
19:12:38,856::vm::4266::vm.Vm::(_onAbnormalStop) 
vmId=`88997598-1db3-478a-bbe2-a7d234cfdc77`::abnormal vm stop device 
virtio-disk0 error eother
libvirtEventLoop::DEBUG::2014-01-05 
19:12:38,856::vm::4840::vm.Vm::(_onLibvirtLifecycleEvent) 
vmId=`88997598-1db3-478a-bbe2-a7d234cfdc77`::event Suspended detail 2 
opaque None
Thread-28::DEBUG::2014-01-05 
19:12:40,996::fileSD::239::Storage.Misc.excCmd::(getReadDelay) '/bin/dd 
iflag=direct 
if=/rhev/data-center/mnt/lovhm001.fabber.it:_var_lib_exports_iso/6a9b4fa6-f393-4036-bd4e-0bc9dccb1594/dom_md/metadata 
bs=4096 count=1' (cwd None)
Thread-28::DEBUG::2014-01-05 
19:12:41,001::fileSD::239::Storage.Misc.excCmd::(getReadDelay) SUCCESS: 
<err> = '0+1 records in\n0+1 records out\n361 bytes (361 B) copied, 
0.00017524 s, 2.1 MB/s\n'; <rc> = 0
Thread-298::DEBUG::2014-01-05 
19:12:41,088::BindingXMLRPC::974::vds::(wrapper) client 
[5.39.66.85]::call volumesList with () {}
Thread-298::DEBUG::2014-01-05 
19:12:41,097::BindingXMLRPC::981::vds::(wrapper) return volumesList with 
{'status': {'message': 'Done', 'code': 0}, 'volumes': {'vmdata': 
{'transportType': ['TCP'], 'uuid': 
'e9b05f7a-f392-44f3-9d44-04761c36437d', 'bricks': 
['lovhm001.fabber.it:/vmdata', 'lovhm002.fabber.it:/vmdata'], 
'volumeName': 'vmdata', 'volumeType': 'REPLICATE', 'replicaCount': '2', 
'brickCount': '2', 'distCount': '2', 'volumeStatus': 'ONLINE', 
'stripeCount': '1', 'options': {'cluster.server-quorum-type': 'server', 
'cluster.eager-lock': 'enable', 'performance.stat-prefetch': 'off', 
'auth.allow': '*', 'cluster.quorum-type': 'auto', 
'performance.quick-read': 'off', 'network.remote-dio': 'enable', 
'nfs.disable': 'on', 'performance.io-cache': 'off', 
'server.allow-insecure': 'on', 'storage.owner-uid': '36', 'user.cifs': 
'disable', 'performance.read-ahead': 'off', 'storage.owner-gid': '36', 
'cluster.server-quorum-ratio': '50%'}}}}
Thread-300::DEBUG::2014-01-05 
19:12:41,345::task::579::TaskManager.Task::(_updateState) 
Task=`d049a71c-70a4-4dc2-9d69-99f1561ab405`::moving from state init -> 
state preparing
Thread-300::INFO::2014-01-05 
19:12:41,345::logUtils::44::dispatcher::(wrapper) Run and protect: 
repoStats(options=None)
Thread-300::INFO::2014-01-05 
19:12:41,345::logUtils::47::dispatcher::(wrapper) Run and protect: 
repoStats, Return response: {'85ad5f7d-3b67-4618-a871-f9ec886020a4': 
{'delay': '0.000370968', 'lastCheck': '8.9', 'code': 0, 'valid': True, 
'version': 3}, '6a9b4fa6-f393-4036-bd4e-0bc9dccb1594': {'delay': 
'0.00017524', 'lastCheck': '0.3', 'code': 0, 'valid': True, 'version': 0}}
Thread-300::DEBUG::2014-01-05 
19:12:41,346::task::1168::TaskManager.Task::(prepare) 
Task=`d049a71c-70a4-4dc2-9d69-99f1561ab405`::finished: 
{'85ad5f7d-3b67-4618-a871-f9ec886020a4': {'delay': '0.000370968', 
'lastCheck': '8.9', 'code': 0, 'valid': True, 'version': 3}, 
'6a9b4fa6-f393-4036-bd4e-0bc9dccb1594': {'delay': '0.00017524', 
'lastCheck': '0.3', 'code': 0, 'valid': True, 'version': 0}}
Thread-300::DEBUG::2014-01-05 
19:12:41,346::task::579::TaskManager.Task::(_updateState) 
Task=`d049a71c-70a4-4dc2-9d69-99f1561ab405`::moving from state preparing 
-> state finished
Thread-300::DEBUG::2014-01-05 
19:12:41,346::resourceManager::939::ResourceManager.Owner::(releaseAll) 
Owner.releaseAll requests {} resources {}
Thread-300::DEBUG::2014-01-05 
19:12:41,346::resourceManager::976::ResourceManager.Owner::(cancelAll) 
Owner.cancelAll requests {}
Thread-300::DEBUG::2014-01-05 
19:12:41,346::task::974::TaskManager.Task::(_decref) 
Task=`d049a71c-70a4-4dc2-9d69-99f1561ab405`::ref 0 aborting False
Thread-27::DEBUG::2014-01-05 
19:12:42,459::fileSD::239::Storage.Misc.excCmd::(getReadDelay) '/bin/dd 
iflag=direct 
if=/rhev/data-center/mnt/glusterSD/lovhm001:_vmdata/85ad5f7d-3b67-4618-a871-f9ec886020a4/dom_md/metadata 
bs=4096 count=1' (cwd None)
Thread-27::DEBUG::2014-01-05 
19:12:42,464::fileSD::239::Storage.Misc.excCmd::(getReadDelay) SUCCESS: 
<err> = '0+1 records in\n0+1 records out\n495 bytes (495 B) copied, 
0.000345681 s, 1.4 MB/s\n'; <rc> = 0
Thread-302::DEBUG::2014-01-05 
19:12:42,484::BindingXMLRPC::177::vds::(wrapper) client [5.39.66.85]
Thread-302::DEBUG::2014-01-05 
19:12:42,485::task::579::TaskManager.Task::(_updateState) 
Task=`acd727ae-dcbf-4662-bd97-fbdbadf6968a`::moving from state init -> 
state preparing
Thread-302::INFO::2014-01-05 
19:12:42,485::logUtils::44::dispatcher::(wrapper) Run and protect: 
getSpmStatus(spUUID='2eceb484-73e0-464a-965b-69f067918080', options=None)
Thread-302::INFO::2014-01-05 
19:12:42,485::logUtils::47::dispatcher::(wrapper) Run and protect: 
getSpmStatus, Return response: {'spm_st': {'spmId': 2, 'spmStatus': 
'SPM', 'spmLver': 3}}

thanks in advance
a

-- 
Amedeo Salvati
RHC{DS,E,VA} - LPIC-3 - UCP - NCLA 11
m. +39 333 1264484
email: amedeo at oscert.net
email: amedeo at linux.com
http://plugcomputing.it/redhatcert.php
http://plugcomputing.it/lpicert.php




More information about the Users mailing list