[Users] Stopping glusterfsd service shut down data center
Amedeo Salvati
amedeo at oscert.net
Sun Jan 5 18:47:28 UTC 2014
Hi all,
I'm testing ovirt+glusterfs with only two node for all (engine,
glusterfs, hypervisors), on centos 6.5 hosts following guide at:
http://community.redhat.com/blog/2013/09/up-and-running-with-ovirt-3-3/
http://www.gluster.org/2013/09/ovirt-3-3-glusterized/
but with some change like setting on glusterfs, parameter
cluster.server-quorum-ratio to 50% (due to prevent glusterfs to go down
if one node goes done) and option on /etc/glusterfs/glusterd.vol "option
base-port 50152" (due to libvirt port conflict).
So, with the above parameter I was able to stop/reboot node not used to
directly mount glusterfs (eg lovhm002), but when I stop/reboot node,
that is used to mount glusterfs (eg node lovhm001), all data center goes
done, especially when I stop service glusterfsd (not glusterd
service!!!), but the glusterfs still alive and is reachable on node
lovhm002 that survives but ovirt/libvirt marks DC/storage in error.
Do you have any ideas to configure DC/Cluster on ovirt that remains
aware if node used to mount glusterfs goes down?
This is a sample vdsmd.log on node that remains on-line (lovhm002) when
I stopped services glusterd and glusterfsd node lovhm001:
Thread-294::DEBUG::2014-01-05
19:12:32,475::task::1168::TaskManager.Task::(prepare)
Task=`a003cde0-a11a-489e-94c2-611f3d096a81`::finished: {'info':
{'spm_id': 2, 'master_uuid': '85ad5f7d-3b67-4618-a871-f9ec886020a4',
'name': 'PROD', 'version': '3', 'domains':
'85ad5f7d-3b67-4618-a871-f9ec886020a4:Active,6a9b4fa6-f393-4036-bd4e-0bc9dccb1594:Active',
'pool_status': 'connected', 'isoprefix':
'/rhev/data-center/mnt/lovhm001.fabber.it:_var_lib_exports_iso/6a9b4fa6-f393-4036-bd4e-0bc9dccb1594/images/11111111-1111-1111-1111-111111111111',
'type': 'GLUSTERFS', 'master_ver': 2, 'lver': 3}, 'dominfo':
{'85ad5f7d-3b67-4618-a871-f9ec886020a4': {'status': 'Active',
'diskfree': '374350675968', 'alerts': [], 'version': 3, 'disktotal':
'375626137600'}, '6a9b4fa6-f393-4036-bd4e-0bc9dccb1594': {'status':
'Active', 'diskfree': '45249200128', 'alerts': [], 'version': 0,
'disktotal': '51604619264'}}}
Thread-294::DEBUG::2014-01-05
19:12:32,476::task::579::TaskManager.Task::(_updateState)
Task=`a003cde0-a11a-489e-94c2-611f3d096a81`::moving from state preparing
-> state finished
Thread-294::DEBUG::2014-01-05
19:12:32,476::resourceManager::939::ResourceManager.Owner::(releaseAll)
Owner.releaseAll requests {} resources
{'Storage.2eceb484-73e0-464a-965b-69f067918080': < ResourceRef
'Storage.2eceb484-73e0-464a-965b-69f067918080', isValid: 'True' obj:
'None'>}
Thread-294::DEBUG::2014-01-05
19:12:32,476::resourceManager::976::ResourceManager.Owner::(cancelAll)
Owner.cancelAll requests {}
Thread-294::DEBUG::2014-01-05
19:12:32,476::resourceManager::615::ResourceManager::(releaseResource)
Trying to release resource 'Storage.2eceb484-73e0-464a-965b-69f067918080'
Thread-294::DEBUG::2014-01-05
19:12:32,476::resourceManager::634::ResourceManager::(releaseResource)
Released resource 'Storage.2eceb484-73e0-464a-965b-69f067918080' (0
active users)
Thread-294::DEBUG::2014-01-05
19:12:32,476::resourceManager::640::ResourceManager::(releaseResource)
Resource 'Storage.2eceb484-73e0-464a-965b-69f067918080' is free, finding
out if anyone is waiting for it.
Thread-294::DEBUG::2014-01-05
19:12:32,476::resourceManager::648::ResourceManager::(releaseResource)
No one is waiting for resource
'Storage.2eceb484-73e0-464a-965b-69f067918080', Clearing records.
Thread-294::DEBUG::2014-01-05
19:12:32,476::task::974::TaskManager.Task::(_decref)
Task=`a003cde0-a11a-489e-94c2-611f3d096a81`::ref 0 aborting False
Thread-296::DEBUG::2014-01-05
19:12:36,070::BindingXMLRPC::974::vds::(wrapper) client
[5.39.66.85]::call volumesList with () {} flowID [76f294ea]
Thread-296::DEBUG::2014-01-05
19:12:36,079::BindingXMLRPC::981::vds::(wrapper) return volumesList with
{'status': {'message': 'Done', 'code': 0}, 'volumes': {'vmdata':
{'transportType': ['TCP'], 'uuid':
'e9b05f7a-f392-44f3-9d44-04761c36437d', 'bricks':
['lovhm001.fabber.it:/vmdata', 'lovhm002.fabber.it:/vmdata'],
'volumeName': 'vmdata', 'volumeType': 'REPLICATE', 'replicaCount': '2',
'brickCount': '2', 'distCount': '2', 'volumeStatus': 'ONLINE',
'stripeCount': '1', 'options': {'cluster.server-quorum-type': 'server',
'cluster.eager-lock': 'enable', 'performance.stat-prefetch': 'off',
'auth.allow': '*', 'cluster.quorum-type': 'auto',
'performance.quick-read': 'off', 'network.remote-dio': 'enable',
'nfs.disable': 'on', 'performance.io-cache': 'off',
'server.allow-insecure': 'on', 'storage.owner-uid': '36', 'user.cifs':
'disable', 'performance.read-ahead': 'off', 'storage.owner-gid': '36',
'cluster.server-quorum-ratio': '50%'}}}}
libvirtEventLoop::INFO::2014-01-05
19:12:38,856::vm::4266::vm.Vm::(_onAbnormalStop)
vmId=`88997598-1db3-478a-bbe2-a7d234cfdc77`::abnormal vm stop device
virtio-disk0 error eother
libvirtEventLoop::DEBUG::2014-01-05
19:12:38,856::vm::4840::vm.Vm::(_onLibvirtLifecycleEvent)
vmId=`88997598-1db3-478a-bbe2-a7d234cfdc77`::event Suspended detail 2
opaque None
Thread-28::DEBUG::2014-01-05
19:12:40,996::fileSD::239::Storage.Misc.excCmd::(getReadDelay) '/bin/dd
iflag=direct
if=/rhev/data-center/mnt/lovhm001.fabber.it:_var_lib_exports_iso/6a9b4fa6-f393-4036-bd4e-0bc9dccb1594/dom_md/metadata
bs=4096 count=1' (cwd None)
Thread-28::DEBUG::2014-01-05
19:12:41,001::fileSD::239::Storage.Misc.excCmd::(getReadDelay) SUCCESS:
<err> = '0+1 records in\n0+1 records out\n361 bytes (361 B) copied,
0.00017524 s, 2.1 MB/s\n'; <rc> = 0
Thread-298::DEBUG::2014-01-05
19:12:41,088::BindingXMLRPC::974::vds::(wrapper) client
[5.39.66.85]::call volumesList with () {}
Thread-298::DEBUG::2014-01-05
19:12:41,097::BindingXMLRPC::981::vds::(wrapper) return volumesList with
{'status': {'message': 'Done', 'code': 0}, 'volumes': {'vmdata':
{'transportType': ['TCP'], 'uuid':
'e9b05f7a-f392-44f3-9d44-04761c36437d', 'bricks':
['lovhm001.fabber.it:/vmdata', 'lovhm002.fabber.it:/vmdata'],
'volumeName': 'vmdata', 'volumeType': 'REPLICATE', 'replicaCount': '2',
'brickCount': '2', 'distCount': '2', 'volumeStatus': 'ONLINE',
'stripeCount': '1', 'options': {'cluster.server-quorum-type': 'server',
'cluster.eager-lock': 'enable', 'performance.stat-prefetch': 'off',
'auth.allow': '*', 'cluster.quorum-type': 'auto',
'performance.quick-read': 'off', 'network.remote-dio': 'enable',
'nfs.disable': 'on', 'performance.io-cache': 'off',
'server.allow-insecure': 'on', 'storage.owner-uid': '36', 'user.cifs':
'disable', 'performance.read-ahead': 'off', 'storage.owner-gid': '36',
'cluster.server-quorum-ratio': '50%'}}}}
Thread-300::DEBUG::2014-01-05
19:12:41,345::task::579::TaskManager.Task::(_updateState)
Task=`d049a71c-70a4-4dc2-9d69-99f1561ab405`::moving from state init ->
state preparing
Thread-300::INFO::2014-01-05
19:12:41,345::logUtils::44::dispatcher::(wrapper) Run and protect:
repoStats(options=None)
Thread-300::INFO::2014-01-05
19:12:41,345::logUtils::47::dispatcher::(wrapper) Run and protect:
repoStats, Return response: {'85ad5f7d-3b67-4618-a871-f9ec886020a4':
{'delay': '0.000370968', 'lastCheck': '8.9', 'code': 0, 'valid': True,
'version': 3}, '6a9b4fa6-f393-4036-bd4e-0bc9dccb1594': {'delay':
'0.00017524', 'lastCheck': '0.3', 'code': 0, 'valid': True, 'version': 0}}
Thread-300::DEBUG::2014-01-05
19:12:41,346::task::1168::TaskManager.Task::(prepare)
Task=`d049a71c-70a4-4dc2-9d69-99f1561ab405`::finished:
{'85ad5f7d-3b67-4618-a871-f9ec886020a4': {'delay': '0.000370968',
'lastCheck': '8.9', 'code': 0, 'valid': True, 'version': 3},
'6a9b4fa6-f393-4036-bd4e-0bc9dccb1594': {'delay': '0.00017524',
'lastCheck': '0.3', 'code': 0, 'valid': True, 'version': 0}}
Thread-300::DEBUG::2014-01-05
19:12:41,346::task::579::TaskManager.Task::(_updateState)
Task=`d049a71c-70a4-4dc2-9d69-99f1561ab405`::moving from state preparing
-> state finished
Thread-300::DEBUG::2014-01-05
19:12:41,346::resourceManager::939::ResourceManager.Owner::(releaseAll)
Owner.releaseAll requests {} resources {}
Thread-300::DEBUG::2014-01-05
19:12:41,346::resourceManager::976::ResourceManager.Owner::(cancelAll)
Owner.cancelAll requests {}
Thread-300::DEBUG::2014-01-05
19:12:41,346::task::974::TaskManager.Task::(_decref)
Task=`d049a71c-70a4-4dc2-9d69-99f1561ab405`::ref 0 aborting False
Thread-27::DEBUG::2014-01-05
19:12:42,459::fileSD::239::Storage.Misc.excCmd::(getReadDelay) '/bin/dd
iflag=direct
if=/rhev/data-center/mnt/glusterSD/lovhm001:_vmdata/85ad5f7d-3b67-4618-a871-f9ec886020a4/dom_md/metadata
bs=4096 count=1' (cwd None)
Thread-27::DEBUG::2014-01-05
19:12:42,464::fileSD::239::Storage.Misc.excCmd::(getReadDelay) SUCCESS:
<err> = '0+1 records in\n0+1 records out\n495 bytes (495 B) copied,
0.000345681 s, 1.4 MB/s\n'; <rc> = 0
Thread-302::DEBUG::2014-01-05
19:12:42,484::BindingXMLRPC::177::vds::(wrapper) client [5.39.66.85]
Thread-302::DEBUG::2014-01-05
19:12:42,485::task::579::TaskManager.Task::(_updateState)
Task=`acd727ae-dcbf-4662-bd97-fbdbadf6968a`::moving from state init ->
state preparing
Thread-302::INFO::2014-01-05
19:12:42,485::logUtils::44::dispatcher::(wrapper) Run and protect:
getSpmStatus(spUUID='2eceb484-73e0-464a-965b-69f067918080', options=None)
Thread-302::INFO::2014-01-05
19:12:42,485::logUtils::47::dispatcher::(wrapper) Run and protect:
getSpmStatus, Return response: {'spm_st': {'spmId': 2, 'spmStatus':
'SPM', 'spmLver': 3}}
thanks in advance
a
--
Amedeo Salvati
RHC{DS,E,VA} - LPIC-3 - UCP - NCLA 11
m. +39 333 1264484
email: amedeo at oscert.net
email: amedeo at linux.com
http://plugcomputing.it/redhatcert.php
http://plugcomputing.it/lpicert.php
More information about the Users
mailing list