Re: [ovirt-users] Importing existing (dirty) storage domains

Some interesting output from the vdsm log... 2017-02-09 15:16:24,051 INFO (jsonrpc/1) [storage.StorageDomain] Resource namespace 01_img_60455567-ad30-42e3-a9df-62fe86c7fd25 already registered (sd:731) 2017-02-09 15:16:24,051 INFO (jsonrpc/1) [storage.StorageDomain] Resource namespace 02_vol_60455567-ad30-42e3-a9df-62fe86c7fd25 already registered (sd:740) 2017-02-09 15:16:24,052 INFO (jsonrpc/1) [storage.SANLock] Acquiring Lease(name='SDM', path=u'/rhev/data-center/mnt/glusterSD/localhost:data2/60455567-ad30-42e3-a9df-6 2fe86c7fd25/dom_md/leases', offset=1048576) for host id 1 (clusterlock:343) 2017-02-09 15:16:24,057 INFO (jsonrpc/1) [storage.SANLock] Releasing host id for domain 60455567-ad30-42e3-a9df-62fe86c7fd25 (id: 1) (clusterlock:305) 2017-02-09 15:16:25,149 INFO (jsonrpc/3) [jsonrpc.JsonRpcServer] RPC call GlusterHost.list succeeded in 0.17 seconds (__init__:515) 2017-02-09 15:16:25,264 INFO (Reactor thread) [ProtocolDetector.AcceptorImpl] Accepted connection from ::ffff: 127.0.0.1:55060 (protocoldetector:72) 2017-02-09 15:16:25,270 INFO (Reactor thread) [ProtocolDetector.Detector] Detected protocol stomp from ::ffff:127.0.0.1:55060 (protocoldetector:127) 2017-02-09 15:16:25,271 INFO (Reactor thread) [Broker.StompAdapter] Processing CONNECT request (stompreactor:102) 2017-02-09 15:16:25,271 INFO (JsonRpc (StompReactor)) [Broker.StompAdapter] Subscribe command received (stompreactor:129) 2017-02-09 15:16:25,416 INFO (jsonrpc/5) [jsonrpc.JsonRpcServer] RPC call Host.getHardwareInfo succeeded in 0.01 seconds (__init__:515) 2017-02-09 15:16:25,419 INFO (jsonrpc/6) [dispatcher] Run and protect: repoStats(options=None) (logUtils:49) 2017-02-09 15:16:25,419 INFO (jsonrpc/6) [dispatcher] Run and protect: repoStats, Return response: {u'e8d04da7-ad3d-4227-a45d-b5a29b2f43e5': {'code': 0, 'actual': True , 'version': 4, 'acquired': True, 'delay': '0.000854128', 'lastCheck': '5.1', 'valid': True}, u'a77b8821-ff19-4d17-a3ce-a6c3a69436d5': {'code': 0, 'actual': True, 'vers ion': 4, 'acquired': True, 'delay': '0.000966556', 'lastCheck': '2.6', 'valid': True}} (logUtils:52) 2017-02-09 15:16:25,447 INFO (jsonrpc/6) [jsonrpc.JsonRpcServer] RPC call Host.getStats succeeded in 0.03 seconds (__init__:515) 2017-02-09 15:16:25,450 ERROR (JsonRpc (StompReactor)) [vds.dispatcher] SSL error receiving from <yajsonrpc.betterAsyncore.Dispatcher connected ('::ffff:127.0.0.1', 55060, 0, 0) at 0x7f69c0043cf8>: unexpected eof (betterAsyncore:113) 2017-02-09 15:16:25,812 INFO (jsonrpc/7) [jsonrpc.JsonRpcServer] RPC call GlusterVolume.list succeeded in 0.10 seconds (__init__:515) 2017-02-09 15:16:25,940 INFO (Reactor thread) [ProtocolDetector.AcceptorImpl] Accepted connection from ::ffff: 127.0.0.1:55062 (protocoldetector:72) 2017-02-09 15:16:25,946 INFO (Reactor thread) [ProtocolDetector.Detector] Detected protocol stomp from ::ffff:127.0.0.1:55062 (protocoldetector:127) 2017-02-09 15:16:25,947 INFO (Reactor thread) [Broker.StompAdapter] Processing CONNECT request (stompreactor:102) 2017-02-09 15:16:25,947 INFO (JsonRpc (StompReactor)) [Broker.StompAdapter] Subscribe command received (stompreactor:129) 2017-02-09 15:16:26,058 ERROR (jsonrpc/1) [storage.TaskManager.Task] (Task='02cad901-5fe8-4f2d-895b-14184f67feab') Unexpected error (task:870) Traceback (most recent call last): File "/usr/share/vdsm/storage/task.py", line 877, in _run return fn(*args, **kargs) File "/usr/lib/python2.7/site-packages/vdsm/logUtils.py", line 50, in wrapper res = f(*args, **kwargs) File "/usr/share/vdsm/storage/hsm.py", line 812, in forcedDetachStorageDomain self._deatchStorageDomainFromOldPools(sdUUID) File "/usr/share/vdsm/storage/hsm.py", line 790, in _deatchStorageDomainFromOldPools dom.acquireClusterLock(host_id) File "/usr/share/vdsm/storage/sd.py", line 810, in acquireClusterLock self._manifest.acquireDomainLock(hostID) File "/usr/share/vdsm/storage/sd.py", line 499, in acquireDomainLock self._domainLock.acquire(hostID, self.getDomainLease()) File "/usr/lib/python2.7/site-packages/vdsm/storage/clusterlock.py", line 362, in acquire "Cannot acquire %s" % (lease,), str(e)) AcquireLockFailure: Cannot obtain lock: u"id=60455567-ad30-42e3-a9df-62fe86c7fd25, rc=5, out=Cannot acquire Lease(name='SDM', path=u'/rhev/data-center/mnt/glusterSD/localhost:data2/60455567-ad30-42e3-a9df-62fe86c7fd25/dom_md/leases', offset=1048576), err=(5, 'Sanlock resource not acquired', 'Input/output error')" 2017-02-09 15:16:26,058 INFO (jsonrpc/1) [storage.TaskManager.Task] (Task='02cad901-5fe8-4f2d-895b-14184f67feab') aborting: Task is aborted: 'Cannot obtain lock' - code 651 (task:1175) 2017-02-09 15:16:26,059 ERROR (jsonrpc/1) [storage.Dispatcher] {'status': {'message': 'Cannot obtain lock: u"id=60455567-ad30-42e3-a9df-62fe86c7fd25, rc=5, out=Cannot acquire Lease(name=\'SDM\', path=u\'/rhev/data-center/mnt/glusterSD/localhost:data2/60455567-ad30-42e3-a9df-62fe86c7fd25/dom_md/leases\', offset=1048576), err=(5, \'Sanlock resource not acquired\', \'Input/output error\')"', 'code': 651}} (dispatcher:77) 2017-02-09 15:16:26,059 INFO (jsonrpc/1) [jsonrpc.JsonRpcServer] RPC call StorageDomain.detach failed (error 651) in 23.04 seconds (__init__:515) -- Doug

Any errors in the gluster mount logs for this gluster volume? How about "gluster vol heal <volname> info" - does it list any entries to heal? On Thu, Feb 9, 2017 at 11:57 PM, Doug Ingham <dougti@gmail.com> wrote:
Some interesting output from the vdsm log...
2017-02-09 15:16:24,051 INFO (jsonrpc/1) [storage.StorageDomain] Resource namespace 01_img_60455567-ad30-42e3-a9df-62fe86c7fd25 already registered (sd:731) 2017-02-09 15:16:24,051 INFO (jsonrpc/1) [storage.StorageDomain] Resource namespace 02_vol_60455567-ad30-42e3-a9df-62fe86c7fd25 already registered (sd:740) 2017-02-09 15:16:24,052 INFO (jsonrpc/1) [storage.SANLock] Acquiring Lease(name='SDM', path=u'/rhev/data-center/mnt/glusterSD/localhost:data2/ 60455567-ad30-42e3-a9df-6 2fe86c7fd25/dom_md/leases', offset=1048576) for host id 1 (clusterlock:343) 2017-02-09 15:16:24,057 INFO (jsonrpc/1) [storage.SANLock] Releasing host id for domain 60455567-ad30-42e3-a9df-62fe86c7fd25 (id: 1) (clusterlock:305) 2017-02-09 15:16:25,149 INFO (jsonrpc/3) [jsonrpc.JsonRpcServer] RPC call GlusterHost.list succeeded in 0.17 seconds (__init__:515) 2017-02-09 15:16:25,264 INFO (Reactor thread) [ProtocolDetector.AcceptorImpl] Accepted connection from ::ffff:127.0.0.1:55060 (protocoldetector:72) 2017-02-09 15:16:25,270 INFO (Reactor thread) [ProtocolDetector.Detector] Detected protocol stomp from ::ffff:127.0.0.1:55060 (protocoldetector:127) 2017-02-09 15:16:25,271 INFO (Reactor thread) [Broker.StompAdapter] Processing CONNECT request (stompreactor:102) 2017-02-09 15:16:25,271 INFO (JsonRpc (StompReactor)) [Broker.StompAdapter] Subscribe command received (stompreactor:129) 2017-02-09 15:16:25,416 INFO (jsonrpc/5) [jsonrpc.JsonRpcServer] RPC call Host.getHardwareInfo succeeded in 0.01 seconds (__init__:515) 2017-02-09 15:16:25,419 INFO (jsonrpc/6) [dispatcher] Run and protect: repoStats(options=None) (logUtils:49) 2017-02-09 15:16:25,419 INFO (jsonrpc/6) [dispatcher] Run and protect: repoStats, Return response: {u'e8d04da7-ad3d-4227-a45d-b5a29b2f43e5': {'code': 0, 'actual': True , 'version': 4, 'acquired': True, 'delay': '0.000854128', 'lastCheck': '5.1', 'valid': True}, u'a77b8821-ff19-4d17-a3ce-a6c3a69436d5': {'code': 0, 'actual': True, 'vers ion': 4, 'acquired': True, 'delay': '0.000966556', 'lastCheck': '2.6', 'valid': True}} (logUtils:52) 2017-02-09 15:16:25,447 INFO (jsonrpc/6) [jsonrpc.JsonRpcServer] RPC call Host.getStats succeeded in 0.03 seconds (__init__:515) 2017-02-09 15:16:25,450 ERROR (JsonRpc (StompReactor)) [vds.dispatcher] SSL error receiving from <yajsonrpc.betterAsyncore.Dispatcher connected ('::ffff:127.0.0.1', 55060, 0, 0) at 0x7f69c0043cf8>: unexpected eof (betterAsyncore:113) 2017-02-09 15:16:25,812 INFO (jsonrpc/7) [jsonrpc.JsonRpcServer] RPC call GlusterVolume.list succeeded in 0.10 seconds (__init__:515) 2017-02-09 15:16:25,940 INFO (Reactor thread) [ProtocolDetector.AcceptorImpl] Accepted connection from ::ffff:127.0.0.1:55062 (protocoldetector:72) 2017-02-09 15:16:25,946 INFO (Reactor thread) [ProtocolDetector.Detector] Detected protocol stomp from ::ffff:127.0.0.1:55062 (protocoldetector:127) 2017-02-09 15:16:25,947 INFO (Reactor thread) [Broker.StompAdapter] Processing CONNECT request (stompreactor:102) 2017-02-09 15:16:25,947 INFO (JsonRpc (StompReactor)) [Broker.StompAdapter] Subscribe command received (stompreactor:129) 2017-02-09 15:16:26,058 ERROR (jsonrpc/1) [storage.TaskManager.Task] (Task='02cad901-5fe8-4f2d-895b-14184f67feab') Unexpected error (task:870) Traceback (most recent call last): File "/usr/share/vdsm/storage/task.py", line 877, in _run return fn(*args, **kargs) File "/usr/lib/python2.7/site-packages/vdsm/logUtils.py", line 50, in wrapper res = f(*args, **kwargs) File "/usr/share/vdsm/storage/hsm.py", line 812, in forcedDetachStorageDomain self._deatchStorageDomainFromOldPools(sdUUID) File "/usr/share/vdsm/storage/hsm.py", line 790, in _ deatchStorageDomainFromOldPools dom.acquireClusterLock(host_id) File "/usr/share/vdsm/storage/sd.py", line 810, in acquireClusterLock self._manifest.acquireDomainLock(hostID) File "/usr/share/vdsm/storage/sd.py", line 499, in acquireDomainLock self._domainLock.acquire(hostID, self.getDomainLease()) File "/usr/lib/python2.7/site-packages/vdsm/storage/clusterlock.py", line 362, in acquire "Cannot acquire %s" % (lease,), str(e)) AcquireLockFailure: Cannot obtain lock: u"id=60455567-ad30-42e3-a9df-62fe86c7fd25, rc=5, out=Cannot acquire Lease(name='SDM', path=u'/rhev/data-center/mnt/ glusterSD/localhost:data2/60455567-ad30-42e3-a9df-62fe86c7fd25/dom_md/leases', offset=1048576), err=(5, 'Sanlock resource not acquired', 'Input/output error')" 2017-02-09 15:16:26,058 INFO (jsonrpc/1) [storage.TaskManager.Task] (Task='02cad901-5fe8-4f2d-895b-14184f67feab') aborting: Task is aborted: 'Cannot obtain lock' - code 651 (task:1175) 2017-02-09 15:16:26,059 ERROR (jsonrpc/1) [storage.Dispatcher] {'status': {'message': 'Cannot obtain lock: u"id=60455567-ad30-42e3-a9df-62fe86c7fd25, rc=5, out=Cannot acquire Lease(name=\'SDM\', path=u\'/rhev/data-center/mnt/ glusterSD/localhost:data2/60455567-ad30-42e3-a9df-62fe86c7fd25/dom_md/leases\', offset=1048576), err=(5, \'Sanlock resource not acquired\', \'Input/output error\')"', 'code': 651}} (dispatcher:77) 2017-02-09 15:16:26,059 INFO (jsonrpc/1) [jsonrpc.JsonRpcServer] RPC call StorageDomain.detach failed (error 651) in 23.04 seconds (__init__:515)
-- Doug
_______________________________________________ Users mailing list Users@ovirt.org http://lists.ovirt.org/mailman/listinfo/users

Hi Sahina, On 13 February 2017 at 05:45, Sahina Bose <sabose@redhat.com> wrote:
Any errors in the gluster mount logs for this gluster volume?
How about "gluster vol heal <volname> info" - does it list any entries to heal?
After more investigating, I found out that there is a sanlock daemon that runs with VDSM, independently of the HE, so I'd basically have to bring the volume down & wait for the leases to expire/delete them* before I can import the domain. *I understand removing /dom_md/leases/ should do the job?
On Thu, Feb 9, 2017 at 11:57 PM, Doug Ingham <dougti@gmail.com> wrote:
Some interesting output from the vdsm log...
2017-02-09 15:16:24,051 INFO (jsonrpc/1) [storage.StorageDomain] Resource namespace 01_img_60455567-ad30-42e3-a9df-62fe86c7fd25 already registered (sd:731) 2017-02-09 15:16:24,051 INFO (jsonrpc/1) [storage.StorageDomain] Resource namespace 02_vol_60455567-ad30-42e3-a9df-62fe86c7fd25 already registered (sd:740) 2017-02-09 15:16:24,052 INFO (jsonrpc/1) [storage.SANLock] Acquiring Lease(name='SDM', path=u'/rhev/data-center/mnt/g lusterSD/localhost:data2/60455567-ad30-42e3-a9df-6 2fe86c7fd25/dom_md/leases', offset=1048576) for host id 1 (clusterlock:343) 2017-02-09 15:16:24,057 INFO (jsonrpc/1) [storage.SANLock] Releasing host id for domain 60455567-ad30-42e3-a9df-62fe86c7fd25 (id: 1) (clusterlock:305) 2017-02-09 15:16:25,149 INFO (jsonrpc/3) [jsonrpc.JsonRpcServer] RPC call GlusterHost.list succeeded in 0.17 seconds (__init__:515) 2017-02-09 15:16:25,264 INFO (Reactor thread) [ProtocolDetector.AcceptorImpl] Accepted connection from ::ffff: 127.0.0.1:55060 (protocoldetector:72) 2017-02-09 15:16:25,270 INFO (Reactor thread) [ProtocolDetector.Detector] Detected protocol stomp from ::ffff: 127.0.0.1:55060 (protocoldetector:127) 2017-02-09 15:16:25,271 INFO (Reactor thread) [Broker.StompAdapter] Processing CONNECT request (stompreactor:102) 2017-02-09 15:16:25,271 INFO (JsonRpc (StompReactor)) [Broker.StompAdapter] Subscribe command received (stompreactor:129) 2017-02-09 15:16:25,416 INFO (jsonrpc/5) [jsonrpc.JsonRpcServer] RPC call Host.getHardwareInfo succeeded in 0.01 seconds (__init__:515) 2017-02-09 15:16:25,419 INFO (jsonrpc/6) [dispatcher] Run and protect: repoStats(options=None) (logUtils:49) 2017-02-09 15:16:25,419 INFO (jsonrpc/6) [dispatcher] Run and protect: repoStats, Return response: {u'e8d04da7-ad3d-4227-a45d-b5a29b2f43e5': {'code': 0, 'actual': True , 'version': 4, 'acquired': True, 'delay': '0.000854128', 'lastCheck': '5.1', 'valid': True}, u'a77b8821-ff19-4d17-a3ce-a6c3a69436d5': {'code': 0, 'actual': True, 'vers ion': 4, 'acquired': True, 'delay': '0.000966556', 'lastCheck': '2.6', 'valid': True}} (logUtils:52) 2017-02-09 15:16:25,447 INFO (jsonrpc/6) [jsonrpc.JsonRpcServer] RPC call Host.getStats succeeded in 0.03 seconds (__init__:515) 2017-02-09 15:16:25,450 ERROR (JsonRpc (StompReactor)) [vds.dispatcher] SSL error receiving from <yajsonrpc.betterAsyncore.Dispatcher connected ('::ffff:127.0.0.1', 55060, 0, 0) at 0x7f69c0043cf8>: unexpected eof (betterAsyncore:113) 2017-02-09 15:16:25,812 INFO (jsonrpc/7) [jsonrpc.JsonRpcServer] RPC call GlusterVolume.list succeeded in 0.10 seconds (__init__:515) 2017-02-09 15:16:25,940 INFO (Reactor thread) [ProtocolDetector.AcceptorImpl] Accepted connection from ::ffff: 127.0.0.1:55062 (protocoldetector:72) 2017-02-09 15:16:25,946 INFO (Reactor thread) [ProtocolDetector.Detector] Detected protocol stomp from ::ffff: 127.0.0.1:55062 (protocoldetector:127) 2017-02-09 15:16:25,947 INFO (Reactor thread) [Broker.StompAdapter] Processing CONNECT request (stompreactor:102) 2017-02-09 15:16:25,947 INFO (JsonRpc (StompReactor)) [Broker.StompAdapter] Subscribe command received (stompreactor:129) 2017-02-09 15:16:26,058 ERROR (jsonrpc/1) [storage.TaskManager.Task] (Task='02cad901-5fe8-4f2d-895b-14184f67feab') Unexpected error (task:870) Traceback (most recent call last): File "/usr/share/vdsm/storage/task.py", line 877, in _run return fn(*args, **kargs) File "/usr/lib/python2.7/site-packages/vdsm/logUtils.py", line 50, in wrapper res = f(*args, **kwargs) File "/usr/share/vdsm/storage/hsm.py", line 812, in forcedDetachStorageDomain self._deatchStorageDomainFromOldPools(sdUUID) File "/usr/share/vdsm/storage/hsm.py", line 790, in _deatchStorageDomainFromOldPools dom.acquireClusterLock(host_id) File "/usr/share/vdsm/storage/sd.py", line 810, in acquireClusterLock self._manifest.acquireDomainLock(hostID) File "/usr/share/vdsm/storage/sd.py", line 499, in acquireDomainLock self._domainLock.acquire(hostID, self.getDomainLease()) File "/usr/lib/python2.7/site-packages/vdsm/storage/clusterlock.py", line 362, in acquire "Cannot acquire %s" % (lease,), str(e)) AcquireLockFailure: Cannot obtain lock: u"id=60455567-ad30-42e3-a9df-62fe86c7fd25, rc=5, out=Cannot acquire Lease(name='SDM', path=u'/rhev/data-center/mnt/g lusterSD/localhost:data2/60455567-ad30-42e3-a9df-62fe86c7fd25/dom_md/leases', offset=1048576), err=(5, 'Sanlock resource not acquired', 'Input/output error')" 2017-02-09 15:16:26,058 INFO (jsonrpc/1) [storage.TaskManager.Task] (Task='02cad901-5fe8-4f2d-895b-14184f67feab') aborting: Task is aborted: 'Cannot obtain lock' - code 651 (task:1175) 2017-02-09 15:16:26,059 ERROR (jsonrpc/1) [storage.Dispatcher] {'status': {'message': 'Cannot obtain lock: u"id=60455567-ad30-42e3-a9df-62fe86c7fd25, rc=5, out=Cannot acquire Lease(name=\'SDM\', path=u\'/rhev/data-center/mnt/ glusterSD/localhost:data2/60455567-ad30-42e3-a9df-62fe86c7fd25/dom_md/leases\', offset=1048576), err=(5, \'Sanlock resource not acquired\', \'Input/output error\')"', 'code': 651}} (dispatcher:77) 2017-02-09 15:16:26,059 INFO (jsonrpc/1) [jsonrpc.JsonRpcServer] RPC call StorageDomain.detach failed (error 651) in 23.04 seconds (__init__:515)
-- Doug
_______________________________________________ Users mailing list Users@ovirt.org http://lists.ovirt.org/mailman/listinfo/users
-- Doug

On Mon, Feb 13, 2017 at 3:35 PM, Doug Ingham <dougti@gmail.com> wrote:
Hi Sahina,
On 13 February 2017 at 05:45, Sahina Bose <sabose@redhat.com> wrote:
Any errors in the gluster mount logs for this gluster volume?
How about "gluster vol heal <volname> info" - does it list any entries to heal?
After more investigating, I found out that there is a sanlock daemon that runs with VDSM, independently of the HE, so I'd basically have to bring the volume down & wait for the leases to expire/delete them* before I can import the domain.
*I understand removing /dom_md/leases/ should do the job?
No, the issue is probably dom_md/ids accessed by sanlock, but removing files accessed by sanlock will not help, an open file will remain open until sanlock close the file. Did you try to reboot the host before installing it again? If you did and you still have these issues, you probably need to remove the previous installation properly before installing again. Adding Simone to help with uninstalling and reinstalling hosted engine. Nir
On Thu, Feb 9, 2017 at 11:57 PM, Doug Ingham <dougti@gmail.com> wrote:
Some interesting output from the vdsm log...
2017-02-09 15:16:24,051 INFO (jsonrpc/1) [storage.StorageDomain] Resource namespace 01_img_60455567-ad30-42e3-a9df-62fe86c7fd25 already registered (sd:731) 2017-02-09 15:16:24,051 INFO (jsonrpc/1) [storage.StorageDomain] Resource namespace 02_vol_60455567-ad30-42e3-a9df-62fe86c7fd25 already registered (sd:740) 2017-02-09 15:16:24,052 INFO (jsonrpc/1) [storage.SANLock] Acquiring Lease(name='SDM', path=u'/rhev/data-center/mnt/glusterSD/localhost:data2/60455567-ad30-42e3-a9df-6 2fe86c7fd25/dom_md/leases', offset=1048576) for host id 1 (clusterlock:343) 2017-02-09 15:16:24,057 INFO (jsonrpc/1) [storage.SANLock] Releasing host id for domain 60455567-ad30-42e3-a9df-62fe86c7fd25 (id: 1) (clusterlock:305) 2017-02-09 15:16:25,149 INFO (jsonrpc/3) [jsonrpc.JsonRpcServer] RPC call GlusterHost.list succeeded in 0.17 seconds (__init__:515) 2017-02-09 15:16:25,264 INFO (Reactor thread) [ProtocolDetector.AcceptorImpl] Accepted connection from ::ffff:127.0.0.1:55060 (protocoldetector:72) 2017-02-09 15:16:25,270 INFO (Reactor thread) [ProtocolDetector.Detector] Detected protocol stomp from ::ffff:127.0.0.1:55060 (protocoldetector:127) 2017-02-09 15:16:25,271 INFO (Reactor thread) [Broker.StompAdapter] Processing CONNECT request (stompreactor:102) 2017-02-09 15:16:25,271 INFO (JsonRpc (StompReactor)) [Broker.StompAdapter] Subscribe command received (stompreactor:129) 2017-02-09 15:16:25,416 INFO (jsonrpc/5) [jsonrpc.JsonRpcServer] RPC call Host.getHardwareInfo succeeded in 0.01 seconds (__init__:515) 2017-02-09 15:16:25,419 INFO (jsonrpc/6) [dispatcher] Run and protect: repoStats(options=None) (logUtils:49) 2017-02-09 15:16:25,419 INFO (jsonrpc/6) [dispatcher] Run and protect: repoStats, Return response: {u'e8d04da7-ad3d-4227-a45d-b5a29b2f43e5': {'code': 0, 'actual': True , 'version': 4, 'acquired': True, 'delay': '0.000854128', 'lastCheck': '5.1', 'valid': True}, u'a77b8821-ff19-4d17-a3ce-a6c3a69436d5': {'code': 0, 'actual': True, 'vers ion': 4, 'acquired': True, 'delay': '0.000966556', 'lastCheck': '2.6', 'valid': True}} (logUtils:52) 2017-02-09 15:16:25,447 INFO (jsonrpc/6) [jsonrpc.JsonRpcServer] RPC call Host.getStats succeeded in 0.03 seconds (__init__:515) 2017-02-09 15:16:25,450 ERROR (JsonRpc (StompReactor)) [vds.dispatcher] SSL error receiving from <yajsonrpc.betterAsyncore.Dispatcher connected ('::ffff:127.0.0.1', 55060, 0, 0) at 0x7f69c0043cf8>: unexpected eof (betterAsyncore:113) 2017-02-09 15:16:25,812 INFO (jsonrpc/7) [jsonrpc.JsonRpcServer] RPC call GlusterVolume.list succeeded in 0.10 seconds (__init__:515) 2017-02-09 15:16:25,940 INFO (Reactor thread) [ProtocolDetector.AcceptorImpl] Accepted connection from ::ffff:127.0.0.1:55062 (protocoldetector:72) 2017-02-09 15:16:25,946 INFO (Reactor thread) [ProtocolDetector.Detector] Detected protocol stomp from ::ffff:127.0.0.1:55062 (protocoldetector:127) 2017-02-09 15:16:25,947 INFO (Reactor thread) [Broker.StompAdapter] Processing CONNECT request (stompreactor:102) 2017-02-09 15:16:25,947 INFO (JsonRpc (StompReactor)) [Broker.StompAdapter] Subscribe command received (stompreactor:129) 2017-02-09 15:16:26,058 ERROR (jsonrpc/1) [storage.TaskManager.Task] (Task='02cad901-5fe8-4f2d-895b-14184f67feab') Unexpected error (task:870) Traceback (most recent call last): File "/usr/share/vdsm/storage/task.py", line 877, in _run return fn(*args, **kargs) File "/usr/lib/python2.7/site-packages/vdsm/logUtils.py", line 50, in wrapper res = f(*args, **kwargs) File "/usr/share/vdsm/storage/hsm.py", line 812, in forcedDetachStorageDomain self._deatchStorageDomainFromOldPools(sdUUID) File "/usr/share/vdsm/storage/hsm.py", line 790, in _deatchStorageDomainFromOldPools dom.acquireClusterLock(host_id) File "/usr/share/vdsm/storage/sd.py", line 810, in acquireClusterLock self._manifest.acquireDomainLock(hostID) File "/usr/share/vdsm/storage/sd.py", line 499, in acquireDomainLock self._domainLock.acquire(hostID, self.getDomainLease()) File "/usr/lib/python2.7/site-packages/vdsm/storage/clusterlock.py", line 362, in acquire "Cannot acquire %s" % (lease,), str(e)) AcquireLockFailure: Cannot obtain lock: u"id=60455567-ad30-42e3-a9df-62fe86c7fd25, rc=5, out=Cannot acquire Lease(name='SDM', path=u'/rhev/data-center/mnt/glusterSD/localhost:data2/60455567-ad30-42e3-a9df-62fe86c7fd25/dom_md/leases', offset=1048576), err=(5, 'Sanlock resource not acquired', 'Input/output error')" 2017-02-09 15:16:26,058 INFO (jsonrpc/1) [storage.TaskManager.Task] (Task='02cad901-5fe8-4f2d-895b-14184f67feab') aborting: Task is aborted: 'Cannot obtain lock' - code 651 (task:1175) 2017-02-09 15:16:26,059 ERROR (jsonrpc/1) [storage.Dispatcher] {'status': {'message': 'Cannot obtain lock: u"id=60455567-ad30-42e3-a9df-62fe86c7fd25, rc=5, out=Cannot acquire Lease(name=\'SDM\', path=u\'/rhev/data-center/mnt/glusterSD/localhost:data2/60455567-ad30-42e3-a9df-62fe86c7fd25/dom_md/leases\', offset=1048576), err=(5, \'Sanlock resource not acquired\', \'Input/output error\')"', 'code': 651}} (dispatcher:77) 2017-02-09 15:16:26,059 INFO (jsonrpc/1) [jsonrpc.JsonRpcServer] RPC call StorageDomain.detach failed (error 651) in 23.04 seconds (__init__:515)
-- Doug
_______________________________________________ Users mailing list Users@ovirt.org http://lists.ovirt.org/mailman/listinfo/users
-- Doug
_______________________________________________ Users mailing list Users@ovirt.org http://lists.ovirt.org/mailman/listinfo/users

Hi Nir, On 16 February 2017 at 13:55, Nir Soffer <nsoffer@redhat.com> wrote:
Hi Sahina,
On 13 February 2017 at 05:45, Sahina Bose <sabose@redhat.com> wrote:
Any errors in the gluster mount logs for this gluster volume?
How about "gluster vol heal <volname> info" - does it list any entries
to
heal?
After more investigating, I found out that there is a sanlock daemon that runs with VDSM, independently of the HE, so I'd basically have to bring
On Mon, Feb 13, 2017 at 3:35 PM, Doug Ingham <dougti@gmail.com> wrote: the
volume down & wait for the leases to expire/delete them* before I can import the domain.
*I understand removing /dom_md/leases/ should do the job?
No, the issue is probably dom_md/ids accessed by sanlock, but removing files accessed by sanlock will not help, an open file will remain open until sanlock close the file.
I'm testing this with volume snapshots at the moment, so there are no processes accessing the new volume. Did you try to reboot the host before installing it again? If you did and
you still have these issues, you probably need to remove the previous installation properly before installing again.
Adding Simone to help with uninstalling and reinstalling hosted engine.
The Hosted-Engine database had been corrupted and the restore wasn't running correctly, so I installed a new engine on a new server - no restores or old data. The aim is to import the old storage domain into the new Engine & then import the VMs into the new storage domain. My only problem with this is that there appear to be some file based leases somewhere that, unless I manage to locate & delete them, force me to wait for the leases to timeout before I can import the old storage domain. To minimise downtime, I'm trying to avoid having to wait for the leases to timeout. Regards, Doug
Nir
On Thu, Feb 9, 2017 at 11:57 PM, Doug Ingham <dougti@gmail.com> wrote:
Some interesting output from the vdsm log...
2017-02-09 15:16:24,051 INFO (jsonrpc/1) [storage.StorageDomain] Resource namespace 01_img_60455567-ad30-42e3-a9df-62fe86c7fd25 already registered (sd:731) 2017-02-09 15:16:24,051 INFO (jsonrpc/1) [storage.StorageDomain] Resource namespace 02_vol_60455567-ad30-42e3-a9df-62fe86c7fd25 already registered (sd:740) 2017-02-09 15:16:24,052 INFO (jsonrpc/1) [storage.SANLock] Acquiring Lease(name='SDM', path=u'/rhev/data-center/mnt/glusterSD/localhost:data2/
60455567-ad30-42e3-a9df-6
2fe86c7fd25/dom_md/leases', offset=1048576) for host id 1 (clusterlock:343) 2017-02-09 15:16:24,057 INFO (jsonrpc/1) [storage.SANLock] Releasing host id for domain 60455567-ad30-42e3-a9df-62fe86c7fd25 (id: 1) (clusterlock:305) 2017-02-09 15:16:25,149 INFO (jsonrpc/3) [jsonrpc.JsonRpcServer] RPC call GlusterHost.list succeeded in 0.17 seconds (__init__:515) 2017-02-09 15:16:25,264 INFO (Reactor thread) [ProtocolDetector.AcceptorImpl] Accepted connection from ::ffff:127.0.0.1:55060 (protocoldetector:72) 2017-02-09 15:16:25,270 INFO (Reactor thread) [ProtocolDetector.Detector] Detected protocol stomp from ::ffff:127.0.0.1:55060 (protocoldetector:127) 2017-02-09 15:16:25,271 INFO (Reactor thread) [Broker.StompAdapter] Processing CONNECT request (stompreactor:102) 2017-02-09 15:16:25,271 INFO (JsonRpc (StompReactor)) [Broker.StompAdapter] Subscribe command received (stompreactor:129) 2017-02-09 15:16:25,416 INFO (jsonrpc/5) [jsonrpc.JsonRpcServer] RPC call Host.getHardwareInfo succeeded in 0.01 seconds (__init__:515) 2017-02-09 15:16:25,419 INFO (jsonrpc/6) [dispatcher] Run and protect: repoStats(options=None) (logUtils:49) 2017-02-09 15:16:25,419 INFO (jsonrpc/6) [dispatcher] Run and protect: repoStats, Return response: {u'e8d04da7-ad3d-4227-a45d-b5a29b2f43e5': {'code': 0, 'actual': True , 'version': 4, 'acquired': True, 'delay': '0.000854128', 'lastCheck': '5.1', 'valid': True}, u'a77b8821-ff19-4d17-a3ce-a6c3a69436d5': {'code': 0, 'actual': True, 'vers ion': 4, 'acquired': True, 'delay': '0.000966556', 'lastCheck': '2.6', 'valid': True}} (logUtils:52) 2017-02-09 15:16:25,447 INFO (jsonrpc/6) [jsonrpc.JsonRpcServer] RPC call Host.getStats succeeded in 0.03 seconds (__init__:515) 2017-02-09 15:16:25,450 ERROR (JsonRpc (StompReactor)) [vds.dispatcher] SSL error receiving from <yajsonrpc.betterAsyncore.Dispatcher connected ('::ffff:127.0.0.1', 55060, 0, 0) at 0x7f69c0043cf8>: unexpected eof (betterAsyncore:113) 2017-02-09 15:16:25,812 INFO (jsonrpc/7) [jsonrpc.JsonRpcServer] RPC call GlusterVolume.list succeeded in 0.10 seconds (__init__:515) 2017-02-09 15:16:25,940 INFO (Reactor thread) [ProtocolDetector.AcceptorImpl] Accepted connection from ::ffff:127.0.0.1:55062 (protocoldetector:72) 2017-02-09 15:16:25,946 INFO (Reactor thread) [ProtocolDetector.Detector] Detected protocol stomp from ::ffff:127.0.0.1:55062 (protocoldetector:127) 2017-02-09 15:16:25,947 INFO (Reactor thread) [Broker.StompAdapter] Processing CONNECT request (stompreactor:102) 2017-02-09 15:16:25,947 INFO (JsonRpc (StompReactor)) [Broker.StompAdapter] Subscribe command received (stompreactor:129) 2017-02-09 15:16:26,058 ERROR (jsonrpc/1) [storage.TaskManager.Task] (Task='02cad901-5fe8-4f2d-895b-14184f67feab') Unexpected error (task:870) Traceback (most recent call last): File "/usr/share/vdsm/storage/task.py", line 877, in _run return fn(*args, **kargs) File "/usr/lib/python2.7/site-packages/vdsm/logUtils.py", line 50, in wrapper res = f(*args, **kwargs) File "/usr/share/vdsm/storage/hsm.py", line 812, in forcedDetachStorageDomain self._deatchStorageDomainFromOldPools(sdUUID) File "/usr/share/vdsm/storage/hsm.py", line 790, in _deatchStorageDomainFromOldPools dom.acquireClusterLock(host_id) File "/usr/share/vdsm/storage/sd.py", line 810, in acquireClusterLock self._manifest.acquireDomainLock(hostID) File "/usr/share/vdsm/storage/sd.py", line 499, in acquireDomainLock self._domainLock.acquire(hostID, self.getDomainLease()) File "/usr/lib/python2.7/site-packages/vdsm/storage/clusterlock.py", line 362, in acquire "Cannot acquire %s" % (lease,), str(e)) AcquireLockFailure: Cannot obtain lock: u"id=60455567-ad30-42e3-a9df-62fe86c7fd25, rc=5, out=Cannot acquire Lease(name='SDM', path=u'/rhev/data-center/mnt/glusterSD/localhost:data2/ 60455567-ad30-42e3-a9df-62fe86c7fd25/dom_md/leases', offset=1048576), err=(5, 'Sanlock resource not acquired', 'Input/output error')" 2017-02-09 15:16:26,058 INFO (jsonrpc/1) [storage.TaskManager.Task] (Task='02cad901-5fe8-4f2d-895b-14184f67feab') aborting: Task is aborted: 'Cannot obtain lock' - code 651 (task:1175) 2017-02-09 15:16:26,059 ERROR (jsonrpc/1) [storage.Dispatcher] {'status': {'message': 'Cannot obtain lock: u"id=60455567-ad30-42e3-a9df- 62fe86c7fd25, rc=5, out=Cannot acquire Lease(name=\'SDM\', path=u\'/rhev/data-center/mnt/glusterSD/localhost:data2/ 60455567-ad30-42e3-a9df-62fe86c7fd25/dom_md/leases\', offset=1048576), err=(5, \'Sanlock resource not acquired\', \'Input/output error\')"', 'code': 651}} (dispatcher:77) 2017-02-09 15:16:26,059 INFO (jsonrpc/1) [jsonrpc.JsonRpcServer] RPC call StorageDomain.detach failed (error 651) in 23.04 seconds (__init__:515)
-- Doug
_______________________________________________ Users mailing list Users@ovirt.org http://lists.ovirt.org/mailman/listinfo/users
-- Doug
_______________________________________________ Users mailing list Users@ovirt.org http://lists.ovirt.org/mailman/listinfo/users
-- Doug

Well that didn't go so well. I deleted both dom_md/ids & dom_md/leases in the cloned volume, and I still can't import the storage domain. The snapshot was also taken some 4 hours before the attempted import, so I'm surprised the locks haven't expired by themselves... 2017-02-16 21:58:24,630-03 INFO [org.ovirt.engine.core.bll.storage.connection.AddStorageServerConnectionCommand] (default task-45) [d59bc8c0-3c53-4a34-9d7c-8c982ee14e14] Lock Acquired to object 'EngineLock:{exclusiveLocks='[localhost:data-teste2=<STORAGE_CONNECTION, ACTION_TYPE_FAILED_OBJECT_LOCKED>]', sharedLocks='null'}' 2017-02-16 21:58:24,645-03 INFO [org.ovirt.engine.core.vdsbroker.vdsbroker.ConnectStorageServerVDSCommand] (default task-45) [d59bc8c0-3c53-4a34-9d7c-8c982ee14e14] START, ConnectStorageServerVDSCommand(HostName = v5.dc0.example.com, StorageServerConnectionManagementVDSParameters:{runAsync='true', hostId='1a3f10f2-e4ce-44b9-9495-06e445cfa0b0', storagePoolId='00000000-0000-0000-0000-000000000000', storageType='GLUSTERFS', connectionList='[StorageServerConnections:{id='null', connection='localhost:data-teste2', iqn='null', vfsType='glusterfs', mountOptions='null', nfsVersion='null', nfsRetrans='null', nfsTimeo='null', iface='null', netIfaceName='null'}]'}), log id: 726df65e 2017-02-16 21:58:26,046-03 INFO [org.ovirt.engine.core.bll.storage.connection.AddStorageServerConnectionCommand] (default task-45) [d59bc8c0-3c53-4a34-9d7c-8c982ee14e14] Lock freed to object 'EngineLock:{exclusiveLocks='[localhost:data teste2=<STORAGE_CONNECTION, ACTION_TYPE_FAILED_OBJECT_LOCKED>]', sharedLocks='null'}' 2017-02-16 21:58:26,206-03 INFO [org.ovirt.engine.core.vdsbroker.vdsbroker.HSMGetStorageDomainsListVDSCommand] (default task-52) [85548427-713f-4ffb-a385-a97a7ee4109d] START, HSMGetStorageDomainsListVDSCommand(HostName = v5.dc0.example.com, HSMGetStorageDomainsListVDSCommandParameters:{runAsync='true', hostId='1a3f10f2-e4ce-44b9-9495-06e445cfa0b0', storagePoolId='00000000-0000-0000-0000-000000000000', storageType='null', storageDomainType='Data', path='localhost:data-teste2'}), log id: 79f6cc88 2017-02-16 21:58:27,899-03 INFO [org.ovirt.engine.core.vdsbroker.vdsbroker.HSMGetStorageDomainsListVDSCommand] (default task-50) [38e87311-a7a5-49a8-bf18-857dd969cd5f] START, HSMGetStorageDomainsListVDSCommand(HostName = v5.dc0.example.com, HSMGetStorageDomainsListVDSCommandParameters:{runAsync='true', hostId='1a3f10f2-e4ce-44b9-9495-06e445cfa0b0', storagePoolId='00000000-0000-0000-0000-000000000000', storageType='null', storageDomainType='Data', path='localhost:data-teste2'}), log id: 7280d13 2017-02-16 21:58:29,156-03 INFO [org.ovirt.engine.core.bll.storage.connection.RemoveStorageServerConnectionCommand] (default task-56) [1b3826e4-4890-43d4-8854-16f3c573a31f] Lock Acquired to object 'EngineLock:{exclusiveLocks='[localhost:data-teste2=<STORAGE_CONNECTION, ACTION_TYPE_FAILED_OBJECT_LOCKED>, 5e5f6610-c759-448b-a53d-9a456f513681=<STORAGE_CONNECTION, ACTION_TYPE_FAILED_OBJECT_LOCKED>]', sharedLocks='null'}' 2017-02-16 21:58:29,168-03 INFO [org.ovirt.engine.core.vdsbroker.vdsbroker.DisconnectStorageServerVDSCommand] (default task-57) [5e4b20cf-60d2-4ae9-951b-c2693603aa6f] START, DisconnectStorageServerVDSCommand(HostName = v5.dc0.example.com, StorageServerConnectionManagementVDSParameters:{runAsync='true', hostId='1a3f10f2-e4ce-44b9-9495-06e445cfa0b0', storagePoolId='00000000-0000-0000-0000-000000000000', storageType='GLUSTERFS', connectionList='[StorageServerConnections:{id='5e5f6610-c759-448b-a53d-9a456f513681', connection='localhost:data-teste2', iqn='null', vfsType='glusterfs', mountOptions='null', nfsVersion='null', nfsRetrans='null', nfsTimeo='null', iface='null', netIfaceName='null'}]'}), log id: 6042b108 2017-02-16 21:58:29,193-03 INFO [org.ovirt.engine.core.vdsbroker.vdsbroker.DisconnectStorageServerVDSCommand] (default task-56) [1b3826e4-4890-43d4-8854-16f3c573a31f] START, DisconnectStorageServerVDSCommand(HostName = v5.dc0.example.com, StorageServerConnectionManagementVDSParameters:{runAsync='true', hostId='1a3f10f2-e4ce-44b9-9495-06e445cfa0b0', storagePoolId='00000000-0000-0000-0000-000000000000', storageType='GLUSTERFS', connectionList='[StorageServerConnections:{id='5e5f6610-c759-448b-a53d-9a456f513681', connection='localhost:data-teste2', iqn='null', vfsType='glusterfs', mountOptions='null', nfsVersion='null', nfsRetrans='null', nfsTimeo='null', iface='null', netIfaceName='null'}]'}), log id: 4e9421cf 2017-02-16 21:58:31,398-03 INFO [org.ovirt.engine.core.bll.storage.connection.RemoveStorageServerConnectionCommand] (default task-56) [1b3826e4-4890-43d4-8854-16f3c573a31f] Lock freed to object 'EngineLock:{exclusiveLocks='[localhost:data-teste2=<STORAGE_CONNECTION, ACTION_TYPE_FAILED_OBJECT_LOCKED>, 5e5f6610-c759-448b-a53d-9a456f513681=<STORAGE_CONNECTION, ACTION_TYPE_FAILED_OBJECT_LOCKED>]', sharedLocks='null'}' Again, many thanks! Doug On 16 February 2017 at 18:53, Doug Ingham <dougti@gmail.com> wrote:
Hi Nir,
On 16 February 2017 at 13:55, Nir Soffer <nsoffer@redhat.com> wrote:
Hi Sahina,
On 13 February 2017 at 05:45, Sahina Bose <sabose@redhat.com> wrote:
Any errors in the gluster mount logs for this gluster volume?
How about "gluster vol heal <volname> info" - does it list any entries
to
heal?
After more investigating, I found out that there is a sanlock daemon
runs with VDSM, independently of the HE, so I'd basically have to bring
On Mon, Feb 13, 2017 at 3:35 PM, Doug Ingham <dougti@gmail.com> wrote: that the
volume down & wait for the leases to expire/delete them* before I can import the domain.
*I understand removing /dom_md/leases/ should do the job?
No, the issue is probably dom_md/ids accessed by sanlock, but removing files accessed by sanlock will not help, an open file will remain open until sanlock close the file.
I'm testing this with volume snapshots at the moment, so there are no processes accessing the new volume.
Did you try to reboot the host before installing it again? If you did and
you still have these issues, you probably need to remove the previous installation properly before installing again.
Adding Simone to help with uninstalling and reinstalling hosted engine.
The Hosted-Engine database had been corrupted and the restore wasn't running correctly, so I installed a new engine on a new server - no restores or old data. The aim is to import the old storage domain into the new Engine & then import the VMs into the new storage domain. My only problem with this is that there appear to be some file based leases somewhere that, unless I manage to locate & delete them, force me to wait for the leases to timeout before I can import the old storage domain. To minimise downtime, I'm trying to avoid having to wait for the leases to timeout.
Regards, Doug
Nir
On Thu, Feb 9, 2017 at 11:57 PM, Doug Ingham <dougti@gmail.com> wrote:
Some interesting output from the vdsm log...
2017-02-09 15:16:24,051 INFO (jsonrpc/1) [storage.StorageDomain] Resource namespace 01_img_60455567-ad30-42e3-a9df-62fe86c7fd25
registered (sd:731) 2017-02-09 15:16:24,051 INFO (jsonrpc/1) [storage.StorageDomain] Resource namespace 02_vol_60455567-ad30-42e3-a9df-62fe86c7fd25 already registered (sd:740) 2017-02-09 15:16:24,052 INFO (jsonrpc/1) [storage.SANLock] Acquiring Lease(name='SDM', path=u'/rhev/data-center/mnt/glusterSD/localhost:data2/60455 567-ad30-42e3-a9df-6 2fe86c7fd25/dom_md/leases', offset=1048576) for host id 1 (clusterlock:343) 2017-02-09 15:16:24,057 INFO (jsonrpc/1) [storage.SANLock] Releasing host id for domain 60455567-ad30-42e3-a9df-62fe86c7fd25 (id: 1) (clusterlock:305) 2017-02-09 15:16:25,149 INFO (jsonrpc/3) [jsonrpc.JsonRpcServer] RPC call GlusterHost.list succeeded in 0.17 seconds (__init__:515) 2017-02-09 15:16:25,264 INFO (Reactor thread) [ProtocolDetector.AcceptorImpl] Accepted connection from ::ffff:127.0.0.1:55060 (protocoldetector:72) 2017-02-09 15:16:25,270 INFO (Reactor thread) [ProtocolDetector.Detector] Detected protocol stomp from ::ffff:127.0.0.1:55060 (protocoldetector:127) 2017-02-09 15:16:25,271 INFO (Reactor thread) [Broker.StompAdapter] Processing CONNECT request (stompreactor:102) 2017-02-09 15:16:25,271 INFO (JsonRpc (StompReactor)) [Broker.StompAdapter] Subscribe command received (stompreactor:129) 2017-02-09 15:16:25,416 INFO (jsonrpc/5) [jsonrpc.JsonRpcServer] RPC call Host.getHardwareInfo succeeded in 0.01 seconds (__init__:515) 2017-02-09 15:16:25,419 INFO (jsonrpc/6) [dispatcher] Run and
repoStats(options=None) (logUtils:49) 2017-02-09 15:16:25,419 INFO (jsonrpc/6) [dispatcher] Run and
already protect: protect:
repoStats, Return response: {u'e8d04da7-ad3d-4227-a45d-b5a29b2f43e5': {'code': 0, 'actual': True , 'version': 4, 'acquired': True, 'delay': '0.000854128', 'lastCheck': '5.1', 'valid': True}, u'a77b8821-ff19-4d17-a3ce-a6c3a69436d5': {'code': 0, 'actual': True, 'vers ion': 4, 'acquired': True, 'delay': '0.000966556', 'lastCheck': '2.6', 'valid': True}} (logUtils:52) 2017-02-09 15:16:25,447 INFO (jsonrpc/6) [jsonrpc.JsonRpcServer] RPC call Host.getStats succeeded in 0.03 seconds (__init__:515) 2017-02-09 15:16:25,450 ERROR (JsonRpc (StompReactor)) [vds.dispatcher] SSL error receiving from <yajsonrpc.betterAsyncore.Dispatcher connected ('::ffff:127.0.0.1', 55060, 0, 0) at 0x7f69c0043cf8>: unexpected eof (betterAsyncore:113) 2017-02-09 15:16:25,812 INFO (jsonrpc/7) [jsonrpc.JsonRpcServer] RPC call GlusterVolume.list succeeded in 0.10 seconds (__init__:515) 2017-02-09 15:16:25,940 INFO (Reactor thread) [ProtocolDetector.AcceptorImpl] Accepted connection from ::ffff:127.0.0.1:55062 (protocoldetector:72) 2017-02-09 15:16:25,946 INFO (Reactor thread) [ProtocolDetector.Detector] Detected protocol stomp from ::ffff:127.0.0.1:55062 (protocoldetector:127) 2017-02-09 15:16:25,947 INFO (Reactor thread) [Broker.StompAdapter] Processing CONNECT request (stompreactor:102) 2017-02-09 15:16:25,947 INFO (JsonRpc (StompReactor)) [Broker.StompAdapter] Subscribe command received (stompreactor:129) 2017-02-09 15:16:26,058 ERROR (jsonrpc/1) [storage.TaskManager.Task] (Task='02cad901-5fe8-4f2d-895b-14184f67feab') Unexpected error (task:870) Traceback (most recent call last): File "/usr/share/vdsm/storage/task.py", line 877, in _run return fn(*args, **kargs) File "/usr/lib/python2.7/site-packages/vdsm/logUtils.py", line 50, in wrapper res = f(*args, **kwargs) File "/usr/share/vdsm/storage/hsm.py", line 812, in forcedDetachStorageDomain self._deatchStorageDomainFromOldPools(sdUUID) File "/usr/share/vdsm/storage/hsm.py", line 790, in _deatchStorageDomainFromOldPools dom.acquireClusterLock(host_id) File "/usr/share/vdsm/storage/sd.py", line 810, in acquireClusterLock self._manifest.acquireDomainLock(hostID) File "/usr/share/vdsm/storage/sd.py", line 499, in acquireDomainLock self._domainLock.acquire(hostID, self.getDomainLease()) File "/usr/lib/python2.7/site-packages/vdsm/storage/clusterlock.p y", line 362, in acquire "Cannot acquire %s" % (lease,), str(e)) AcquireLockFailure: Cannot obtain lock: u"id=60455567-ad30-42e3-a9df-62fe86c7fd25, rc=5, out=Cannot acquire Lease(name='SDM', path=u'/rhev/data-center/mnt/glusterSD/localhost:data2/60455 567-ad30-42e3-a9df-62fe86c7fd25/dom_md/leases', offset=1048576), err=(5, 'Sanlock resource not acquired', 'Input/output error')" 2017-02-09 15:16:26,058 INFO (jsonrpc/1) [storage.TaskManager.Task] (Task='02cad901-5fe8-4f2d-895b-14184f67feab') aborting: Task is aborted: 'Cannot obtain lock' - code 651 (task:1175) 2017-02-09 15:16:26,059 ERROR (jsonrpc/1) [storage.Dispatcher] {'status': {'message': 'Cannot obtain lock: u"id=60455567-ad30-42e3-a9df-6 2fe86c7fd25, rc=5, out=Cannot acquire Lease(name=\'SDM\', path=u\'/rhev/data-center/mnt/glusterSD/localhost:data2/6045 5567-ad30-42e3-a9df-62fe86c7fd25/dom_md/leases\', offset=1048576), err=(5, \'Sanlock resource not acquired\', \'Input/output error\')"', 'code': 651}} (dispatcher:77) 2017-02-09 15:16:26,059 INFO (jsonrpc/1) [jsonrpc.JsonRpcServer] RPC call StorageDomain.detach failed (error 651) in 23.04 seconds (__init__:515)
-- Doug
_______________________________________________ Users mailing list Users@ovirt.org http://lists.ovirt.org/mailman/listinfo/users
-- Doug
_______________________________________________ Users mailing list Users@ovirt.org http://lists.ovirt.org/mailman/listinfo/users
-- Doug
-- Doug

On Fri, Feb 17, 2017 at 3:16 AM, Doug Ingham <dougti@gmail.com> wrote:
Well that didn't go so well. I deleted both dom_md/ids & dom_md/leases in the cloned volume, and I still can't import the storage domain.
You cannot delete files from dom_md, this will invalidate the storage domain and you will not be able to use it without restoring these files. The leases file on file storage domain is unused, so creating empty file is enough. The ids file must be created and initialized using sanlock, please you should find instructions how to do it in the archives.
The snapshot was also taken some 4 hours before the attempted import, so I'm surprised the locks haven't expired by themselves...
Leases do not expire if vdsm is connected to storage, and sanlock can access the storage. I'm not sure what do you mean by volume snapshots. To import a storage domain, you should first make that no other setup is using the storage domain. The best way to do it is to detach the storage domain from the other setup. If you are using hosted engine, you must also put the hosted engine agent in global maintenance mode. If your engine is broken, the best way to disconnect from storage is to reboot the hosts. Nir
2017-02-16 21:58:24,630-03 INFO [org.ovirt.engine.core.bll.storage.connection.AddStorageServerConnectionCommand] (default task-45) [d59bc8c0-3c53-4a34-9d7c-8c982ee14e14] Lock Acquired to object 'EngineLock:{exclusiveLocks='[localhost:data-teste2=<STORAGE_CONNECTION, ACTION_TYPE_FAILED_OBJECT_LOCKED>]', sharedLocks='null'}' 2017-02-16 21:58:24,645-03 INFO [org.ovirt.engine.core.vdsbroker.vdsbroker.ConnectStorageServerVDSCommand] (default task-45) [d59bc8c0-3c53-4a34-9d7c-8c982ee14e14] START, ConnectStorageServerVDSCommand(HostName = v5.dc0.example.com, StorageServerConnectionManagementVDSParameters:{runAsync='true', hostId='1a3f10f2-e4ce-44b9-9495-06e445cfa0b0', storagePoolId='00000000-0000-0000-0000-000000000000', storageType='GLUSTERFS', connectionList='[StorageServerConnections:{id='null', connection='localhost:data-teste2', iqn='null', vfsType='glusterfs', mountOptions='null', nfsVersion='null', nfsRetrans='null', nfsTimeo='null', iface='null', netIfaceName='null'}]'}), log id: 726df65e 2017-02-16 21:58:26,046-03 INFO [org.ovirt.engine.core.bll.storage.connection.AddStorageServerConnectionCommand] (default task-45) [d59bc8c0-3c53-4a34-9d7c-8c982ee14e14] Lock freed to object 'EngineLock:{exclusiveLocks='[localhost:data teste2=<STORAGE_CONNECTION, ACTION_TYPE_FAILED_OBJECT_LOCKED>]', sharedLocks='null'}' 2017-02-16 21:58:26,206-03 INFO [org.ovirt.engine.core.vdsbroker.vdsbroker.HSMGetStorageDomainsListVDSCommand] (default task-52) [85548427-713f-4ffb-a385-a97a7ee4109d] START, HSMGetStorageDomainsListVDSCommand(HostName = v5.dc0.example.com, HSMGetStorageDomainsListVDSCommandParameters:{runAsync='true', hostId='1a3f10f2-e4ce-44b9-9495-06e445cfa0b0', storagePoolId='00000000-0000-0000-0000-000000000000', storageType='null', storageDomainType='Data', path='localhost:data-teste2'}), log id: 79f6cc88 2017-02-16 21:58:27,899-03 INFO [org.ovirt.engine.core.vdsbroker.vdsbroker.HSMGetStorageDomainsListVDSCommand] (default task-50) [38e87311-a7a5-49a8-bf18-857dd969cd5f] START, HSMGetStorageDomainsListVDSCommand(HostName = v5.dc0.example.com, HSMGetStorageDomainsListVDSCommandParameters:{runAsync='true', hostId='1a3f10f2-e4ce-44b9-9495-06e445cfa0b0', storagePoolId='00000000-0000-0000-0000-000000000000', storageType='null', storageDomainType='Data', path='localhost:data-teste2'}), log id: 7280d13 2017-02-16 21:58:29,156-03 INFO [org.ovirt.engine.core.bll.storage.connection.RemoveStorageServerConnectionCommand] (default task-56) [1b3826e4-4890-43d4-8854-16f3c573a31f] Lock Acquired to object 'EngineLock:{exclusiveLocks='[localhost:data-teste2=<STORAGE_CONNECTION, ACTION_TYPE_FAILED_OBJECT_LOCKED>, 5e5f6610-c759-448b-a53d-9a456f513681=<STORAGE_CONNECTION, ACTION_TYPE_FAILED_OBJECT_LOCKED>]', sharedLocks='null'}' 2017-02-16 21:58:29,168-03 INFO [org.ovirt.engine.core.vdsbroker.vdsbroker.DisconnectStorageServerVDSCommand] (default task-57) [5e4b20cf-60d2-4ae9-951b-c2693603aa6f] START, DisconnectStorageServerVDSCommand(HostName = v5.dc0.example.com, StorageServerConnectionManagementVDSParameters:{runAsync='true', hostId='1a3f10f2-e4ce-44b9-9495-06e445cfa0b0', storagePoolId='00000000-0000-0000-0000-000000000000', storageType='GLUSTERFS', connectionList='[StorageServerConnections:{id='5e5f6610-c759-448b-a53d-9a456f513681', connection='localhost:data-teste2', iqn='null', vfsType='glusterfs', mountOptions='null', nfsVersion='null', nfsRetrans='null', nfsTimeo='null', iface='null', netIfaceName='null'}]'}), log id: 6042b108 2017-02-16 21:58:29,193-03 INFO [org.ovirt.engine.core.vdsbroker.vdsbroker.DisconnectStorageServerVDSCommand] (default task-56) [1b3826e4-4890-43d4-8854-16f3c573a31f] START, DisconnectStorageServerVDSCommand(HostName = v5.dc0.example.com, StorageServerConnectionManagementVDSParameters:{runAsync='true', hostId='1a3f10f2-e4ce-44b9-9495-06e445cfa0b0', storagePoolId='00000000-0000-0000-0000-000000000000', storageType='GLUSTERFS', connectionList='[StorageServerConnections:{id='5e5f6610-c759-448b-a53d-9a456f513681', connection='localhost:data-teste2', iqn='null', vfsType='glusterfs', mountOptions='null', nfsVersion='null', nfsRetrans='null', nfsTimeo='null', iface='null', netIfaceName='null'}]'}), log id: 4e9421cf 2017-02-16 21:58:31,398-03 INFO [org.ovirt.engine.core.bll.storage.connection.RemoveStorageServerConnectionCommand] (default task-56) [1b3826e4-4890-43d4-8854-16f3c573a31f] Lock freed to object 'EngineLock:{exclusiveLocks='[localhost:data-teste2=<STORAGE_CONNECTION, ACTION_TYPE_FAILED_OBJECT_LOCKED>, 5e5f6610-c759-448b-a53d-9a456f513681=<STORAGE_CONNECTION, ACTION_TYPE_FAILED_OBJECT_LOCKED>]', sharedLocks='null'}'
Again, many thanks! Doug
On 16 February 2017 at 18:53, Doug Ingham <dougti@gmail.com> wrote:
Hi Nir,
On 16 February 2017 at 13:55, Nir Soffer <nsoffer@redhat.com> wrote:
On Mon, Feb 13, 2017 at 3:35 PM, Doug Ingham <dougti@gmail.com> wrote:
Hi Sahina,
On 13 February 2017 at 05:45, Sahina Bose <sabose@redhat.com> wrote:
Any errors in the gluster mount logs for this gluster volume?
How about "gluster vol heal <volname> info" - does it list any entries to heal?
After more investigating, I found out that there is a sanlock daemon that runs with VDSM, independently of the HE, so I'd basically have to bring the volume down & wait for the leases to expire/delete them* before I can import the domain.
*I understand removing /dom_md/leases/ should do the job?
No, the issue is probably dom_md/ids accessed by sanlock, but removing files accessed by sanlock will not help, an open file will remain open until sanlock close the file.
I'm testing this with volume snapshots at the moment, so there are no processes accessing the new volume.
Did you try to reboot the host before installing it again? If you did and you still have these issues, you probably need to remove the previous installation properly before installing again.
Adding Simone to help with uninstalling and reinstalling hosted engine.
The Hosted-Engine database had been corrupted and the restore wasn't running correctly, so I installed a new engine on a new server - no restores or old data. The aim is to import the old storage domain into the new Engine & then import the VMs into the new storage domain. My only problem with this is that there appear to be some file based leases somewhere that, unless I manage to locate & delete them, force me to wait for the leases to timeout before I can import the old storage domain. To minimise downtime, I'm trying to avoid having to wait for the leases to timeout.
Regards, Doug
Nir
On Thu, Feb 9, 2017 at 11:57 PM, Doug Ingham <dougti@gmail.com> wrote:
Some interesting output from the vdsm log...
2017-02-09 15:16:24,051 INFO (jsonrpc/1) [storage.StorageDomain] Resource namespace 01_img_60455567-ad30-42e3-a9df-62fe86c7fd25 already registered (sd:731) 2017-02-09 15:16:24,051 INFO (jsonrpc/1) [storage.StorageDomain] Resource namespace 02_vol_60455567-ad30-42e3-a9df-62fe86c7fd25 already registered (sd:740) 2017-02-09 15:16:24,052 INFO (jsonrpc/1) [storage.SANLock] Acquiring Lease(name='SDM',
path=u'/rhev/data-center/mnt/glusterSD/localhost:data2/60455567-ad30-42e3-a9df-6 2fe86c7fd25/dom_md/leases', offset=1048576) for host id 1 (clusterlock:343) 2017-02-09 15:16:24,057 INFO (jsonrpc/1) [storage.SANLock] Releasing host id for domain 60455567-ad30-42e3-a9df-62fe86c7fd25 (id: 1) (clusterlock:305) 2017-02-09 15:16:25,149 INFO (jsonrpc/3) [jsonrpc.JsonRpcServer] RPC call GlusterHost.list succeeded in 0.17 seconds (__init__:515) 2017-02-09 15:16:25,264 INFO (Reactor thread) [ProtocolDetector.AcceptorImpl] Accepted connection from ::ffff:127.0.0.1:55060 (protocoldetector:72) 2017-02-09 15:16:25,270 INFO (Reactor thread) [ProtocolDetector.Detector] Detected protocol stomp from ::ffff:127.0.0.1:55060 (protocoldetector:127) 2017-02-09 15:16:25,271 INFO (Reactor thread) [Broker.StompAdapter] Processing CONNECT request (stompreactor:102) 2017-02-09 15:16:25,271 INFO (JsonRpc (StompReactor)) [Broker.StompAdapter] Subscribe command received (stompreactor:129) 2017-02-09 15:16:25,416 INFO (jsonrpc/5) [jsonrpc.JsonRpcServer] RPC call Host.getHardwareInfo succeeded in 0.01 seconds (__init__:515) 2017-02-09 15:16:25,419 INFO (jsonrpc/6) [dispatcher] Run and protect: repoStats(options=None) (logUtils:49) 2017-02-09 15:16:25,419 INFO (jsonrpc/6) [dispatcher] Run and protect: repoStats, Return response: {u'e8d04da7-ad3d-4227-a45d-b5a29b2f43e5': {'code': 0, 'actual': True , 'version': 4, 'acquired': True, 'delay': '0.000854128', 'lastCheck': '5.1', 'valid': True}, u'a77b8821-ff19-4d17-a3ce-a6c3a69436d5': {'code': 0, 'actual': True, 'vers ion': 4, 'acquired': True, 'delay': '0.000966556', 'lastCheck': '2.6', 'valid': True}} (logUtils:52) 2017-02-09 15:16:25,447 INFO (jsonrpc/6) [jsonrpc.JsonRpcServer] RPC call Host.getStats succeeded in 0.03 seconds (__init__:515) 2017-02-09 15:16:25,450 ERROR (JsonRpc (StompReactor)) [vds.dispatcher] SSL error receiving from <yajsonrpc.betterAsyncore.Dispatcher connected ('::ffff:127.0.0.1', 55060, 0, 0) at 0x7f69c0043cf8>: unexpected eof (betterAsyncore:113) 2017-02-09 15:16:25,812 INFO (jsonrpc/7) [jsonrpc.JsonRpcServer] RPC call GlusterVolume.list succeeded in 0.10 seconds (__init__:515) 2017-02-09 15:16:25,940 INFO (Reactor thread) [ProtocolDetector.AcceptorImpl] Accepted connection from ::ffff:127.0.0.1:55062 (protocoldetector:72) 2017-02-09 15:16:25,946 INFO (Reactor thread) [ProtocolDetector.Detector] Detected protocol stomp from ::ffff:127.0.0.1:55062 (protocoldetector:127) 2017-02-09 15:16:25,947 INFO (Reactor thread) [Broker.StompAdapter] Processing CONNECT request (stompreactor:102) 2017-02-09 15:16:25,947 INFO (JsonRpc (StompReactor)) [Broker.StompAdapter] Subscribe command received (stompreactor:129) 2017-02-09 15:16:26,058 ERROR (jsonrpc/1) [storage.TaskManager.Task] (Task='02cad901-5fe8-4f2d-895b-14184f67feab') Unexpected error (task:870) Traceback (most recent call last): File "/usr/share/vdsm/storage/task.py", line 877, in _run return fn(*args, **kargs) File "/usr/lib/python2.7/site-packages/vdsm/logUtils.py", line 50, in wrapper res = f(*args, **kwargs) File "/usr/share/vdsm/storage/hsm.py", line 812, in forcedDetachStorageDomain self._deatchStorageDomainFromOldPools(sdUUID) File "/usr/share/vdsm/storage/hsm.py", line 790, in _deatchStorageDomainFromOldPools dom.acquireClusterLock(host_id) File "/usr/share/vdsm/storage/sd.py", line 810, in acquireClusterLock self._manifest.acquireDomainLock(hostID) File "/usr/share/vdsm/storage/sd.py", line 499, in acquireDomainLock self._domainLock.acquire(hostID, self.getDomainLease()) File "/usr/lib/python2.7/site-packages/vdsm/storage/clusterlock.py", line 362, in acquire "Cannot acquire %s" % (lease,), str(e)) AcquireLockFailure: Cannot obtain lock: u"id=60455567-ad30-42e3-a9df-62fe86c7fd25, rc=5, out=Cannot acquire Lease(name='SDM',
path=u'/rhev/data-center/mnt/glusterSD/localhost:data2/60455567-ad30-42e3-a9df-62fe86c7fd25/dom_md/leases', offset=1048576), err=(5, 'Sanlock resource not acquired', 'Input/output error')" 2017-02-09 15:16:26,058 INFO (jsonrpc/1) [storage.TaskManager.Task] (Task='02cad901-5fe8-4f2d-895b-14184f67feab') aborting: Task is aborted: 'Cannot obtain lock' - code 651 (task:1175) 2017-02-09 15:16:26,059 ERROR (jsonrpc/1) [storage.Dispatcher] {'status': {'message': 'Cannot obtain lock: u"id=60455567-ad30-42e3-a9df-62fe86c7fd25, rc=5, out=Cannot acquire Lease(name=\'SDM\',
path=u\'/rhev/data-center/mnt/glusterSD/localhost:data2/60455567-ad30-42e3-a9df-62fe86c7fd25/dom_md/leases\', offset=1048576), err=(5, \'Sanlock resource not acquired\', \'Input/output error\')"', 'code': 651}} (dispatcher:77) 2017-02-09 15:16:26,059 INFO (jsonrpc/1) [jsonrpc.JsonRpcServer] RPC call StorageDomain.detach failed (error 651) in 23.04 seconds (__init__:515)
-- Doug
_______________________________________________ Users mailing list Users@ovirt.org http://lists.ovirt.org/mailman/listinfo/users
-- Doug
_______________________________________________ Users mailing list Users@ovirt.org http://lists.ovirt.org/mailman/listinfo/users
-- Doug
-- Doug

Hi Nir, On 16 Feb 2017 22:41, "Nir Soffer" <nsoffer@redhat.com> wrote: On Fri, Feb 17, 2017 at 3:16 AM, Doug Ingham <dougti@gmail.com> wrote:
Well that didn't go so well. I deleted both dom_md/ids & dom_md/leases in the cloned volume, and I still can't import the storage domain.
You cannot delete files from dom_md, this will invalidate the storage domain and you will not be able to use it without restoring these files. The leases file on file storage domain is unused, so creating empty file is enough. The ids file must be created and initialized using sanlock, please you should find instructions how to do it in the archives.
The snapshot was also taken some 4 hours before the attempted import, so I'm surprised the locks haven't expired by themselves...
Leases do not expire if vdsm is connected to storage, and sanlock can access the storage. I'm not sure what do you mean by volume snapshots. A snapshot is like a save-point, the state of a storage volume from a specific point in time. In this case, it means I have created a copy/clone of my active data volume. It's a completely new, separate volume, and is not attached to any running services. I'm using this copy/clone to test the import process, before doing it with my live volume. If I "break" something in the cloned volume, no worries, I can just delete it and recreate it from the snapshot. Hope that clears things up a bit! To import a storage domain, you should first make that no other setup is using the storage domain. The best way to do it is to detach the storage domain from the other setup. If you are using hosted engine, you must also put the hosted engine agent in global maintenance mode. If your engine is broken, the best way to disconnect from storage is to reboot the hosts. So this is the issue. My current tests emulate exactly this, however I'm still not able to import the domain into the new Engine. When I try to do so, I get the resulting logs I copied in my earlier email. Doug Nir
2017-02-16 21:58:24,630-03 INFO [org.ovirt.engine.core.bll.storage.connection.
AddStorageServerConnectionCommand]
(default task-45) [d59bc8c0-3c53-4a34-9d7c-8c982ee14e14] Lock Acquired to object 'EngineLock:{exclusiveLocks='[localhost:data-teste2=<STORAGE_CONNECTION, ACTION_TYPE_FAILED_OBJECT_LOCKED>]', sharedLocks='null'}' 2017-02-16 21:58:24,645-03 INFO [org.ovirt.engine.core.vdsbroker.vdsbroker.ConnectStorageServerVDSCommand] (default task-45) [d59bc8c0-3c53-4a34-9d7c-8c982ee14e14] START, ConnectStorageServerVDSCommand(HostName = v5.dc0.example.com, StorageServerConnectionManagementVDSParameters:{runAsync='true', hostId='1a3f10f2-e4ce-44b9-9495-06e445cfa0b0', storagePoolId='00000000-0000-0000-0000-000000000000', storageType='GLUSTERFS', connectionList='[StorageServerConnections:{id='null', connection='localhost:data-teste2', iqn='null', vfsType='glusterfs', mountOptions='null', nfsVersion='null', nfsRetrans='null', nfsTimeo='null', iface='null', netIfaceName='null'}]'}), log id: 726df65e 2017-02-16 21:58:26,046-03 INFO [org.ovirt.engine.core.bll.storage.connection. AddStorageServerConnectionCommand] (default task-45) [d59bc8c0-3c53-4a34-9d7c-8c982ee14e14] Lock freed to object 'EngineLock:{exclusiveLocks='[localhost:data teste2=<STORAGE_CONNECTION, ACTION_TYPE_FAILED_OBJECT_LOCKED>]', sharedLocks='null'}' 2017-02-16 21:58:26,206-03 INFO [org.ovirt.engine.core.vdsbroker.vdsbroker.HSMGetStorageDomainsListVDSCom mand] (default task-52) [85548427-713f-4ffb-a385-a97a7ee4109d] START, HSMGetStorageDomainsListVDSCommand(HostName = v5.dc0.example.com, HSMGetStorageDomainsListVDSCommandParameters:{runAsync='true', hostId='1a3f10f2-e4ce-44b9-9495-06e445cfa0b0', storagePoolId='00000000-0000-0000-0000-000000000000', storageType='null', storageDomainType='Data', path='localhost:data-teste2'}), log id: 79f6cc88 2017-02-16 21:58:27,899-03 INFO [org.ovirt.engine.core.vdsbroker.vdsbroker.HSMGetStorageDomainsListVDSCom mand] (default task-50) [38e87311-a7a5-49a8-bf18-857dd969cd5f] START, HSMGetStorageDomainsListVDSCommand(HostName = v5.dc0.example.com, HSMGetStorageDomainsListVDSCommandParameters:{runAsync='true', hostId='1a3f10f2-e4ce-44b9-9495-06e445cfa0b0', storagePoolId='00000000-0000-0000-0000-000000000000', storageType='null', storageDomainType='Data', path='localhost:data-teste2'}), log id: 7280d13 2017-02-16 21:58:29,156-03 INFO [org.ovirt.engine.core.bll.storage.connection. RemoveStorageServerConnectionCommand] (default task-56) [1b3826e4-4890-43d4-8854-16f3c573a31f] Lock Acquired to object 'EngineLock:{exclusiveLocks='[localhost:data-teste2=<STORAGE_CONNECTION, ACTION_TYPE_FAILED_OBJECT_LOCKED>, 5e5f6610-c759-448b-a53d-9a456f513681=<STORAGE_CONNECTION, ACTION_TYPE_FAILED_OBJECT_LOCKED>]', sharedLocks='null'}' 2017-02-16 21:58:29,168-03 INFO [org.ovirt.engine.core.vdsbroker.vdsbroker.DisconnectStorageServerVDSComm and] (default task-57) [5e4b20cf-60d2-4ae9-951b-c2693603aa6f] START, DisconnectStorageServerVDSCommand(HostName = v5.dc0.example.com, StorageServerConnectionManagementVDSParameters:{runAsync='true', hostId='1a3f10f2-e4ce-44b9-9495-06e445cfa0b0', storagePoolId='00000000-0000-0000-0000-000000000000', storageType='GLUSTERFS', connectionList='[StorageServerConnections:{id='5e5f6610-c759-448b-a53d- 9a456f513681', connection='localhost:data-teste2', iqn='null', vfsType='glusterfs', mountOptions='null', nfsVersion='null', nfsRetrans='null', nfsTimeo='null', iface='null', netIfaceName='null'}]'}), log id: 6042b108 2017-02-16 21:58:29,193-03 INFO [org.ovirt.engine.core.vdsbroker.vdsbroker.DisconnectStorageServerVDSComm and] (default task-56) [1b3826e4-4890-43d4-8854-16f3c573a31f] START, DisconnectStorageServerVDSCommand(HostName = v5.dc0.example.com, StorageServerConnectionManagementVDSParameters:{runAsync='true', hostId='1a3f10f2-e4ce-44b9-9495-06e445cfa0b0', storagePoolId='00000000-0000-0000-0000-000000000000', storageType='GLUSTERFS', connectionList='[StorageServerConnections:{id='5e5f6610-c759-448b-a53d- 9a456f513681', connection='localhost:data-teste2', iqn='null', vfsType='glusterfs', mountOptions='null', nfsVersion='null', nfsRetrans='null', nfsTimeo='null', iface='null', netIfaceName='null'}]'}), log id: 4e9421cf 2017-02-16 21:58:31,398-03 INFO [org.ovirt.engine.core.bll.storage.connection. RemoveStorageServerConnectionCommand] (default task-56) [1b3826e4-4890-43d4-8854-16f3c573a31f] Lock freed to object 'EngineLock:{exclusiveLocks='[localhost:data-teste2=<STORAGE_CONNECTION, ACTION_TYPE_FAILED_OBJECT_LOCKED>, 5e5f6610-c759-448b-a53d-9a456f513681=<STORAGE_CONNECTION, ACTION_TYPE_FAILED_OBJECT_LOCKED>]', sharedLocks='null'}'
Again, many thanks! Doug
On 16 February 2017 at 18:53, Doug Ingham <dougti@gmail.com> wrote:
Hi Nir,
On 16 February 2017 at 13:55, Nir Soffer <nsoffer@redhat.com> wrote:
On Mon, Feb 13, 2017 at 3:35 PM, Doug Ingham <dougti@gmail.com> wrote:
Hi Sahina,
On 13 February 2017 at 05:45, Sahina Bose <sabose@redhat.com> wrote:
Any errors in the gluster mount logs for this gluster volume?
How about "gluster vol heal <volname> info" - does it list any
entries
to heal?
After more investigating, I found out that there is a sanlock daemon that runs with VDSM, independently of the HE, so I'd basically have to bring the volume down & wait for the leases to expire/delete them* before I can import the domain.
*I understand removing /dom_md/leases/ should do the job?
No, the issue is probably dom_md/ids accessed by sanlock, but removing files accessed by sanlock will not help, an open file will remain open until sanlock close the file.
I'm testing this with volume snapshots at the moment, so there are no processes accessing the new volume.
Did you try to reboot the host before installing it again? If you did and you still have these issues, you probably need to remove the previous installation properly before installing again.
Adding Simone to help with uninstalling and reinstalling hosted engine.
The Hosted-Engine database had been corrupted and the restore wasn't running correctly, so I installed a new engine on a new server - no restores or old data. The aim is to import the old storage domain into the new Engine & then import the VMs into the new storage domain. My only problem with this is that there appear to be some file based leases somewhere that, unless I manage to locate & delete them, force me to wait for the leases to timeout before I can import the old storage domain. To minimise downtime, I'm trying to avoid having to wait for the leases to timeout.
Regards, Doug
Nir
On Thu, Feb 9, 2017 at 11:57 PM, Doug Ingham <dougti@gmail.com>
wrote:
Some interesting output from the vdsm log...
2017-02-09 15:16:24,051 INFO (jsonrpc/1) [storage.StorageDomain] Resource namespace 01_img_60455567-ad30-42e3-a9df-62fe86c7fd25 already registered (sd:731) 2017-02-09 15:16:24,051 INFO (jsonrpc/1) [storage.StorageDomain] Resource namespace 02_vol_60455567-ad30-42e3-a9df-62fe86c7fd25 already registered (sd:740) 2017-02-09 15:16:24,052 INFO (jsonrpc/1) [storage.SANLock]
Acquiring
Lease(name='SDM',
path=u'/rhev/data-center/mnt/glusterSD/localhost:data2/ 60455567-ad30-42e3-a9df-6 2fe86c7fd25/dom_md/leases', offset=1048576) for host id 1 (clusterlock:343) 2017-02-09 15:16:24,057 INFO (jsonrpc/1) [storage.SANLock] Releasing host id for domain 60455567-ad30-42e3-a9df-62fe86c7fd25 (id: 1) (clusterlock:305) 2017-02-09 15:16:25,149 INFO (jsonrpc/3) [jsonrpc.JsonRpcServer] RPC call GlusterHost.list succeeded in 0.17 seconds (__init__:515) 2017-02-09 15:16:25,264 INFO (Reactor thread) [ProtocolDetector.AcceptorImpl] Accepted connection from ::ffff:127.0.0.1:55060 (protocoldetector:72) 2017-02-09 15:16:25,270 INFO (Reactor thread) [ProtocolDetector.Detector] Detected protocol stomp from ::ffff:127.0.0.1:55060 (protocoldetector:127) 2017-02-09 15:16:25,271 INFO (Reactor thread) [Broker.StompAdapter] Processing CONNECT request (stompreactor:102) 2017-02-09 15:16:25,271 INFO (JsonRpc (StompReactor)) [Broker.StompAdapter] Subscribe command received (stompreactor:129) 2017-02-09 15:16:25,416 INFO (jsonrpc/5) [jsonrpc.JsonRpcServer] RPC call Host.getHardwareInfo succeeded in 0.01 seconds (__init__:515) 2017-02-09 15:16:25,419 INFO (jsonrpc/6) [dispatcher] Run and protect: repoStats(options=None) (logUtils:49) 2017-02-09 15:16:25,419 INFO (jsonrpc/6) [dispatcher] Run and protect: repoStats, Return response: {u'e8d04da7-ad3d-4227-a45d- b5a29b2f43e5': {'code': 0, 'actual': True , 'version': 4, 'acquired': True, 'delay': '0.000854128', 'lastCheck': '5.1', 'valid': True}, u'a77b8821-ff19-4d17-a3ce-a6c3a69436d5': {'code': 0, 'actual': True, 'vers ion': 4, 'acquired': True, 'delay': '0.000966556', 'lastCheck': '2.6', 'valid': True}} (logUtils:52) 2017-02-09 15:16:25,447 INFO (jsonrpc/6) [jsonrpc.JsonRpcServer] RPC call Host.getStats succeeded in 0.03 seconds (__init__:515) 2017-02-09 15:16:25,450 ERROR (JsonRpc (StompReactor)) [vds.dispatcher] SSL error receiving from <yajsonrpc.betterAsyncore.Dispatcher connected ('::ffff:127.0.0.1', 55060, 0, 0) at 0x7f69c0043cf8>: unexpected eof (betterAsyncore:113) 2017-02-09 15:16:25,812 INFO (jsonrpc/7) [jsonrpc.JsonRpcServer] RPC call GlusterVolume.list succeeded in 0.10 seconds (__init__:515) 2017-02-09 15:16:25,940 INFO (Reactor thread) [ProtocolDetector.AcceptorImpl] Accepted connection from ::ffff:127.0.0.1:55062 (protocoldetector:72) 2017-02-09 15:16:25,946 INFO (Reactor thread) [ProtocolDetector.Detector] Detected protocol stomp from ::ffff:127.0.0.1:55062 (protocoldetector:127) 2017-02-09 15:16:25,947 INFO (Reactor thread) [Broker.StompAdapter] Processing CONNECT request (stompreactor:102) 2017-02-09 15:16:25,947 INFO (JsonRpc (StompReactor)) [Broker.StompAdapter] Subscribe command received (stompreactor:129) 2017-02-09 15:16:26,058 ERROR (jsonrpc/1) [storage.TaskManager.Task] (Task='02cad901-5fe8-4f2d-895b-14184f67feab') Unexpected error (task:870) Traceback (most recent call last): File "/usr/share/vdsm/storage/task.py", line 877, in _run return fn(*args, **kargs) File "/usr/lib/python2.7/site-packages/vdsm/logUtils.py", line 50, in wrapper res = f(*args, **kwargs) File "/usr/share/vdsm/storage/hsm.py", line 812, in forcedDetachStorageDomain self._deatchStorageDomainFromOldPools(sdUUID) File "/usr/share/vdsm/storage/hsm.py", line 790, in _deatchStorageDomainFromOldPools dom.acquireClusterLock(host_id) File "/usr/share/vdsm/storage/sd.py", line 810, in acquireClusterLock self._manifest.acquireDomainLock(hostID) File "/usr/share/vdsm/storage/sd.py", line 499, in acquireDomainLock self._domainLock.acquire(hostID, self.getDomainLease()) File "/usr/lib/python2.7/site-packages/vdsm/storage/clusterlock.py", line 362, in acquire "Cannot acquire %s" % (lease,), str(e)) AcquireLockFailure: Cannot obtain lock: u"id=60455567-ad30-42e3-a9df-62fe86c7fd25, rc=5, out=Cannot acquire Lease(name='SDM',
path=u'/rhev/data-center/mnt/glusterSD/localhost:data2/ 60455567-ad30-42e3-a9df-62fe86c7fd25/dom_md/leases', offset=1048576), err=(5, 'Sanlock resource not acquired', 'Input/output error')" 2017-02-09 15:16:26,058 INFO (jsonrpc/1) [storage.TaskManager.Task] (Task='02cad901-5fe8-4f2d-895b-14184f67feab') aborting: Task is aborted: 'Cannot obtain lock' - code 651 (task:1175) 2017-02-09 15:16:26,059 ERROR (jsonrpc/1) [storage.Dispatcher] {'status': {'message': 'Cannot obtain lock: u"id=60455567-ad30-42e3-a9df-62fe86c7fd25, rc=5, out=Cannot acquire Lease(name=\'SDM\',
path=u\'/rhev/data-center/mnt/glusterSD/localhost:data2/ 60455567-ad30-42e3-a9df-62fe86c7fd25/dom_md/leases\', offset=1048576), err=(5, \'Sanlock resource not acquired\', \'Input/output error\')"', 'code': 651}} (dispatcher:77) 2017-02-09 15:16:26,059 INFO (jsonrpc/1) [jsonrpc.JsonRpcServer] RPC call StorageDomain.detach failed (error 651) in 23.04 seconds (__init__:515)
-- Doug
_______________________________________________ Users mailing list Users@ovirt.org http://lists.ovirt.org/mailman/listinfo/users
-- Doug
_______________________________________________ Users mailing list Users@ovirt.org http://lists.ovirt.org/mailman/listinfo/users
-- Doug
-- Doug
participants (3)
-
Doug Ingham
-
Nir Soffer
-
Sahina Bose