What is the status of your Datacenter?  Are these hosts both operational?  Are you experiencing other problems with your storage other than the inconsistent task state?    Do you see the KeyError: 'VERSION' message related to domain b6730d64-2cf8-42a3-8f08-24b8cc2c0cd8 also on Node02?  Did you experience any disaster (power outage, FC storage outage, network, etc) around the time this started happening?

On Wed, Oct 11, 2017 at 9:21 AM, yayo (j) <jaganz@gmail.com> wrote:
Hi all,

ovirt 4.1 hosted engine on 2 node cluster and FC LUN Storage

I'm trying to clear some task pending from months using vdsClient but I can't do anything.  Below are the steps (on node 1, the SPM):

1. Show all tasks:

# vdsClient -s 0 getAllTasksInfo
fd319af4-d160-48ce-b682-5a908333a5e1 :
         verb = createVolume
         id = fd319af4-d160-48ce-b682-5a908333a5e1
9bbc2bc4-3c73-4814-a785-6ea737904528 :
         verb = prepareMerge
         id = 9bbc2bc4-3c73-4814-a785-6ea737904528
e70feb21-964d-49d9-9b5a-8e3f70a92db1 :
         verb = prepareMerge
         id = e70feb21-964d-49d9-9b5a-8e3f70a92db1
cf064461-f0ab-4e44-a68f-b2d58fa83a21 :
         verb = prepareMerge
         id = cf064461-f0ab-4e44-a68f-b2d58fa83a21
85b7cf4e-d658-4785-94f0-391fe9616b41 :
         verb = prepareMerge
         id = 85b7cf4e-d658-4785-94f0-391fe9616b41
7416627a-fe50-4353-b129-e01bba066a66 :
         verb = prepareMerge
         id = 7416627a-fe50-4353-b129-e01bba066a66


2. Stop all tasks (repeted for every task):

# vdsClient -s 0 stopTask 7416627a-fe50-4353-b129-e01bba066a66 
Task is aborted: u'7416627a-fe50-4353-b129-e01bba066a66' - code 411

3. Tring to clear tasks:

 # vdsClient -s 0 clearTask 7416627a-fe50-4353-b129-e01bba066a66
Operation is not allowed in this task state: ("can't clean in state running",)



On Node 01 (the SPM) I have multiple errors in /var/log/vdsm/vdsm.log like this:

2017-10-11 15:09:53,719+0200 INFO  (jsonrpc/3) [storage.TaskManager.Task] (Task='9519d4db-2960-4b88-82f2-e4c1094eac54') aborting: Task is aborted: u'Operation is not allowed in this task state: ("can\'t clean in state running",)' - code 100 (task:1175)
2017-10-11 15:09:53,719+0200 ERROR (jsonrpc/3) [storage.Dispatcher] FINISH clearTask error=Operation is not allowed in this task state: ("can't clean in state running",) (dispatcher:78)
2017-10-11 15:09:53,720+0200 INFO  (jsonrpc/3) [jsonrpc.JsonRpcServer] RPC call Task.clear failed (error 410) in 0.01 seconds (__init__:539)
2017-10-11 15:09:53,743+0200 INFO  (jsonrpc/6) [vdsm.api] START clearTask(taskID=u'7416627a-fe50-4353-b129-e01bba066a66', spUUID=None, options=None) from=::ffff:192.168.0.226,36724, flow_id=7cd340ec (api:46)
2017-10-11 15:09:53,743+0200 INFO  (jsonrpc/6) [vdsm.api] FINISH clearTask error=Operation is not allowed in this task state: ("can't clean in state running",) from=::ffff:192.168.0.226,36724, flow_id=7cd340ec (api:50)
2017-10-11 15:09:53,743+0200 ERROR (jsonrpc/6) [storage.TaskManager.Task] (Task='0e12e052-2aca-480d-b50f-5de01ddebe35') Unexpected error (task:870)
Traceback (most recent call last):
  File "/usr/share/vdsm/storage/task.py", line 877, in _run
    return fn(*args, **kargs)
  File "<string>", line 2, in clearTask
  File "/usr/lib/python2.7/site-packages/vdsm/common/api.py", line 48, in method
    ret = func(*args, **kwargs)
  File "/usr/share/vdsm/storage/hsm.py", line 2258, in clearTask
    return self.taskMng.clearTask(taskID=taskID)
  File "/usr/share/vdsm/storage/taskManager.py", line 175, in clearTask
    t.clean()
  File "/usr/share/vdsm/storage/task.py", line 1047, in clean
    raise se.TaskStateError("can't clean in state %s" % self.state)
TaskStateError: Operation is not allowed in this task state: ("can't clean in state running",)


On Node 02 (is a 2 node cluster) I have other errors (I don't know if are related):

2017-10-11 15:11:57,083+0200 INFO  (jsonrpc/7) [storage.LVM] Refreshing lvs: vg=b50c1f5c-aa2c-4a53-9f89-83517fa70d3b lvs=['leases'] (lvm:1291)
2017-10-11 15:11:57,084+0200 INFO  (jsonrpc/7) [storage.LVM] Refreshing LVs (vg=b50c1f5c-aa2c-4a53-9f89-83517fa70d3b, lvs=['leases']) (lvm:1319)
2017-10-11 15:11:57,124+0200 INFO  (jsonrpc/7) [storage.VolumeManifest] b50c1f5c-aa2c-4a53-9f89-83517fa70d3b/d42f671e-1745-46c1-9e1c-2833245675fc/c86afaa5-6ca8-4fcb-a27e-ffbe0133fe23 info is {'status': 'OK', 'domain': 'b50c1f5c-aa2c-4a53-9f89-83517fa70d3b', 'voltype': 'LEAF', 'description': 'hosted-engine.metadata', 'parent': '00000000-0000-0000-0000-000000000000', 'format': 'RAW', 'generation': 0, 'image': 'd42f671e-1745-46c1-9e1c-2833245675fc', 'ctime': '1499437345', 'disktype': '2', 'legality': 'LEGAL', 'mtime': '0', 'apparentsize': '134217728', 'children': [], 'pool': '', 'capacity': '134217728', 'uuid': u'c86afaa5-6ca8-4fcb-a27e-ffbe0133fe23', 'truesize': '134217728', 'type': 'PREALLOCATED', 'lease': {'owners': [], 'version': None}} (volume:272)
2017-10-11 15:11:57,125+0200 INFO  (jsonrpc/7) [vdsm.api] FINISH getVolumeInfo return={'info': {'status': 'OK', 'domain': 'b50c1f5c-aa2c-4a53-9f89-83517fa70d3b', 'voltype': 'LEAF', 'description': 'hosted-engine.metadata', 'parent': '00000000-0000-0000-0000-000000000000', 'format': 'RAW', 'generation': 0, 'image': 'd42f671e-1745-46c1-9e1c-2833245675fc', 'ctime': '1499437345', 'disktype': '2', 'legality': 'LEGAL', 'mtime': '0', 'apparentsize': '134217728', 'children': [], 'pool': '', 'capacity': '134217728', 'uuid': u'c86afaa5-6ca8-4fcb-a27e-ffbe0133fe23', 'truesize': '134217728', 'type': 'PREALLOCATED', 'lease': {'owners': [], 'version': None}}} from=::1,56906 (api:52)
2017-10-11 15:11:57,126+0200 INFO  (jsonrpc/7) [jsonrpc.JsonRpcServer] RPC call Volume.getInfo succeeded in 0.05 seconds (__init__:539)
2017-10-11 15:11:57,758+0200 INFO  (Reactor thread) [ProtocolDetector.AcceptorImpl] Accepted connection from ::1:56908 (protocoldetector:72)
2017-10-11 15:11:57,764+0200 INFO  (Reactor thread) [ProtocolDetector.Detector] Detected protocol stomp from ::1:56908 (protocoldetector:127)
2017-10-11 15:11:57,765+0200 INFO  (Reactor thread) [Broker.StompAdapter] Processing CONNECT request (stompreactor:103)
2017-10-11 15:11:57,765+0200 INFO  (JsonRpc (StompReactor)) [Broker.StompAdapter] Subscribe command received (stompreactor:130)
2017-10-11 15:11:57,930+0200 INFO  (jsonrpc/0) [jsonrpc.JsonRpcServer] RPC call Host.getHardwareInfo succeeded in 0.01 seconds (__init__:539)
2017-10-11 15:11:57,933+0200 INFO  (jsonrpc/1) [vdsm.api] START repoStats(options=None) from=::1,56908 (api:46)
2017-10-11 15:11:57,933+0200 INFO  (jsonrpc/1) [vdsm.api] FINISH repoStats return={u'b50c1f5c-aa2c-4a53-9f89-83517fa70d3b': {'code': 0, 'actual': True, 'version': 4, 'acquired': True, 'delay': '0.000138003', 'lastCheck': '4.9', 'valid': True}, u'b6730d64-2cf8-42a3-8f08-24b8cc2c0cd8': {'code': 200, 'actual': True, 'version': -1, 'acquired': False, 'delay': '0', 'lastCheck': '9.7', 'valid': False}, u'c7d32f1b-f32c-4a21-995b-2e3b415aae4e': {'code': 0, 'actual': True, 'version': 0, 'acquired': True, 'delay': '0.000618471', 'lastCheck': '1.4', 'valid': True}, u'05ab1dd9-24bc-409b-80b8-6c5b00c52aa9': {'code': 0, 'actual': True, 'version': 4, 'acquired': True, 'delay': '0.00027591', 'lastCheck': '5.2', 'valid': True}} from=::1,56908 (api:52)
2017-10-11 15:11:57,998+0200 INFO  (jsonrpc/1) [jsonrpc.JsonRpcServer] RPC call Host.getStats succeeded in 0.06 seconds (__init__:539)
2017-10-11 15:11:58,253+0200 ERROR (monitor/b6730d6) [storage.Monitor] Setting up monitor for b6730d64-2cf8-42a3-8f08-24b8cc2c0cd8 failed (monitor:329)
Traceback (most recent call last):
  File "/usr/share/vdsm/storage/monitor.py", line 326, in _setupLoop
    self._setupMonitor()
  File "/usr/share/vdsm/storage/monitor.py", line 349, in _setupMonitor
    self._produceDomain()
  File "/usr/lib/python2.7/site-packages/vdsm/utils.py", line 401, in wrapper
    value = meth(self, *a, **kw)
  File "/usr/share/vdsm/storage/monitor.py", line 367, in _produceDomain
    self.domain = sdCache.produce(self.sdUUID)
  File "/usr/share/vdsm/storage/sdc.py", line 112, in produce
    domain.getRealDomain()
  File "/usr/share/vdsm/storage/sdc.py", line 53, in getRealDomain
    return self._cache._realProduce(self._sdUUID)
  File "/usr/share/vdsm/storage/sdc.py", line 136, in _realProduce
    domain = self._findDomain(sdUUID)
  File "/usr/share/vdsm/storage/sdc.py", line 153, in _findDomain
    return findMethod(sdUUID)
  File "/usr/share/vdsm/storage/nfsSD.py", line 126, in findDomain
    return NfsStorageDomain(NfsStorageDomain.findDomainPath(sdUUID))
  File "/usr/share/vdsm/storage/fileSD.py", line 359, in __init__
    manifest = self.manifestClass(domainPath)
  File "/usr/share/vdsm/storage/fileSD.py", line 171, in __init__
    sd.StorageDomainManifest.__init__(self, sdUUID, domaindir, metadata)
  File "/usr/share/vdsm/storage/sd.py", line 332, in __init__
    self._domainLock = self._makeDomainLock()
  File "/usr/share/vdsm/storage/sd.py", line 526, in _makeDomainLock
    domVersion = self.getVersion()
  File "/usr/share/vdsm/storage/sd.py", line 403, in getVersion
    return self.getMetaParam(DMDK_VERSION)
  File "/usr/share/vdsm/storage/sd.py", line 400, in getMetaParam
    return self._metadata[key]
  File "/usr/lib/python2.7/site-packages/vdsm/storage/persistent.py", line 91, in __getitem__
    return dec(self._dict[key])
  File "/usr/lib/python2.7/site-packages/vdsm/storage/persistent.py", line 203, in __getitem__
    raise KeyError(key)
KeyError: 'VERSION'


Can you help me? 

Restart hosted engine don't solve the problem

Thank you


p.s. Related question: tasks above are the same/related reported by the engine in the screenshot here? https://snag.gy/XDmoUt.jpg ... How Can I clear also these tasks from engine?

_______________________________________________
Users mailing list
Users@ovirt.org
http://lists.ovirt.org/mailman/listinfo/users




--
Adam Litke