Hi,

It seems VM pools are completely broken since our upgrade to 4.3. Is anybody else also experiencing this issue?

Only a single instance from a pool can be used. Afterwards the pool becomes unusable due to a lock not being released. Once ovirt-engine is restarted, another (single) VM from a pool can be used.

I've added my findings to bug 1462236, but I'm no longer sure the issue is the same as the one initially reported.

When the first VM of a pool is started:

2019-05-14 13:26:46,058+02 INFO  [org.ovirt.engine.core.vdsbroker.IsVmDuringInitiatingVDSCommand] (default task-6) [e3c5745c-e593-4aed-ba67-b173808140e8] START, IsVmDuringInitiatingVDSCommand( IsVmDuringInitiatingVDSCommandParameters:{vmId='d8a99676-d520-425e-9974-1b1efe6da8a5'}), log id: 2fb4f7f5
2019-05-14 13:26:46,058+02 INFO  [org.ovirt.engine.core.vdsbroker.IsVmDuringInitiatingVDSCommand] (default task-6) [e3c5745c-e593-4aed-ba67-b173808140e8] FINISH, IsVmDuringInitiatingVDSCommand, return: false, log id: 2fb4f7f5
2019-05-14 13:26:46,208+02 INFO  [org.ovirt.engine.core.bll.VmPoolHandler] (default task-6) [e3c5745c-e593-4aed-ba67-b173808140e8] Lock Acquired to object 'EngineLock:{exclusiveLocks='[d8a99676-d520-425e-9974-1b1efe6da8a5=VM]', sharedLocks=''}'

-> it has acquired a lock (lock1)

2019-05-14 13:26:46,247+02 INFO  [org.ovirt.engine.core.bll.AttachUserToVmFromPoolAndRunCommand] (default task-6) [e3c5745c-e593-4aed-ba67-b173808140e8] Lock Acquired to object 'EngineLock:{exclusiveLocks='[a5bed59c-d2fe-4fe4-bff7-52efe089ebd6=USER_VM_POOL]', sharedLocks=''}'

-> it has acquired another lock (lock2)

2019-05-14 13:26:46,352+02 INFO  [org.ovirt.engine.core.bll.AttachUserToVmFromPoolAndRunCommand] (default task-6) [e3c5745c-e593-4aed-ba67-b173808140e8] Running command: AttachUserToVmFromPoolAndRunCommand internal: false. Entities affected :  ID: 4c622213-e5f4-4032-8639-643174b698cc Type: VmPoolAction group VM_POOL_BASIC_OPERATIONS with role type USER
2019-05-14 13:26:46,393+02 INFO  [org.ovirt.engine.core.bll.AddPermissionCommand] (default task-6) [e3c5745c-e593-4aed-ba67-b173808140e8] Running command: AddPermissionCommand internal: true. Entities affected :  ID: d8a99676-d520-425e-9974-1b1efe6da8a5 Type: VMAction group MANIPULATE_PERMISSIONS with role type USER
2019-05-14 13:26:46,433+02 INFO  [org.ovirt.engine.core.bll.AttachUserToVmFromPoolAndRunCommand] (default task-6) [e3c5745c-e593-4aed-ba67-b173808140e8] Succeeded giving user 'a5bed59c-d2fe-4fe4-bff7-52efe089ebd6' permission to Vm 'd8a99676-d520-425e-9974-1b1efe6da8a5'
2019-05-14 13:26:46,608+02 INFO  [org.ovirt.engine.core.vdsbroker.IsVmDuringInitiatingVDSCommand] (default task-6) [e3c5745c-e593-4aed-ba67-b173808140e8] START, IsVmDuringInitiatingVDSCommand( IsVmDuringInitiatingVDSCommandParameters:{vmId='d8a99676-d520-425e-9974-1b1efe6da8a5'}), log id: 67acc561
2019-05-14 13:26:46,608+02 INFO  [org.ovirt.engine.core.vdsbroker.IsVmDuringInitiatingVDSCommand] (default task-6) [e3c5745c-e593-4aed-ba67-b173808140e8] FINISH, IsVmDuringInitiatingVDSCommand, return: false, log id: 67acc561
2019-05-14 13:26:46,719+02 INFO  [org.ovirt.engine.core.bll.RunVmCommand] (default task-6) [e3c5745c-e593-4aed-ba67-b173808140e8] Running command:RunVmCommand internal: true. Entities affected :  ID: d8a99676-d520-425e-9974-1b1efe6da8a5 Type: VMAction group RUN_VM with role type USER
2019-05-14 13:26:46,791+02 INFO  [org.ovirt.engine.core.vdsbroker.UpdateVmDynamicDataVDSCommand] (default task-6) [e3c5745c-e593-4aed-ba67-b173808140e8] START, UpdateVmDynamicDataVDSCommand( UpdateVmDynamicDataVDSCommandParameters:{hostId='null', vmId='d8a99676-d520-425e-9974-1b1efe6da8a5', vmDynamic='org.ovirt.engine.core.common.businessentities.VmDynamic@6db8c94d'}), log id: 2c110e4
2019-05-14 13:26:46,795+02 INFO  [org.ovirt.engine.core.vdsbroker.UpdateVmDynamicDataVDSCommand] (default task-6) [e3c5745c-e593-4aed-ba67-b173808140e8] FINISH, UpdateVmDynamicDataVDSCommand, return: , log id: 2c110e4
2019-05-14 13:26:46,804+02 INFO  [org.ovirt.engine.core.vdsbroker.CreateVDSCommand] (default task-6) [e3c5745c-e593-4aed-ba67-b173808140e8] START,CreateVDSCommand( CreateVDSCommandParameters:{hostId='eec7ec2b-cae1-4bb9-b933-4dff47a70bdb', vmId='d8a99676-d520-425e-9974-1b1efe6da8a5', vm='VM [stud-c7-1]'}), log id: 71d599f2
2019-05-14 13:26:46,809+02 INFO  [org.ovirt.engine.core.vdsbroker.vdsbroker.CreateBrokerVDSCommand] (default task-6) [e3c5745c-e593-4aed-ba67-b173808140e8] START, CreateBrokerVDSCommand(HostName = studvirt1, CreateVDSCommandParameters:{hostId='eec7ec2b-cae1-4bb9-b933-4dff47a70bdb', vmId='d8a99676-d520-425e-9974-1b1efe6da8a5', vm='VM [stud-c7-1]'}), log id: 3aa6b5ff
2019-05-14 13:26:46,836+02 INFO  [org.ovirt.engine.core.vdsbroker.builder.vminfo.VmInfoBuildUtils] (default task-6) [e3c5745c-e593-4aed-ba67-b173808140e8] Kernel FIPS - Guid: eec7ec2b-cae1-4bb9-b933-4dff47a70bdb fips: false
2019-05-14 13:26:46,903+02 INFO  [org.ovirt.engine.core.vdsbroker.vdsbroker.CreateBrokerVDSCommand] (default task-6) [e3c5745c-e593-4aed-ba67-b173808140e8] VM <?xml version="1.0" encoding="UTF-8"?><domain type="kvm" xmlns:ovirt-tune="http://ovirt.org/vm/tune/1.0" xmlns:ovirt-vm="http://ovirt.org/vm/1.0">
[domain xml stripped]
2019-05-14 13:26:46,928+02 INFO  [org.ovirt.engine.core.vdsbroker.vdsbroker.CreateBrokerVDSCommand] (default task-6) [e3c5745c-e593-4aed-ba67-b173808140e8] FINISH, CreateBrokerVDSCommand, return: , log id: 3aa6b5ff
2019-05-14 13:26:46,932+02 INFO  [org.ovirt.engine.core.vdsbroker.CreateVDSCommand] (default task-6) [e3c5745c-e593-4aed-ba67-b173808140e8] FINISH, CreateVDSCommand, return: WaitForLaunch, log id: 71d599f2
2019-05-14 13:26:46,932+02 INFO  [org.ovirt.engine.core.bll.RunVmCommand] (default task-6) [e3c5745c-e593-4aed-ba67-b173808140e8] Lock freed to object 'EngineLock:{exclusiveLocks='[a5bed59c-d2fe-4fe4-bff7-52efe089ebd6=USER_VM_POOL]', sharedLocks=''}'

-> it has released lock2

2019-05-14 13:26:47,004+02 INFO  [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (default task-6) [e3c5745c-e593-4aed-ba67-b173808140e8] EVENT_ID: VDS_INITIATED_RUN_VM(506), Trying to restart VM stud-c7-1 on Host studvirt1
2019-05-14 13:26:47,094+02 INFO  [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (default task-6) [e3c5745c-e593-4aed-ba67-b173808140e8] EVENT_ID: USER_ATTACH_USER_TO_VM_FROM_POOL(316), Attaching User u0045469 to VM stud-c7-1 in VM Pool stud-c7-? was initiated by u0045469@esat.kuleuven.be-authz.
2019-05-14 13:26:47,098+02 WARN  [org.ovirt.engine.core.bll.lock.InMemoryLockManager] (default task-6) [e3c5745c-e593-4aed-ba67-b173808140e8] Trying to release exclusive lock which does not exist, lock key: 'a5bed59c-d2fe-4fe4-bff7-52efe089ebd6USER_VM_POOL'

-> it's trying to release the same lock2 as above and failing

2019-05-14 13:26:47,098+02 INFO  [org.ovirt.engine.core.bll.AttachUserToVmFromPoolAndRunCommand] (default task-6) [e3c5745c-e593-4aed-ba67-b173808140e8] Lock freed to object 'EngineLock:{exclusiveLocks='[a5bed59c-d2fe-4fe4-bff7-52efe089ebd6=USER_VM_POOL]', sharedLocks=''}'

-> now it's indicating that it released/freed the lock (lock2)

2019-05-14 13:26:48,518+02 INFO  [org.ovirt.engine.core.bll.ConcurrentChildCommandsExecutionCallback] (EE-ManagedThreadFactory-engineScheduled-Thread-3) [e3c5745c-e593-4aed-ba67-b173808140e8] Command 'AttachUserToVmFromPoolAndRun' id: '0148c91d-b053-4dc9-960c-f10bf0d3f343' child commands '[0470802d-09fa-4579-b95b-3d9603feff7b, 47dbfc58-3bae-4229-96eb-d1fc08911237]' executions were completed, status 'SUCCEEDED'
2019-05-14 13:26:49,584+02 INFO  [org.ovirt.engine.core.bll.AttachUserToVmFromPoolAndRunCommand] (EE-ManagedThreadFactory-engineScheduled-Thread-14) [e3c5745c-e593-4aed-ba67-b173808140e8] Ending command 'org.ovirt.engine.core.bll.AttachUserToVmFromPoolAndRunCommand' successfully.
2019-05-14 13:26:49,650+02 INFO  [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (EE-ManagedThreadFactory-engineScheduled-Thread-14) [e3c5745c-e593-4aed-ba67-b173808140e8] EVENT_ID: USER_ATTACH_USER_TO_VM_FROM_POOL_FINISHED_SUCCESS(318), User u0045469 successfully attached to VM stud-c7-1 in VM Pool stud-c7-?.
2019-05-14 13:26:50,584+02 INFO  [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (ForkJoinPool-1-worker-2) [] EVENT_ID: VM_CONSOLE_DISCONNECTED(168), User <UNKNOWN> got disconnected from VM stud-c7-1.
2019-05-14 13:26:50,585+02 INFO  [org.ovirt.engine.core.vdsbroker.monitoring.VmAnalyzer] (ForkJoinPool-1-worker-2) [] VM 'd8a99676-d520-425e-9974-1b1efe6da8a5'(stud-c7-1) moved from 'WaitForLaunch' --> 'PoweringUp'


The first lock (on the pool itself) is never released. It seems this lock should also be released? Maybe it's this lock that should be released the second time (instead of the second lock twice)?



When we start to launch another instance of the pool it fails:

2019-05-14 13:49:32,656+02 INFO  [org.ovirt.engine.core.vdsbroker.IsVmDuringInitiatingVDSCommand] (default task-11) [55cc0796-4f53-49cd-8739-3b7e7dd2d95b] START, IsVmDuringInitiatingVDSCommand( IsVmDuringInitiatingVDSCommandParameters:{vmId='d8a99676-d520-425e-9974-1b1efe6da8a5'}), log id: 7db2f4fc
2019-05-14 13:49:32,656+02 INFO  [org.ovirt.engine.core.vdsbroker.IsVmDuringInitiatingVDSCommand] (default task-11) [55cc0796-4f53-49cd-8739-3b7e7dd2d95b] FINISH, IsVmDuringInitiatingVDSCommand, return: false, log id: 7db2f4fc
2019-05-14 13:49:32,688+02 INFO  [org.ovirt.engine.core.bll.VmPoolHandler] (default task-11) [55cc0796-4f53-49cd-8739-3b7e7dd2d95b] Failed to Acquire Lock to object 'EngineLock:{exclusiveLocks='[d8a99676-d520-425e-9974-1b1efe6da8a5=VM]', sharedLocks=''}'
2019-05-14 13:49:32,700+02 INFO  [org.ovirt.engine.core.bll.AttachUserToVmFromPoolAndRunCommand] (default task-11) [55cc0796-4f53-49cd-8739-3b7e7dd2d95b] Lock Acquired to object 'EngineLock:{exclusiveLocks='[a5bed59c-d2fe-4fe4-bff7-52efe089ebd6=USER_VM_POOL]', sharedLocks=''}'
2019-05-14 13:49:32,700+02 WARN  [org.ovirt.engine.core.bll.AttachUserToVmFromPoolAndRunCommand] (default task-11) [55cc0796-4f53-49cd-8739-3b7e7dd2d95b] Validation of action 'AttachUserToVmFromPoolAndRun' failed for user u0045469@esat.kuleuven.be-authz. Reasons: VAR__ACTION__ALLOCATE_AND_RUN,VAR__TYPE__VM_FROM_VM_POOL,ACTION_TYPE_FAILED_NO_AVAILABLE_POOL_VMS
2019-05-14 13:49:32,700+02 INFO  [org.ovirt.engine.core.bll.AttachUserToVmFromPoolAndRunCommand] (default task-11) [55cc0796-4f53-49cd-8739-3b7e7dd2d95b] Lock freed to object 'EngineLock:{exclusiveLocks='[a5bed59c-d2fe-4fe4-bff7-52efe089ebd6=USER_VM_POOL]', sharedLocks=''}'
2019-05-14 13:49:32,706+02 ERROR [org.ovirt.engine.api.restapi.resource.AbstractBackendResource] (default task-11) [] Operation Failed: [Cannot allocate and run VM from VM-Pool. There are no available VMs in the VM-Pool.]


Regards,
Rik


-- 
Rik Theys
System Engineer
KU Leuven - Dept. Elektrotechniek (ESAT)
Kasteelpark Arenberg 10 bus 2440  - B-3001 Leuven-Heverlee
+32(0)16/32.11.07
----------------------------------------------------------------
<<Any errors in spelling, tact or fact are transmission errors>>