Thanks Andrej.
I will follow the patch and update.
Dafna
On Thu, May 9, 2019 at 11:23 AM Andrej Krejcir <akrejcir(a)redhat.com>
wrote:
> Hi,
>
> Ok, I have posted the reverting patch:
>
https://gerrit.ovirt.org/#/c/99845/
>
> I'm still investigating what is the problem. Sorry for the delay, we had
> a public holiday yesturday.
>
>
> Andrej
>
> On Thu, 9 May 2019 at 11:20, Dafna Ron <dron(a)redhat.com> wrote:
>
>> Hi,
>>
>> I have not heard back on this issue and ovirt-engine has been broken for
>> the past 3 days.
>>
>> As this does not seem a simple debug and fix I suggest reverting the
>> patch and investigating later.
>>
>> thanks,
>> Dafna
>>
>>
>>
>> On Wed, May 8, 2019 at 9:42 AM Dafna Ron <dron(a)redhat.com> wrote:
>>
>>> Any news?
>>>
>>> Thanks,
>>> Dafna
>>>
>>>
>>> On Tue, May 7, 2019 at 4:57 PM Dafna Ron <dron(a)redhat.com> wrote:
>>>
>>>> thanks for the quick reply and investigation.
>>>> Please update me if I can help any further and if you find the cause
>>>> and have a patch let me know.
>>>> Note that ovirt-engine project is broken and if we cannot find the
>>>> cause relatively fast we should consider reverting the patch to allow a
new
>>>> package to be built in CQ with other changes that were submitted.
>>>>
>>>> Thanks,
>>>> Dafna
>>>>
>>>>
>>>> On Tue, May 7, 2019 at 4:42 PM Andrej Krejcir
<akrejcir(a)redhat.com>
>>>> wrote:
>>>>
>>>>> After running a few OSTs manually, it seems that the patch is the
>>>>> cause. Investigating...
>>>>>
>>>>> On Tue, 7 May 2019 at 14:58, Andrej Krejcir
<akrejcir(a)redhat.com>
>>>>> wrote:
>>>>>
>>>>>> Hi,
>>>>>>
>>>>>> The issue is probably not caused by the patch.
>>>>>>
>>>>>> This log line means that the VM does not exist in the DB:
>>>>>>
>>>>>> 2019-05-07 06:02:04,215-04 WARN
>>>>>> [org.ovirt.engine.core.bll.MigrateMultipleVmsCommand]
>>>>>> (EE-ManagedThreadFactory-commandCoordinator-Thread-2) [33485140]
Validation
>>>>>> of action 'MigrateMultipleVms' failed for user
admin@internal-authz.
>>>>>> Reasons: ACTION_TYPE_FAILED_VMS_NOT_FOUND
>>>>>>
>>>>>> I will investigate more, why the VM is missing.
>>>>>>
>>>>>> On Tue, 7 May 2019 at 14:07, Dafna Ron <dron(a)redhat.com>
wrote:
>>>>>>
>>>>>>> Hi,
>>>>>>>
>>>>>>> We are failing test upgrade_hosts on
>>>>>>> upgrade-from-release-suite-master.
>>>>>>> From the logs I can see that we are calling migrate vm when
we have
>>>>>>> only one host and the vm seem to have been shut down before
the maintenance
>>>>>>> call is issued.
>>>>>>>
>>>>>>> Can you please look into this?
>>>>>>>
>>>>>>> suspected patch reported as root cause by CQ is:
>>>>>>>
>>>>>>>
https://gerrit.ovirt.org/#/c/98920/ - core: Add
MigrateMultipleVms
>>>>>>> command and use it for host maintenance
>>>>>>>
>>>>>>>
>>>>>>> logs are found here:
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
http://jenkins.ovirt.org/job/ovirt-master_change-queue-tester/14021/artif...
>>>>>>>
>>>>>>>
>>>>>>> I can see the issue is vm migration when putting host in
>>>>>>> maintenance:
>>>>>>>
>>>>>>>
>>>>>>> 2019-05-07 06:02:04,170-04 INFO
>>>>>>> [org.ovirt.engine.core.bll.MaintenanceVdsCommand]
>>>>>>> (EE-ManagedThreadFactory-commandCoordinator-Thread-2)
>>>>>>> [05592db2-f859-487b-b779-4b32eec5bab
>>>>>>> 3] Running command: MaintenanceVdsCommand internal: true.
Entities
>>>>>>> affected : ID: 38e1379b-c3b6-4a2e-91df-d1f346e414a9 Type:
VDS
>>>>>>> 2019-05-07 06:02:04,215-04 WARN
>>>>>>> [org.ovirt.engine.core.bll.MigrateMultipleVmsCommand]
>>>>>>> (EE-ManagedThreadFactory-commandCoordinator-Thread-2)
[33485140] Validation
>>>>>>> of action
>>>>>>> 'MigrateMultipleVms' failed for user
admin@internal-authz.
>>>>>>> Reasons: ACTION_TYPE_FAILED_VMS_NOT_FOUND
>>>>>>> 2019-05-07 06:02:04,221-04 ERROR
>>>>>>> [org.ovirt.engine.core.bll.MaintenanceVdsCommand]
>>>>>>> (EE-ManagedThreadFactory-commandCoordinator-Thread-2)
[33485140] Failed to
>>>>>>> migrate one or
>>>>>>> more VMs.
>>>>>>> 2019-05-07 06:02:04,227-04 ERROR
>>>>>>>
[org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector]
>>>>>>> (EE-ManagedThreadFactory-commandCoordinator-Thread-2)
[33485140] EVEN
>>>>>>> T_ID: VDS_MAINTENANCE_FAILED(17), Failed to switch Host
>>>>>>> lago-upgrade-from-release-suite-master-host-0 to Maintenance
mode.
>>>>>>> 2019-05-07 06:02:04,239-04 INFO
>>>>>>> [org.ovirt.engine.core.bll.ActivateVdsCommand]
>>>>>>> (EE-ManagedThreadFactory-commandCoordinator-Thread-2)
[70840477] Lock
>>>>>>> Acquired to object 'Eng
>>>>>>>
ineLock:{exclusiveLocks='[38e1379b-c3b6-4a2e-91df-d1f346e414a9=VDS]',
>>>>>>> sharedLocks=''}'
>>>>>>> 2019-05-07 06:02:04,242-04 INFO
>>>>>>> [org.ovirt.engine.core.bll.ActivateVdsCommand]
>>>>>>> (EE-ManagedThreadFactory-commandCoordinator-Thread-2)
[70840477] Running
>>>>>>> command: ActivateVds
>>>>>>> Command internal: true. Entities affected : ID:
>>>>>>> 38e1379b-c3b6-4a2e-91df-d1f346e414a9 Type: VDSAction group
MANIPULATE_HOST
>>>>>>> with role type ADMIN
>>>>>>> 2019-05-07 06:02:04,243-04 INFO
>>>>>>> [org.ovirt.engine.core.bll.ActivateVdsCommand]
>>>>>>> (EE-ManagedThreadFactory-commandCoordinator-Thread-2)
[70840477] Before
>>>>>>> acquiring lock in ord
>>>>>>> er to prevent monitoring for host
>>>>>>> 'lago-upgrade-from-release-suite-master-host-0' from
data-center 'test-dc'
>>>>>>> 2019-05-07 06:02:04,243-04 INFO
>>>>>>> [org.ovirt.engine.core.bll.ActivateVdsCommand]
>>>>>>> (EE-ManagedThreadFactory-commandCoordinator-Thread-2)
[70840477] Lock
>>>>>>> acquired, from now a mo
>>>>>>> nitoring of host will be skipped for host
>>>>>>> 'lago-upgrade-from-release-suite-master-host-0' from
data-center 'test-dc'
>>>>>>> 2019-05-07 06:02:04,252-04 INFO
>>>>>>> [org.ovirt.engine.core.vdsbroker.SetVdsStatusVDSCommand]
>>>>>>> (EE-ManagedThreadFactory-commandCoordinator-Thread-2)
[70840477] START,
>>>>>>> SetVdsStatu
>>>>>>> sVDSCommand(HostName =
>>>>>>> lago-upgrade-from-release-suite-master-host-0,
>>>>>>>
SetVdsStatusVDSCommandParameters:{hostId='38e1379b-c3b6-4a2e-91df-d1f346e414a9',
>>>>>>> status='Unassigned', n
>>>>>>> onOperationalReason='NONE',
stopSpmFailureLogged='false',
>>>>>>> maintenanceReason='null'}), log id: 2c8aa211
>>>>>>> 2019-05-07 06:02:04,256-04 INFO
>>>>>>> [org.ovirt.engine.core.vdsbroker.SetVdsStatusVDSCommand]
>>>>>>> (EE-ManagedThreadFactory-commandCoordinator-Thread-2)
[70840477] FINISH,
>>>>>>> SetVdsStat
>>>>>>> usVDSCommand, return: , log id: 2c8aa211
>>>>>>> 2019-05-07 06:02:04,261-04 INFO
>>>>>>> [org.ovirt.engine.core.bll.ActivateVdsCommand]
>>>>>>> (EE-ManagedThreadFactory-commandCoordinator-Thread-2)
[70840477] Activate
>>>>>>> host finished. Lock
>>>>>>> released. Monitoring can run now for host
>>>>>>> 'lago-upgrade-from-release-suite-master-host-0' from
data-center 'test-dc'
>>>>>>> 2019-05-07 06:02:04,265-04 INFO
>>>>>>>
[org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector]
>>>>>>> (EE-ManagedThreadFactory-commandCoordinator-Thread-2)
[70840477] EVEN
>>>>>>> T_ID: VDS_ACTIVATE(16), Activation of host
>>>>>>> lago-upgrade-from-release-suite-master-host-0 initiated by
>>>>>>> admin@internal-authz.
>>>>>>> 2019-05-07 06:02:04,266-04 INFO
>>>>>>> [org.ovirt.engine.core.bll.ActivateVdsCommand]
>>>>>>> (EE-ManagedThreadFactory-commandCoordinator-Thread-2)
[70840477] Lock freed
>>>>>>> to object 'Engine
>>>>>>>
Lock:{exclusiveLocks='[38e1379b-c3b6-4a2e-91df-d1f346e414a9=VDS]',
>>>>>>> sharedLocks=''}'
>>>>>>> 2019-05-07 06:02:04,484-04 ERROR
>>>>>>> [org.ovirt.engine.core.bll.hostdeploy.HostUpgradeCallback]
>>>>>>> (EE-ManagedThreadFactory-engineScheduled-Thread-96)
>>>>>>> [05592db2-f859-487b-b779-4b32
>>>>>>> eec5bab3] Host
'lago-upgrade-from-release-suite-master-host-0'
>>>>>>> failed to move to maintenance mode. Upgrade process is
terminated.
>>>>>>>
>>>>>>> I can see there was only one vm running:
>>>>>>>
>>>>>>>
>>>>>>> drwxrwxr-x. 2 dron dron 1024 May 7 11:49 qemu
>>>>>>> [dron@dron post-004_basic_sanity.py]$ ls -l
>>>>>>>
lago-upgrade-from-release-suite-master-host-0/_var_log/libvirt/qemu/
>>>>>>> total 6
>>>>>>> -rw-rw-r--. 1 dron dron 4466 May 7 10:12 vm-with-iface.log
>>>>>>>
>>>>>>> and I can see that there was an attempt to terminate it with
an
>>>>>>> error that it does not exist:
>>>>>>>
>>>>>>>
>>>>>>>
stroyVmVDSCommandParameters:{hostId='38e1379b-c3b6-4a2e-91df-d1f346e414a9',
>>>>>>> vmId='dfbd75e2-a9cb-4fca-8788-a16954db4abf',
secondsToWait='0',
>>>>>>> gracefully='false', reason='', ig
>>>>>>> noreNoVm='false'}), log id: 24278e9b
>>>>>>> 2019-05-07 06:01:41,082-04 INFO
>>>>>>> [org.ovirt.engine.core.vdsbroker.vdsbroker.DestroyVDSCommand]
(default
>>>>>>> task-1) [105f7555-517b-4bf9-b86e-6eb42375de20] START,
DestroyVDSComma
>>>>>>> nd(HostName = lago-upgrade-from-release-suite-master-host-0,
>>>>>>>
DestroyVmVDSCommandParameters:{hostId='38e1379b-c3b6-4a2e-91df-d1f346e414a9',
>>>>>>> vmId='dfbd75e2-a9cb-4fca-8788-a169
>>>>>>> 54db4abf', secondsToWait='0',
gracefully='false', reason='',
>>>>>>> ignoreNoVm='false'}), log id: 78bba2f8
>>>>>>> 2019-05-07 06:01:42,090-04 INFO
>>>>>>> [org.ovirt.engine.core.vdsbroker.vdsbroker.DestroyVDSCommand]
(default
>>>>>>> task-1) [105f7555-517b-4bf9-b86e-6eb42375de20] FINISH,
DestroyVDSComm
>>>>>>> and, return: , log id: 78bba2f8
>>>>>>> 2019-05-07 06:01:42,090-04 INFO
>>>>>>> [org.ovirt.engine.core.vdsbroker.DestroyVmVDSCommand]
(default task-1)
>>>>>>> [105f7555-517b-4bf9-b86e-6eb42375de20] FINISH,
DestroyVmVDSCommand, r
>>>>>>> eturn: , log id: 24278e9b
>>>>>>> 2019-05-07 06:01:42,094-04 INFO
>>>>>>> [org.ovirt.engine.core.vdsbroker.monitoring.VmAnalyzer]
>>>>>>> (ForkJoinPool-1-worker-4) [] VM
'dfbd75e2-a9cb-4fca-8788-a16954db4abf' was
>>>>>>> reported
>>>>>>> as Down on VDS
>>>>>>>
'38e1379b-c3b6-4a2e-91df-d1f346e414a9'(lago-upgrade-from-release-suite-master-host-0)
>>>>>>> 2019-05-07 06:01:42,096-04 INFO
>>>>>>>
[org.ovirt.engine.core.vdsbroker.vdsbroker.DestroyVDSCommand]
>>>>>>> (ForkJoinPool-1-worker-4) [] START,
DestroyVDSCommand(HostName =
>>>>>>> lago-upgrade-
>>>>>>> from-release-suite-master-host-0,
>>>>>>>
DestroyVmVDSCommandParameters:{hostId='38e1379b-c3b6-4a2e-91df-d1f346e414a9',
>>>>>>> vmId='dfbd75e2-a9cb-4fca-8788-a16954db4abf',
secondsToWait='0
>>>>>>> ', gracefully='false', reason='',
ignoreNoVm='true'}), log id:
>>>>>>> 1dbd31eb
>>>>>>> 2019-05-07 06:01:42,114-04 INFO
>>>>>>>
[org.ovirt.engine.core.vdsbroker.vdsbroker.DestroyVDSCommand]
>>>>>>> (ForkJoinPool-1-worker-4) [] Failed to destroy VM
>>>>>>> 'dfbd75e2-a9cb-4fca-8788-a16
>>>>>>> 954db4abf' because VM does not exist, ignoring
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>