I'm still investigating what is the problem. Sorry for the delay, we had a
public holiday yesturday.
Andrej
On Thu, 9 May 2019 at 11:20, Dafna Ron <dron(a)redhat.com> wrote:
Hi,
I have not heard back on this issue and ovirt-engine has been broken for
the past 3 days.
As this does not seem a simple debug and fix I suggest reverting the patch
and investigating later.
thanks,
Dafna
On Wed, May 8, 2019 at 9:42 AM Dafna Ron <dron(a)redhat.com> wrote:
> Any news?
>
> Thanks,
> Dafna
>
>
> On Tue, May 7, 2019 at 4:57 PM Dafna Ron <dron(a)redhat.com> wrote:
>
>> thanks for the quick reply and investigation.
>> Please update me if I can help any further and if you find the cause and
>> have a patch let me know.
>> Note that ovirt-engine project is broken and if we cannot find the cause
>> relatively fast we should consider reverting the patch to allow a new
>> package to be built in CQ with other changes that were submitted.
>>
>> Thanks,
>> Dafna
>>
>>
>> On Tue, May 7, 2019 at 4:42 PM Andrej Krejcir <akrejcir(a)redhat.com>
>> wrote:
>>
>>> After running a few OSTs manually, it seems that the patch is the
>>> cause. Investigating...
>>>
>>> On Tue, 7 May 2019 at 14:58, Andrej Krejcir <akrejcir(a)redhat.com>
>>> wrote:
>>>
>>>> Hi,
>>>>
>>>> The issue is probably not caused by the patch.
>>>>
>>>> This log line means that the VM does not exist in the DB:
>>>>
>>>> 2019-05-07 06:02:04,215-04 WARN
>>>> [org.ovirt.engine.core.bll.MigrateMultipleVmsCommand]
>>>> (EE-ManagedThreadFactory-commandCoordinator-Thread-2) [33485140]
Validation
>>>> of action 'MigrateMultipleVms' failed for user
admin@internal-authz.
>>>> Reasons: ACTION_TYPE_FAILED_VMS_NOT_FOUND
>>>>
>>>> I will investigate more, why the VM is missing.
>>>>
>>>> On Tue, 7 May 2019 at 14:07, Dafna Ron <dron(a)redhat.com> wrote:
>>>>
>>>>> Hi,
>>>>>
>>>>> We are failing test upgrade_hosts on
>>>>> upgrade-from-release-suite-master.
>>>>> From the logs I can see that we are calling migrate vm when we have
>>>>> only one host and the vm seem to have been shut down before the
maintenance
>>>>> call is issued.
>>>>>
>>>>> Can you please look into this?
>>>>>
>>>>> suspected patch reported as root cause by CQ is:
>>>>>
>>>>>
https://gerrit.ovirt.org/#/c/98920/ - core: Add MigrateMultipleVms
>>>>> command and use it for host maintenance
>>>>>
>>>>>
>>>>> logs are found here:
>>>>>
>>>>>
>>>>>
>>>>>
http://jenkins.ovirt.org/job/ovirt-master_change-queue-tester/14021/artif...
>>>>>
>>>>>
>>>>> I can see the issue is vm migration when putting host in
maintenance:
>>>>>
>>>>>
>>>>> 2019-05-07 06:02:04,170-04 INFO
>>>>> [org.ovirt.engine.core.bll.MaintenanceVdsCommand]
>>>>> (EE-ManagedThreadFactory-commandCoordinator-Thread-2)
>>>>> [05592db2-f859-487b-b779-4b32eec5bab
>>>>> 3] Running command: MaintenanceVdsCommand internal: true. Entities
>>>>> affected : ID: 38e1379b-c3b6-4a2e-91df-d1f346e414a9 Type: VDS
>>>>> 2019-05-07 06:02:04,215-04 WARN
>>>>> [org.ovirt.engine.core.bll.MigrateMultipleVmsCommand]
>>>>> (EE-ManagedThreadFactory-commandCoordinator-Thread-2) [33485140]
Validation
>>>>> of action
>>>>> 'MigrateMultipleVms' failed for user admin@internal-authz.
Reasons:
>>>>> ACTION_TYPE_FAILED_VMS_NOT_FOUND
>>>>> 2019-05-07 06:02:04,221-04 ERROR
>>>>> [org.ovirt.engine.core.bll.MaintenanceVdsCommand]
>>>>> (EE-ManagedThreadFactory-commandCoordinator-Thread-2) [33485140]
Failed to
>>>>> migrate one or
>>>>> more VMs.
>>>>> 2019-05-07 06:02:04,227-04 ERROR
>>>>>
[org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector]
>>>>> (EE-ManagedThreadFactory-commandCoordinator-Thread-2) [33485140]
EVEN
>>>>> T_ID: VDS_MAINTENANCE_FAILED(17), Failed to switch Host
>>>>> lago-upgrade-from-release-suite-master-host-0 to Maintenance mode.
>>>>> 2019-05-07 06:02:04,239-04 INFO
>>>>> [org.ovirt.engine.core.bll.ActivateVdsCommand]
>>>>> (EE-ManagedThreadFactory-commandCoordinator-Thread-2) [70840477]
Lock
>>>>> Acquired to object 'Eng
>>>>>
ineLock:{exclusiveLocks='[38e1379b-c3b6-4a2e-91df-d1f346e414a9=VDS]',
>>>>> sharedLocks=''}'
>>>>> 2019-05-07 06:02:04,242-04 INFO
>>>>> [org.ovirt.engine.core.bll.ActivateVdsCommand]
>>>>> (EE-ManagedThreadFactory-commandCoordinator-Thread-2) [70840477]
Running
>>>>> command: ActivateVds
>>>>> Command internal: true. Entities affected : ID:
>>>>> 38e1379b-c3b6-4a2e-91df-d1f346e414a9 Type: VDSAction group
MANIPULATE_HOST
>>>>> with role type ADMIN
>>>>> 2019-05-07 06:02:04,243-04 INFO
>>>>> [org.ovirt.engine.core.bll.ActivateVdsCommand]
>>>>> (EE-ManagedThreadFactory-commandCoordinator-Thread-2) [70840477]
Before
>>>>> acquiring lock in ord
>>>>> er to prevent monitoring for host
>>>>> 'lago-upgrade-from-release-suite-master-host-0' from
data-center 'test-dc'
>>>>> 2019-05-07 06:02:04,243-04 INFO
>>>>> [org.ovirt.engine.core.bll.ActivateVdsCommand]
>>>>> (EE-ManagedThreadFactory-commandCoordinator-Thread-2) [70840477]
Lock
>>>>> acquired, from now a mo
>>>>> nitoring of host will be skipped for host
>>>>> 'lago-upgrade-from-release-suite-master-host-0' from
data-center 'test-dc'
>>>>> 2019-05-07 06:02:04,252-04 INFO
>>>>> [org.ovirt.engine.core.vdsbroker.SetVdsStatusVDSCommand]
>>>>> (EE-ManagedThreadFactory-commandCoordinator-Thread-2) [70840477]
START,
>>>>> SetVdsStatu
>>>>> sVDSCommand(HostName =
lago-upgrade-from-release-suite-master-host-0,
>>>>>
SetVdsStatusVDSCommandParameters:{hostId='38e1379b-c3b6-4a2e-91df-d1f346e414a9',
>>>>> status='Unassigned', n
>>>>> onOperationalReason='NONE',
stopSpmFailureLogged='false',
>>>>> maintenanceReason='null'}), log id: 2c8aa211
>>>>> 2019-05-07 06:02:04,256-04 INFO
>>>>> [org.ovirt.engine.core.vdsbroker.SetVdsStatusVDSCommand]
>>>>> (EE-ManagedThreadFactory-commandCoordinator-Thread-2) [70840477]
FINISH,
>>>>> SetVdsStat
>>>>> usVDSCommand, return: , log id: 2c8aa211
>>>>> 2019-05-07 06:02:04,261-04 INFO
>>>>> [org.ovirt.engine.core.bll.ActivateVdsCommand]
>>>>> (EE-ManagedThreadFactory-commandCoordinator-Thread-2) [70840477]
Activate
>>>>> host finished. Lock
>>>>> released. Monitoring can run now for host
>>>>> 'lago-upgrade-from-release-suite-master-host-0' from
data-center 'test-dc'
>>>>> 2019-05-07 06:02:04,265-04 INFO
>>>>>
[org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector]
>>>>> (EE-ManagedThreadFactory-commandCoordinator-Thread-2) [70840477]
EVEN
>>>>> T_ID: VDS_ACTIVATE(16), Activation of host
>>>>> lago-upgrade-from-release-suite-master-host-0 initiated by
>>>>> admin@internal-authz.
>>>>> 2019-05-07 06:02:04,266-04 INFO
>>>>> [org.ovirt.engine.core.bll.ActivateVdsCommand]
>>>>> (EE-ManagedThreadFactory-commandCoordinator-Thread-2) [70840477] Lock
freed
>>>>> to object 'Engine
>>>>>
Lock:{exclusiveLocks='[38e1379b-c3b6-4a2e-91df-d1f346e414a9=VDS]',
>>>>> sharedLocks=''}'
>>>>> 2019-05-07 06:02:04,484-04 ERROR
>>>>> [org.ovirt.engine.core.bll.hostdeploy.HostUpgradeCallback]
>>>>> (EE-ManagedThreadFactory-engineScheduled-Thread-96)
>>>>> [05592db2-f859-487b-b779-4b32
>>>>> eec5bab3] Host
'lago-upgrade-from-release-suite-master-host-0' failed
>>>>> to move to maintenance mode. Upgrade process is terminated.
>>>>>
>>>>> I can see there was only one vm running:
>>>>>
>>>>>
>>>>> drwxrwxr-x. 2 dron dron 1024 May 7 11:49 qemu
>>>>> [dron@dron post-004_basic_sanity.py]$ ls -l
>>>>> lago-upgrade-from-release-suite-master-host-0/_var_log/libvirt/qemu/
>>>>> total 6
>>>>> -rw-rw-r--. 1 dron dron 4466 May 7 10:12 vm-with-iface.log
>>>>>
>>>>> and I can see that there was an attempt to terminate it with an
error
>>>>> that it does not exist:
>>>>>
>>>>>
>>>>>
stroyVmVDSCommandParameters:{hostId='38e1379b-c3b6-4a2e-91df-d1f346e414a9',
>>>>> vmId='dfbd75e2-a9cb-4fca-8788-a16954db4abf',
secondsToWait='0',
>>>>> gracefully='false', reason='', ig
>>>>> noreNoVm='false'}), log id: 24278e9b
>>>>> 2019-05-07 06:01:41,082-04 INFO
>>>>> [org.ovirt.engine.core.vdsbroker.vdsbroker.DestroyVDSCommand]
(default
>>>>> task-1) [105f7555-517b-4bf9-b86e-6eb42375de20] START,
DestroyVDSComma
>>>>> nd(HostName = lago-upgrade-from-release-suite-master-host-0,
>>>>>
DestroyVmVDSCommandParameters:{hostId='38e1379b-c3b6-4a2e-91df-d1f346e414a9',
>>>>> vmId='dfbd75e2-a9cb-4fca-8788-a169
>>>>> 54db4abf', secondsToWait='0', gracefully='false',
reason='',
>>>>> ignoreNoVm='false'}), log id: 78bba2f8
>>>>> 2019-05-07 06:01:42,090-04 INFO
>>>>> [org.ovirt.engine.core.vdsbroker.vdsbroker.DestroyVDSCommand]
(default
>>>>> task-1) [105f7555-517b-4bf9-b86e-6eb42375de20] FINISH,
DestroyVDSComm
>>>>> and, return: , log id: 78bba2f8
>>>>> 2019-05-07 06:01:42,090-04 INFO
>>>>> [org.ovirt.engine.core.vdsbroker.DestroyVmVDSCommand] (default
task-1)
>>>>> [105f7555-517b-4bf9-b86e-6eb42375de20] FINISH, DestroyVmVDSCommand,
r
>>>>> eturn: , log id: 24278e9b
>>>>> 2019-05-07 06:01:42,094-04 INFO
>>>>> [org.ovirt.engine.core.vdsbroker.monitoring.VmAnalyzer]
>>>>> (ForkJoinPool-1-worker-4) [] VM
'dfbd75e2-a9cb-4fca-8788-a16954db4abf' was
>>>>> reported
>>>>> as Down on VDS
>>>>>
'38e1379b-c3b6-4a2e-91df-d1f346e414a9'(lago-upgrade-from-release-suite-master-host-0)
>>>>> 2019-05-07 06:01:42,096-04 INFO
>>>>> [org.ovirt.engine.core.vdsbroker.vdsbroker.DestroyVDSCommand]
>>>>> (ForkJoinPool-1-worker-4) [] START, DestroyVDSCommand(HostName =
>>>>> lago-upgrade-
>>>>> from-release-suite-master-host-0,
>>>>>
DestroyVmVDSCommandParameters:{hostId='38e1379b-c3b6-4a2e-91df-d1f346e414a9',
>>>>> vmId='dfbd75e2-a9cb-4fca-8788-a16954db4abf',
secondsToWait='0
>>>>> ', gracefully='false', reason='',
ignoreNoVm='true'}), log id:
>>>>> 1dbd31eb
>>>>> 2019-05-07 06:01:42,114-04 INFO
>>>>> [org.ovirt.engine.core.vdsbroker.vdsbroker.DestroyVDSCommand]
>>>>> (ForkJoinPool-1-worker-4) [] Failed to destroy VM
>>>>> 'dfbd75e2-a9cb-4fca-8788-a16
>>>>> 954db4abf' because VM does not exist, ignoring
>>>>>
>>>>>
>>>>>
>>>>>