Milan,
log [1], VM b3962e5c-08e3-444e-910e-ea675fa1a5c7
migration away finished at 2017-12-05 06:26:24,515-0500
incoming migration of the same VM back, at 2017-12-05 06:26:46,614-0500
seems to me the migration away didn’t really properly clean up the VM. Milane,
can you check that if logs matches? Perhaps orphaned libvirt’s xml?
It seems that, for unknown reasons, Engine didn’t call Destroy on the
source after the first migration. So while the VM was no longer running
there, it was still present in Vdsm (in Down status) and thus the
following create call was rejected in API.
There are weird messages in engine.log during the first migration such
as the following ones, I don’t know whether they are harmless or not:
2017-12-05 06:26:19,799-05 INFO [org.ovirt.engine.core.vdsbroker.monitoring.VmAnalyzer]
(EE-ManagedThreadFactory-engineScheduled-Thread-89) [] VM
'b3962e5c-08e3-444e-910e-ea675fa1a5c7'(vm0) was unexpectedly detected as
'MigratingTo' on VDS
'56bf5bdd-8a95-4b7a-9c72-7206d6b59e38'(lago-basic-suite-master-host-1) (expected
on '104bd4da-24bb-4368-a5ca-21a465aca70e')
2017-12-05 06:26:19,799-05 INFO [org.ovirt.engine.core.vdsbroker.monitoring.VmAnalyzer]
(EE-ManagedThreadFactory-engineScheduled-Thread-89) [] VM
'b3962e5c-08e3-444e-910e-ea675fa1a5c7' is migrating to VDS
'56bf5bdd-8a95-4b7a-9c72-7206d6b59e38'(lago-basic-suite-master-host-1) ignoring it
in the refresh until migration is done
The migration completion event was successfully obtained by Engine:
2017-12-05 06:26:24,542-05 INFO
[org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector]
(ForkJoinPool-1-worker-2) [] EVENT_ID: VM_MIGRATION_DONE(63), Migration completed (VM:
vm0, Source: lago-basic-suite-master-host-0, Destination: lago-basic-suite-master-host-1,
Duration: 6 seconds, Total: 6 seconds, Actual downtime: 73ms)
Otherwise it would need reproduction in CI and some more logs….
[1]
http://jenkins.ovirt.org/job/ovirt-master_change-queue-tester/4278/artifa...
<
http://jenkins.ovirt.org/job/ovirt-master_change-queue-tester/4278/artifa...
> On 5 Dec 2017, at 13:26, Dafna Ron <dron(a)redhat.com> wrote:
>
> Hi,
> We had a failure for test 006_migrations.migrate_vm on master.
>
> There was a libvirt disruption in the migration src and after that vdsm reported the
migration as failed because the vm already exists which makes me suspect a split bran
case.
> The patch reported has nothing to do with this issue and I think we simply stumbled
on a race condition which can cause a split brain.
> Please note that there are several metrics related issues reported in vdsm logs as
well.
>
> Link and headline of suspected patches:
>
> Not related
>
> Link to Job:
>
>
>
http://jenkins.ovirt.org/job/ovirt-master_change-queue-tester/4278/
<
http://jenkins.ovirt.org/job/ovirt-master_change-queue-tester/4278/>
>
> Link to all logs:
>
>
>
http://jenkins.ovirt.org/job/ovirt-master_change-queue-tester/4278/artifa...
<
http://jenkins.ovirt.org/job/ovirt-master_change-queue-tester/4278/artifa...
>
> (Relevant) error snippet from the log:
> <error>
>
> Engine:
> 2017-12-05 06:26:48,546-05 ERROR
[org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector]
(EE-ManagedThreadFactory-engine-Thread-385) [] EVENT_ID:
VM_MIGRATION_TO_SERVER_FAILED(120), Migration failed (VM: vm0, Source: lago-
> basic-suite-master-host-1, Destination: lago-basic-suite-master-host-0).
>
>
> dst:
>
> 2017-12-05 06:26:46,615-0500 WARN (jsonrpc/6) [vds] vm
b3962e5c-08e3-444e-910e-ea675fa1a5c7 already exists (API:179)
> 2017-12-05 06:26:46,615-0500 ERROR (jsonrpc/6) [api] FINISH create error=Virtual
machine already exists (api:124)
> Traceback (most recent call last):
> File "/usr/lib/python2.7/site-packages/vdsm/common/api.py", line 117, in
method
> ret = func(*args, **kwargs)
> File "/usr/lib/python2.7/site-packages/vdsm/API.py", line 180, in create
> raise exception.VMExists()
> VMExists: Virtual machine already exists
> 2017-12-05 06:26:46,620-0500 INFO (jsonrpc/6) [api.virt] FINISH create
return={'status': {'message': 'Virtual machine already exists',
'code': 4}} from=::ffff:192.168.201.3,50394 (api:52)
> 2017-12-05 06:26:46,620-0500 INFO (jsonrpc/6) [api.virt] FINISH migrationCreate
return={'status': {'message': 'Virtual machine already exists',
'code': 4}} from=::ffff:192.168.201.3,50394 (api:52)
> 2017-12-05 06:26:46,620-0500 INFO (jsonrpc/6) [jsonrpc.JsonRpcServer] RPC call
VM.migrationCreate failed (error 4) in 0.03 seconds (__init__:573)
> 2017-12-05 06:26:46,624-0500 INFO (jsonrpc/2) [api.virt] START
destroy(gracefulAttempts=1) from=::ffff:192.168.201.3,50394 (api:46)
> 2017-12-05 06:26:46,624-0500 INFO (jsonrpc/2) [virt.vm]
(vmId='b3962e5c-08e3-444e-910e-ea675fa1a5c7') Release VM resources (vm:4967)
> 2017-12-05 06:26:46,625-0500 WARN (jsonrpc/2) [virt.vm]
(vmId='b3962e5c-08e3-444e-910e-ea675fa1a5c7') trying to set state to Powering down
when already Down (vm:575)
> 2017-12-05 06:26:46,625-0500 INFO (jsonrpc/2) [virt.vm]
(vmId='b3962e5c-08e3-444e-910e-ea675fa1a5c7') Stopping connection
(guestagent:435)
> 2017-12-05 06:26:46,625-0500 INFO (jsonrpc/2) [virt.vm]
(vmId='b3962e5c-08e3-444e-910e-ea675fa1a5c7') _destroyVmGraceful attempt #0
(vm:5004)
> 2017-12-05 06:26:46,626-0500 WARN (jsonrpc/2) [virt.vm]
(vmId='b3962e5c-08e3-444e-910e-ea675fa1a5c7') VM
'b3962e5c-08e3-444e-910e-ea675fa1a5c7' couldn't be destroyed in libvirt:
Requested operation is not valid: domain is not running (vm
> :5025)
> 2017-12-05 06:26:46,627-0500 INFO (jsonrpc/2) [vdsm.api] START
teardownImage(sdUUID='952bb427-b88c-4fbe-99ef-49970d3aaf70',
spUUID='9dcfeaaf-96b7-4e26-a327-5570e0e39261',
imgUUID='e6eadbae-ec7a-48f4-8832-64a622a12bef', volUUID=None) from
> =::ffff:192.168.201.3,50394, task_id=2da93725-5533-4354-a369-751eb44f9ef2 (api:46)
>
> scr
>
> 2017-12-05 06:26:46,623-0500 ERROR (migsrc/b3962e5c) [virt.vm]
(vmId='b3962e5c-08e3-444e-910e-ea675fa1a5c7') migration destination error: Virtual
machine already exists (migration:290)
>
> disruption on dst:
>
> 2017-12-05 06:20:04,662-0500 INFO (jsonrpc/3) [jsonrpc.JsonRpcServer] RPC call
VM.shutdown succeeded in 0.00 seconds (__init__:573)
> 2017-12-05 06:20:04,676-0500 ERROR (Thread-1) [root] Shutdown by QEMU Guest Agent
failed (vm:5097)
> Traceback (most recent call last):
> File "/usr/lib/python2.7/site-packages/vdsm/virt/vm.py", line 5088, in
qemuGuestAgentShutdown
> self._dom.shutdownFlags(libvirt.VIR_DOMAIN_SHUTDOWN_GUEST_AGENT)
> File "/usr/lib/python2.7/site-packages/vdsm/virt/virdomain.py", line 98,
in f
> ret = attr(*args, **kwargs)
> File "/usr/lib/python2.7/site-packages/vdsm/libvirtconnection.py", line
126, in wrapper
> ret = f(*args, **kwargs)
> File "/usr/lib/python2.7/site-packages/vdsm/utils.py", line 512, in
wrapper
> return func(inst, *args, **kwargs)
> File "/usr/lib64/python2.7/site-packages/libvirt.py", line 2403, in
shutdownFlags
> if ret == -1: raise libvirtError ('virDomainShutdownFlags() failed',
dom=self)
> libvirtError: Guest agent is not responding: QEMU guest agent is not connected
> 2017-12-05 06:20:04,697-0500 INFO (libvirt/events) [virt.vm]
(vmId='b3962e5c-08e3-444e-910e-ea675fa1a5c7') block threshold 1 exceeded on
'vda'
(/rhev/data-center/mnt/blockSD/952bb427-b88c-4fbe-99ef-49970d3aaf70/images/e6eadbae-ec7a-48f4-
> 8832-64a622a12bef/1f063c73-f5cd-44e8-a84f-7810857f82df) (drivemonitor:162)
> 2017-12-05 06:20:04,698-0500 INFO (libvirt/events) [virt.vm]
(vmId='b3962e5c-08e3-444e-910e-ea675fa1a5c7') drive 'vda' threshold
exceeded (storage:872)
> 2017-12-05 06:20:05,889-0500 INFO (jsonrpc/7) [api.host] START getAllVmStats()
from=::1,41118 (api:46)
> 2017-12-05 06:20:05,891-0500 INFO (jsonrpc/7) [api.host] FINISH getAllVmStats
return={'status': {'message': 'Done', 'code': 0},
'statsList': (suppressed)} from=::1,41118 (api:52)
> 2017-12-05 06:20:05,892-0500 INFO (jsonrpc/7) [jsonrpc.JsonRpcServer] RPC call
Host.getAllVmStats succeeded in 0.00 seconds (__init__:573)
> 2017-12-05 06:20:07,466-0500 INFO (libvirt/events) [virt.vm]
(vmId='b3962e5c-08e3-444e-910e-ea675fa1a5c7') underlying process disconnected
(vm:1024)
> 2017-12-05 06:20:07,469-0500 INFO (libvirt/events) [virt.vm]
(vmId='b3962e5c-08e3-444e-910e-ea675fa1a5c7') Release VM resources (vm:4967)
> 2017-12-05 06:20:07,469-0500 INFO (libvirt/events) [virt.vm]
(vmId='b3962e5c-08e3-444e-910e-ea675fa1a5c7') Stopping connection
(guestagent:435)
> 2017-12-05 06:20:07,469-0500 INFO (libvirt/events) [vdsm.api] START
teardownImage(sdUUID='952bb427-b88c-4fbe-99ef-49970d3aaf70',
spUUID='9dcfeaaf-96b7-4e26-a327-5570e0e39261',
imgUUID='e6eadbae-ec7a-48f4-8832-64a622a12bef', volUUID=None)
> from=internal, task_id=9efedf46-d3be-4e41-b7f9-a074ed6344f6 (api:46)
>
>
> </error>
>
>
>
>
> _______________________________________________
> Devel mailing list
> Devel(a)ovirt.org
>
http://lists.ovirt.org/mailman/listinfo/devel