[ovirt-devel] [ OST Failure Report ] [ oVirt Master ] [ 05-12-2017 ] [ 006_migrations.migrate_vm ]

Arik Hadas ahadas at redhat.com
Mon Dec 11 16:54:05 UTC 2017


On Mon, Dec 11, 2017 at 6:44 PM, Milan Zamazal <mzamazal at redhat.com> wrote:

> Michal Skrivanek <michal.skrivanek at redhat.com> writes:
>
> > Milan,
> >
> > log [1], VM b3962e5c-08e3-444e-910e-ea675fa1a5c7
> > migration away finished at 2017-12-05 06:26:24,515-0500
> > incoming migration of the same VM back, at 2017-12-05 06:26:46,614-0500
> >
> > seems to me the migration away didn’t really properly clean up the VM.
> Milane,
> > can you check that if logs matches? Perhaps orphaned libvirt’s xml?
>
> It seems that, for unknown reasons, Engine didn’t call Destroy on the
> source after the first migration.  So while the VM was no longer running
> there, it was still present in Vdsm (in Down status) and thus the
> following create call was rejected in API.
>
> There are weird messages in engine.log during the first migration such
> as the following ones, I don’t know whether they are harmless or not:
>
>   2017-12-05 06:26:19,799-05 INFO  [org.ovirt.engine.core.
> vdsbroker.monitoring.VmAnalyzer] (EE-ManagedThreadFactory-engineScheduled-Thread-89)
> [] VM 'b3962e5c-08e3-444e-910e-ea675fa1a5c7'(vm0) was unexpectedly
> detected as 'MigratingTo' on VDS '56bf5bdd-8a95-4b7a-9c72-
> 7206d6b59e38'(lago-basic-suite-master-host-1) (expected on
> '104bd4da-24bb-4368-a5ca-21a465aca70e')
>   2017-12-05 06:26:19,799-05 INFO  [org.ovirt.engine.core.
> vdsbroker.monitoring.VmAnalyzer] (EE-ManagedThreadFactory-engineScheduled-Thread-89)
> [] VM 'b3962e5c-08e3-444e-910e-ea675fa1a5c7' is migrating to VDS
> '56bf5bdd-8a95-4b7a-9c72-7206d6b59e38'(lago-basic-suite-master-host-1)
> ignoring it in the refresh until migration is done
>

These logs are completely harmless - they only indicate that the VM's
run_on_vds still points to the source host and the hand-over was not done
yet.


>
> The migration completion event was successfully obtained by Engine:
>
>   2017-12-05 06:26:24,542-05 INFO  [org.ovirt.engine.core.dal.
> dbbroker.auditloghandling.AuditLogDirector] (ForkJoinPool-1-worker-2) []
> EVENT_ID: VM_MIGRATION_DONE(63), Migration completed (VM: vm0, Source:
> lago-basic-suite-master-host-0, Destination: lago-basic-suite-master-host-1,
> Duration: 6 seconds, Total: 6 seconds, Actual downtime: 73ms)
>
> > Otherwise it would need reproduction in CI and some more logs….
> >
> > [1]
> > http://jenkins.ovirt.org/job/ovirt-master_change-queue-
> tester/4278/artifact/exported-artifacts/basic-suit-master-
> el7/test_logs/basic-suite-master/post-006_migrations.py/
> lago-basic-suite-master-host-0/_var_log/vdsm/vdsm.log
> > <http://jenkins.ovirt.org/job/ovirt-master_change-queue-
> tester/4278/artifact/exported-artifacts/basic-suit-master-
> el7/test_logs/basic-suite-master/post-006_migrations.py/
> lago-basic-suite-master-host-0/_var_log/vdsm/vdsm.log>
> >
> >> On 5 Dec 2017, at 13:26, Dafna Ron <dron at redhat.com> wrote:
> >>
> >> Hi,
> >> We had a failure for test 006_migrations.migrate_vm on master.
> >>
> >> There was a libvirt disruption in the migration src  and after that
> vdsm reported the migration as failed because the vm already exists which
> makes me suspect a split bran case.
> >> The patch reported has nothing to do with this issue and I think we
> simply stumbled on a race condition which can cause a split brain.
> >> Please note that there are several metrics related issues reported in
> vdsm logs as well.
> >>
> >> Link and headline of suspected patches:
> >>
> >> Not related
> >>
> >> Link to Job:
> >>
> >>
> >> http://jenkins.ovirt.org/job/ovirt-master_change-queue-tester/4278/ <
> http://jenkins.ovirt.org/job/ovirt-master_change-queue-tester/4278/>
> >>
> >> Link to all logs:
> >>
> >>
> >> http://jenkins.ovirt.org/job/ovirt-master_change-queue-
> tester/4278/artifact/exported-artifacts/basic-suit-master-
> el7/test_logs/basic-suite-master/post-006_migrations.py/ <
> http://jenkins.ovirt.org/job/ovirt-master_change-queue-
> tester/4278/artifact/exported-artifacts/basic-suit-master-
> el7/test_logs/basic-suite-master/post-006_migrations.py/>
> >>
> >> (Relevant) error snippet from the log:
> >> <error>
> >>
> >> Engine:
> >> 2017-12-05 06:26:48,546-05 ERROR [org.ovirt.engine.core.dal.
> dbbroker.auditloghandling.AuditLogDirector] (EE-ManagedThreadFactory-engine-Thread-385)
> [] EVENT_ID: VM_MIGRATION_TO_SERVER_FAILED(120), Migration failed  (VM:
> vm0, Source: lago-
> >> basic-suite-master-host-1, Destination: lago-basic-suite-master-host-
> 0).
> >>
> >>
> >> dst:
> >>
> >> 2017-12-05 06:26:46,615-0500 WARN  (jsonrpc/6) [vds] vm
> b3962e5c-08e3-444e-910e-ea675fa1a5c7 already exists (API:179)
> >> 2017-12-05 06:26:46,615-0500 ERROR (jsonrpc/6) [api] FINISH create
> error=Virtual machine already exists (api:124)
> >> Traceback (most recent call last):
> >>   File "/usr/lib/python2.7/site-packages/vdsm/common/api.py", line
> 117, in method
> >>     ret = func(*args, **kwargs)
> >>   File "/usr/lib/python2.7/site-packages/vdsm/API.py", line 180, in
> create
> >>     raise exception.VMExists()
> >> VMExists: Virtual machine already exists
> >> 2017-12-05 06:26:46,620-0500 INFO  (jsonrpc/6) [api.virt] FINISH create
> return={'status': {'message': 'Virtual machine already exists', 'code': 4}}
> from=::ffff:192.168.201.3,50394 (api:52)
> >> 2017-12-05 06:26:46,620-0500 INFO  (jsonrpc/6) [api.virt] FINISH
> migrationCreate return={'status': {'message': 'Virtual machine already
> exists', 'code': 4}} from=::ffff:192.168.201.3,50394 (api:52)
> >> 2017-12-05 06:26:46,620-0500 INFO  (jsonrpc/6) [jsonrpc.JsonRpcServer]
> RPC call VM.migrationCreate failed (error 4) in 0.03 seconds (__init__:573)
> >> 2017-12-05 06:26:46,624-0500 INFO  (jsonrpc/2) [api.virt] START
> destroy(gracefulAttempts=1) from=::ffff:192.168.201.3,50394 (api:46)
> >> 2017-12-05 06:26:46,624-0500 INFO  (jsonrpc/2) [virt.vm]
> (vmId='b3962e5c-08e3-444e-910e-ea675fa1a5c7') Release VM resources
> (vm:4967)
> >> 2017-12-05 06:26:46,625-0500 WARN  (jsonrpc/2) [virt.vm]
> (vmId='b3962e5c-08e3-444e-910e-ea675fa1a5c7') trying to set state to
> Powering down when already Down (vm:575)
> >> 2017-12-05 06:26:46,625-0500 INFO  (jsonrpc/2) [virt.vm]
> (vmId='b3962e5c-08e3-444e-910e-ea675fa1a5c7') Stopping connection
> (guestagent:435)
> >> 2017-12-05 06:26:46,625-0500 INFO  (jsonrpc/2) [virt.vm]
> (vmId='b3962e5c-08e3-444e-910e-ea675fa1a5c7') _destroyVmGraceful attempt
> #0 (vm:5004)
> >> 2017-12-05 06:26:46,626-0500 WARN  (jsonrpc/2) [virt.vm]
> (vmId='b3962e5c-08e3-444e-910e-ea675fa1a5c7') VM 'b3962e5c-08e3-444e-910e-ea675fa1a5c7'
> couldn't be destroyed in libvirt: Requested operation is not valid: domain
> is not running (vm
> >> :5025)
> >> 2017-12-05 06:26:46,627-0500 INFO  (jsonrpc/2) [vdsm.api] START
> teardownImage(sdUUID='952bb427-b88c-4fbe-99ef-49970d3aaf70',
> spUUID='9dcfeaaf-96b7-4e26-a327-5570e0e39261',
> imgUUID='e6eadbae-ec7a-48f4-8832-64a622a12bef', volUUID=None) from
> >> =::ffff:192.168.201.3,50394, task_id=2da93725-5533-4354-a369-751eb44f9ef2
> (api:46)
> >>
> >> scr
> >>
> >> 2017-12-05 06:26:46,623-0500 ERROR (migsrc/b3962e5c) [virt.vm]
> (vmId='b3962e5c-08e3-444e-910e-ea675fa1a5c7') migration destination
> error: Virtual machine already exists (migration:290)
> >>
> >> disruption on dst:
> >>
> >> 2017-12-05 06:20:04,662-0500 INFO  (jsonrpc/3) [jsonrpc.JsonRpcServer]
> RPC call VM.shutdown succeeded in 0.00 seconds (__init__:573)
> >> 2017-12-05 06:20:04,676-0500 ERROR (Thread-1) [root] Shutdown by QEMU
> Guest Agent failed (vm:5097)
> >> Traceback (most recent call last):
> >>   File "/usr/lib/python2.7/site-packages/vdsm/virt/vm.py", line 5088,
> in qemuGuestAgentShutdown
> >>     self._dom.shutdownFlags(libvirt.VIR_DOMAIN_SHUTDOWN_GUEST_AGENT)
> >>   File "/usr/lib/python2.7/site-packages/vdsm/virt/virdomain.py", line
> 98, in f
> >>     ret = attr(*args, **kwargs)
> >>   File "/usr/lib/python2.7/site-packages/vdsm/libvirtconnection.py",
> line 126, in wrapper
> >>     ret = f(*args, **kwargs)
> >>   File "/usr/lib/python2.7/site-packages/vdsm/utils.py", line 512, in
> wrapper
> >>     return func(inst, *args, **kwargs)
> >>   File "/usr/lib64/python2.7/site-packages/libvirt.py", line 2403, in
> shutdownFlags
> >>     if ret == -1: raise libvirtError ('virDomainShutdownFlags()
> failed', dom=self)
> >> libvirtError: Guest agent is not responding: QEMU guest agent is not
> connected
> >> 2017-12-05 06:20:04,697-0500 INFO  (libvirt/events) [virt.vm]
> (vmId='b3962e5c-08e3-444e-910e-ea675fa1a5c7') block threshold 1 exceeded
> on 'vda' (/rhev/data-center/mnt/blockSD/952bb427-b88c-4fbe-
> 99ef-49970d3aaf70/images/e6eadbae-ec7a-48f4-
> >> 8832-64a622a12bef/1f063c73-f5cd-44e8-a84f-7810857f82df)
> (drivemonitor:162)
> >> 2017-12-05 06:20:04,698-0500 INFO  (libvirt/events) [virt.vm]
> (vmId='b3962e5c-08e3-444e-910e-ea675fa1a5c7') drive 'vda' threshold
> exceeded (storage:872)
> >> 2017-12-05 06:20:05,889-0500 INFO  (jsonrpc/7) [api.host] START
> getAllVmStats() from=::1,41118 (api:46)
> >> 2017-12-05 06:20:05,891-0500 INFO  (jsonrpc/7) [api.host] FINISH
> getAllVmStats return={'status': {'message': 'Done', 'code': 0},
> 'statsList': (suppressed)} from=::1,41118 (api:52)
> >> 2017-12-05 06:20:05,892-0500 INFO  (jsonrpc/7) [jsonrpc.JsonRpcServer]
> RPC call Host.getAllVmStats succeeded in 0.00 seconds (__init__:573)
> >> 2017-12-05 06:20:07,466-0500 INFO  (libvirt/events) [virt.vm]
> (vmId='b3962e5c-08e3-444e-910e-ea675fa1a5c7') underlying process
> disconnected (vm:1024)
> >> 2017-12-05 06:20:07,469-0500 INFO  (libvirt/events) [virt.vm]
> (vmId='b3962e5c-08e3-444e-910e-ea675fa1a5c7') Release VM resources
> (vm:4967)
> >> 2017-12-05 06:20:07,469-0500 INFO  (libvirt/events) [virt.vm]
> (vmId='b3962e5c-08e3-444e-910e-ea675fa1a5c7') Stopping connection
> (guestagent:435)
> >> 2017-12-05 06:20:07,469-0500 INFO  (libvirt/events) [vdsm.api] START
> teardownImage(sdUUID='952bb427-b88c-4fbe-99ef-49970d3aaf70',
> spUUID='9dcfeaaf-96b7-4e26-a327-5570e0e39261',
> imgUUID='e6eadbae-ec7a-48f4-8832-64a622a12bef', volUUID=None)
> >>  from=internal, task_id=9efedf46-d3be-4e41-b7f9-a074ed6344f6 (api:46)
> >>
> >>
> >> </error>
> >>
> >>
> >>
> >>
> >> _______________________________________________
> >> Devel mailing list
> >> Devel at ovirt.org
> >> http://lists.ovirt.org/mailman/listinfo/devel
> _______________________________________________
> Devel mailing list
> Devel at ovirt.org
> http://lists.ovirt.org/mailman/listinfo/devel
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.ovirt.org/pipermail/devel/attachments/20171211/51bb21a3/attachment-0001.html>


More information about the Devel mailing list