<div dir="ltr"><br><div class="gmail_extra"><br><div class="gmail_quote">On Mon, Dec 11, 2017 at 6:44 PM, Milan Zamazal <span dir="ltr"><<a href="mailto:mzamazal@redhat.com" target="_blank">mzamazal@redhat.com</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><span class="">Michal Skrivanek <<a href="mailto:michal.skrivanek@redhat.com">michal.skrivanek@redhat.com</a>> writes:<br>
<br>
> Milan,<br>
><br>
> log [1], VM b3962e5c-08e3-444e-910e-<wbr>ea675fa1a5c7<br>
> migration away finished at 2017-12-05 06:26:24,515-0500<br>
> incoming migration of the same VM back, at 2017-12-05 06:26:46,614-0500<br>
><br>
> seems to me the migration away didn’t really properly clean up the VM. Milane,<br>
> can you check that if logs matches? Perhaps orphaned libvirt’s xml?<br>
<br>
</span>It seems that, for unknown reasons, Engine didn’t call Destroy on the<br>
source after the first migration. So while the VM was no longer running<br>
there, it was still present in Vdsm (in Down status) and thus the<br>
following create call was rejected in API.<br>
<br>
There are weird messages in engine.log during the first migration such<br>
as the following ones, I don’t know whether they are harmless or not:<br>
<br>
2017-12-05 06:26:19,799-05 INFO [org.ovirt.engine.core.<wbr>vdsbroker.monitoring.<wbr>VmAnalyzer] (EE-ManagedThreadFactory-<wbr>engineScheduled-Thread-89) [] VM 'b3962e5c-08e3-444e-910e-<wbr>ea675fa1a5c7'(vm0) was unexpectedly detected as 'MigratingTo' on VDS '56bf5bdd-8a95-4b7a-9c72-<wbr>7206d6b59e38'(lago-basic-<wbr>suite-master-host-1) (expected on '104bd4da-24bb-4368-a5ca-<wbr>21a465aca70e')<br>
2017-12-05 06:26:19,799-05 INFO [org.ovirt.engine.core.<wbr>vdsbroker.monitoring.<wbr>VmAnalyzer] (EE-ManagedThreadFactory-<wbr>engineScheduled-Thread-89) [] VM 'b3962e5c-08e3-444e-910e-<wbr>ea675fa1a5c7' is migrating to VDS '56bf5bdd-8a95-4b7a-9c72-<wbr>7206d6b59e38'(lago-basic-<wbr>suite-master-host-1) ignoring it in the refresh until migration is done<br></blockquote><div><br></div><div>These logs are completely harmless - they only indicate that the VM's run_on_vds still points to the source host and the hand-over was not done yet.</div><div> </div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">
<br>
The migration completion event was successfully obtained by Engine:<br>
<br>
2017-12-05 06:26:24,542-05 INFO [org.ovirt.engine.core.dal.<wbr>dbbroker.auditloghandling.<wbr>AuditLogDirector] (ForkJoinPool-1-worker-2) [] EVENT_ID: VM_MIGRATION_DONE(63), Migration completed (VM: vm0, Source: lago-basic-suite-master-host-<wbr>0, Destination: lago-basic-suite-master-host-<wbr>1, Duration: 6 seconds, Total: 6 seconds, Actual downtime: 73ms)<br>
<span class=""><br>
> Otherwise it would need reproduction in CI and some more logs….<br>
><br>
> [1]<br>
> <a href="http://jenkins.ovirt.org/job/ovirt-master_change-queue-tester/4278/artifact/exported-artifacts/basic-suit-master-el7/test_logs/basic-suite-master/post-006_migrations.py/lago-basic-suite-master-host-0/_var_log/vdsm/vdsm.log" rel="noreferrer" target="_blank">http://jenkins.ovirt.org/job/<wbr>ovirt-master_change-queue-<wbr>tester/4278/artifact/exported-<wbr>artifacts/basic-suit-master-<wbr>el7/test_logs/basic-suite-<wbr>master/post-006_migrations.py/<wbr>lago-basic-suite-master-host-<wbr>0/_var_log/vdsm/vdsm.log</a><br>
> <<a href="http://jenkins.ovirt.org/job/ovirt-master_change-queue-tester/4278/artifact/exported-artifacts/basic-suit-master-el7/test_logs/basic-suite-master/post-006_migrations.py/lago-basic-suite-master-host-0/_var_log/vdsm/vdsm.log" rel="noreferrer" target="_blank">http://jenkins.ovirt.org/job/<wbr>ovirt-master_change-queue-<wbr>tester/4278/artifact/exported-<wbr>artifacts/basic-suit-master-<wbr>el7/test_logs/basic-suite-<wbr>master/post-006_migrations.py/<wbr>lago-basic-suite-master-host-<wbr>0/_var_log/vdsm/vdsm.log</a>><br>
><br>
>> On 5 Dec 2017, at 13:26, Dafna Ron <<a href="mailto:dron@redhat.com">dron@redhat.com</a>> wrote:<br>
>><br>
>> Hi,<br>
>> We had a failure for test 006_migrations.migrate_vm on master.<br>
>><br>
>> There was a libvirt disruption in the migration src and after that vdsm reported the migration as failed because the vm already exists which makes me suspect a split bran case.<br>
>> The patch reported has nothing to do with this issue and I think we simply stumbled on a race condition which can cause a split brain.<br>
>> Please note that there are several metrics related issues reported in vdsm logs as well.<br>
>><br>
>> Link and headline of suspected patches:<br>
>><br>
>> Not related<br>
>><br>
>> Link to Job:<br>
>><br>
>><br>
</span>>> <a href="http://jenkins.ovirt.org/job/ovirt-master_change-queue-tester/4278/" rel="noreferrer" target="_blank">http://jenkins.ovirt.org/job/<wbr>ovirt-master_change-queue-<wbr>tester/4278/</a> <<a href="http://jenkins.ovirt.org/job/ovirt-master_change-queue-tester/4278/" rel="noreferrer" target="_blank">http://jenkins.ovirt.org/job/<wbr>ovirt-master_change-queue-<wbr>tester/4278/</a>><br>
>><br>
>> Link to all logs:<br>
>><br>
>><br>
>> <a href="http://jenkins.ovirt.org/job/ovirt-master_change-queue-tester/4278/artifact/exported-artifacts/basic-suit-master-el7/test_logs/basic-suite-master/post-006_migrations.py/" rel="noreferrer" target="_blank">http://jenkins.ovirt.org/job/<wbr>ovirt-master_change-queue-<wbr>tester/4278/artifact/exported-<wbr>artifacts/basic-suit-master-<wbr>el7/test_logs/basic-suite-<wbr>master/post-006_migrations.py/</a> <<a href="http://jenkins.ovirt.org/job/ovirt-master_change-queue-tester/4278/artifact/exported-artifacts/basic-suit-master-el7/test_logs/basic-suite-master/post-006_migrations.py/" rel="noreferrer" target="_blank">http://jenkins.ovirt.org/job/<wbr>ovirt-master_change-queue-<wbr>tester/4278/artifact/exported-<wbr>artifacts/basic-suit-master-<wbr>el7/test_logs/basic-suite-<wbr>master/post-006_migrations.py/</a><wbr>><br>
<div class="HOEnZb"><div class="h5">>><br>
>> (Relevant) error snippet from the log:<br>
>> <error><br>
>><br>
>> Engine:<br>
>> 2017-12-05 06:26:48,546-05 ERROR [org.ovirt.engine.core.dal.<wbr>dbbroker.auditloghandling.<wbr>AuditLogDirector] (EE-ManagedThreadFactory-<wbr>engine-Thread-385) [] EVENT_ID: VM_MIGRATION_TO_SERVER_FAILED(<wbr>120), Migration failed (VM: vm0, Source: lago-<br>
>> basic-suite-master-host-1, Destination: lago-basic-suite-master-host-<wbr>0).<br>
>><br>
>><br>
>> dst:<br>
>><br>
>> 2017-12-05 06:26:46,615-0500 WARN (jsonrpc/6) [vds] vm b3962e5c-08e3-444e-910e-<wbr>ea675fa1a5c7 already exists (API:179)<br>
>> 2017-12-05 06:26:46,615-0500 ERROR (jsonrpc/6) [api] FINISH create error=Virtual machine already exists (api:124)<br>
>> Traceback (most recent call last):<br>
>> File "/usr/lib/python2.7/site-<wbr>packages/vdsm/common/api.py", line 117, in method<br>
>> ret = func(*args, **kwargs)<br>
>> File "/usr/lib/python2.7/site-<wbr>packages/vdsm/API.py", line 180, in create<br>
>> raise exception.VMExists()<br>
>> VMExists: Virtual machine already exists<br>
>> 2017-12-05 06:26:46,620-0500 INFO (jsonrpc/6) [api.virt] FINISH create return={'status': {'message': 'Virtual machine already exists', 'code': 4}} from=::ffff:192.168.201.3,<wbr>50394 (api:52)<br>
>> 2017-12-05 06:26:46,620-0500 INFO (jsonrpc/6) [api.virt] FINISH migrationCreate return={'status': {'message': 'Virtual machine already exists', 'code': 4}} from=::ffff:192.168.201.3,<wbr>50394 (api:52)<br>
>> 2017-12-05 06:26:46,620-0500 INFO (jsonrpc/6) [jsonrpc.JsonRpcServer] RPC call VM.migrationCreate failed (error 4) in 0.03 seconds (__init__:573)<br>
>> 2017-12-05 06:26:46,624-0500 INFO (jsonrpc/2) [api.virt] START destroy(gracefulAttempts=1) from=::ffff:192.168.201.3,<wbr>50394 (api:46)<br>
>> 2017-12-05 06:26:46,624-0500 INFO (jsonrpc/2) [virt.vm] (vmId='b3962e5c-08e3-444e-<wbr>910e-ea675fa1a5c7') Release VM resources (vm:4967)<br>
>> 2017-12-05 06:26:46,625-0500 WARN (jsonrpc/2) [virt.vm] (vmId='b3962e5c-08e3-444e-<wbr>910e-ea675fa1a5c7') trying to set state to Powering down when already Down (vm:575)<br>
>> 2017-12-05 06:26:46,625-0500 INFO (jsonrpc/2) [virt.vm] (vmId='b3962e5c-08e3-444e-<wbr>910e-ea675fa1a5c7') Stopping connection (guestagent:435)<br>
>> 2017-12-05 06:26:46,625-0500 INFO (jsonrpc/2) [virt.vm] (vmId='b3962e5c-08e3-444e-<wbr>910e-ea675fa1a5c7') _destroyVmGraceful attempt #0 (vm:5004)<br>
>> 2017-12-05 06:26:46,626-0500 WARN (jsonrpc/2) [virt.vm] (vmId='b3962e5c-08e3-444e-<wbr>910e-ea675fa1a5c7') VM 'b3962e5c-08e3-444e-910e-<wbr>ea675fa1a5c7' couldn't be destroyed in libvirt: Requested operation is not valid: domain is not running (vm<br>
>> :5025)<br>
>> 2017-12-05 06:26:46,627-0500 INFO (jsonrpc/2) [vdsm.api] START teardownImage(sdUUID='<wbr>952bb427-b88c-4fbe-99ef-<wbr>49970d3aaf70', spUUID='9dcfeaaf-96b7-4e26-<wbr>a327-5570e0e39261', imgUUID='e6eadbae-ec7a-48f4-<wbr>8832-64a622a12bef', volUUID=None) from<br>
>> =::ffff:192.168.201.3,50394, task_id=2da93725-5533-4354-<wbr>a369-751eb44f9ef2 (api:46)<br>
>><br>
>> scr<br>
>><br>
>> 2017-12-05 06:26:46,623-0500 ERROR (migsrc/b3962e5c) [virt.vm] (vmId='b3962e5c-08e3-444e-<wbr>910e-ea675fa1a5c7') migration destination error: Virtual machine already exists (migration:290)<br>
>><br>
>> disruption on dst:<br>
>><br>
>> 2017-12-05 06:20:04,662-0500 INFO (jsonrpc/3) [jsonrpc.JsonRpcServer] RPC call VM.shutdown succeeded in 0.00 seconds (__init__:573)<br>
>> 2017-12-05 06:20:04,676-0500 ERROR (Thread-1) [root] Shutdown by QEMU Guest Agent failed (vm:5097)<br>
>> Traceback (most recent call last):<br>
>> File "/usr/lib/python2.7/site-<wbr>packages/vdsm/virt/vm.py", line 5088, in qemuGuestAgentShutdown<br>
>> self._dom.shutdownFlags(<wbr>libvirt.VIR_DOMAIN_SHUTDOWN_<wbr>GUEST_AGENT)<br>
>> File "/usr/lib/python2.7/site-<wbr>packages/vdsm/virt/virdomain.<wbr>py", line 98, in f<br>
>> ret = attr(*args, **kwargs)<br>
>> File "/usr/lib/python2.7/site-<wbr>packages/vdsm/<wbr>libvirtconnection.py", line 126, in wrapper<br>
>> ret = f(*args, **kwargs)<br>
>> File "/usr/lib/python2.7/site-<wbr>packages/vdsm/utils.py", line 512, in wrapper<br>
>> return func(inst, *args, **kwargs)<br>
>> File "/usr/lib64/python2.7/site-<wbr>packages/libvirt.py", line 2403, in shutdownFlags<br>
>> if ret == -1: raise libvirtError ('virDomainShutdownFlags() failed', dom=self)<br>
>> libvirtError: Guest agent is not responding: QEMU guest agent is not connected<br>
>> 2017-12-05 06:20:04,697-0500 INFO (libvirt/events) [virt.vm] (vmId='b3962e5c-08e3-444e-<wbr>910e-ea675fa1a5c7') block threshold 1 exceeded on 'vda' (/rhev/data-center/mnt/<wbr>blockSD/952bb427-b88c-4fbe-<wbr>99ef-49970d3aaf70/images/<wbr>e6eadbae-ec7a-48f4-<br>
>> 8832-64a622a12bef/1f063c73-<wbr>f5cd-44e8-a84f-7810857f82df) (drivemonitor:162)<br>
>> 2017-12-05 06:20:04,698-0500 INFO (libvirt/events) [virt.vm] (vmId='b3962e5c-08e3-444e-<wbr>910e-ea675fa1a5c7') drive 'vda' threshold exceeded (storage:872)<br>
>> 2017-12-05 06:20:05,889-0500 INFO (jsonrpc/7) [api.host] START getAllVmStats() from=::1,41118 (api:46)<br>
>> 2017-12-05 06:20:05,891-0500 INFO (jsonrpc/7) [api.host] FINISH getAllVmStats return={'status': {'message': 'Done', 'code': 0}, 'statsList': (suppressed)} from=::1,41118 (api:52)<br>
>> 2017-12-05 06:20:05,892-0500 INFO (jsonrpc/7) [jsonrpc.JsonRpcServer] RPC call Host.getAllVmStats succeeded in 0.00 seconds (__init__:573)<br>
>> 2017-12-05 06:20:07,466-0500 INFO (libvirt/events) [virt.vm] (vmId='b3962e5c-08e3-444e-<wbr>910e-ea675fa1a5c7') underlying process disconnected (vm:1024)<br>
>> 2017-12-05 06:20:07,469-0500 INFO (libvirt/events) [virt.vm] (vmId='b3962e5c-08e3-444e-<wbr>910e-ea675fa1a5c7') Release VM resources (vm:4967)<br>
>> 2017-12-05 06:20:07,469-0500 INFO (libvirt/events) [virt.vm] (vmId='b3962e5c-08e3-444e-<wbr>910e-ea675fa1a5c7') Stopping connection (guestagent:435)<br>
>> 2017-12-05 06:20:07,469-0500 INFO (libvirt/events) [vdsm.api] START teardownImage(sdUUID='<wbr>952bb427-b88c-4fbe-99ef-<wbr>49970d3aaf70', spUUID='9dcfeaaf-96b7-4e26-<wbr>a327-5570e0e39261', imgUUID='e6eadbae-ec7a-48f4-<wbr>8832-64a622a12bef', volUUID=None)<br>
>> from=internal, task_id=9efedf46-d3be-4e41-<wbr>b7f9-a074ed6344f6 (api:46)<br>
>><br>
>><br>
>> </error><br>
>><br>
>><br>
>><br>
>><br>
>> ______________________________<wbr>_________________<br>
>> Devel mailing list<br>
>> <a href="mailto:Devel@ovirt.org">Devel@ovirt.org</a><br>
>> <a href="http://lists.ovirt.org/mailman/listinfo/devel" rel="noreferrer" target="_blank">http://lists.ovirt.org/<wbr>mailman/listinfo/devel</a><br>
______________________________<wbr>_________________<br>
Devel mailing list<br>
<a href="mailto:Devel@ovirt.org">Devel@ovirt.org</a><br>
<a href="http://lists.ovirt.org/mailman/listinfo/devel" rel="noreferrer" target="_blank">http://lists.ovirt.org/<wbr>mailman/listinfo/devel</a></div></div></blockquote></div><br></div></div>