[ OST Failure Report ] [ oVirt Master ] [ 05-12-2017 ] [ 006_migrations.migrate_vm ]

This is a multi-part message in MIME format. --------------D695925BEA0466FA018AF9F5 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit Hi, We had a failure for test 006_migrations.migrate_vm on master. There was a libvirt disruption in the migration src and after that vdsm reported the migration as failed because the vm already exists which makes me suspect a split bran case. The patch reported has nothing to do with this issue and I think we simply stumbled on a race condition which can cause a split brain. Please note that there are several metrics related issues reported in vdsm logs as well. ** *Link and headline of suspected patches: * * *Not related * Link to Job: * **http://jenkins.ovirt.org/job/ovirt-master_change-queue-tester/4278/** * * *Link to all logs:* * * **http://jenkins.ovirt.org/job/ovirt-master_change-queue-tester/4278/artifact/... * * *(Relevant) error snippet from the log: * * *<error>* ** **Engine: * * *2017-12-05 06:26:48,546-05 ERROR [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (EE-ManagedThreadFactory-engine-Thread-385) [] EVENT_ID: VM_MIGRATION_TO_SERVER_FAILED(120), Migration failed (VM: vm0, Source: lago- basic-suite-master-host-1, Destination: lago-basic-suite-master-host-0). * *dst:** ** *2017-12-05 06:26:46,615-0500 WARN (jsonrpc/6) [vds] vm b3962e5c-08e3-444e-910e-ea675fa1a5c7 already exists (API:179) 2017-12-05 06:26:46,615-0500 ERROR (jsonrpc/6) [api] FINISH create error=Virtual machine already exists (api:124) Traceback (most recent call last): File "/usr/lib/python2.7/site-packages/vdsm/common/api.py", line 117, in method ret = func(*args, **kwargs) File "/usr/lib/python2.7/site-packages/vdsm/API.py", line 180, in create raise exception.VMExists() VMExists: Virtual machine already exists 2017-12-05 06:26:46,620-0500 INFO (jsonrpc/6) [api.virt] FINISH create return={'status': {'message': 'Virtual machine already exists', 'code': 4}} from=::ffff:192.168.201.3,50394 (api:52) 2017-12-05 06:26:46,620-0500 INFO (jsonrpc/6) [api.virt] FINISH migrationCreate return={'status': {'message': 'Virtual machine already exists', 'code': 4}} from=::ffff:192.168.201.3,50394 (api:52) 2017-12-05 06:26:46,620-0500 INFO (jsonrpc/6) [jsonrpc.JsonRpcServer] RPC call VM.migrationCreate failed (error 4) in 0.03 seconds (__init__:573) 2017-12-05 06:26:46,624-0500 INFO (jsonrpc/2) [api.virt] START destroy(gracefulAttempts=1) from=::ffff:192.168.201.3,50394 (api:46) 2017-12-05 06:26:46,624-0500 INFO (jsonrpc/2) [virt.vm] (vmId='b3962e5c-08e3-444e-910e-ea675fa1a5c7') Release VM resources (vm:4967) 2017-12-05 06:26:46,625-0500 WARN (jsonrpc/2) [virt.vm] (vmId='b3962e5c-08e3-444e-910e-ea675fa1a5c7') trying to set state to Powering down when already Down (vm:575) 2017-12-05 06:26:46,625-0500 INFO (jsonrpc/2) [virt.vm] (vmId='b3962e5c-08e3-444e-910e-ea675fa1a5c7') Stopping connection (guestagent:435) 2017-12-05 06:26:46,625-0500 INFO (jsonrpc/2) [virt.vm] (vmId='b3962e5c-08e3-444e-910e-ea675fa1a5c7') _destroyVmGraceful attempt #0 (vm:5004) 2017-12-05 06:26:46,626-0500 WARN (jsonrpc/2) [virt.vm] (vmId='b3962e5c-08e3-444e-910e-ea675fa1a5c7') VM 'b3962e5c-08e3-444e-910e-ea675fa1a5c7' couldn't be destroyed in libvirt: Requested operation is not valid: domain is not running (vm :5025) 2017-12-05 06:26:46,627-0500 INFO (jsonrpc/2) [vdsm.api] START teardownImage(sdUUID='952bb427-b88c-4fbe-99ef-49970d3aaf70', spUUID='9dcfeaaf-96b7-4e26-a327-5570e0e39261', imgUUID='e6eadbae-ec7a-48f4-8832-64a622a12bef', volUUID=None) from =::ffff:192.168.201.3,50394, task_id=2da93725-5533-4354-a369-751eb44f9ef2 (api:46) * **scr * * *2017-12-05 06:26:46,623-0500 ERROR (migsrc/b3962e5c) [virt.vm] (vmId='b3962e5c-08e3-444e-910e-ea675fa1a5c7') migration destination error: Virtual machine already exists (migration:290) * **disruption on **dst**: * * **2017-12-05 06:20:04,662-0500 INFO (jsonrpc/3) [jsonrpc.JsonRpcServer] RPC call VM.shutdown succeeded in 0.00 seconds (__init__:573) 2017-12-05 06:20:04,676-0500 ERROR (Thread-1) [root] Shutdown by QEMU Guest Agent failed (vm:5097) Traceback (most recent call last): File "/usr/lib/python2.7/site-packages/vdsm/virt/vm.py", line 5088, in qemuGuestAgentShutdown self._dom.shutdownFlags(libvirt.VIR_DOMAIN_SHUTDOWN_GUEST_AGENT) File "/usr/lib/python2.7/site-packages/vdsm/virt/virdomain.py", line 98, in f ret = attr(*args, **kwargs) File "/usr/lib/python2.7/site-packages/vdsm/libvirtconnection.py", line 126, in wrapper ret = f(*args, **kwargs) File "/usr/lib/python2.7/site-packages/vdsm/utils.py", line 512, in wrapper return func(inst, *args, **kwargs) File "/usr/lib64/python2.7/site-packages/libvirt.py", line 2403, in shutdownFlags if ret == -1: raise libvirtError ('virDomainShutdownFlags() failed', dom=self) libvirtError: Guest agent is not responding: QEMU guest agent is not connected 2017-12-05 06:20:04,697-0500 INFO (libvirt/events) [virt.vm] (vmId='b3962e5c-08e3-444e-910e-ea675fa1a5c7') block threshold 1 exceeded on 'vda' (/rhev/data-center/mnt/blockSD/952bb427-b88c-4fbe-99ef-49970d3aaf70/images/e6eadbae-ec7a-48f4- 8832-64a622a12bef/1f063c73-f5cd-44e8-a84f-7810857f82df) (drivemonitor:162) 2017-12-05 06:20:04,698-0500 INFO (libvirt/events) [virt.vm] (vmId='b3962e5c-08e3-444e-910e-ea675fa1a5c7') drive 'vda' threshold exceeded (storage:872) 2017-12-05 06:20:05,889-0500 INFO (jsonrpc/7) [api.host] START getAllVmStats() from=::1,41118 (api:46) 2017-12-05 06:20:05,891-0500 INFO (jsonrpc/7) [api.host] FINISH getAllVmStats return={'status': {'message': 'Done', 'code': 0}, 'statsList': (suppressed)} from=::1,41118 (api:52) 2017-12-05 06:20:05,892-0500 INFO (jsonrpc/7) [jsonrpc.JsonRpcServer] RPC call Host.getAllVmStats succeeded in 0.00 seconds (__init__:573) 2017-12-05 06:20:07,466-0500 INFO (libvirt/events) [virt.vm] (vmId='b3962e5c-08e3-444e-910e-ea675fa1a5c7') underlying process disconnected (vm:1024) 2017-12-05 06:20:07,469-0500 INFO (libvirt/events) [virt.vm] (vmId='b3962e5c-08e3-444e-910e-ea675fa1a5c7') Release VM resources (vm:4967) 2017-12-05 06:20:07,469-0500 INFO (libvirt/events) [virt.vm] (vmId='b3962e5c-08e3-444e-910e-ea675fa1a5c7') Stopping connection (guestagent:435) 2017-12-05 06:20:07,469-0500 INFO (libvirt/events) [vdsm.api] START teardownImage(sdUUID='952bb427-b88c-4fbe-99ef-49970d3aaf70', spUUID='9dcfeaaf-96b7-4e26-a327-5570e0e39261', imgUUID='e6eadbae-ec7a-48f4-8832-64a622a12bef', volUUID=None) from=internal, task_id=9efedf46-d3be-4e41-b7f9-a074ed6344f6 (api:46) * * *</error>* ** --------------D695925BEA0466FA018AF9F5 Content-Type: text/html; charset=utf-8 Content-Transfer-Encoding: 8bit <html> <head> <meta http-equiv="content-type" content="text/html; charset=utf-8"> </head> <body text="#000000" bgcolor="#FFFFFF"> <p>Hi, <br> </p> <p>We had a failure for test 006_migrations.migrate_vm on master. </p> <p>There was a libvirt disruption in the migration src and after that vdsm reported the migration as failed because the vm already exists which makes me suspect a split bran case. <br> </p> <p>The patch reported has nothing to do with this issue and I think we simply stumbled on a race condition which can cause a split brain. <br> </p> <p>Please note that there are several metrics related issues reported in vdsm logs as well. <br> </p> <p><b style="font-weight:normal;" id="docs-internal-guid-5859b7a1-269d-3fbb-6886-821a181fd550"> <p dir="ltr" style="line-height:1.38;margin-top:0pt;margin-bottom:0pt;"><span style="font-size:11pt;font-family:Arial;color:#000000;background-color:transparent;font-weight:400;font-style:normal;font-variant:normal;text-decoration:none;vertical-align:baseline;white-space:pre-wrap;">Link and headline of suspected patches: </span></p> <p dir="ltr" style="line-height:1.38;margin-top:0pt;margin-bottom:0pt;"><span style="font-size:11pt;font-family:Arial;color:#000000;background-color:transparent;font-weight:400;font-style:normal;font-variant:normal;text-decoration:none;vertical-align:baseline;white-space:pre-wrap;"> </span></p> <p dir="ltr" style="line-height:1.38;margin-top:0pt;margin-bottom:0pt;"><span style="font-size:11pt;font-family:Arial;color:#000000;background-color:transparent;font-weight:400;font-style:normal;font-variant:normal;text-decoration:none;vertical-align:baseline;white-space:pre-wrap;"><b>Not related </b> </span></p> <br> <p dir="ltr" style="line-height:1.38;margin-top:0pt;margin-bottom:0pt;"><span style="font-size:11pt;font-family:Arial;color:#000000;background-color:transparent;font-weight:400;font-style:normal;font-variant:normal;text-decoration:none;vertical-align:baseline;white-space:pre-wrap;">Link to Job:</span></p> <br> </b></p> <p><b style="font-weight:normal;" id="docs-internal-guid-5859b7a1-269d-3fbb-6886-821a181fd550"><b><a class="moz-txt-link-freetext" href="http://jenkins.ovirt.org/job/ovirt-master_change-queue-tester/4278/">http://jenkins.ovirt.org/job/ovirt-master_change-queue-tester/4278/</a></b></b></p> <p><b style="font-weight:normal;" id="docs-internal-guid-5859b7a1-269d-3fbb-6886-821a181fd550"><br> <p dir="ltr" style="line-height:1.38;margin-top:0pt;margin-bottom:0pt;"><span style="font-size:11pt;font-family:Arial;color:#000000;background-color:transparent;font-weight:400;font-style:normal;font-variant:normal;text-decoration:none;vertical-align:baseline;white-space:pre-wrap;">Link to all logs:</span></p> <br> </b></p> <p><b style="font-weight:normal;" id="docs-internal-guid-5859b7a1-269d-3fbb-6886-821a181fd550"><b><a class="moz-txt-link-freetext" href="http://jenkins.ovirt.org/job/ovirt-master_change-queue-tester/4278/artifact/exported-artifacts/basic-suit-master-el7/test_logs/basic-suite-master/post-006_migrations.py/">http://jenkins.ovirt.org/job/ovirt-master_change-queue-tester/4278/artifact/exported-artifacts/basic-suit-master-el7/test_logs/basic-suite-master/post-006_migrations.py/</a></b></b></p> <p><b style="font-weight:normal;" id="docs-internal-guid-5859b7a1-269d-3fbb-6886-821a181fd550"><br> <p dir="ltr" style="line-height:1.38;margin-top:0pt;margin-bottom:0pt;"><span style="font-size:11pt;font-family:Arial;color:#000000;background-color:transparent;font-weight:400;font-style:normal;font-variant:normal;text-decoration:none;vertical-align:baseline;white-space:pre-wrap;">(Relevant) error snippet from the log: </span></p> <p dir="ltr" style="line-height:1.38;margin-top:0pt;margin-bottom:0pt;"><b><span style="font-size:11pt;font-family:Arial;color:#000000;background-color:transparent;font-weight:400;font-style:normal;font-variant:normal;text-decoration:none;vertical-align:baseline;white-space:pre-wrap;"><error></span></b></p> <p dir="ltr" style="line-height:1.38;margin-top:0pt;margin-bottom:0pt;"><b><span style="font-size:11pt;font-family:Arial;color:#000000;background-color:transparent;font-weight:400;font-style:normal;font-variant:normal;text-decoration:none;vertical-align:baseline;white-space:pre-wrap;"> </span></b></p> <p dir="ltr" style="line-height:1.38;margin-top:0pt;margin-bottom:0pt;"><b><span style="font-size:11pt;font-family:Arial;color:#000000;background-color:transparent;font-weight:400;font-style:normal;font-variant:normal;text-decoration:none;vertical-align:baseline;white-space:pre-wrap;"><b>Engine: </b> </span></b></p> <p dir="ltr" style="line-height:1.38;margin-top:0pt;margin-bottom:0pt;"><b><span style="font-size:11pt;font-family:Arial;color:#000000;background-color:transparent;font-weight:400;font-style:normal;font-variant:normal;text-decoration:none;vertical-align:baseline;white-space:pre-wrap;">2017-12-05 06:26:48,546-05 ERROR [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (EE-ManagedThreadFactory-engine-Thread-385) [] EVENT_ID: VM_MIGRATION_TO_SERVER_FAILED(120), Migration failed (VM: vm0, Source: lago- basic-suite-master-host-1, Destination: lago-basic-suite-master-host-0). </span></b></p> <br> <b>dst:</b><b><br> </b></b></p> <p><b><span style="font-weight:normal;">2017-12-05 06:26:46,615-0500 WARN (jsonrpc/6) [vds] vm b3962e5c-08e3-444e-910e-ea675fa1a5c7 already exists (API:179)<br> 2017-12-05 06:26:46,615-0500 ERROR (jsonrpc/6) [api] FINISH create error=Virtual machine already exists (api:124)<br> Traceback (most recent call last):<br> File "/usr/lib/python2.7/site-packages/vdsm/common/api.py", line 117, in method<br> ret = func(*args, **kwargs)<br> File "/usr/lib/python2.7/site-packages/vdsm/API.py", line 180, in create<br> raise exception.VMExists()<br> VMExists: Virtual machine already exists<br> 2017-12-05 06:26:46,620-0500 INFO (jsonrpc/6) [api.virt] FINISH create return={'status': {'message': 'Virtual machine already exists', 'code': 4}} from=::ffff:192.168.201.3,50394 (api:52)<br> 2017-12-05 06:26:46,620-0500 INFO (jsonrpc/6) [api.virt] FINISH migrationCreate return={'status': {'message': 'Virtual machine already exists', 'code': 4}} from=::ffff:192.168.201.3,50394 (api:52)<br> 2017-12-05 06:26:46,620-0500 INFO (jsonrpc/6) [jsonrpc.JsonRpcServer] RPC call VM.migrationCreate failed (error 4) in 0.03 seconds (__init__:573)<br> 2017-12-05 06:26:46,624-0500 INFO (jsonrpc/2) [api.virt] START destroy(gracefulAttempts=1) from=::ffff:192.168.201.3,50394 (api:46)<br> 2017-12-05 06:26:46,624-0500 INFO (jsonrpc/2) [virt.vm] (vmId='b3962e5c-08e3-444e-910e-ea675fa1a5c7') Release VM resources (vm:4967)<br> 2017-12-05 06:26:46,625-0500 WARN (jsonrpc/2) [virt.vm] (vmId='b3962e5c-08e3-444e-910e-ea675fa1a5c7') trying to set state to Powering down when already Down (vm:575)<br> 2017-12-05 06:26:46,625-0500 INFO (jsonrpc/2) [virt.vm] (vmId='b3962e5c-08e3-444e-910e-ea675fa1a5c7') Stopping connection (guestagent:435)<br> 2017-12-05 06:26:46,625-0500 INFO (jsonrpc/2) [virt.vm] (vmId='b3962e5c-08e3-444e-910e-ea675fa1a5c7') _destroyVmGraceful attempt #0 (vm:5004)<br> 2017-12-05 06:26:46,626-0500 WARN (jsonrpc/2) [virt.vm] (vmId='b3962e5c-08e3-444e-910e-ea675fa1a5c7') VM 'b3962e5c-08e3-444e-910e-ea675fa1a5c7' couldn't be destroyed in libvirt: Requested operation is not valid: domain is not running (vm<br> :5025)<br> 2017-12-05 06:26:46,627-0500 INFO (jsonrpc/2) [vdsm.api] START teardownImage(sdUUID='952bb427-b88c-4fbe-99ef-49970d3aaf70', spUUID='9dcfeaaf-96b7-4e26-a327-5570e0e39261', imgUUID='e6eadbae-ec7a-48f4-8832-64a622a12bef', volUUID=None) from<br> =::ffff:192.168.201.3,50394, task_id=2da93725-5533-4354-a369-751eb44f9ef2 (api:46)<br> </span></b></p> <p><b><span style="font-weight:normal;"><b>scr </b><br> </span></b></p> <p><b><span style="font-weight:normal;">2017-12-05 06:26:46,623-0500 ERROR (migsrc/b3962e5c) [virt.vm] (vmId='b3962e5c-08e3-444e-910e-ea675fa1a5c7') migration destination error: Virtual machine already exists (migration:290)<br> </span></b></p> <p><b><span style="font-weight:normal;"><b>disruption on </b><b>dst</b><b>: </b><br> </span></b></p> <p><b style="font-weight:normal;" id="docs-internal-guid-5859b7a1-269d-3fbb-6886-821a181fd550"><b>2017-12-05 06:20:04,662-0500 INFO (jsonrpc/3) [jsonrpc.JsonRpcServer] RPC call VM.shutdown succeeded in 0.00 seconds (__init__:573)<br> 2017-12-05 06:20:04,676-0500 ERROR (Thread-1) [root] Shutdown by QEMU Guest Agent failed (vm:5097)<br> Traceback (most recent call last):<br> File "/usr/lib/python2.7/site-packages/vdsm/virt/vm.py", line 5088, in qemuGuestAgentShutdown<br> self._dom.shutdownFlags(libvirt.VIR_DOMAIN_SHUTDOWN_GUEST_AGENT)<br> File "/usr/lib/python2.7/site-packages/vdsm/virt/virdomain.py", line 98, in f<br> ret = attr(*args, **kwargs)<br> File "/usr/lib/python2.7/site-packages/vdsm/libvirtconnection.py", line 126, in wrapper<br> ret = f(*args, **kwargs)<br> File "/usr/lib/python2.7/site-packages/vdsm/utils.py", line 512, in wrapper<br> return func(inst, *args, **kwargs)<br> File "/usr/lib64/python2.7/site-packages/libvirt.py", line 2403, in shutdownFlags<br> if ret == -1: raise libvirtError ('virDomainShutdownFlags() failed', dom=self)<br> libvirtError: Guest agent is not responding: QEMU guest agent is not connected<br> 2017-12-05 06:20:04,697-0500 INFO (libvirt/events) [virt.vm] (vmId='b3962e5c-08e3-444e-910e-ea675fa1a5c7') block threshold 1 exceeded on 'vda' (/rhev/data-center/mnt/blockSD/952bb427-b88c-4fbe-99ef-49970d3aaf70/images/e6eadbae-ec7a-48f4-<br> 8832-64a622a12bef/1f063c73-f5cd-44e8-a84f-7810857f82df) (drivemonitor:162)<br> 2017-12-05 06:20:04,698-0500 INFO (libvirt/events) [virt.vm] (vmId='b3962e5c-08e3-444e-910e-ea675fa1a5c7') drive 'vda' threshold exceeded (storage:872)<br> 2017-12-05 06:20:05,889-0500 INFO (jsonrpc/7) [api.host] START getAllVmStats() from=::1,41118 (api:46)<br> 2017-12-05 06:20:05,891-0500 INFO (jsonrpc/7) [api.host] FINISH getAllVmStats return={'status': {'message': 'Done', 'code': 0}, 'statsList': (suppressed)} from=::1,41118 (api:52)<br> 2017-12-05 06:20:05,892-0500 INFO (jsonrpc/7) [jsonrpc.JsonRpcServer] RPC call Host.getAllVmStats succeeded in 0.00 seconds (__init__:573)<br> 2017-12-05 06:20:07,466-0500 INFO (libvirt/events) [virt.vm] (vmId='b3962e5c-08e3-444e-910e-ea675fa1a5c7') underlying process disconnected (vm:1024)<br> 2017-12-05 06:20:07,469-0500 INFO (libvirt/events) [virt.vm] (vmId='b3962e5c-08e3-444e-910e-ea675fa1a5c7') Release VM resources (vm:4967)<br> 2017-12-05 06:20:07,469-0500 INFO (libvirt/events) [virt.vm] (vmId='b3962e5c-08e3-444e-910e-ea675fa1a5c7') Stopping connection (guestagent:435)<br> 2017-12-05 06:20:07,469-0500 INFO (libvirt/events) [vdsm.api] START teardownImage(sdUUID='952bb427-b88c-4fbe-99ef-49970d3aaf70', spUUID='9dcfeaaf-96b7-4e26-a327-5570e0e39261', imgUUID='e6eadbae-ec7a-48f4-8832-64a622a12bef', volUUID=None)<br> from=internal, task_id=9efedf46-d3be-4e41-b7f9-a074ed6344f6 (api:46)<br> </b><br> <p dir="ltr" style="line-height:1.38;margin-top:0pt;margin-bottom:0pt;"><span style="font-size:11pt;font-family:Arial;color:#000000;background-color:transparent;font-weight:400;font-style:normal;font-variant:normal;text-decoration:none;vertical-align:baseline;white-space:pre-wrap;"></error></span></p> </b><br class="Apple-interchange-newline"> </p> <p><br> </p> <p><br> </p> </body> </html> --------------D695925BEA0466FA018AF9F5--
participants (1)
-
Dafna Ron