Subject: [ OST Failure Report ] [ oVirt Master ] [ Jan 15th 2018 ] [ 006_migrations.migrate_vm ]

Hi, We had a failure in test 006_migrations.migrate_vm <http://jenkins.ovirt.org/job/ovirt-master_change-queue-tester/4842/testReport/junit/%28root%29/006_migrations/migrate_vm/>. the migration failed with reason "VMExists" Seems to be an issue which is caused by connectivity between engine and hosts. I remember this issue happening before a few weeks ago - is there a solution/bug for this issue? *Link and headline of suspected patches: https://gerrit.ovirt.org/#/c/86114/4 <https://gerrit.ovirt.org/#/c/86114/4> - net tests: Fix vlan creation name length in nettestlib Link to Job:* *http://jenkins.ovirt.org/job/ovirt-master_change-queue-tester/4842/ <http://jenkins.ovirt.org/job/ovirt-master_change-queue-tester/4842/>Link to all logs:http://jenkins.ovirt.org/job/ovirt-master_change-queue-tester/4842/artifact/ <http://jenkins.ovirt.org/job/ovirt-master_change-queue-tester/4842/artifact/>(Relevant) error snippet from the log: <error>* *vdsm dst:2018-01-15 06:47:03,355-0500 ERROR (jsonrpc/0) [api] FINISH create error=Virtual machine already exists (api:124)Traceback (most recent call last): File "/usr/lib/python2.7/site-packages/vdsm/common/api.py", line 117, in method ret = func(*args, **kwargs) File "/usr/lib/python2.7/site-packages/vdsm/API.py", line 180, in create raise exception.VMExists()VMExists: Virtual machine already exists* *vdsm src: 2018-01-15 06:47:03,359-0500 ERROR (migsrc/d17a2482) [virt.vm] *(vmId='d17a2482-4904-4cbc-8d13-3a3b7840782d') migration destination error: Virtual machine already exists (migration:290 *)* *Engine: 2018-01-15 06:45:30,169-05 ERROR [org.ovirt.engine.core.vdsbroker.monitoring.HostMonitoring] (EE-ManagedThreadFactory-engineScheduled-Thread-34) [] Failure to refresh host 'lago-basic-suite-master-host-0' runtime info: java.net.ConnectException: Connection refused2018-01-15 06:45:30,169-05 DEBUG [org.ovirt.engine.core.vdsbroker.monitoring.HostMonitoring] (EE-ManagedThreadFactory-engineScheduled-Thread-34) [] Exception: org.ovirt.engine.core.vdsbroker.vdsbroker.VDSNetworkException: java.net.ConnectException: Connection refused at org.ovirt.engine.core.vdsbroker.vdsbroker.VdsBrokerCommand.createNetworkException(VdsBrokerCommand.java:159) [vdsbroker.jar:] at org.ovirt.engine.core.vdsbroker.vdsbroker.VdsBrokerCommand.executeVDSCommand(VdsBrokerCommand.java:122) [vdsbroker.jar:] at org.ovirt.engine.core.vdsbroker.VDSCommandBase.executeCommand(VDSCommandBase.java:73) [vdsbroker.jar:] at org.ovirt.engine.core.dal.VdcCommandBase.execute(VdcCommandBase.java:33) [dal.jar:] at org.ovirt.engine.core.vdsbroker.vdsbroker.DefaultVdsCommandExecutor.execute(DefaultVdsCommandExecutor.java:14) [vdsbroker.jar:] at org.ovirt.engine.core.vdsbroker.ResourceManager.runVdsCommand(ResourceManager.java:387) [vdsbroker.jar:] at org.ovirt.engine.core.vdsbroker.ResourceManager$Proxy$_$$_WeldSubclass.runVdsCommand$$super(Unknown Source) [vdsbroker.jar:] at sun.reflect.GeneratedMethodAccessor234.invoke(Unknown Source) [:1.8.0_151] at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) [rt.jar:1.8.0_151] at java.lang.reflect.Method.invoke(Method.java:498) [rt.jar:1.8.0_151] at org.jboss.weld.interceptor.proxy.TerminalAroundInvokeInvocationContext.proceedInternal(TerminalAroundInvokeInvocationContext.java:49) [weld-core-impl-2.4.3.Final.jar:2.4.3.Final] at org.jboss.weld.interceptor.proxy.AroundInvokeInvocationContext.proceed(AroundInvokeInvocationContext.java:77) [weld-core-impl-2.4.3.Final.jar:2.4.3.Final] at org.ovirt.engine.core.common.di.interceptor.LoggingInterceptor.apply(LoggingInterceptor.java:12) [common.jar:] at sun.reflect.GeneratedMethodAccessor69.invoke(Unknown Source) [:1.8.0_151] at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) [rt.jar:1.8.0_151] at java.lang.reflect.Method.invoke(Method.java:498) [rt.jar:1.8.0_151] at org.jboss.weld.interceptor.reader.SimpleInterceptorInvocation$SimpleMethodInvocation.invoke(SimpleInterceptorInvocation.java:73) [weld-core-impl-2.4.3.Final.jar:2.4.3.Final] at org.jboss.weld.interceptor.proxy.InterceptorMethodHandler.executeAroundInvoke(InterceptorMethodHandler.java:84) [weld-core-impl-2.4.3.Final.jar:2.4.3.Final] at org.jboss.weld.interceptor.proxy.InterceptorMethodHandler.executeInterception(InterceptorMethodHandler.java:72) [weld-core-impl-2.4.3.Final.jar:2.4.3.Final] at org.jboss.weld.interceptor.proxy.InterceptorMethodHandler.invoke(InterceptorMethodHandler.java:56) [weld-core-impl-2.4.3.Final.jar:2.4.3.Final] at org.jboss.weld.bean.proxy.CombinedInterceptorAndDecoratorStackMethodHandler.invoke(CombinedInterceptorAndDecoratorStackMethodHandler.java:79) [weld-core-impl-2.4.3.Final.jar:2.4.3.Final] at org.jboss.weld.bean.proxy.CombinedInterceptorAndDecoratorStackMethodHandler.invoke(CombinedInterceptorAndDecoratorStackMethodHandler.java:68) [weld-core-impl-2.4.3.Final.jar:2.4.3.Final] at org.ovirt.engine.core.vdsbroker.ResourceManager$Proxy$_$$_WeldSubclass.runVdsCommand(Unknown Source) [vdsbroker.jar:] at org.ovirt.engine.core.vdsbroker.VdsManager.refreshCapabilities(VdsManager.java:647) [vdsbroker.jar:] at org.ovirt.engine.core.vdsbroker.monitoring.HostMonitoring.refreshVdsRunTimeInfo(HostMonitoring.java:118) [vdsbroker.jar:] at org.ovirt.engine.core.vdsbroker.monitoring.HostMonitoring.refresh(HostMonitoring.java:85) [vdsbroker.jar:] at org.ovirt.engine.core.vdsbroker.VdsManager.refresh(VdsManager.java:267) [vdsbroker.jar:] at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) [rt.jar:1.8.0_151] at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308) [rt.jar:1.8.0_151] at org.glassfish.enterprise.concurrent.internal.ManagedScheduledThreadPoolExecutor$ManagedScheduledFutureTask.access$201(ManagedScheduledThreadPoolExecutor.java:383) [javax.enterprise.concurrent-1.0.jar:] at org.glassfish.enterprise.concurrent.internal.ManagedScheduledThreadPoolExecutor$ManagedScheduledFutureTask.run(ManagedScheduledThreadPoolExecutor.java:534) [javax.enterprise.concurrent-1.0.jar:] at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) [rt.jar:1.8.0_151] at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) [rt.jar:1.8.0_151] at java.lang.Thread.run(Thread.java:748) [rt.jar:1.8.0_151] at org.glassfish.enterprise.concurrent.ManagedThreadFactoryImpl$ManagedThread.run(ManagedThreadFactoryImpl.java:250) [javax.enterprise.concurrent-1.0.jar:] at org.jboss.as.ee.concurrent.service.ElytronManagedThreadFactory$ElytronManagedThread.run(ElytronManagedThreadFactory.java:78)Caused by: java.net.ConnectException: Connection refused at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method) [rt.jar:1.8.0_151] at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:717) [rt.jar:1.8.0_151] at org.ovirt.vdsm.jsonrpc.client.reactors.ReactorClient.connect(ReactorClient.java:118) [vdsm-jsonrpc-java-client.jar:] at org.ovirt.vdsm.jsonrpc.client.JsonRpcClient.getClient(JsonRpcClient.java:160) [vdsm-jsonrpc-java-client.jar:] at org.ovirt.vdsm.jsonrpc.client.JsonRpcClient.call(JsonRpcClient.java:93) [vdsm-jsonrpc-java-client.jar:] at org.ovirt.engine.core.vdsbroker.jsonrpc.FutureMap.<init>(FutureMap.java:70) [vdsbroker.jar:] at org.ovirt.engine.core.vdsbroker.jsonrpc.JsonRpcVdsServer.getCapabilities(JsonRpcVdsServer.java:314) [vdsbroker.jar:] at org.ovirt.engine.core.vdsbroker.vdsbroker.GetCapabilitiesVDSCommand.executeVdsBrokerCommand(GetCapabilitiesVDSCommand.java:22) [vdsbroker.jar:] at org.ovirt.engine.core.vdsbroker.vdsbroker.VdsBrokerCommand.executeVDSCommand(VdsBrokerCommand.java:112) [vdsbroker.jar:] ... 34 more2018-01-15 06:45:30,170-05 DEBUG [org.ovirt.engine.core.vdsbroker.VdsManager] (EE-ManagedThreadFactory-engineScheduled-Thread-34) [] Failed to refresh VDS, network error, continuing, vds='lago-basic-suite-master-host-0'(31f09289-ec6c-42ff-a745-e82e8ac8e6b9): java.net.ConnectException: Connection refused* *</error>*

On Mon, Jan 15, 2018 at 5:13 PM, Dafna Ron <dron@redhat.com> wrote:
Hi,
We had a failure in test 006_migrations.migrate_vm <http://jenkins.ovirt.org/job/ovirt-master_change-queue-tester/4842/testReport/junit/%28root%29/006_migrations/migrate_vm/>.
the migration failed with reason "VMExists"
Seems to be an issue which is caused by connectivity between engine and hosts. I remember this issue happening before a few weeks ago - is there a solution/bug for this issue?
*Link and headline of suspected patches: https://gerrit.ovirt.org/#/c/86114/4 <https://gerrit.ovirt.org/#/c/86114/4> - net tests: Fix vlan creation name length in nettestlib*
This touched tests, not production code, so I do not think it is relevant.
* Link to Job:*
*http://jenkins.ovirt.org/job/ovirt-master_change-queue-tester/4842/ <http://jenkins.ovirt.org/job/ovirt-master_change-queue-tester/4842/>Link to all logs:http://jenkins.ovirt.org/job/ovirt-master_change-queue-tester/4842/artifact/ <http://jenkins.ovirt.org/job/ovirt-master_change-queue-tester/4842/artifact/>(Relevant) error snippet from the log: <error>*
*vdsm dst:2018-01-15 06:47:03,355-0500 ERROR (jsonrpc/0) [api] FINISH create error=Virtual machine already exists (api:124)Traceback (most recent call last): File "/usr/lib/python2.7/site-packages/vdsm/common/api.py", line 117, in method ret = func(*args, **kwargs) File "/usr/lib/python2.7/site-packages/vdsm/API.py", line 180, in create raise exception.VMExists()VMExists: Virtual machine already exists*
*vdsm src: 2018-01-15 06:47:03,359-0500 ERROR (migsrc/d17a2482) [virt.vm] * (vmId='d17a2482-4904-4cbc-8d13-3a3b7840782d') migration destination error: Virtual machine already exists (migration:290
*)*
*Engine: 2018-01-15 06:45:30,169-05 ERROR [org.ovirt.engine.core.vdsbroker.monitoring.HostMonitoring] (EE-ManagedThreadFactory-engineScheduled-Thread-34) [] Failure to refresh host 'lago-basic-suite-master-host-0' runtime info: java.net.ConnectException: Connection refused2018-01-15 06:45:30,169-05 DEBUG [org.ovirt.engine.core.vdsbroker.monitoring.HostMonitoring] (EE-ManagedThreadFactory-engineScheduled-Thread-34) [] Exception: org.ovirt.engine.core.vdsbroker.vdsbroker.VDSNetworkException: java.net.ConnectException: Connection refused at org.ovirt.engine.core.vdsbroker.vdsbroker.VdsBrokerCommand.createNetworkException(VdsBrokerCommand.java:159) [vdsbroker.jar:] at org.ovirt.engine.core.vdsbroker.vdsbroker.VdsBrokerCommand.executeVDSCommand(VdsBrokerCommand.java:122) [vdsbroker.jar:] at org.ovirt.engine.core.vdsbroker.VDSCommandBase.executeCommand(VDSCommandBase.java:73) [vdsbroker.jar:] at org.ovirt.engine.core.dal.VdcCommandBase.execute(VdcCommandBase.java:33) [dal.jar:] at org.ovirt.engine.core.vdsbroker.vdsbroker.DefaultVdsCommandExecutor.execute(DefaultVdsCommandExecutor.java:14) [vdsbroker.jar:] at org.ovirt.engine.core.vdsbroker.ResourceManager.runVdsCommand(ResourceManager.java:387) [vdsbroker.jar:] at org.ovirt.engine.core.vdsbroker.ResourceManager$Proxy$_$$_WeldSubclass.runVdsCommand$$super(Unknown Source) [vdsbroker.jar:] at sun.reflect.GeneratedMethodAccessor234.invoke(Unknown Source) [:1.8.0_151] at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) [rt.jar:1.8.0_151] at java.lang.reflect.Method.invoke(Method.java:498) [rt.jar:1.8.0_151] at org.jboss.weld.interceptor.proxy.TerminalAroundInvokeInvocationContext.proceedInternal(TerminalAroundInvokeInvocationContext.java:49) [weld-core-impl-2.4.3.Final.jar:2.4.3.Final] at org.jboss.weld.interceptor.proxy.AroundInvokeInvocationContext.proceed(AroundInvokeInvocationContext.java:77) [weld-core-impl-2.4.3.Final.jar:2.4.3.Final] at org.ovirt.engine.core.common.di.interceptor.LoggingInterceptor.apply(LoggingInterceptor.java:12) [common.jar:] at sun.reflect.GeneratedMethodAccessor69.invoke(Unknown Source) [:1.8.0_151] at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) [rt.jar:1.8.0_151] at java.lang.reflect.Method.invoke(Method.java:498) [rt.jar:1.8.0_151] at org.jboss.weld.interceptor.reader.SimpleInterceptorInvocation$SimpleMethodInvocation.invoke(SimpleInterceptorInvocation.java:73) [weld-core-impl-2.4.3.Final.jar:2.4.3.Final] at org.jboss.weld.interceptor.proxy.InterceptorMethodHandler.executeAroundInvoke(InterceptorMethodHandler.java:84) [weld-core-impl-2.4.3.Final.jar:2.4.3.Final] at org.jboss.weld.interceptor.proxy.InterceptorMethodHandler.executeInterception(InterceptorMethodHandler.java:72) [weld-core-impl-2.4.3.Final.jar:2.4.3.Final] at org.jboss.weld.interceptor.proxy.InterceptorMethodHandler.invoke(InterceptorMethodHandler.java:56) [weld-core-impl-2.4.3.Final.jar:2.4.3.Final] at org.jboss.weld.bean.proxy.CombinedInterceptorAndDecoratorStackMethodHandler.invoke(CombinedInterceptorAndDecoratorStackMethodHandler.java:79) [weld-core-impl-2.4.3.Final.jar:2.4.3.Final] at org.jboss.weld.bean.proxy.CombinedInterceptorAndDecoratorStackMethodHandler.invoke(CombinedInterceptorAndDecoratorStackMethodHandler.java:68) [weld-core-impl-2.4.3.Final.jar:2.4.3.Final] at org.ovirt.engine.core.vdsbroker.ResourceManager$Proxy$_$$_WeldSubclass.runVdsCommand(Unknown Source) [vdsbroker.jar:] at org.ovirt.engine.core.vdsbroker.VdsManager.refreshCapabilities(VdsManager.java:647) [vdsbroker.jar:] at org.ovirt.engine.core.vdsbroker.monitoring.HostMonitoring.refreshVdsRunTimeInfo(HostMonitoring.java:118) [vdsbroker.jar:] at org.ovirt.engine.core.vdsbroker.monitoring.HostMonitoring.refresh(HostMonitoring.java:85) [vdsbroker.jar:] at org.ovirt.engine.core.vdsbroker.VdsManager.refresh(VdsManager.java:267) [vdsbroker.jar:] at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) [rt.jar:1.8.0_151] at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308) [rt.jar:1.8.0_151] at org.glassfish.enterprise.concurrent.internal.ManagedScheduledThreadPoolExecutor$ManagedScheduledFutureTask.access$201(ManagedScheduledThreadPoolExecutor.java:383) [javax.enterprise.concurrent-1.0.jar:] at org.glassfish.enterprise.concurrent.internal.ManagedScheduledThreadPoolExecutor$ManagedScheduledFutureTask.run(ManagedScheduledThreadPoolExecutor.java:534) [javax.enterprise.concurrent-1.0.jar:] at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) [rt.jar:1.8.0_151] at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) [rt.jar:1.8.0_151] at java.lang.Thread.run(Thread.java:748) [rt.jar:1.8.0_151] at org.glassfish.enterprise.concurrent.ManagedThreadFactoryImpl$ManagedThread.run(ManagedThreadFactoryImpl.java:250) [javax.enterprise.concurrent-1.0.jar:] at org.jboss.as.ee.concurrent.service.ElytronManagedThreadFactory$ElytronManagedThread.run(ElytronManagedThreadFactory.java:78)Caused by: java.net.ConnectException: Connection refused at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method) [rt.jar:1.8.0_151] at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:717) [rt.jar:1.8.0_151] at org.ovirt.vdsm.jsonrpc.client.reactors.ReactorClient.connect(ReactorClient.java:118) [vdsm-jsonrpc-java-client.jar:] at org.ovirt.vdsm.jsonrpc.client.JsonRpcClient.getClient(JsonRpcClient.java:160) [vdsm-jsonrpc-java-client.jar:] at org.ovirt.vdsm.jsonrpc.client.JsonRpcClient.call(JsonRpcClient.java:93) [vdsm-jsonrpc-java-client.jar:] at org.ovirt.engine.core.vdsbroker.jsonrpc.FutureMap.<init>(FutureMap.java:70) [vdsbroker.jar:] at org.ovirt.engine.core.vdsbroker.jsonrpc.JsonRpcVdsServer.getCapabilities(JsonRpcVdsServer.java:314) [vdsbroker.jar:] at org.ovirt.engine.core.vdsbroker.vdsbroker.GetCapabilitiesVDSCommand.executeVdsBrokerCommand(GetCapabilitiesVDSCommand.java:22) [vdsbroker.jar:] at org.ovirt.engine.core.vdsbroker.vdsbroker.VdsBrokerCommand.executeVDSCommand(VdsBrokerCommand.java:112) [vdsbroker.jar:] ... 34 more2018-01-15 06:45:30,170-05 DEBUG [org.ovirt.engine.core.vdsbroker.VdsManager] (EE-ManagedThreadFactory-engineScheduled-Thread-34) [] Failed to refresh VDS, network error, continuing, vds='lago-basic-suite-master-host-0'(31f09289-ec6c-42ff-a745-e82e8ac8e6b9): java.net.ConnectException: Connection refused*
*</error>*

Dafna Ron <dron@redhat.com> writes:
We had a failure in test 006_migrations.migrate_vm <http://jenkins.ovirt.org/job/ovirt-master_change-queue-tester/4842/testReport/junit/%28root%29/006_migrations/migrate_vm/>.
the migration failed with reason "VMExists"
There are two migrations in 006_migrations.migrate_vm. The first one succeeded, but if I'm looking correctly into the logs, Engine didn't send Destroy to the source host after the migration had finished. Then the second migration gets rejected by Vdsm, because Vdsm still keeps the former Vm object instance in Down status. Since the test succeeds most of the time, it looks like some timing issue or border case. Arik, is it a known problem? If not, would you like to look into the logs, whether you can see what's happening?
Seems to be an issue which is caused by connectivity between engine and hosts. I remember this issue happening before a few weeks ago - is there a solution/bug for this issue?
None I'm aware of.
*Link and headline of suspected patches: https://gerrit.ovirt.org/#/c/86114/4 <https://gerrit.ovirt.org/#/c/86114/4> - net tests: Fix vlan creation name length in nettestlib Link to Job:*
It's just coincidence that it failed on that patch, so I'm excluding Edward from the discussion, he is innocent :-).
*http://jenkins.ovirt.org/job/ovirt-master_change-queue-tester/4842/ <http://jenkins.ovirt.org/job/ovirt-master_change-queue-tester/4842/>Link to all logs:http://jenkins.ovirt.org/job/ovirt-master_change-queue-tester/4842/artifact/ <http://jenkins.ovirt.org/job/ovirt-master_change-queue-tester/4842/artifact/>(Relevant) error snippet from the log: <error>*
*vdsm dst:2018-01-15 06:47:03,355-0500 ERROR (jsonrpc/0) [api] FINISH create error=Virtual machine already exists (api:124)Traceback (most recent call last): File "/usr/lib/python2.7/site-packages/vdsm/common/api.py", line 117, in method ret = func(*args, **kwargs) File "/usr/lib/python2.7/site-packages/vdsm/API.py", line 180, in create raise exception.VMExists()VMExists: Virtual machine already exists*
*vdsm src: 2018-01-15 06:47:03,359-0500 ERROR (migsrc/d17a2482) [virt.vm] *(vmId='d17a2482-4904-4cbc-8d13-3a3b7840782d') migration destination error: Virtual machine already exists (migration:290
*)*
*Engine: 2018-01-15 06:45:30,169-05 ERROR [org.ovirt.engine.core.vdsbroker.monitoring.HostMonitoring] (EE-ManagedThreadFactory-engineScheduled-Thread-34) [] Failure to refresh host 'lago-basic-suite-master-host-0' runtime info: java.net.ConnectException: Connection refused2018-01-15 06:45:30,169-05 DEBUG [org.ovirt.engine.core.vdsbroker.monitoring.HostMonitoring] (EE-ManagedThreadFactory-engineScheduled-Thread-34) [] Exception: org.ovirt.engine.core.vdsbroker.vdsbroker.VDSNetworkException: java.net.ConnectException: Connection refused at org.ovirt.engine.core.vdsbroker.vdsbroker.VdsBrokerCommand.createNetworkException(VdsBrokerCommand.java:159) [vdsbroker.jar:] at org.ovirt.engine.core.vdsbroker.vdsbroker.VdsBrokerCommand.executeVDSCommand(VdsBrokerCommand.java:122) [vdsbroker.jar:] at org.ovirt.engine.core.vdsbroker.VDSCommandBase.executeCommand(VDSCommandBase.java:73) [vdsbroker.jar:] at org.ovirt.engine.core.dal.VdcCommandBase.execute(VdcCommandBase.java:33) [dal.jar:] at org.ovirt.engine.core.vdsbroker.vdsbroker.DefaultVdsCommandExecutor.execute(DefaultVdsCommandExecutor.java:14) [vdsbroker.jar:] at org.ovirt.engine.core.vdsbroker.ResourceManager.runVdsCommand(ResourceManager.java:387) [vdsbroker.jar:] at org.ovirt.engine.core.vdsbroker.ResourceManager$Proxy$_$$_WeldSubclass.runVdsCommand$$super(Unknown Source) [vdsbroker.jar:] at sun.reflect.GeneratedMethodAccessor234.invoke(Unknown Source) [:1.8.0_151] at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) [rt.jar:1.8.0_151] at java.lang.reflect.Method.invoke(Method.java:498) [rt.jar:1.8.0_151] at org.jboss.weld.interceptor.proxy.TerminalAroundInvokeInvocationContext.proceedInternal(TerminalAroundInvokeInvocationContext.java:49) [weld-core-impl-2.4.3.Final.jar:2.4.3.Final] at org.jboss.weld.interceptor.proxy.AroundInvokeInvocationContext.proceed(AroundInvokeInvocationContext.java:77) [weld-core-impl-2.4.3.Final.jar:2.4.3.Final] at org.ovirt.engine.core.common.di.interceptor.LoggingInterceptor.apply(LoggingInterceptor.java:12) [common.jar:] at sun.reflect.GeneratedMethodAccessor69.invoke(Unknown Source) [:1.8.0_151] at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) [rt.jar:1.8.0_151] at java.lang.reflect.Method.invoke(Method.java:498) [rt.jar:1.8.0_151] at org.jboss.weld.interceptor.reader.SimpleInterceptorInvocation$SimpleMethodInvocation.invoke(SimpleInterceptorInvocation.java:73) [weld-core-impl-2.4.3.Final.jar:2.4.3.Final] at org.jboss.weld.interceptor.proxy.InterceptorMethodHandler.executeAroundInvoke(InterceptorMethodHandler.java:84) [weld-core-impl-2.4.3.Final.jar:2.4.3.Final] at org.jboss.weld.interceptor.proxy.InterceptorMethodHandler.executeInterception(InterceptorMethodHandler.java:72) [weld-core-impl-2.4.3.Final.jar:2.4.3.Final] at org.jboss.weld.interceptor.proxy.InterceptorMethodHandler.invoke(InterceptorMethodHandler.java:56) [weld-core-impl-2.4.3.Final.jar:2.4.3.Final] at org.jboss.weld.bean.proxy.CombinedInterceptorAndDecoratorStackMethodHandler.invoke(CombinedInterceptorAndDecoratorStackMethodHandler.java:79) [weld-core-impl-2.4.3.Final.jar:2.4.3.Final] at org.jboss.weld.bean.proxy.CombinedInterceptorAndDecoratorStackMethodHandler.invoke(CombinedInterceptorAndDecoratorStackMethodHandler.java:68) [weld-core-impl-2.4.3.Final.jar:2.4.3.Final] at org.ovirt.engine.core.vdsbroker.ResourceManager$Proxy$_$$_WeldSubclass.runVdsCommand(Unknown Source) [vdsbroker.jar:] at org.ovirt.engine.core.vdsbroker.VdsManager.refreshCapabilities(VdsManager.java:647) [vdsbroker.jar:] at org.ovirt.engine.core.vdsbroker.monitoring.HostMonitoring.refreshVdsRunTimeInfo(HostMonitoring.java:118) [vdsbroker.jar:] at org.ovirt.engine.core.vdsbroker.monitoring.HostMonitoring.refresh(HostMonitoring.java:85) [vdsbroker.jar:] at org.ovirt.engine.core.vdsbroker.VdsManager.refresh(VdsManager.java:267) [vdsbroker.jar:] at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) [rt.jar:1.8.0_151] at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308) [rt.jar:1.8.0_151] at org.glassfish.enterprise.concurrent.internal.ManagedScheduledThreadPoolExecutor$ManagedScheduledFutureTask.access$201(ManagedScheduledThreadPoolExecutor.java:383) [javax.enterprise.concurrent-1.0.jar:] at org.glassfish.enterprise.concurrent.internal.ManagedScheduledThreadPoolExecutor$ManagedScheduledFutureTask.run(ManagedScheduledThreadPoolExecutor.java:534) [javax.enterprise.concurrent-1.0.jar:] at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) [rt.jar:1.8.0_151] at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) [rt.jar:1.8.0_151] at java.lang.Thread.run(Thread.java:748) [rt.jar:1.8.0_151] at org.glassfish.enterprise.concurrent.ManagedThreadFactoryImpl$ManagedThread.run(ManagedThreadFactoryImpl.java:250) [javax.enterprise.concurrent-1.0.jar:] at org.jboss.as.ee.concurrent.service.ElytronManagedThreadFactory$ElytronManagedThread.run(ElytronManagedThreadFactory.java:78)Caused by: java.net.ConnectException: Connection refused at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method) [rt.jar:1.8.0_151] at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:717) [rt.jar:1.8.0_151] at org.ovirt.vdsm.jsonrpc.client.reactors.ReactorClient.connect(ReactorClient.java:118) [vdsm-jsonrpc-java-client.jar:] at org.ovirt.vdsm.jsonrpc.client.JsonRpcClient.getClient(JsonRpcClient.java:160) [vdsm-jsonrpc-java-client.jar:] at org.ovirt.vdsm.jsonrpc.client.JsonRpcClient.call(JsonRpcClient.java:93) [vdsm-jsonrpc-java-client.jar:] at org.ovirt.engine.core.vdsbroker.jsonrpc.FutureMap.<init>(FutureMap.java:70) [vdsbroker.jar:] at org.ovirt.engine.core.vdsbroker.jsonrpc.JsonRpcVdsServer.getCapabilities(JsonRpcVdsServer.java:314) [vdsbroker.jar:] at org.ovirt.engine.core.vdsbroker.vdsbroker.GetCapabilitiesVDSCommand.executeVdsBrokerCommand(GetCapabilitiesVDSCommand.java:22) [vdsbroker.jar:] at org.ovirt.engine.core.vdsbroker.vdsbroker.VdsBrokerCommand.executeVDSCommand(VdsBrokerCommand.java:112) [vdsbroker.jar:] ... 34 more2018-01-15 06:45:30,170-05 DEBUG [org.ovirt.engine.core.vdsbroker.VdsManager] (EE-ManagedThreadFactory-engineScheduled-Thread-34) [] Failed to refresh VDS, network error, continuing, vds='lago-basic-suite-master-host-0'(31f09289-ec6c-42ff-a745-e82e8ac8e6b9): java.net.ConnectException: Connection refused*
*</error>* _______________________________________________ Devel mailing list Devel@ovirt.org http://lists.ovirt.org/mailman/listinfo/devel

On Wed, Jan 17, 2018 at 9:41 PM, Milan Zamazal <mzamazal@redhat.com> wrote:
Dafna Ron <dron@redhat.com> writes:
We had a failure in test 006_migrations.migrate_vm <http://jenkins.ovirt.org/job/ovirt-master_change-queue- tester/4842/testReport/junit/%28root%29/006_migrations/migrate_vm/>.
the migration failed with reason "VMExists"
There are two migrations in 006_migrations.migrate_vm. The first one succeeded, but if I'm looking correctly into the logs, Engine didn't send Destroy to the source host after the migration had finished. Then the second migration gets rejected by Vdsm, because Vdsm still keeps the former Vm object instance in Down status.
Since the test succeeds most of the time, it looks like some timing issue or border case. Arik, is it a known problem? If not, would you like to look into the logs, whether you can see what's happening?
Your analysis is correct. That's a nice one actually! The statistics monitoring cycles of both hosts host-0 and host-1 were scheduled in a way that they are executed almost at the same time [1]. Now, at 6:46:34 the VM was migrated from host-1 to host-0. At 6:46:42 the migration succeeded - we got events from both hosts, but only processed the one from the destination so the VM switched to Up. The next statistics monitoring cycle was triggered at 6:46:44 - again, the report of that VM from the source host was skipped because we processed the one from the destination. At 6:46:59, in the next statistics monitoring cycle, it happened again - the report of the VM from the source host was skipped. The next migration was triggered at 6:47:05 - the engine didn't manage to process any report from the source host, so the VM remained Down there. The probability of this to happen is extremely low. However, I think we can make a little tweak to the monitoring code to avoid this: "If we get the VM as Down on an unexpected host (that is, not the host we expect the VM to run on), do not lock the VM" It should be safe since we don't update anything in this scenario. [1] For instance: 2018-01-15 06:46:44,905-05 ... GetAllVmStatsVDSCommand ... VdsIdVDSCommandParametersBase:{hostId='873a4d36-55fe-4be1-acb7-8de9c9123eb2'}) 2018-01-15 06:46:44,932-05 ... GetAllVmStatsVDSCommand ... VdsIdVDSCommandParametersBase:{hostId='31f09289-ec6c-42ff-a745-e82e8ac8e6b9'})

--Apple-Mail=_D554CC52-6154-4DEA-A96A-56848A52349A Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset=us-ascii > On 18 Jan 2018, at 17:36, Arik Hadas <ahadas@redhat.com> wrote: >=20 >=20 >=20 > On Wed, Jan 17, 2018 at 9:41 PM, Milan Zamazal <mzamazal@redhat.com = <mailto:mzamazal@redhat.com>> wrote: > Dafna Ron <dron@redhat.com <mailto:dron@redhat.com>> writes: >=20 > > We had a failure in test 006_migrations.migrate_vm > > = <http://jenkins.ovirt.org/job/ovirt-master_change-queue-tester/4842/testRe= port/junit/%28root%29/006_migrations/migrate_vm/ = <http://jenkins.ovirt.org/job/ovirt-master_change-queue-tester/4842/testRe= port/junit/%28root%29/006_migrations/migrate_vm/>>. > > > > the migration failed with reason "VMExists" >=20 > There are two migrations in 006_migrations.migrate_vm. The first one > succeeded, but if I'm looking correctly into the logs, Engine didn't > send Destroy to the source host after the migration had finished. = Then > the second migration gets rejected by Vdsm, because Vdsm still keeps = the > former Vm object instance in Down status. >=20 > Since the test succeeds most of the time, it looks like some timing > issue or border case. Arik, is it a known problem? If not, would you > like to look into the logs, whether you can see what's happening? >=20 > Your analysis is correct. That's a nice one actually! >=20 > The statistics monitoring cycles of both hosts host-0 and host-1 were = scheduled in a way that they are executed almost at the same time [1]. >=20 > Now, at 6:46:34 the VM was migrated from host-1 to host-0. > At 6:46:42 the migration succeeded - we got events from both hosts, = but only processed the one from the destination so the VM switched to = Up. > The next statistics monitoring cycle was triggered at 6:46:44 - again, = the report of that VM from the source host was skipped because we = processed the one from the destination. > At 6:46:59, in the next statistics monitoring cycle, it happened again = - the report of the VM from the source host was skipped. > The next migration was triggered at 6:47:05 - the engine didn't manage = to process any report from the source host, so the VM remained Down = there.=20 >=20 > The probability of this to happen is extremely low. Why wasn't the migration rerun? > However, I think we can make a little tweak to the monitoring code to = avoid this: > "If we get the VM as Down on an unexpected host (that is, not the host = we expect the VM to run on), do not lock the VM" > It should be safe since we don't update anything in this scenario. > =20 > [1] For instance: > 2018-01-15 06:46:44,905-05 ... GetAllVmStatsVDSCommand ... = VdsIdVDSCommandParametersBase:{hostId=3D'873a4d36-55fe-4be1-acb7-8de9c9123= eb2'}) > 2018-01-15 06:46:44,932-05 ... GetAllVmStatsVDSCommand ... = VdsIdVDSCommandParametersBase:{hostId=3D'31f09289-ec6c-42ff-a745-e82e8ac8e= 6b9'}) > _______________________________________________ > Devel mailing list > Devel@ovirt.org > http://lists.ovirt.org/mailman/listinfo/devel --Apple-Mail=_D554CC52-6154-4DEA-A96A-56848A52349A Content-Transfer-Encoding: quoted-printable Content-Type: text/html; charset=us-ascii <html><head><meta http-equiv=3D"Content-Type" content=3D"text/html; = charset=3Dus-ascii"></head><body style=3D"word-wrap: break-word; = -webkit-nbsp-mode: space; line-break: after-white-space;" class=3D""><br = class=3D""><div><br class=3D""><blockquote type=3D"cite" class=3D""><div = class=3D"">On 18 Jan 2018, at 17:36, Arik Hadas <<a = href=3D"mailto:ahadas@redhat.com" class=3D"">ahadas@redhat.com</a>> = wrote:</div><br class=3D"Apple-interchange-newline"><div class=3D""><div = dir=3D"ltr" class=3D""><br class=3D""><div class=3D"gmail_extra"><br = class=3D""><div class=3D"gmail_quote">On Wed, Jan 17, 2018 at 9:41 PM, = Milan Zamazal <span dir=3D"ltr" class=3D""><<a = href=3D"mailto:mzamazal@redhat.com" target=3D"_blank" = class=3D"">mzamazal@redhat.com</a>></span> wrote:<br = class=3D""><blockquote class=3D"gmail_quote" style=3D"margin:0px 0px 0px = 0.8ex;border-left-width:1px;border-left-style:solid;border-left-color:rgb(= 204,204,204);padding-left:1ex"><span class=3D"gmail-">Dafna Ron <<a = href=3D"mailto:dron@redhat.com" class=3D"">dron@redhat.com</a>> = writes:<br class=3D""> <br class=3D""> > We had a failure in test 006_migrations.migrate_vm<br class=3D""> </span>> <<a = href=3D"http://jenkins.ovirt.org/job/ovirt-master_change-queue-tester/4842= /testReport/junit/%28root%29/006_migrations/migrate_vm/" = rel=3D"noreferrer" target=3D"_blank" = class=3D"">http://jenkins.ovirt.org/job/<wbr = class=3D"">ovirt-master_change-queue-<wbr = class=3D"">tester/4842/testReport/junit/%<wbr = class=3D"">28root%29/006_migrations/<wbr = class=3D"">migrate_vm/</a>>.<br class=3D""> <span class=3D"gmail-">><br class=3D""> > the migration failed with reason "VMExists"<br class=3D""> <br class=3D""> </span>There are two migrations in 006_migrations.migrate_vm. The = first one<br class=3D""> succeeded, but if I'm looking correctly into the logs, Engine didn't<br = class=3D""> send Destroy to the source host after the migration had finished. = Then<br class=3D""> the second migration gets rejected by Vdsm, because Vdsm still keeps = the<br class=3D""> former Vm object instance in Down status.<br class=3D""> <br class=3D""> Since the test succeeds most of the time, it looks like some timing<br = class=3D""> issue or border case. Arik, is it a known problem? If not, = would you<br class=3D""> like to look into the logs, whether you can see what's = happening?</blockquote><div class=3D""><br class=3D""></div><div = class=3D"">Your analysis is correct. That's a nice one = actually!</div><div class=3D""><br class=3D""></div><div class=3D"">The = statistics monitoring cycles of both hosts host-0 and host-1 were = scheduled in a way that they are executed almost at the same time = [1].</div><div class=3D""><br class=3D""></div><div class=3D"">Now, at = 6:46:34 the VM was migrated from host-1 to host-0.</div><div class=3D"">At= 6:46:42 the migration succeeded - we got events from both hosts, but = only processed the one from the destination so the VM switched to = Up.</div><div class=3D"">The next statistics monitoring cycle was = triggered at 6:46:44 - again, the report of that VM from the source host = was skipped because we processed the one from the destination.</div><div = class=3D"">At 6:46:59, in the next statistics monitoring cycle, it = happened again - the report of the VM from the source host was = skipped.</div><div class=3D"">The next migration was triggered at = 6:47:05 - the engine didn't manage to process any report from the source = host, so the VM remained Down there. </div><div class=3D""><br = class=3D""></div><div class=3D"">The probability of this to happen is = extremely low.</div></div></div></div></div></blockquote><div><br = class=3D""></div></div><div>Why wasn't the migration = rerun?</div><div><br class=3D""><blockquote type=3D"cite" class=3D""><div = class=3D""><div dir=3D"ltr" class=3D""><div class=3D"gmail_extra"><div = class=3D"gmail_quote"><div class=3D"">However, I think we can make a = little tweak to the monitoring code to avoid this:</div><div = class=3D"">"If we get the VM as Down on an unexpected host (that is, not = the host we expect the VM to run on), do not lock the VM"</div><div = class=3D"">It should be safe since we don't update anything in this = scenario.</div><div class=3D""> </div><div class=3D"">[1] For = instance:</div><div class=3D""><div style=3D"margin: 0px; font-stretch: = normal; font-size: 11px; line-height: normal; font-family: Menlo;" = class=3D""><span style=3D"font-variant-ligatures:no-common-ligatures" = class=3D"">2018-01-15 06:46:44,905-05 = ... </span>GetAllVmStatsVDSCommand ... = VdsIdVDSCommandParametersBase:{hostId=3D'873a4d36-55fe-4be1-acb7-8de9c9123= eb2'})</div></div><div class=3D""><div style=3D"margin: 0px; = font-stretch: normal; font-size: 11px; line-height: normal; font-family: = Menlo;" class=3D""><span = style=3D"font-variant-ligatures:no-common-ligatures" class=3D"">2018-01-15= 06:46:44,932-05 ... </span>GetAllVmStatsVDSCommand ... = VdsIdVDSCommandParametersBase:{hostId=3D'31f09289-ec6c-42ff-a745-e82e8ac8e= 6b9'})</div></div></div></div></div> _______________________________________________<br class=3D"">Devel = mailing list<br class=3D""><a href=3D"mailto:Devel@ovirt.org" = class=3D"">Devel@ovirt.org</a><br = class=3D"">http://lists.ovirt.org/mailman/listinfo/devel</div></blockquote= ></div><br class=3D""></body></html>= --Apple-Mail=_D554CC52-6154-4DEA-A96A-56848A52349A--

On Fri, Jan 19, 2018 at 12:46 PM, Michal Skrivanek < michal.skrivanek@redhat.com> wrote:
On 18 Jan 2018, at 17:36, Arik Hadas <ahadas@redhat.com> wrote:
On Wed, Jan 17, 2018 at 9:41 PM, Milan Zamazal <mzamazal@redhat.com> wrote:
Dafna Ron <dron@redhat.com> writes:
We had a failure in test 006_migrations.migrate_vm <http://jenkins.ovirt.org/job/ovirt-master_change-queue-test er/4842/testReport/junit/%28root%29/006_migrations/migrate_vm/>.
the migration failed with reason "VMExists"
There are two migrations in 006_migrations.migrate_vm. The first one succeeded, but if I'm looking correctly into the logs, Engine didn't send Destroy to the source host after the migration had finished. Then the second migration gets rejected by Vdsm, because Vdsm still keeps the former Vm object instance in Down status.
Since the test succeeds most of the time, it looks like some timing issue or border case. Arik, is it a known problem? If not, would you like to look into the logs, whether you can see what's happening?
Your analysis is correct. That's a nice one actually!
The statistics monitoring cycles of both hosts host-0 and host-1 were scheduled in a way that they are executed almost at the same time [1].
Now, at 6:46:34 the VM was migrated from host-1 to host-0. At 6:46:42 the migration succeeded - we got events from both hosts, but only processed the one from the destination so the VM switched to Up. The next statistics monitoring cycle was triggered at 6:46:44 - again, the report of that VM from the source host was skipped because we processed the one from the destination. At 6:46:59, in the next statistics monitoring cycle, it happened again - the report of the VM from the source host was skipped. The next migration was triggered at 6:47:05 - the engine didn't manage to process any report from the source host, so the VM remained Down there.
The probability of this to happen is extremely low.
Why wasn't the migration rerun?
Good question, because a migration to a particular host (MigrateVmToServer) was requested. In this particular case, it seems that there are only two hosts defined so changing it to MigrateVm wouldn't make any difference though.
However, I think we can make a little tweak to the monitoring code to avoid this: "If we get the VM as Down on an unexpected host (that is, not the host we expect the VM to run on), do not lock the VM" It should be safe since we don't update anything in this scenario.
[1] For instance: 2018-01-15 06:46:44,905-05 ... GetAllVmStatsVDSCommand ... VdsIdVDSCommandParametersBase:{hostId='873a4d36-55fe-4be1- acb7-8de9c9123eb2'}) 2018-01-15 06:46:44,932-05 ... GetAllVmStatsVDSCommand ... VdsIdVDSCommandParametersBase:{hostId='31f09289-ec6c-42ff- a745-e82e8ac8e6b9'}) _______________________________________________ Devel mailing list Devel@ovirt.org http://lists.ovirt.org/mailman/listinfo/devel
participants (5)
-
Arik Hadas
-
Dafna Ron
-
Edward Haas
-
Michal Skrivanek
-
Milan Zamazal