<div dir="ltr"><br><div class="gmail_extra"><br><div class="gmail_quote">On Sun, Mar 5, 2017 at 8:29 AM, Dan Kenigsberg <span dir="ltr"><<a href="mailto:danken@redhat.com" target="_blank">danken@redhat.com</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">Piotr, could you provide more information?<br>
<br>
Which setupNetworks action triggers this problem? Any idea which lock<br>
did we use to take and when did we drop it?<br></blockquote><div><br></div><div>I though that this [1] would make sure that setupNetworks is exclusive operation on a host which seems not to be the case.<br></div><div>In the logs I saw following message sent:<br><br>{"jsonrpc":"2.0","method":"Host.setupNetworks","params":{"networks":{"VLAN200_Network":{"vlan":"200","netmask":"255.255.255.0","ipv6autoconf":false,"nic":"eth0","bridged":"false","ipaddr":"192.0.3.1","dhcpv6":false,"mtu":1500,"switch":"legacy"}},"bondings":{},"options":{"connectivityTimeout":120,"connectivityCheck":"true"}},"id":"3f7f74ea-fc39-4815-831b-5e3b1c22131d"}<br><br></div><div>Few seconds later there was:<br><br>{"jsonrpc":"2.0","method":"Host.getAllVmStats","params":{},"id":"67d510eb-6dfc-4f67-97b6-a4e63c670ff2"}<br><br></div><div>and still while we were calling pings there was:<br><br>{"jsonrpc":"2.0","method":"StoragePool.getSpmStatus","params":{"storagepoolID":"8cc227da-70e7-4557-aa01-6d8ddee6f847"},"id":"d4d04c7c-47b8-44db-867b-770e1e19361c"}<br><br></div><div>My assumption was that those calls should not happen and calls them selves could be corrupted or their responses.<br></div><div>What do you think?</div><div><br>[1] <a href="https://github.com/oVirt/ovirt-engine/blob/master/backend/manager/modules/bll/src/main/java/org/ovirt/engine/core/bll/network/host/HostSetupNetworksCommand.java#L285">https://github.com/oVirt/ovirt-engine/blob/master/backend/manager/modules/bll/src/main/java/org/ovirt/engine/core/bll/network/host/HostSetupNetworksCommand.java#L285</a> <br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">
<br>
Is it related to your<br>
<a href="https://gerrit.ovirt.org/#/q/Idaa54767bb7e54bf13e89887ca34fa8e01ade420" rel="noreferrer" target="_blank">https://gerrit.ovirt.org/#/q/<wbr>Idaa54767bb7e54bf13e89887ca34f<wbr>a8e01ade420</a><br>
(jsonrpc incomplete message)?<br>
<div><div class="gmail-h5"><br></div></div></blockquote><div><br></div><div>Seems not to be related.<br></div><div> </div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div><div class="gmail-h5">
<br>
On Fri, Mar 3, 2017 at 4:10 PM, Piotr Kliczewski <<a href="mailto:pkliczew@redhat.com">pkliczew@redhat.com</a>> wrote:<br>
> This one we say already. The cause of it that during network setup engine<br>
> sends messages which may fail or partially arrive.<br>
> We used to have host level lock to protect this kind of situation but it<br>
> seems like we do not have it anymore.<br>
><br>
> Previous failure was triggered by host monitoring now it was SpmStatus.<br>
><br>
> On Fri, Mar 3, 2017 at 2:54 PM, Pavel Zhukov <<a href="mailto:pzhukov@redhat.com">pzhukov@redhat.com</a>> wrote:<br>
>><br>
>><br>
>> Hi,<br>
>><br>
>> Migration is failed because host is in Connecting state.<br>
>> Seems like another jsonrpc related issue (Unrecognized message received).<br>
>><br>
>> Job:<br>
>> <a href="http://jenkins.ovirt.org/job/test-repo_ovirt_experimental_master/5654/" rel="noreferrer" target="_blank">http://jenkins.ovirt.org/job/<wbr>test-repo_ovirt_experimental_<wbr>master/5654/</a><br>
>> Logs:<br>
>> <a href="http://jenkins.ovirt.org/job/test-repo_ovirt_experimental_master/5654/artifact/exported-artifacts/*zip*/exported-artifacts.zip" rel="noreferrer" target="_blank">http://jenkins.ovirt.org/job/<wbr>test-repo_ovirt_experimental_<wbr>master/5654/artifact/exported-<wbr>artifacts/*zip*/exported-<wbr>artifacts.zip</a><br>
>><br>
>> --<br>
>> Pavel<br>
>><br>
>> [LOGS SNIPPET]<br>
>> 2017-03-03 06:00:40,882-05 DEBUG<br>
>> [org.ovirt.vdsm.jsonrpc.<wbr>client.reactors.Reactor] (SSL Stomp Reactor)<br>
>> [3155acab] Unable to process messages Unrecognized message received :<br>
>> org.ovirt.vdsm.jsonrpc.client.<wbr>ClientConnectionException: Unrecognized<br>
>> message received<br>
>> 42965:2017-03-03 06:00:40,889-05 ERROR<br>
>> [org.ovirt.engine.core.dal.<wbr>dbbroker.auditloghandling.<wbr>AuditLogDirector]<br>
>> (DefaultQuartzScheduler4) [] EVENT_ID: VDS_BROKER_COMMAND_FAILURE(10,<wbr>802),<br>
>> Correlation ID: null, Call Stack: null, Custom Event ID: -1, Message: VDSM<br>
>> lago-basic-suite-master-host1 command SpmStatusVDS failed: Unrecognized<br>
>> message received<br>
>> 42966:2017-03-03 06:00:40,889-05 ERROR<br>
>> [org.ovirt.engine.core.<wbr>vdsbroker.vdsbroker.<wbr>SpmStatusVDSCommand]<br>
>> (DefaultQuartzScheduler4) [] Command 'SpmStatusVDSCommand(HostName =<br>
>> lago-basic-suite-master-host1,<br>
>> SpmStatusVDSCommandParameters:<wbr>{runAsync='true',<br>
>> hostId='bba0ec26-4856-4389-<wbr>982d-2ad68cb3f682',<br>
>> storagePoolId='8cc227da-70e7-<wbr>4557-aa01-6d8ddee6f847'})' execution failed:<br>
>> VDSGenericException: VDSNetworkException: Unrecognized message received<br>
>> 42999:2017-03-03 06:00:40,925-05 WARN<br>
>> [org.ovirt.engine.core.dal.<wbr>dbbroker.auditloghandling.<wbr>AuditLogDirector]<br>
>> (DefaultQuartzScheduler4) [6c7fb340] EVENT_ID:<br>
>> SYSTEM_CHANGE_STORAGE_POOL_<wbr>STATUS_PROBLEMATIC_WITH_ERROR(<wbr>987), Correlation<br>
>> ID: 6c7fb340, Call Stack: null, Custom Event ID: -1, Message: Invalid status<br>
>> on Data Center test-dc. Setting Data Center status to Non Responsive (On<br>
>> host lago-basic-suite-master-host1, Error: Network error during<br>
>> communication with the Host.).<br>
>> 43457:2017-03-03 06:00:44,466-05 ERROR<br>
>> [org.ovirt.engine.core.bll.<wbr>network.host.HostValidator] (default task-17)<br>
>> [14be0e43-97b8-4882-bbcc-<wbr>27392543fae6] Unable to setup network: operation<br>
>> can only be done when Host status is one of: Maintenance, Up,<br>
>> NonOperational; current status is Connecting<br>
>> 43460:2017-03-03 06:00:44,476-05 ERROR<br>
>> [org.ovirt.engine.api.restapi.<wbr>resource.<wbr>AbstractBackendResource] (default<br>
>> task-17) [] Operation Failed: [Cannot setup Networks. Operation can be<br>
>> performed only when Host status is Maintenance, Up, NonOperational.]<br>
><br>
><br>
><br>
</div></div>> ______________________________<wbr>_________________<br>
> Devel mailing list<br>
> <a href="mailto:Devel@ovirt.org">Devel@ovirt.org</a><br>
> <a href="http://lists.ovirt.org/mailman/listinfo/devel" rel="noreferrer" target="_blank">http://lists.ovirt.org/<wbr>mailman/listinfo/devel</a><br>
</blockquote></div><br></div></div>