On Sun, Mar 5, 2017 at 8:29 AM, Dan Kenigsberg <danken@redhat.com> wrote:
Piotr, could you provide more information?

Which setupNetworks action triggers this problem? Any idea which lock
did we use to take and when did we drop it?

I though that this [1] would make sure that setupNetworks is exclusive operation on a host which seems not to be the case.
In the logs I saw following message sent:

{"jsonrpc":"2.0","method":"Host.setupNetworks","params":{"networks":{"VLAN200_Network":{"vlan":"200","netmask":"255.255.255.0","ipv6autoconf":false,"nic":"eth0","bridged":"false","ipaddr":"192.0.3.1","dhcpv6":false,"mtu":1500,"switch":"legacy"}},"bondings":{},"options":{"connectivityTimeout":120,"connectivityCheck":"true"}},"id":"3f7f74ea-fc39-4815-831b-5e3b1c22131d"}

Few seconds later there was:

{"jsonrpc":"2.0","method":"Host.getAllVmStats","params":{},"id":"67d510eb-6dfc-4f67-97b6-a4e63c670ff2"}

and still while we were calling pings there was:

{"jsonrpc":"2.0","method":"StoragePool.getSpmStatus","params":{"storagepoolID":"8cc227da-70e7-4557-aa01-6d8ddee6f847"},"id":"d4d04c7c-47b8-44db-867b-770e1e19361c"}

My assumption was that those calls should not happen and calls them selves could be corrupted or their responses.
What do you think?

[1] https://github.com/oVirt/ovirt-engine/blob/master/backend/manager/modules/bll/src/main/java/org/ovirt/engine/core/bll/network/host/HostSetupNetworksCommand.java#L285

Is it related to your
https://gerrit.ovirt.org/#/q/Idaa54767bb7e54bf13e89887ca34fa8e01ade420
(jsonrpc incomplete message)?


Seems not to be related.
 

On Fri, Mar 3, 2017 at 4:10 PM, Piotr Kliczewski <pkliczew@redhat.com> wrote:
> This one we say already. The cause of it that during network setup engine
> sends messages which may fail or partially arrive.
> We used to have host level lock to protect this kind of situation but it
> seems like we do not have it anymore.
>
> Previous failure was triggered by host monitoring now it was SpmStatus.
>
> On Fri, Mar 3, 2017 at 2:54 PM, Pavel Zhukov <pzhukov@redhat.com> wrote:
>>
>>
>> Hi,
>>
>> Migration is failed because host is in Connecting state.
>> Seems like another jsonrpc related issue (Unrecognized message received).
>>
>> Job:
>> http://jenkins.ovirt.org/job/test-repo_ovirt_experimental_master/5654/
>> Logs:
>> http://jenkins.ovirt.org/job/test-repo_ovirt_experimental_master/5654/artifact/exported-artifacts/*zip*/exported-artifacts.zip
>>
>> --
>> Pavel
>>
>> [LOGS SNIPPET]
>> 2017-03-03 06:00:40,882-05 DEBUG
>> [org.ovirt.vdsm.jsonrpc.client.reactors.Reactor] (SSL Stomp Reactor)
>> [3155acab] Unable to process messages Unrecognized message received :
>> org.ovirt.vdsm.jsonrpc.client.ClientConnectionException: Unrecognized
>> message received
>>   42965:2017-03-03 06:00:40,889-05 ERROR
>> [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector]
>> (DefaultQuartzScheduler4) [] EVENT_ID: VDS_BROKER_COMMAND_FAILURE(10,802),
>> Correlation ID: null, Call Stack: null, Custom Event ID: -1, Message: VDSM
>> lago-basic-suite-master-host1 command SpmStatusVDS failed: Unrecognized
>> message received
>>   42966:2017-03-03 06:00:40,889-05 ERROR
>> [org.ovirt.engine.core.vdsbroker.vdsbroker.SpmStatusVDSCommand]
>> (DefaultQuartzScheduler4) [] Command 'SpmStatusVDSCommand(HostName =
>> lago-basic-suite-master-host1,
>> SpmStatusVDSCommandParameters:{runAsync='true',
>> hostId='bba0ec26-4856-4389-982d-2ad68cb3f682',
>> storagePoolId='8cc227da-70e7-4557-aa01-6d8ddee6f847'})' execution failed:
>> VDSGenericException: VDSNetworkException: Unrecognized message received
>>   42999:2017-03-03 06:00:40,925-05 WARN
>> [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector]
>> (DefaultQuartzScheduler4) [6c7fb340] EVENT_ID:
>> SYSTEM_CHANGE_STORAGE_POOL_STATUS_PROBLEMATIC_WITH_ERROR(987), Correlation
>> ID: 6c7fb340, Call Stack: null, Custom Event ID: -1, Message: Invalid status
>> on Data Center test-dc. Setting Data Center status to Non Responsive (On
>> host lago-basic-suite-master-host1, Error: Network error during
>> communication with the Host.).
>>   43457:2017-03-03 06:00:44,466-05 ERROR
>> [org.ovirt.engine.core.bll.network.host.HostValidator] (default task-17)
>> [14be0e43-97b8-4882-bbcc-27392543fae6] Unable to setup network: operation
>> can only be done when Host status is one of: Maintenance, Up,
>> NonOperational; current status is Connecting
>>   43460:2017-03-03 06:00:44,476-05 ERROR
>> [org.ovirt.engine.api.restapi.resource.AbstractBackendResource] (default
>> task-17) [] Operation Failed: [Cannot setup Networks. Operation can be
>> performed only when Host status is  Maintenance, Up, NonOperational.]
>
>
>
> _______________________________________________
> Devel mailing list
> Devel@ovirt.org
> http://lists.ovirt.org/mailman/listinfo/devel