On Mon, Mar 6, 2017 at 8:23 AM, Dan Kenigsberg <danken(a)redhat.com> wrote:
On Sun, Mar 5, 2017 at 9:50 PM, Piotr Kliczewski
<pkliczew(a)redhat.com>
wrote:
>
>
> On Sun, Mar 5, 2017 at 8:29 AM, Dan Kenigsberg <danken(a)redhat.com>
wrote:
>>
>> Piotr, could you provide more information?
>>
>> Which setupNetworks action triggers this problem? Any idea which lock
>> did we use to take and when did we drop it?
>
>
> I though that this [1] would make sure that setupNetworks is exclusive
> operation on a host which seems not to be the case.
> In the logs I saw following message sent:
>
>
{"jsonrpc":"2.0","method":"Host.setupNetworks","params":{
"networks":{"VLAN200_Network":{"vlan":"200","netmask":"255.
255.255.0","ipv6autoconf":false,"nic":"eth0","bridged":"
false","ipaddr":"192.0.3.1","dhcpv6":false,"mtu":1500,"
switch":"legacy"}},"bondings":{},"options":{"connectivityTimeout":120,"
connectivityCheck":"true"}},"id":"3f7f74ea-fc39-4815-831b-5e3b1c22131d"}
>
> Few seconds later there was:
>
>
{"jsonrpc":"2.0","method":"Host.getAllVmStats","params":{
},"id":"67d510eb-6dfc-4f67-97b6-a4e63c670ff2"}
>
> and still while we were calling pings there was:
>
>
{"jsonrpc":"2.0","method":"StoragePool.getSpmStatus","
params":{"storagepoolID":"8cc227da-70e7-4557-aa01-
6d8ddee6f847"},"id":"d4d04c7c-47b8-44db-867b-770e1e19361c"}
>
> My assumption was that those calls should not happen and calls them
selves
> could be corrupted or their responses.
> What do you think?
>
> [1]
>
https://github.com/oVirt/ovirt-engine/blob/master/
backend/manager/modules/bll/src/main/java/org/ovirt/
engine/core/bll/network/host/HostSetupNetworksCommand.java#L285
I suspect that getVmStats and getSpmStatus simply do not take the
hostmonitoring lock, and I don't see anything wrong in that.
Note that during 006_migration, we set only a mere migration network,
not the management network. This operation should not interfere with
Engine-Vdsm communication in any way; I don't yet understand why you
suspect that it does.
My assumption here is that I saw this failure 2 times and both were during
setupNetworks.
The pattern is that always a call fails which "should not" occur during
such operation.