On Mon, Mar 6, 2017 at 10:11 AM, Piotr Kliczewski <pkliczew(a)redhat.com> wrote:
On Mon, Mar 6, 2017 at 8:23 AM, Dan Kenigsberg <danken(a)redhat.com> wrote:
>
> On Sun, Mar 5, 2017 at 9:50 PM, Piotr Kliczewski <pkliczew(a)redhat.com>
> wrote:
> >
> >
> > On Sun, Mar 5, 2017 at 8:29 AM, Dan Kenigsberg <danken(a)redhat.com>
> > wrote:
> >>
> >> Piotr, could you provide more information?
> >>
> >> Which setupNetworks action triggers this problem? Any idea which lock
> >> did we use to take and when did we drop it?
> >
> >
> > I though that this [1] would make sure that setupNetworks is exclusive
> > operation on a host which seems not to be the case.
> > In the logs I saw following message sent:
> >
> >
> >
{"jsonrpc":"2.0","method":"Host.setupNetworks","params":{"networks":{"VLAN200_Network":{"vlan":"200","netmask":"255.255.255.0","ipv6autoconf":false,"nic":"eth0","bridged":"false","ipaddr":"192.0.3.1","dhcpv6":false,"mtu":1500,"switch":"legacy"}},"bondings":{},"options":{"connectivityTimeout":120,"connectivityCheck":"true"}},"id":"3f7f74ea-fc39-4815-831b-5e3b1c22131d"}
> >
> > Few seconds later there was:
> >
> >
> >
{"jsonrpc":"2.0","method":"Host.getAllVmStats","params":{},"id":"67d510eb-6dfc-4f67-97b6-a4e63c670ff2"}
> >
> > and still while we were calling pings there was:
> >
> >
> >
{"jsonrpc":"2.0","method":"StoragePool.getSpmStatus","params":{"storagepoolID":"8cc227da-70e7-4557-aa01-6d8ddee6f847"},"id":"d4d04c7c-47b8-44db-867b-770e1e19361c"}
> >
> > My assumption was that those calls should not happen and calls them
> > selves
> > could be corrupted or their responses.
> > What do you think?
> >
> > [1]
> >
> >
https://github.com/oVirt/ovirt-engine/blob/master/backend/manager/modules...
>
> I suspect that getVmStats and getSpmStatus simply do not take the
> hostmonitoring lock, and I don't see anything wrong in that.
>
> Note that during 006_migration, we set only a mere migration network,
> not the management network. This operation should not interfere with
> Engine-Vdsm communication in any way; I don't yet understand why you
> suspect that it does.
My assumption here is that I saw this failure 2 times and both were during
setupNetworks.
The pattern is that always a call fails which "should not" occur during such
operation.
It is fair to suspect an interaction with setupNetworks, but let us
put some substance into it.
What is the mode of failure of the other command?