On Sun, Mar 5, 2017 at 8:29 AM, Dan Kenigsberg <danken(a)redhat.com> wrote:
Piotr, could you provide more information?
Which setupNetworks action triggers this problem? Any idea which lock
did we use to take and when did we drop it?
I though that this [1] would make sure that setupNetworks is exclusive
operation on a host which seems not to be the case.
In the logs I saw following message sent:
{"jsonrpc":"2.0","method":"Host.setupNetworks","params":{"networks":{"VLAN200_Network":{"vlan":"200","netmask":"255.255.255.0","ipv6autoconf":false,"nic":"eth0","bridged":"false","ipaddr":"192.0.3.1","dhcpv6":false,"mtu":1500,"switch":"legacy"}},"bondings":{},"options":{"connectivityTimeout":120,"connectivityCheck":"true"}},"id":"3f7f74ea-fc39-4815-831b-5e3b1c22131d"}
Few seconds later there was:
{"jsonrpc":"2.0","method":"Host.getAllVmStats","params":{},"id":"67d510eb-6dfc-4f67-97b6-a4e63c670ff2"}
and still while we were calling pings there was:
{"jsonrpc":"2.0","method":"StoragePool.getSpmStatus","params":{"storagepoolID":"8cc227da-70e7-4557-aa01-6d8ddee6f847"},"id":"d4d04c7c-47b8-44db-867b-770e1e19361c"}
My assumption was that those calls should not happen and calls them selves
could be corrupted or their responses.
What do you think?
[1]
https://github.com/oVirt/ovirt-engine/blob/master/backend/manager/modules...
Seems not to be related.
On Fri, Mar 3, 2017 at 4:10 PM, Piotr Kliczewski <pkliczew(a)redhat.com>
wrote:
> This one we say already. The cause of it that during network setup engine
> sends messages which may fail or partially arrive.
> We used to have host level lock to protect this kind of situation but it
> seems like we do not have it anymore.
>
> Previous failure was triggered by host monitoring now it was SpmStatus.
>
> On Fri, Mar 3, 2017 at 2:54 PM, Pavel Zhukov <pzhukov(a)redhat.com> wrote:
>>
>>
>> Hi,
>>
>> Migration is failed because host is in Connecting state.
>> Seems like another jsonrpc related issue (Unrecognized message
received).
>>
>> Job:
>>
http://jenkins.ovirt.org/job/test-repo_ovirt_experimental_master/5654/
>> Logs:
>>
http://jenkins.ovirt.org/job/test-repo_ovirt_experimental_
master/5654/artifact/exported-artifacts/*zip*/exported-artifacts.zip
>>
>> --
>> Pavel
>>
>> [LOGS SNIPPET]
>> 2017-03-03 06:00:40,882-05 DEBUG
>> [org.ovirt.vdsm.jsonrpc.client.reactors.Reactor] (SSL Stomp Reactor)
>> [3155acab] Unable to process messages Unrecognized message received :
>> org.ovirt.vdsm.jsonrpc.client.ClientConnectionException: Unrecognized
>> message received
>> 42965:2017-03-03 06:00:40,889-05 ERROR
>> [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector]
>> (DefaultQuartzScheduler4) [] EVENT_ID: VDS_BROKER_COMMAND_FAILURE(10,
802),
>> Correlation ID: null, Call Stack: null, Custom Event ID: -1, Message:
VDSM
>> lago-basic-suite-master-host1 command SpmStatusVDS failed: Unrecognized
>> message received
>> 42966:2017-03-03 06:00:40,889-05 ERROR
>> [org.ovirt.engine.core.vdsbroker.vdsbroker.SpmStatusVDSCommand]
>> (DefaultQuartzScheduler4) [] Command 'SpmStatusVDSCommand(HostName =
>> lago-basic-suite-master-host1,
>> SpmStatusVDSCommandParameters:{runAsync='true',
>> hostId='bba0ec26-4856-4389-982d-2ad68cb3f682',
>> storagePoolId='8cc227da-70e7-4557-aa01-6d8ddee6f847'})' execution
failed:
>> VDSGenericException: VDSNetworkException: Unrecognized message received
>> 42999:2017-03-03 06:00:40,925-05 WARN
>> [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector]
>> (DefaultQuartzScheduler4) [6c7fb340] EVENT_ID:
>> SYSTEM_CHANGE_STORAGE_POOL_STATUS_PROBLEMATIC_WITH_ERROR(987),
Correlation
>> ID: 6c7fb340, Call Stack: null, Custom Event ID: -1, Message: Invalid
status
>> on Data Center test-dc. Setting Data Center status to Non Responsive (On
>> host lago-basic-suite-master-host1, Error: Network error during
>> communication with the Host.).
>> 43457:2017-03-03 06:00:44,466-05 ERROR
>> [org.ovirt.engine.core.bll.network.host.HostValidator] (default
task-17)
>> [14be0e43-97b8-4882-bbcc-27392543fae6] Unable to setup network:
operation
>> can only be done when Host status is one of: Maintenance, Up,
>> NonOperational; current status is Connecting
>> 43460:2017-03-03 06:00:44,476-05 ERROR
>> [org.ovirt.engine.api.restapi.resource.AbstractBackendResource]
(default
>> task-17) [] Operation Failed: [Cannot setup Networks. Operation can be
>> performed only when Host status is Maintenance, Up, NonOperational.]
>
>
>
> _______________________________________________
> Devel mailing list
> Devel(a)ovirt.org
>
http://lists.ovirt.org/mailman/listinfo/devel