On Wed, Feb 27, 2019 at 6:26 PM Dominik Holler <dholler(a)redhat.com> wrote:
> This is a valid question.
> > > > I noticed that one node had ovirtmgmt
network unsynchronized. I tried
> oVirt detected a difference between the expected
configuration and applied
> configuration. This might happen if the interface configuration is change
> directly on the host instead of using oVirt Engine.
> > > > to resynchronize it.
> If you have the vdsm.log, the relevant lines start at the
pattern
> Calling 'Host.setupNetworks'
> and ends at the pattern
> FINISH getCapabilities
This gave some clues. See the log below. IMHO it points engine getting
something wrong because it seems to ask for DHCP setup in query.
Probably fails because it succeeds in address change and network
connection is torn down.
2019-02-25 12:41:38,478+0200 WARN (vdsm.Scheduler) [Executor] Worker
blocked: <Worker name=jsonrpc/1 running <Task <JsonRpcTask {'params':
{u'bondings': {}, u'networks': {u'ovirtmgmt':
{u'ipv6autoconf': False,
u'nic': u'eno1', u'mtu': 1500, u'switch':
u'legacy', u'dhcpv6': False,
u'STP': u'no', u'bridged': u'true',
u'defaultRoute': True,
u'bootproto': u'dhcp'}}, u'options':
{u'connectivityCheck': u'true',
u'connectivityTimeout': 120, u'commitOnSuccess': True}},
'jsonrpc':
'2.0', 'method': u'Host.setupNetworks', 'id':
u'2ca75cf3-6410-43b4-aebf-cdc3f262e5c2'} at 0x7fc9c95ef350
timeout=60, duration=60.01 at 0x7fc9c961e090> task#=230106
at
0x7fc9a43a1fd0>, traceback:
File: "/usr/lib64/python2.7/threading.py", line 785, in __bootstrap
self.__bootstrap_inner()
File: "/usr/lib64/python2.7/threading.py", line 812, in __bootstrap_inner
self.run()
File: "/usr/lib64/python2.7/threading.py", line 765, in run
self.__target(*self.__args, **self.__kwargs)
File: "/usr/lib/python2.7/site-packages/vdsm/common/concurrent.py",
line 195, in run
ret = func(*args, **kwargs)
File: "/usr/lib/python2.7/site-packages/vdsm/executor.py", line 301, in _run
self._execute_task()
File: "/usr/lib/python2.7/site-packages/vdsm/executor.py", line 315,
in _execute_task
task()
File: "/usr/lib/python2.7/site-packages/vdsm/executor.py", line 391, in
__call__
self._callable()
File: "/usr/lib/python2.7/site-packages/yajsonrpc/__init__.py", line
262, in __call__
self._handler(self._ctx, self._req)
File: "/usr/lib/python2.7/site-packages/yajsonrpc/__init__.py", line
305, in _serveRequest
response = self._handle_request(req, ctx)
File: "/usr/lib/python2.7/site-packages/yajsonrpc/__init__.py", line
345, in _handle_request
res = method(**params)
File: "/usr/lib/python2.7/site-packages/vdsm/rpc/Bridge.py", line 194,
in _dynamicMethod
result = fn(*methodArgs)
File: "<string>", line 2, in setupNetworks
File: "/usr/lib/python2.7/site-packages/vdsm/common/api.py", line 50, in method
ret = func(*args, **kwargs)
File: "/usr/lib/python2.7/site-packages/vdsm/API.py", line 1562, in
setupNetworks
supervdsm.getProxy().setupNetworks(networks, bondings, options)
File: "/usr/lib/python2.7/site-packages/vdsm/common/supervdsm.py",
line 56, in __call__
return callMethod()
File: "/usr/lib/python2.7/site-packages/vdsm/common/supervdsm.py",
line 54, in <lambda
**kwargs)
File: "<string>", line 2, in setupNetworks
File: "/usr/lib64/python2.7/multiprocessing/managers.py", line 759, in
_callmethod
kind, result = conn.recv() (executor:363)
2019-02-25 12:41:38,532+0200 INFO (jsonrpc/0) [jsonrpc.JsonRpcServer]
RPC call Host.ping2 succeeded in 0.00 seconds (__init__:312)
2019-02-25 12:41:38,538+0200 INFO (jsonrpc/5) [api.host] START
getCapabilities() from=::1,41016 (api:48)
2019-02-25 12:41:38,834+0200 INFO (jsonrpc/7) [api.host] START
getAllVmStats() from=::1,41018 (api:48)
2019-02-25 12:41:38,845+0200 WARN (jsonrpc/7) [virt.vm]
(vmId='66394fff-6207-4277-ab47-bffd39075eef') monitor became
unresponsive (command timeout, age=70.0700000003) (vm:6014)
2019-02-25 12:41:38,850+0200 INFO (jsonrpc/7) [api.host] FINISH
getAllVmStats return={'status': {'message': 'Done',
'code': 0},
'statsList': (suppressed)} from=::1,41018 (api:54)
2019-02-25 12:41:38,854+0200 INFO (jsonrpc/7) [jsonrpc.JsonRpcServer]
RPC call Host.getAllVmStats succeeded in 0.02 seconds (__init__:312)
2019-02-25 12:41:38,867+0200 INFO (jsonrpc/2) [api.host] START
getAllVmIoTunePolicies() from=::1,41018 (api:48)
2019-02-25 12:41:38,868+0200 INFO (jsonrpc/2) [api.host] FINISH
getAllVmIoTunePolicies return={'status': {'message': 'Done',
'code':
0}, 'io_tune_policies_dict': {'5cb297ee-81c0-4fc6-a2ba-87192723e6ab':
{'policy': [], 'current_values': [{'ioTune':
{'write_bytes_sec': 0L,
'total_iops_sec': 0L, 'read_iops_sec': 0L, 'read_bytes_sec': 0L,
'write_iops_sec': 0L, 'total_bytes_sec': 0L}, 'path':
'/rhev/data-center/mnt/blockSD/5106a440-3a0a-4a79-a8ee-0ffc5fa8e3a2/images/a4b8b705-d999-4578-99eb-c80aa1cbbec6/19e9b0b0-6485-43ed-93e5-c2ef40d2e388',
'name': 'vda'}]}, '16a6d21b-3d69-4967-982b-dfe05534f153':
{'policy':
[], 'current_values': [{'ioTune': {'write_bytes_sec': 0L,
'total_iops_sec': 0L, 'read_iops_sec': 0L, 'read_bytes_sec': 0L,
'write_iops_sec': 0L, 'total_bytes_sec': 0L}, 'path':
'/rhev/data-center/mnt/blockSD/5106a440-3a0a-4a79-a8ee-0ffc5fa8e3a2/images/67c41187-6510-429e-b88d-93a907bbbf38/b1f627d3-13c2-4800-8f29-4504d35bba5f',
'name': 'sda'}]}, '59103e4b-863c-4c16-8f45-226712a9ab08':
{'policy':
[], 'current_values': [{'ioTune': {'write_bytes_sec': 0L,
'total_iops_sec': 0L, 'read_iops_sec': 0L, 'read_bytes_sec': 0L,
'write_iops_sec': 0L, 'total_bytes_sec': 0L}, 'path':
u'/rhev/data-center/mnt/nfs.virt.local:_iso/3fd616f2-a1dd-4a31-ad35-02d039bb30b1/images/6a5c7e95-da88-4702-9595-2762e4870196/45ef1052-26ba-4c3b-90d8-9039e085419a',
'name': 'sda'}, {'ioTune': {'write_bytes_sec': 0L,
'total_iops_sec':
0L, 'read_iops_sec': 0L, 'read_bytes_sec': 0L, 'write_iops_sec':
0L,
'total_bytes_sec': 0L}, 'path':
'/rhev/data-center/mnt/blockSD/48619c98-840f-4d3b-948d-47d081d2c55a/images/5163a9d9-8881-40a3-a993-acaa5ae3218d/b1942e61-9d1f-4b2c-868c-1bde862a6107',
'name': 'sdb'}]}, '66394fff-6207-4277-ab47-bffd39075eef':
{'policy':
[], 'current_values': [{'ioTune': {'write_bytes_sec': 0L,
'total_iops_sec': 0L, 'read_iops_sec': 0L, 'read_bytes_sec': 0L,
'write_iops_sec': 0L, 'total_bytes_sec': 0L}, 'path':
u'/rhev/data-center/mnt/nfs.virt.local:_iso/3fd616f2-a1dd-4a31-ad35-02d039bb30b1/images/2f0acad7-74e1-4775-8fb1-fc47b1cf5ee7/a7b9a10c-34eb-4273-a9dd-daebf1099a39',
'name': 'sda'}]}}} from=::1,41018 (api:54)
2019-02-25 12:41:38,869+0200 INFO (jsonrpc/2) [jsonrpc.JsonRpcServer]
RPC call Host.getAllVmIoTunePolicies succeeded in 0.01 seconds
(__init__:312)
2019-02-25 12:41:40,451+0200 INFO (monitor/5106a44) [IOProcessClient]
(3fd616f2-a1dd-4a31-ad35-02d039bb30b1) Closing client (__init__:584)
2019-02-25 12:41:40,452+0200 INFO (monitor/5106a44) [IOProcessClient]
(75efef0a-ccd4-4799-a6ed-8a4072edb698) Closing client (__init__:584)
2019-02-25 12:41:40,534+0200 WARN (check/loop) [storage.check]
Checker
u'/rhev/data-center/mnt/nfs.virt.local:_ovirtexport/75efef0a-ccd4-4799-a6ed-8a4072edb698/dom_md/metadata'
is blocked for 60.00 seconds (check:282)
2019-02-25 12:41:40,545+0200 WARN (check/loop) [storage.check]
Checker
u'/rhev/data-center/mnt/nfs.virt.local:_iso/3fd616f2-a1dd-4a31-ad35-02d039bb30b1/dom_md/metadata'
is blocked for 59.99 seconds (check:282)
2019-02-25 12:41:40,885+0200 INFO (jsonrpc/5) [root] managedvolume
not supported: Managed Volume Not Supported. Missing package
os-brick.: ('Cannot import os_brick.initiator',) (caps:147)
2019-02-25 12:41:41,582+0200 INFO (jsonrpc/5) [root]
/usr/libexec/vdsm/hooks/after_get_caps/50_openstacknet: rc=0 err=
(hooks:114)
2019-02-25 12:41:41,754+0200 INFO (jsonrpc/3) [jsonrpc.JsonRpcServer]
RPC call Host.ping2 succeeded in 0.00 seconds (__init__:312)
2019-02-25 12:41:41,854+0200 INFO (jsonrpc/5) [root]
/usr/libexec/vdsm/hooks/after_get_caps/openstacknet_utils.py: rc=0
err= (hooks:114)
2019-02-25 12:41:41,891+0200 INFO (jsonrpc/6) [jsonrpc.JsonRpcServer]
RPC call Host.ping2 succeeded in 0.00 seconds (__init__:312)
2019-02-25 12:41:41,899+0200 INFO (jsonrpc/4) [vdsm.api] START
repoStats(domains=[u'5106a440-3a0a-4a79-a8ee-0ffc5fa8e3a2'])
from=::1,41016, task_id=6273a6a4-efbf-4780-9676-26b99502d712 (api:48)
2019-02-25 12:41:41,899+0200 INFO (jsonrpc/4) [vdsm.api] FINISH
repoStats return={u'5106a440-3a0a-4a79-a8ee-0ffc5fa8e3a2': {'code': 0,
'actual': True, 'version': 4, 'acquired': True, 'delay':
'0.000473384', 'lastCheck': '1.4', 'valid': True}}
from=::1,41016,
task_id=6273a6a4-efbf-4780-9676-26b99502d712 (api:54)
2019-02-25 12:41:41,900+0200 INFO (jsonrpc/4) [jsonrpc.JsonRpcServer]
RPC call Host.getStorageRepoStats succeeded in 0.01 seconds
(__init__:312)
2019-02-25 12:41:41,904+0200 INFO (jsonrpc/0) [jsonrpc.JsonRpcServer]
RPC call Host.ping2 succeeded in 0.00 seconds (__init__:312)
2019-02-25 12:41:41,915+0200 INFO (jsonrpc/7) [vdsm.api] START
repoStats(domains=[u'5106a440-3a0a-4a79-a8ee-0ffc5fa8e3a2'])
from=::1,41016, task_id=171a65d3-8249-463d-bfe2-646bf1be9472 (api:48)
2019-02-25 12:41:41,916+0200 INFO (jsonrpc/7) [vdsm.api] FINISH
repoStats return={u'5106a440-3a0a-4a79-a8ee-0ffc5fa8e3a2': {'code': 0,
'actual': True, 'version': 4, 'acquired': True, 'delay':
'0.000473384', 'lastCheck': '1.4', 'valid': True}}
from=::1,41016,
task_id=171a65d3-8249-463d-bfe2-646bf1be9472 (api:54)
2019-02-25 12:41:41,916+0200 INFO (jsonrpc/7) [jsonrpc.JsonRpcServer]
RPC call Host.getStorageRepoStats succeeded in 0.00 seconds
(__init__:312)
2019-02-25 12:41:42,049+0200 WARN (qgapoller/0)
[virt.periodic.VmDispatcher] could not run <function <lambda> at
0x7fc9e8092aa0> on ['59103e4b-863c-4c16-8f45-226712a9ab08',
'66394fff-6207-4277-ab47-bffd39075eef'] (periodic:289)
2019-02-25 12:41:42,200+0200 INFO (jsonrpc/5) [root]
/usr/libexec/vdsm/hooks/after_get_caps/ovirt_provider_ovn_hook: rc=0
err= (hooks:114)
This is from engine.log with the relevant call:
2019-02-25 12:40:38,461+02 INFO
[org.ovirt.engine.core.vdsbroker.vdsbroker.HostSetupNetworksVDSCommand]
(default task-17) [1f31b2f5] START,
HostSetupNetworksVDSCommand(HostName = ovirt01.virt.local,
HostSetupNetworksVdsCommandParameters:{hostId='7722c972-c25a-411e-81ba-af6b0801e9ba',
vds='Host[ovirt01.virt.local,7722c972-c25a-411e-81ba-af6b0801e9ba]',
rollbackOnFailure='true', commitOnSuccess='true',
connectivityTimeout='120',
networks='[HostNetwork:{defaultRoute='true', bonding='false',
networkName='ovirtmgmt', vdsmName='ovirtmgmt', nicName='eno1',
vlan='null', vmNetwork='true', stp='false',
properties='[]',
ipv4BootProtocol='DHCP', ipv4Address='null', ipv4Netmask='null',
ipv4Gateway='null', ipv6BootProtocol='NONE', ipv6Address='null',
ipv6Prefix='null', ipv6Gateway='null', nameServers='null'}]',
removedNetworks='[]', bonds='[]', removedBonds='[]',
clusterSwitchType='LEGACY', managementNetworkChanged='true'}), log id:
7be20d99
2019-02-25 12:40:38,463+02 INFO
[org.ovirt.engine.core.vdsbroker.vdsbroker.HostSetupNetworksVDSCommand]
(default task-17) [1f31b2f5] FINISH, HostSetupNetworksVDSCommand,
return: , log id: 7be20d99
2019-02-25 12:40:51,529+02 INFO
[org.ovirt.vdsm.jsonrpc.client.reactors.ReactorClient] (SSL Stomp
Reactor) [] No interaction with host 'ovirt01.virt.local' for 20000
ms.
2019-02-25 12:40:54,081+02 ERROR
[org.ovirt.vdsm.jsonrpc.client.reactors.ReactorClient] (SSL Stomp
Reactor) [] Connection timeout for host 'ovirt01.virt.local', last
response arrived 23360 ms ago.
2019-02-25 12:40:54,100+02 WARN
[org.ovirt.engine.core.vdsbroker.VdsManager]
(EE-ManagedThreadFactory-engine-Thread-6299) [] Host
'ovirt01.virt.local' is not responding. It will stay in Connecting
state for a grace period of 82 seconds and after that an attempt to
fence the host will be issued.
You can see DHCP from there too and results of it getting things
wrong. Maybe there should option when you try to resync that engine
shows what it wants to change, so you can approve the change. Noticed
also that clocks are not in sync. Put the ntpd on now. There is not
ntpd installed in HE?
I have engine backups from Feb 7th and this is the line in
vds_interface table in every backup until failure (10.168.8.111 is the
correct address). So if this is the table engine uses for node
management, the question is how it came to conclusion that is should
use DHCP?
aef5b4fd-5760-44dc-950f-58363de7afcf eno1 ovirtmgmt
7722c972-c25a-411e-81ba-af6b0801e9ba 54:80:28:4a:2b:18 f
\N \N \N \N 1000 10.168.8.111 255.255.255.0
10.168.8.1 2 2 2018-12-05 12:51:58.241569+02
2019-02-06 11:54:57.027254+02 1500 t \N f \N
2 \N \N :: \N legacy \N \N t
> > > > This led the node being rebooted by HP ILO. After
> > > > reboot the node came up with DHCP address.
> I have no idea why this happened, maybe there is a trace
> in /var/log/messages ?
Reboot was because of fencing. See the log above.
VDSM would revert a change in management networking configuration,
if after the change communication to engine is broken.
But if the host is rebooted during the change, there might be a
problem.
If you think you triggered
Bug 1664106 - [Network] Host reboot while setup networks leads to host stuck on status
'connecting'
you logfiles would be very helpful, if they are attached to the bug.
I don't think that this is connected to that because change succeeded.
Thanks,
-Juhani