On Thu, Feb 28, 2019 at 3:01 PM Edward Haas <edwardh@redhat.com> wrote:
>
> On boot, VDSM attempts to apply its persisted network desired configuration through ifcfg.
> Even though you may have fixed the ifcfg files, the persisted VDSM config may had been set with DHCP, therefore your changes would get overwritten on boot or when the synchronization was issued from Engine.
This is what I thought when I saw it coming up with DHCP.
> When you get an unsync warning, fixes on the host should match with the persisted config known to VDSM.
How can you see the persistent config? At least the webadmin is maybe
too user friendly and just offers the resync. It doesn't tell what is
going to do when it does resync. I would have not started the
operation if I had known that it was going to switch to DHCP.
Engine will pass a command to VDSM per the configuration you see on the network attachement window.
If there was something else on the host, it will be changed based on this command.
Even if the parameteres are grayed, it should still show you the current desired state.
You could identify that something is not correct, uncheck the `resync` checkbox, edit the values and then apply.
> Usually the persisted state in VDSM and the one in Engine are the same, unless something very strange has happened... like moving hosts between Engine servers or self editing on one side without it reaching the other side.
I had to move hosts to different cluster (same HE). That was
recommended here because of the EPYC migration problems when upgrading
to 4.3. Discussions are in the list archives and in bug 1672859 for
that one. Maybe this was side effect of that.
Thanks,
-Juhani
>
> On Thu, Feb 28, 2019 at 9:34 AM Juhani Rautiainen <juhani.rautiainen@gmail.com> wrote:
>>
>> On Wed, Feb 27, 2019 at 6:26 PM Dominik Holler <dholler@redhat.com> wrote:
>> >
>>
>> >
>> > This is a valid question.
>> >
>> > > > > I noticed that one node had ovirtmgmt network unsynchronized. I tried
>> >
>> > oVirt detected a difference between the expected configuration and applied
>> > configuration. This might happen if the interface configuration is change
>> > directly on the host instead of using oVirt Engine.
>> >
>> > > > > to resynchronize it.
>> >
>> > If you have the vdsm.log, the relevant lines start at the pattern
>> > Calling 'Host.setupNetworks'
>> > and ends at the pattern
>> > FINISH getCapabilities
>>
>> This gave some clues. See the log below. IMHO it points engine getting
>> something wrong because it seems to ask for DHCP setup in query.
>> Probably fails because it succeeds in address change and network
>> connection is torn down.
>>
>> 2019-02-25 12:41:38,478+0200 WARN (vdsm.Scheduler) [Executor] Worker
>> blocked: <Worker name=jsonrpc/1 running <Task <JsonRpcTask {'params':
>> {u'bondings': {}, u'networks': {u'ovirtmgmt': {u'ipv6autoconf': False,
>> u'nic': u'eno1', u'mtu': 1500, u'switch': u'legacy', u'dhcpv6': False,
>> u'STP': u'no', u'bridged': u'true', u'defaultRoute': True,
>> u'bootproto': u'dhcp'}}, u'options': {u'connectivityCheck': u'true',
>> u'connectivityTimeout': 120, u'commitOnSuccess': True}}, 'jsonrpc':
>> '2.0', 'method': u'Host.setupNetworks', 'id':
>> u'2ca75cf3-6410-43b4-aebf-cdc3f262e5c2'} at 0x7fc9c95ef350>
>> timeout=60, duration=60.01 at 0x7fc9c961e090> task#=230106 at
>> 0x7fc9a43a1fd0>, traceback:
>> File: "/usr/lib64/python2.7/threading.py", line 785, in __bootstrap
>> self.__bootstrap_inner()
>> File: "/usr/lib64/python2.7/threading.py", line 812, in __bootstrap_inner
>> self.run()
>> File: "/usr/lib64/python2.7/threading.py", line 765, in run
>> self.__target(*self.__args, **self.__kwargs)
>> File: "/usr/lib/python2.7/site-packages/vdsm/common/concurrent.py",
>> line 195, in run
>> ret = func(*args, **kwargs)
>> File: "/usr/lib/python2.7/site-packages/vdsm/executor.py", line 301, in _run
>> self._execute_task()
>> File: "/usr/lib/python2.7/site-packages/vdsm/executor.py", line 315,
>> in _execute_task
>> task()
>> File: "/usr/lib/python2.7/site-packages/vdsm/executor.py", line 391, in __call__
>> self._callable()
>> File: "/usr/lib/python2.7/site-packages/yajsonrpc/__init__.py", line
>> 262, in __call__
>> self._handler(self._ctx, self._req)
>> File: "/usr/lib/python2.7/site-packages/yajsonrpc/__init__.py", line
>> 305, in _serveRequest
>> response = self._handle_request(req, ctx)
>> File: "/usr/lib/python2.7/site-packages/yajsonrpc/__init__.py", line
>> 345, in _handle_request
>> res = method(**params)
>> File: "/usr/lib/python2.7/site-packages/vdsm/rpc/Bridge.py", line 194,
>> in _dynamicMethod
>> result = fn(*methodArgs)
>> File: "<string>", line 2, in setupNetworks
>> File: "/usr/lib/python2.7/site-packages/vdsm/common/api.py", line 50, in method
>> ret = func(*args, **kwargs)
>> File: "/usr/lib/python2.7/site-packages/vdsm/API.py", line 1562, in
>> setupNetworks
>> supervdsm.getProxy().setupNetworks(networks, bondings, options)
>> File: "/usr/lib/python2.7/site-packages/vdsm/common/supervdsm.py",
>> line 56, in __call__
>> return callMethod()
>> File: "/usr/lib/python2.7/site-packages/vdsm/common/supervdsm.py",
>> line 54, in <lambda>
>> **kwargs)
>> File: "<string>", line 2, in setupNetworks
>> File: "/usr/lib64/python2.7/multiprocessing/managers.py", line 759, in
>> _callmethod
>> kind, result = conn.recv() (executor:363)
>> 2019-02-25 12:41:38,532+0200 INFO (jsonrpc/0) [jsonrpc.JsonRpcServer]
>> RPC call Host.ping2 succeeded in 0.00 seconds (__init__:312)
>> 2019-02-25 12:41:38,538+0200 INFO (jsonrpc/5) [api.host] START
>> getCapabilities() from=::1,41016 (api:48)
>> 2019-02-25 12:41:38,834+0200 INFO (jsonrpc/7) [api.host] START
>> getAllVmStats() from=::1,41018 (api:48)
>> 2019-02-25 12:41:38,845+0200 WARN (jsonrpc/7) [virt.vm]
>> (vmId='66394fff-6207-4277-ab47-bffd39075eef') monitor became
>> unresponsive (command timeout, age=70.0700000003) (vm:6014)
>> 2019-02-25 12:41:38,850+0200 INFO (jsonrpc/7) [api.host] FINISH
>> getAllVmStats return={'status': {'message': 'Done', 'code': 0},
>> 'statsList': (suppressed)} from=::1,41018 (api:54)
>> 2019-02-25 12:41:38,854+0200 INFO (jsonrpc/7) [jsonrpc.JsonRpcServer]
>> RPC call Host.getAllVmStats succeeded in 0.02 seconds (__init__:312)
>> 2019-02-25 12:41:38,867+0200 INFO (jsonrpc/2) [api.host] START
>> getAllVmIoTunePolicies() from=::1,41018 (api:48)
>> 2019-02-25 12:41:38,868+0200 INFO (jsonrpc/2) [api.host] FINISH
>> getAllVmIoTunePolicies return={'status': {'message': 'Done', 'code':
>> 0}, 'io_tune_policies_dict': {'5cb297ee-81c0-4fc6-a2ba-87192723e6ab':
>> {'policy': [], 'current_values': [{'ioTune': {'write_bytes_sec': 0L,
>> 'total_iops_sec': 0L, 'read_iops_sec': 0L, 'read_bytes_sec': 0L,
>> 'write_iops_sec': 0L, 'total_bytes_sec': 0L}, 'path':
>> '/rhev/data-center/mnt/blockSD/5106a440-3a0a-4a79-a8ee-0ffc5fa8e3a2/images/a4b8b705-d999-4578-99eb-c80aa1cbbec6/19e9b0b0-6485-43ed-93e5-c2ef40d2e388',
>> 'name': 'vda'}]}, '16a6d21b-3d69-4967-982b-dfe05534f153': {'policy':
>> [], 'current_values': [{'ioTune': {'write_bytes_sec': 0L,
>> 'total_iops_sec': 0L, 'read_iops_sec': 0L, 'read_bytes_sec': 0L,
>> 'write_iops_sec': 0L, 'total_bytes_sec': 0L}, 'path':
>> '/rhev/data-center/mnt/blockSD/5106a440-3a0a-4a79-a8ee-0ffc5fa8e3a2/images/67c41187-6510-429e-b88d-93a907bbbf38/b1f627d3-13c2-4800-8f29-4504d35bba5f',
>> 'name': 'sda'}]}, '59103e4b-863c-4c16-8f45-226712a9ab08': {'policy':
>> [], 'current_values': [{'ioTune': {'write_bytes_sec': 0L,
>> 'total_iops_sec': 0L, 'read_iops_sec': 0L, 'read_bytes_sec': 0L,
>> 'write_iops_sec': 0L, 'total_bytes_sec': 0L}, 'path':
>> u'/rhev/data-center/mnt/nfs.virt.local:_iso/3fd616f2-a1dd-4a31-ad35-02d039bb30b1/images/6a5c7e95-da88-4702-9595-2762e4870196/45ef1052-26ba-4c3b-90d8-9039e085419a',
>> 'name': 'sda'}, {'ioTune': {'write_bytes_sec': 0L, 'total_iops_sec':
>> 0L, 'read_iops_sec': 0L, 'read_bytes_sec': 0L, 'write_iops_sec': 0L,
>> 'total_bytes_sec': 0L}, 'path':
>> '/rhev/data-center/mnt/blockSD/48619c98-840f-4d3b-948d-47d081d2c55a/images/5163a9d9-8881-40a3-a993-acaa5ae3218d/b1942e61-9d1f-4b2c-868c-1bde862a6107',
>> 'name': 'sdb'}]}, '66394fff-6207-4277-ab47-bffd39075eef': {'policy':
>> [], 'current_values': [{'ioTune': {'write_bytes_sec': 0L,
>> 'total_iops_sec': 0L, 'read_iops_sec': 0L, 'read_bytes_sec': 0L,
>> 'write_iops_sec': 0L, 'total_bytes_sec': 0L}, 'path':
>> u'/rhev/data-center/mnt/nfs.virt.local:_iso/3fd616f2-a1dd-4a31-ad35-02d039bb30b1/images/2f0acad7-74e1-4775-8fb1-fc47b1cf5ee7/a7b9a10c-34eb-4273-a9dd-daebf1099a39',
>> 'name': 'sda'}]}}} from=::1,41018 (api:54)
>> 2019-02-25 12:41:38,869+0200 INFO (jsonrpc/2) [jsonrpc.JsonRpcServer]
>> RPC call Host.getAllVmIoTunePolicies succeeded in 0.01 seconds
>> (__init__:312)
>> 2019-02-25 12:41:40,451+0200 INFO (monitor/5106a44) [IOProcessClient]
>> (3fd616f2-a1dd-4a31-ad35-02d039bb30b1) Closing client (__init__:584)
>> 2019-02-25 12:41:40,452+0200 INFO (monitor/5106a44) [IOProcessClient]
>> (75efef0a-ccd4-4799-a6ed-8a4072edb698) Closing client (__init__:584)
>> 2019-02-25 12:41:40,534+0200 WARN (check/loop) [storage.check]
>> Checker u'/rhev/data-center/mnt/nfs.virt.local:_ovirtexport/75efef0a-ccd4-4799-a6ed-8a4072edb698/dom_md/metadata'
>> is blocked for 60.00 seconds (check:282)
>> 2019-02-25 12:41:40,545+0200 WARN (check/loop) [storage.check]
>> Checker u'/rhev/data-center/mnt/nfs.virt.local:_iso/3fd616f2-a1dd-4a31-ad35-02d039bb30b1/dom_md/metadata'
>> is blocked for 59.99 seconds (check:282)
>> 2019-02-25 12:41:40,885+0200 INFO (jsonrpc/5) [root] managedvolume
>> not supported: Managed Volume Not Supported. Missing package
>> os-brick.: ('Cannot import os_brick.initiator',) (caps:147)
>> 2019-02-25 12:41:41,582+0200 INFO (jsonrpc/5) [root]
>> /usr/libexec/vdsm/hooks/after_get_caps/50_openstacknet: rc=0 err=
>> (hooks:114)
>> 2019-02-25 12:41:41,754+0200 INFO (jsonrpc/3) [jsonrpc.JsonRpcServer]
>> RPC call Host.ping2 succeeded in 0.00 seconds (__init__:312)
>> 2019-02-25 12:41:41,854+0200 INFO (jsonrpc/5) [root]
>> /usr/libexec/vdsm/hooks/after_get_caps/openstacknet_utils.py: rc=0
>> err= (hooks:114)
>> 2019-02-25 12:41:41,891+0200 INFO (jsonrpc/6) [jsonrpc.JsonRpcServer]
>> RPC call Host.ping2 succeeded in 0.00 seconds (__init__:312)
>> 2019-02-25 12:41:41,899+0200 INFO (jsonrpc/4) [vdsm.api] START
>> repoStats(domains=[u'5106a440-3a0a-4a79-a8ee-0ffc5fa8e3a2'])
>> from=::1,41016, task_id=6273a6a4-efbf-4780-9676-26b99502d712 (api:48)
>> 2019-02-25 12:41:41,899+0200 INFO (jsonrpc/4) [vdsm.api] FINISH
>> repoStats return={u'5106a440-3a0a-4a79-a8ee-0ffc5fa8e3a2': {'code': 0,
>> 'actual': True, 'version': 4, 'acquired': True, 'delay':
>> '0.000473384', 'lastCheck': '1.4', 'valid': True}} from=::1,41016,
>> task_id=6273a6a4-efbf-4780-9676-26b99502d712 (api:54)
>> 2019-02-25 12:41:41,900+0200 INFO (jsonrpc/4) [jsonrpc.JsonRpcServer]
>> RPC call Host.getStorageRepoStats succeeded in 0.01 seconds
>> (__init__:312)
>> 2019-02-25 12:41:41,904+0200 INFO (jsonrpc/0) [jsonrpc.JsonRpcServer]
>> RPC call Host.ping2 succeeded in 0.00 seconds (__init__:312)
>> 2019-02-25 12:41:41,915+0200 INFO (jsonrpc/7) [vdsm.api] START
>> repoStats(domains=[u'5106a440-3a0a-4a79-a8ee-0ffc5fa8e3a2'])
>> from=::1,41016, task_id=171a65d3-8249-463d-bfe2-646bf1be9472 (api:48)
>> 2019-02-25 12:41:41,916+0200 INFO (jsonrpc/7) [vdsm.api] FINISH
>> repoStats return={u'5106a440-3a0a-4a79-a8ee-0ffc5fa8e3a2': {'code': 0,
>> 'actual': True, 'version': 4, 'acquired': True, 'delay':
>> '0.000473384', 'lastCheck': '1.4', 'valid': True}} from=::1,41016,
>> task_id=171a65d3-8249-463d-bfe2-646bf1be9472 (api:54)
>> 2019-02-25 12:41:41,916+0200 INFO (jsonrpc/7) [jsonrpc.JsonRpcServer]
>> RPC call Host.getStorageRepoStats succeeded in 0.00 seconds
>> (__init__:312)
>> 2019-02-25 12:41:42,049+0200 WARN (qgapoller/0)
>> [virt.periodic.VmDispatcher] could not run <function <lambda> at
>> 0x7fc9e8092aa0> on ['59103e4b-863c-4c16-8f45-226712a9ab08',
>> '66394fff-6207-4277-ab47-bffd39075eef'] (periodic:289)
>> 2019-02-25 12:41:42,200+0200 INFO (jsonrpc/5) [root]
>> /usr/libexec/vdsm/hooks/after_get_caps/ovirt_provider_ovn_hook: rc=0
>> err= (hooks:114)
>>
>> This is from engine.log with the relevant call:
>>
>> 2019-02-25 12:40:38,461+02 INFO
>> [org.ovirt.engine.core.vdsbroker.vdsbroker.HostSetupNetworksVDSCommand]
>> (default task-17) [1f31b2f5] START,
>> HostSetupNetworksVDSCommand(HostName = ovirt01.virt.local,
>> HostSetupNetworksVdsCommandParameters:{hostId='7722c972-c25a-411e-81ba-af6b0801e9ba',
>> vds='Host[ovirt01.virt.local,7722c972-c25a-411e-81ba-af6b0801e9ba]',
>> rollbackOnFailure='true', commitOnSuccess='true',
>> connectivityTimeout='120',
>> networks='[HostNetwork:{defaultRoute='true', bonding='false',
>> networkName='ovirtmgmt', vdsmName='ovirtmgmt', nicName='eno1',
>> vlan='null', vmNetwork='true', stp='false', properties='[]',
>> ipv4BootProtocol='DHCP', ipv4Address='null', ipv4Netmask='null',
>> ipv4Gateway='null', ipv6BootProtocol='NONE', ipv6Address='null',
>> ipv6Prefix='null', ipv6Gateway='null', nameServers='null'}]',
>> removedNetworks='[]', bonds='[]', removedBonds='[]',
>> clusterSwitchType='LEGACY', managementNetworkChanged='true'}), log id:
>> 7be20d99
>> 2019-02-25 12:40:38,463+02 INFO
>> [org.ovirt.engine.core.vdsbroker.vdsbroker.HostSetupNetworksVDSCommand]
>> (default task-17) [1f31b2f5] FINISH, HostSetupNetworksVDSCommand,
>> return: , log id: 7be20d99
>> 2019-02-25 12:40:51,529+02 INFO
>> [org.ovirt.vdsm.jsonrpc.client.reactors.ReactorClient] (SSL Stomp
>> Reactor) [] No interaction with host 'ovirt01.virt.local' for 20000
>> ms.
>> 2019-02-25 12:40:54,081+02 ERROR
>> [org.ovirt.vdsm.jsonrpc.client.reactors.ReactorClient] (SSL Stomp
>> Reactor) [] Connection timeout for host 'ovirt01.virt.local', last
>> response arrived 23360 ms ago.
>> 2019-02-25 12:40:54,100+02 WARN
>> [org.ovirt.engine.core.vdsbroker.VdsManager]
>> (EE-ManagedThreadFactory-engine-Thread-6299) [] Host
>> 'ovirt01.virt.local' is not responding. It will stay in Connecting
>> state for a grace period of 82 seconds and after that an attempt to
>> fence the host will be issued.
>>
>> You can see DHCP from there too and results of it getting things
>> wrong. Maybe there should option when you try to resync that engine
>> shows what it wants to change, so you can approve the change. Noticed
>> also that clocks are not in sync. Put the ntpd on now. There is not
>> ntpd installed in HE?
>>
>> I have engine backups from Feb 7th and this is the line in
>> vds_interface table in every backup until failure (10.168.8.111 is the
>> correct address). So if this is the table engine uses for node
>> management, the question is how it came to conclusion that is should
>> use DHCP?
>>
>> aef5b4fd-5760-44dc-950f-58363de7afcf eno1 ovirtmgmt
>> 7722c972-c25a-411e-81ba-af6b0801e9ba 54:80:28:4a:2b:18 f
>> \N \N \N \N 1000 10.168.8.111 255.255.255.0
>> 10.168.8.1 2 2 2018-12-05 12:51:58.241569+02
>> 2019-02-06 11:54:57.027254+02 1500 t \N f \N
>> 2 \N \N :: \N legacy \N \N t
>>
>> > > > > This led the node being rebooted by HP ILO. After
>> > > > > reboot the node came up with DHCP address.
>> >
>> > I have no idea why this happened, maybe there is a trace
>> > in /var/log/messages ?
>>
>> Reboot was because of fencing. See the log above.
>>
>> > VDSM would revert a change in management networking configuration,
>> > if after the change communication to engine is broken.
>> > But if the host is rebooted during the change, there might be a
>> > problem.
>> > If you think you triggered
>> > Bug 1664106 - [Network] Host reboot while setup networks leads to host stuck on status 'connecting'
>> > you logfiles would be very helpful, if they are attached to the bug.
>>
>> I don't think that this is connected to that because change succeeded.
>>
>> Thanks,
>> -Juhani
>> _______________________________________________
>> Users mailing list -- users@ovirt.org
>> To unsubscribe send an email to users-leave@ovirt.org
>> Privacy Statement: https://www.ovirt.org/site/privacy-policy/
>> oVirt Code of Conduct: https://www.ovirt.org/community/about/community-guidelines/
>> List Archives: https://lists.ovirt.org/archives/list/users@ovirt.org/message/LV3DX4WH5PSE7ZXYTOWK46C3B5ZSNTHR/