Node losing management network address?

Hi! I had weird occurence in my two node ovirt cluster today (I have HE). I noticed that one node had ovirtmgmt network unsynchronized. I tried to resynchronize it. This led the node being rebooted by HP ILO. After reboot the node came up with DHCP address. Tried to change it back by fixing ifcfg-ovirtmgmt to original static address. It reverter back to DHCP if I tried the resync the network. I decided to remove HE from the node in order to remove the node in order to be able to add it back. After I started HE removal, address popped back to static address. I did upgrade which was pending on node and after reboot it came back with DHCP address again. After this I removed the node from the cluster, added it back and now it seems to work. I'm just wondering how I prevent this from happening again? How this unsync situation happens for instance and why it decides that DHCP is the way to go? -Juhani

On Mon, 25 Feb 2019 13:46:59 +0200 Juhani Rautiainen <juhani.rautiainen@gmail.com> wrote:
Hi!
I had weird occurence in my two node ovirt cluster today (I have HE). I noticed that one node had ovirtmgmt network unsynchronized. I tried to resynchronize it. This led the node being rebooted by HP ILO. After reboot the node came up with DHCP address. Tried to change it back by fixing ifcfg-ovirtmgmt to original static address.
How did you fix? By ovirt-engine's web UI, REST-API or by modifying a config file on the host, or cockpit?
It reverter back to DHCP if I tried the resync the network. I decided to remove HE from the node in order to remove the node in order to be able to add it back. After I started HE removal, address popped back to static address. I did upgrade which was pending on node and after reboot it came back with DHCP address again. After this I removed the node from the cluster, added it back and now it seems to work. I'm just wondering how I prevent this from happening again?
If you would share the vdsm.log files containing the relevant flow, this would help to understand what happened.
How this unsync situation happens for instance and why it decides that DHCP is the way to go?
-Juhani _______________________________________________ Users mailing list -- users@ovirt.org To unsubscribe send an email to users-leave@ovirt.org Privacy Statement: https://www.ovirt.org/site/privacy-policy/ oVirt Code of Conduct: https://www.ovirt.org/community/about/community-guidelines/ List Archives: https://lists.ovirt.org/archives/list/users@ovirt.org/message/G5K6WJXAUENHV6...

On Tue, Feb 26, 2019 at 12:05 PM Dominik Holler <dholler@redhat.com> wrote:
On Mon, 25 Feb 2019 13:46:59 +0200 Juhani Rautiainen <juhani.rautiainen@gmail.com> wrote:
Hi!
I had weird occurence in my two node ovirt cluster today (I have HE). I noticed that one node had ovirtmgmt network unsynchronized. I tried to resynchronize it. This led the node being rebooted by HP ILO. After reboot the node came up with DHCP address. Tried to change it back by fixing ifcfg-ovirtmgmt to original static address.
How did you fix? By ovirt-engine's web UI, REST-API or by modifying a config file on the host, or cockpit?
I just copied that ifcfg-ovirtmgmt file from second node and fixed IP-address to correct one before doing ifdown/ifup. That file had changed to DHCP so my instinct was trying to correct that one.
If you would share the vdsm.log files containing the relevant flow, this would help to understand what happened.
Can I upload these somewhere? I can find the vdsm logs from the failure time frame. From engine logs I can see that EVENT_ID: VDS_NETWORKS_OUT_OF_SYNC(1,110) started weeks earlier (February 6th). The problem really just flared when I noticed it and tried to resync. There are not old enough vdsm logs to see what happened back then. This event continues daily so is there anything on vdsm logs which is connected to that event that I could dig for? Just noticed that this is pretty much the date I upgraded cluster from 4.2.8 to 4.3. Thanks, Juhani

On Wed, 27 Feb 2019 08:52:24 +0200 Juhani Rautiainen <juhani.rautiainen@gmail.com> wrote:
On Tue, Feb 26, 2019 at 12:05 PM Dominik Holler <dholler@redhat.com> wrote:
On Mon, 25 Feb 2019 13:46:59 +0200 Juhani Rautiainen <juhani.rautiainen@gmail.com> wrote:
Hi!
I had weird occurence in my two node ovirt cluster today (I have HE). I noticed that one node had ovirtmgmt network unsynchronized. I tried to resynchronize it. This led the node being rebooted by HP ILO. After reboot the node came up with DHCP address. Tried to change it back by fixing ifcfg-ovirtmgmt to original static address.
How did you fix? By ovirt-engine's web UI, REST-API or by modifying a config file on the host, or cockpit?
I just copied that ifcfg-ovirtmgmt file from second node and fixed IP-address to correct one before doing ifdown/ifup. That file had changed to DHCP so my instinct was trying to correct that one.
Please let oVirt doing the work for you. If you interface to oVirt is the web UI, please use the dialog "Edit Managment Network: ovirtmgmt" which opens by clicking on the pencil symbol next to ovirtmgmt in Compute > Hosts > xxx > Network Interfaces > Setup Host Networks This will enable oVirt to recognize this change as intended.
If you would share the vdsm.log files containing the relevant flow, this would help to understand what happened.
Can I upload these somewhere? I can find the vdsm logs from the failure time frame. From engine logs I can see that EVENT_ID: VDS_NETWORKS_OUT_OF_SYNC(1,110) started weeks earlier (February 6th). The problem really just flared when I noticed it and tried to resync. There are not old enough vdsm logs to see what happened back then. This event continues daily so is there anything on vdsm logs which is connected to that event that I could dig for? Just noticed that this is pretty much the date I upgraded cluster from 4.2.8 to 4.3.
Maybe no required anymore, since you described very precise what you did.
Thanks, Juhani

On Wed, Feb 27, 2019 at 10:49 AM Dominik Holler <dholler@redhat.com> wrote:
I just copied that ifcfg-ovirtmgmt file from second node and fixed IP-address to correct one before doing ifdown/ifup. That file had changed to DHCP so my instinct was trying to correct that one.
Please let oVirt doing the work for you. If you interface to oVirt is the web UI, please use the dialog "Edit Managment Network: ovirtmgmt" which opens by clicking on the pencil symbol next to ovirtmgmt in Compute > Hosts > xxx > Network Interfaces > Setup Host Networks This will enable oVirt to recognize this change as intended.
Problem was that oVirt couldn't do the work anymore. It had for some reason switched that node to using DHCP addresses. DHCP gave totally different address to the node which was not known by ovirt engine. This is why I tried above change because I had lost connections to node after the resync. I had to use HP ILO console to see what's going on and found out that it had switched to DHCP and had wrong. And it used ILO fencing to boot the server because it couldn't reach it (which took many aftive vm's down). After the boot it still couldn't connect because address was still given by DHCP. What I'm wondering why it switched to DHCP when it had had static since first minute?
Maybe no required anymore, since you described very precise what you did.
Or not clearly enough. Thanks, -Juhani

On Wed, 27 Feb 2019 13:24:14 +0200 Juhani Rautiainen <juhani.rautiainen@gmail.com> wrote:
On Wed, Feb 27, 2019 at 10:49 AM Dominik Holler <dholler@redhat.com> wrote:
I just copied that ifcfg-ovirtmgmt file from second node and fixed IP-address to correct one before doing ifdown/ifup. That file had changed to DHCP so my instinct was trying to correct that one.
Please let oVirt doing the work for you. If you interface to oVirt is the web UI, please use the dialog "Edit Managment Network: ovirtmgmt" which opens by clicking on the pencil symbol next to ovirtmgmt in Compute > Hosts > xxx > Network Interfaces > Setup Host Networks This will enable oVirt to recognize this change as intended.
Problem was that oVirt couldn't do the work anymore. It had for some reason switched that node to using DHCP addresses. DHCP gave totally different address to the node which was not known by ovirt engine. This is why I tried above change because I had lost connections to node after the resync. I had to use HP ILO console to see what's going on and found out that it had switched to DHCP and had wrong. And it used ILO fencing to boot the server because it couldn't reach it (which took many aftive vm's down). After the boot it still couldn't connect because address was still given by DHCP.
You did the right thing to establish a temporary connection of host and engine. As soon as the engine is able to communicate with the host, oVirt's web ui (or REST-API) should be used to configure the management network address and to ensure, that the host is in sync with oVirt's config.
What I'm wondering why it switched to DHCP when it had had static since first minute?
This is a valid question.
I noticed that one node had ovirtmgmt network unsynchronized. I tried
oVirt detected a difference between the expected configuration and applied configuration. This might happen if the interface configuration is change directly on the host instead of using oVirt Engine.
to resynchronize it.
If you have the vdsm.log, the relevant lines start at the pattern Calling 'Host.setupNetworks' and ends at the pattern FINISH getCapabilities
This led the node being rebooted by HP ILO. After reboot the node came up with DHCP address.
I have no idea why this happened, maybe there is a trace in /var/log/messages ? VDSM would revert a change in management networking configuration, if after the change communication to engine is broken. But if the host is rebooted during the change, there might be a problem. If you think you triggered Bug 1664106 - [Network] Host reboot while setup networks leads to host stuck on status 'connecting' you logfiles would be very helpful, if they are attached to the bug. Thanks!
Maybe no required anymore, since you described very precise what you did.
Or not clearly enough.
Thanks, -Juhani

On Wed, Feb 27, 2019 at 6:26 PM Dominik Holler <dholler@redhat.com> wrote:
This is a valid question.
I noticed that one node had ovirtmgmt network unsynchronized. I tried
oVirt detected a difference between the expected configuration and applied configuration. This might happen if the interface configuration is change directly on the host instead of using oVirt Engine.
to resynchronize it.
If you have the vdsm.log, the relevant lines start at the pattern Calling 'Host.setupNetworks' and ends at the pattern FINISH getCapabilities
This gave some clues. See the log below. IMHO it points engine getting something wrong because it seems to ask for DHCP setup in query. Probably fails because it succeeds in address change and network connection is torn down. 2019-02-25 12:41:38,478+0200 WARN (vdsm.Scheduler) [Executor] Worker blocked: <Worker name=jsonrpc/1 running <Task <JsonRpcTask {'params': {u'bondings': {}, u'networks': {u'ovirtmgmt': {u'ipv6autoconf': False, u'nic': u'eno1', u'mtu': 1500, u'switch': u'legacy', u'dhcpv6': False, u'STP': u'no', u'bridged': u'true', u'defaultRoute': True, u'bootproto': u'dhcp'}}, u'options': {u'connectivityCheck': u'true', u'connectivityTimeout': 120, u'commitOnSuccess': True}}, 'jsonrpc': '2.0', 'method': u'Host.setupNetworks', 'id': u'2ca75cf3-6410-43b4-aebf-cdc3f262e5c2'} at 0x7fc9c95ef350> timeout=60, duration=60.01 at 0x7fc9c961e090> task#=230106 at 0x7fc9a43a1fd0>, traceback: File: "/usr/lib64/python2.7/threading.py", line 785, in __bootstrap self.__bootstrap_inner() File: "/usr/lib64/python2.7/threading.py", line 812, in __bootstrap_inner self.run() File: "/usr/lib64/python2.7/threading.py", line 765, in run self.__target(*self.__args, **self.__kwargs) File: "/usr/lib/python2.7/site-packages/vdsm/common/concurrent.py", line 195, in run ret = func(*args, **kwargs) File: "/usr/lib/python2.7/site-packages/vdsm/executor.py", line 301, in _run self._execute_task() File: "/usr/lib/python2.7/site-packages/vdsm/executor.py", line 315, in _execute_task task() File: "/usr/lib/python2.7/site-packages/vdsm/executor.py", line 391, in __call__ self._callable() File: "/usr/lib/python2.7/site-packages/yajsonrpc/__init__.py", line 262, in __call__ self._handler(self._ctx, self._req) File: "/usr/lib/python2.7/site-packages/yajsonrpc/__init__.py", line 305, in _serveRequest response = self._handle_request(req, ctx) File: "/usr/lib/python2.7/site-packages/yajsonrpc/__init__.py", line 345, in _handle_request res = method(**params) File: "/usr/lib/python2.7/site-packages/vdsm/rpc/Bridge.py", line 194, in _dynamicMethod result = fn(*methodArgs) File: "<string>", line 2, in setupNetworks File: "/usr/lib/python2.7/site-packages/vdsm/common/api.py", line 50, in method ret = func(*args, **kwargs) File: "/usr/lib/python2.7/site-packages/vdsm/API.py", line 1562, in setupNetworks supervdsm.getProxy().setupNetworks(networks, bondings, options) File: "/usr/lib/python2.7/site-packages/vdsm/common/supervdsm.py", line 56, in __call__ return callMethod() File: "/usr/lib/python2.7/site-packages/vdsm/common/supervdsm.py", line 54, in <lambda> **kwargs) File: "<string>", line 2, in setupNetworks File: "/usr/lib64/python2.7/multiprocessing/managers.py", line 759, in _callmethod kind, result = conn.recv() (executor:363) 2019-02-25 12:41:38,532+0200 INFO (jsonrpc/0) [jsonrpc.JsonRpcServer] RPC call Host.ping2 succeeded in 0.00 seconds (__init__:312) 2019-02-25 12:41:38,538+0200 INFO (jsonrpc/5) [api.host] START getCapabilities() from=::1,41016 (api:48) 2019-02-25 12:41:38,834+0200 INFO (jsonrpc/7) [api.host] START getAllVmStats() from=::1,41018 (api:48) 2019-02-25 12:41:38,845+0200 WARN (jsonrpc/7) [virt.vm] (vmId='66394fff-6207-4277-ab47-bffd39075eef') monitor became unresponsive (command timeout, age=70.0700000003) (vm:6014) 2019-02-25 12:41:38,850+0200 INFO (jsonrpc/7) [api.host] FINISH getAllVmStats return={'status': {'message': 'Done', 'code': 0}, 'statsList': (suppressed)} from=::1,41018 (api:54) 2019-02-25 12:41:38,854+0200 INFO (jsonrpc/7) [jsonrpc.JsonRpcServer] RPC call Host.getAllVmStats succeeded in 0.02 seconds (__init__:312) 2019-02-25 12:41:38,867+0200 INFO (jsonrpc/2) [api.host] START getAllVmIoTunePolicies() from=::1,41018 (api:48) 2019-02-25 12:41:38,868+0200 INFO (jsonrpc/2) [api.host] FINISH getAllVmIoTunePolicies return={'status': {'message': 'Done', 'code': 0}, 'io_tune_policies_dict': {'5cb297ee-81c0-4fc6-a2ba-87192723e6ab': {'policy': [], 'current_values': [{'ioTune': {'write_bytes_sec': 0L, 'total_iops_sec': 0L, 'read_iops_sec': 0L, 'read_bytes_sec': 0L, 'write_iops_sec': 0L, 'total_bytes_sec': 0L}, 'path': '/rhev/data-center/mnt/blockSD/5106a440-3a0a-4a79-a8ee-0ffc5fa8e3a2/images/a4b8b705-d999-4578-99eb-c80aa1cbbec6/19e9b0b0-6485-43ed-93e5-c2ef40d2e388', 'name': 'vda'}]}, '16a6d21b-3d69-4967-982b-dfe05534f153': {'policy': [], 'current_values': [{'ioTune': {'write_bytes_sec': 0L, 'total_iops_sec': 0L, 'read_iops_sec': 0L, 'read_bytes_sec': 0L, 'write_iops_sec': 0L, 'total_bytes_sec': 0L}, 'path': '/rhev/data-center/mnt/blockSD/5106a440-3a0a-4a79-a8ee-0ffc5fa8e3a2/images/67c41187-6510-429e-b88d-93a907bbbf38/b1f627d3-13c2-4800-8f29-4504d35bba5f', 'name': 'sda'}]}, '59103e4b-863c-4c16-8f45-226712a9ab08': {'policy': [], 'current_values': [{'ioTune': {'write_bytes_sec': 0L, 'total_iops_sec': 0L, 'read_iops_sec': 0L, 'read_bytes_sec': 0L, 'write_iops_sec': 0L, 'total_bytes_sec': 0L}, 'path': u'/rhev/data-center/mnt/nfs.virt.local:_iso/3fd616f2-a1dd-4a31-ad35-02d039bb30b1/images/6a5c7e95-da88-4702-9595-2762e4870196/45ef1052-26ba-4c3b-90d8-9039e085419a', 'name': 'sda'}, {'ioTune': {'write_bytes_sec': 0L, 'total_iops_sec': 0L, 'read_iops_sec': 0L, 'read_bytes_sec': 0L, 'write_iops_sec': 0L, 'total_bytes_sec': 0L}, 'path': '/rhev/data-center/mnt/blockSD/48619c98-840f-4d3b-948d-47d081d2c55a/images/5163a9d9-8881-40a3-a993-acaa5ae3218d/b1942e61-9d1f-4b2c-868c-1bde862a6107', 'name': 'sdb'}]}, '66394fff-6207-4277-ab47-bffd39075eef': {'policy': [], 'current_values': [{'ioTune': {'write_bytes_sec': 0L, 'total_iops_sec': 0L, 'read_iops_sec': 0L, 'read_bytes_sec': 0L, 'write_iops_sec': 0L, 'total_bytes_sec': 0L}, 'path': u'/rhev/data-center/mnt/nfs.virt.local:_iso/3fd616f2-a1dd-4a31-ad35-02d039bb30b1/images/2f0acad7-74e1-4775-8fb1-fc47b1cf5ee7/a7b9a10c-34eb-4273-a9dd-daebf1099a39', 'name': 'sda'}]}}} from=::1,41018 (api:54) 2019-02-25 12:41:38,869+0200 INFO (jsonrpc/2) [jsonrpc.JsonRpcServer] RPC call Host.getAllVmIoTunePolicies succeeded in 0.01 seconds (__init__:312) 2019-02-25 12:41:40,451+0200 INFO (monitor/5106a44) [IOProcessClient] (3fd616f2-a1dd-4a31-ad35-02d039bb30b1) Closing client (__init__:584) 2019-02-25 12:41:40,452+0200 INFO (monitor/5106a44) [IOProcessClient] (75efef0a-ccd4-4799-a6ed-8a4072edb698) Closing client (__init__:584) 2019-02-25 12:41:40,534+0200 WARN (check/loop) [storage.check] Checker u'/rhev/data-center/mnt/nfs.virt.local:_ovirtexport/75efef0a-ccd4-4799-a6ed-8a4072edb698/dom_md/metadata' is blocked for 60.00 seconds (check:282) 2019-02-25 12:41:40,545+0200 WARN (check/loop) [storage.check] Checker u'/rhev/data-center/mnt/nfs.virt.local:_iso/3fd616f2-a1dd-4a31-ad35-02d039bb30b1/dom_md/metadata' is blocked for 59.99 seconds (check:282) 2019-02-25 12:41:40,885+0200 INFO (jsonrpc/5) [root] managedvolume not supported: Managed Volume Not Supported. Missing package os-brick.: ('Cannot import os_brick.initiator',) (caps:147) 2019-02-25 12:41:41,582+0200 INFO (jsonrpc/5) [root] /usr/libexec/vdsm/hooks/after_get_caps/50_openstacknet: rc=0 err= (hooks:114) 2019-02-25 12:41:41,754+0200 INFO (jsonrpc/3) [jsonrpc.JsonRpcServer] RPC call Host.ping2 succeeded in 0.00 seconds (__init__:312) 2019-02-25 12:41:41,854+0200 INFO (jsonrpc/5) [root] /usr/libexec/vdsm/hooks/after_get_caps/openstacknet_utils.py: rc=0 err= (hooks:114) 2019-02-25 12:41:41,891+0200 INFO (jsonrpc/6) [jsonrpc.JsonRpcServer] RPC call Host.ping2 succeeded in 0.00 seconds (__init__:312) 2019-02-25 12:41:41,899+0200 INFO (jsonrpc/4) [vdsm.api] START repoStats(domains=[u'5106a440-3a0a-4a79-a8ee-0ffc5fa8e3a2']) from=::1,41016, task_id=6273a6a4-efbf-4780-9676-26b99502d712 (api:48) 2019-02-25 12:41:41,899+0200 INFO (jsonrpc/4) [vdsm.api] FINISH repoStats return={u'5106a440-3a0a-4a79-a8ee-0ffc5fa8e3a2': {'code': 0, 'actual': True, 'version': 4, 'acquired': True, 'delay': '0.000473384', 'lastCheck': '1.4', 'valid': True}} from=::1,41016, task_id=6273a6a4-efbf-4780-9676-26b99502d712 (api:54) 2019-02-25 12:41:41,900+0200 INFO (jsonrpc/4) [jsonrpc.JsonRpcServer] RPC call Host.getStorageRepoStats succeeded in 0.01 seconds (__init__:312) 2019-02-25 12:41:41,904+0200 INFO (jsonrpc/0) [jsonrpc.JsonRpcServer] RPC call Host.ping2 succeeded in 0.00 seconds (__init__:312) 2019-02-25 12:41:41,915+0200 INFO (jsonrpc/7) [vdsm.api] START repoStats(domains=[u'5106a440-3a0a-4a79-a8ee-0ffc5fa8e3a2']) from=::1,41016, task_id=171a65d3-8249-463d-bfe2-646bf1be9472 (api:48) 2019-02-25 12:41:41,916+0200 INFO (jsonrpc/7) [vdsm.api] FINISH repoStats return={u'5106a440-3a0a-4a79-a8ee-0ffc5fa8e3a2': {'code': 0, 'actual': True, 'version': 4, 'acquired': True, 'delay': '0.000473384', 'lastCheck': '1.4', 'valid': True}} from=::1,41016, task_id=171a65d3-8249-463d-bfe2-646bf1be9472 (api:54) 2019-02-25 12:41:41,916+0200 INFO (jsonrpc/7) [jsonrpc.JsonRpcServer] RPC call Host.getStorageRepoStats succeeded in 0.00 seconds (__init__:312) 2019-02-25 12:41:42,049+0200 WARN (qgapoller/0) [virt.periodic.VmDispatcher] could not run <function <lambda> at 0x7fc9e8092aa0> on ['59103e4b-863c-4c16-8f45-226712a9ab08', '66394fff-6207-4277-ab47-bffd39075eef'] (periodic:289) 2019-02-25 12:41:42,200+0200 INFO (jsonrpc/5) [root] /usr/libexec/vdsm/hooks/after_get_caps/ovirt_provider_ovn_hook: rc=0 err= (hooks:114) This is from engine.log with the relevant call: 2019-02-25 12:40:38,461+02 INFO [org.ovirt.engine.core.vdsbroker.vdsbroker.HostSetupNetworksVDSCommand] (default task-17) [1f31b2f5] START, HostSetupNetworksVDSCommand(HostName = ovirt01.virt.local, HostSetupNetworksVdsCommandParameters:{hostId='7722c972-c25a-411e-81ba-af6b0801e9ba', vds='Host[ovirt01.virt.local,7722c972-c25a-411e-81ba-af6b0801e9ba]', rollbackOnFailure='true', commitOnSuccess='true', connectivityTimeout='120', networks='[HostNetwork:{defaultRoute='true', bonding='false', networkName='ovirtmgmt', vdsmName='ovirtmgmt', nicName='eno1', vlan='null', vmNetwork='true', stp='false', properties='[]', ipv4BootProtocol='DHCP', ipv4Address='null', ipv4Netmask='null', ipv4Gateway='null', ipv6BootProtocol='NONE', ipv6Address='null', ipv6Prefix='null', ipv6Gateway='null', nameServers='null'}]', removedNetworks='[]', bonds='[]', removedBonds='[]', clusterSwitchType='LEGACY', managementNetworkChanged='true'}), log id: 7be20d99 2019-02-25 12:40:38,463+02 INFO [org.ovirt.engine.core.vdsbroker.vdsbroker.HostSetupNetworksVDSCommand] (default task-17) [1f31b2f5] FINISH, HostSetupNetworksVDSCommand, return: , log id: 7be20d99 2019-02-25 12:40:51,529+02 INFO [org.ovirt.vdsm.jsonrpc.client.reactors.ReactorClient] (SSL Stomp Reactor) [] No interaction with host 'ovirt01.virt.local' for 20000 ms. 2019-02-25 12:40:54,081+02 ERROR [org.ovirt.vdsm.jsonrpc.client.reactors.ReactorClient] (SSL Stomp Reactor) [] Connection timeout for host 'ovirt01.virt.local', last response arrived 23360 ms ago. 2019-02-25 12:40:54,100+02 WARN [org.ovirt.engine.core.vdsbroker.VdsManager] (EE-ManagedThreadFactory-engine-Thread-6299) [] Host 'ovirt01.virt.local' is not responding. It will stay in Connecting state for a grace period of 82 seconds and after that an attempt to fence the host will be issued. You can see DHCP from there too and results of it getting things wrong. Maybe there should option when you try to resync that engine shows what it wants to change, so you can approve the change. Noticed also that clocks are not in sync. Put the ntpd on now. There is not ntpd installed in HE? I have engine backups from Feb 7th and this is the line in vds_interface table in every backup until failure (10.168.8.111 is the correct address). So if this is the table engine uses for node management, the question is how it came to conclusion that is should use DHCP? aef5b4fd-5760-44dc-950f-58363de7afcf eno1 ovirtmgmt 7722c972-c25a-411e-81ba-af6b0801e9ba 54:80:28:4a:2b:18 f \N \N \N \N 1000 10.168.8.111 255.255.255.0 10.168.8.1 2 2 2018-12-05 12:51:58.241569+02 2019-02-06 11:54:57.027254+02 1500 t \N f \N 2 \N \N :: \N legacy \N \N t
This led the node being rebooted by HP ILO. After reboot the node came up with DHCP address.
I have no idea why this happened, maybe there is a trace in /var/log/messages ?
Reboot was because of fencing. See the log above.
VDSM would revert a change in management networking configuration, if after the change communication to engine is broken. But if the host is rebooted during the change, there might be a problem. If you think you triggered Bug 1664106 - [Network] Host reboot while setup networks leads to host stuck on status 'connecting' you logfiles would be very helpful, if they are attached to the bug.
I don't think that this is connected to that because change succeeded. Thanks, -Juhani

On boot, VDSM attempts to apply its persisted network desired configuration through ifcfg. Even though you may have fixed the ifcfg files, the persisted VDSM config may had been set with DHCP, therefore your changes would get overwritten on boot or when the synchronization was issued from Engine. When you get an unsync warning, fixes on the host should match with the persisted config known to VDSM. Usually the persisted state in VDSM and the one in Engine are the same, unless something very strange has happened... like moving hosts between Engine servers or self editing on one side without it reaching the other side. On Thu, Feb 28, 2019 at 9:34 AM Juhani Rautiainen < juhani.rautiainen@gmail.com> wrote:
On Wed, Feb 27, 2019 at 6:26 PM Dominik Holler <dholler@redhat.com> wrote:
This is a valid question.
I noticed that one node had ovirtmgmt network unsynchronized. I
tried
oVirt detected a difference between the expected configuration and
applied
configuration. This might happen if the interface configuration is change directly on the host instead of using oVirt Engine.
to resynchronize it.
If you have the vdsm.log, the relevant lines start at the pattern Calling 'Host.setupNetworks' and ends at the pattern FINISH getCapabilities
This gave some clues. See the log below. IMHO it points engine getting something wrong because it seems to ask for DHCP setup in query. Probably fails because it succeeds in address change and network connection is torn down.
2019-02-25 12:41:38,478+0200 WARN (vdsm.Scheduler) [Executor] Worker blocked: <Worker name=jsonrpc/1 running <Task <JsonRpcTask {'params': {u'bondings': {}, u'networks': {u'ovirtmgmt': {u'ipv6autoconf': False, u'nic': u'eno1', u'mtu': 1500, u'switch': u'legacy', u'dhcpv6': False, u'STP': u'no', u'bridged': u'true', u'defaultRoute': True, u'bootproto': u'dhcp'}}, u'options': {u'connectivityCheck': u'true', u'connectivityTimeout': 120, u'commitOnSuccess': True}}, 'jsonrpc': '2.0', 'method': u'Host.setupNetworks', 'id': u'2ca75cf3-6410-43b4-aebf-cdc3f262e5c2'} at 0x7fc9c95ef350> timeout=60, duration=60.01 at 0x7fc9c961e090> task#=230106 at 0x7fc9a43a1fd0>, traceback: File: "/usr/lib64/python2.7/threading.py", line 785, in __bootstrap self.__bootstrap_inner() File: "/usr/lib64/python2.7/threading.py", line 812, in __bootstrap_inner self.run() File: "/usr/lib64/python2.7/threading.py", line 765, in run self.__target(*self.__args, **self.__kwargs) File: "/usr/lib/python2.7/site-packages/vdsm/common/concurrent.py", line 195, in run ret = func(*args, **kwargs) File: "/usr/lib/python2.7/site-packages/vdsm/executor.py", line 301, in _run self._execute_task() File: "/usr/lib/python2.7/site-packages/vdsm/executor.py", line 315, in _execute_task task() File: "/usr/lib/python2.7/site-packages/vdsm/executor.py", line 391, in __call__ self._callable() File: "/usr/lib/python2.7/site-packages/yajsonrpc/__init__.py", line 262, in __call__ self._handler(self._ctx, self._req) File: "/usr/lib/python2.7/site-packages/yajsonrpc/__init__.py", line 305, in _serveRequest response = self._handle_request(req, ctx) File: "/usr/lib/python2.7/site-packages/yajsonrpc/__init__.py", line 345, in _handle_request res = method(**params) File: "/usr/lib/python2.7/site-packages/vdsm/rpc/Bridge.py", line 194, in _dynamicMethod result = fn(*methodArgs) File: "<string>", line 2, in setupNetworks File: "/usr/lib/python2.7/site-packages/vdsm/common/api.py", line 50, in method ret = func(*args, **kwargs) File: "/usr/lib/python2.7/site-packages/vdsm/API.py", line 1562, in setupNetworks supervdsm.getProxy().setupNetworks(networks, bondings, options) File: "/usr/lib/python2.7/site-packages/vdsm/common/supervdsm.py", line 56, in __call__ return callMethod() File: "/usr/lib/python2.7/site-packages/vdsm/common/supervdsm.py", line 54, in <lambda> **kwargs) File: "<string>", line 2, in setupNetworks File: "/usr/lib64/python2.7/multiprocessing/managers.py", line 759, in _callmethod kind, result = conn.recv() (executor:363) 2019-02-25 12:41:38,532+0200 INFO (jsonrpc/0) [jsonrpc.JsonRpcServer] RPC call Host.ping2 succeeded in 0.00 seconds (__init__:312) 2019-02-25 12:41:38,538+0200 INFO (jsonrpc/5) [api.host] START getCapabilities() from=::1,41016 (api:48) 2019-02-25 12:41:38,834+0200 INFO (jsonrpc/7) [api.host] START getAllVmStats() from=::1,41018 (api:48) 2019-02-25 12:41:38,845+0200 WARN (jsonrpc/7) [virt.vm] (vmId='66394fff-6207-4277-ab47-bffd39075eef') monitor became unresponsive (command timeout, age=70.0700000003) (vm:6014) 2019-02-25 12:41:38,850+0200 INFO (jsonrpc/7) [api.host] FINISH getAllVmStats return={'status': {'message': 'Done', 'code': 0}, 'statsList': (suppressed)} from=::1,41018 (api:54) 2019-02-25 12:41:38,854+0200 INFO (jsonrpc/7) [jsonrpc.JsonRpcServer] RPC call Host.getAllVmStats succeeded in 0.02 seconds (__init__:312) 2019-02-25 12:41:38,867+0200 INFO (jsonrpc/2) [api.host] START getAllVmIoTunePolicies() from=::1,41018 (api:48) 2019-02-25 12:41:38,868+0200 INFO (jsonrpc/2) [api.host] FINISH getAllVmIoTunePolicies return={'status': {'message': 'Done', 'code': 0}, 'io_tune_policies_dict': {'5cb297ee-81c0-4fc6-a2ba-87192723e6ab': {'policy': [], 'current_values': [{'ioTune': {'write_bytes_sec': 0L, 'total_iops_sec': 0L, 'read_iops_sec': 0L, 'read_bytes_sec': 0L, 'write_iops_sec': 0L, 'total_bytes_sec': 0L}, 'path':
'/rhev/data-center/mnt/blockSD/5106a440-3a0a-4a79-a8ee-0ffc5fa8e3a2/images/a4b8b705-d999-4578-99eb-c80aa1cbbec6/19e9b0b0-6485-43ed-93e5-c2ef40d2e388', 'name': 'vda'}]}, '16a6d21b-3d69-4967-982b-dfe05534f153': {'policy': [], 'current_values': [{'ioTune': {'write_bytes_sec': 0L, 'total_iops_sec': 0L, 'read_iops_sec': 0L, 'read_bytes_sec': 0L, 'write_iops_sec': 0L, 'total_bytes_sec': 0L}, 'path':
'/rhev/data-center/mnt/blockSD/5106a440-3a0a-4a79-a8ee-0ffc5fa8e3a2/images/67c41187-6510-429e-b88d-93a907bbbf38/b1f627d3-13c2-4800-8f29-4504d35bba5f', 'name': 'sda'}]}, '59103e4b-863c-4c16-8f45-226712a9ab08': {'policy': [], 'current_values': [{'ioTune': {'write_bytes_sec': 0L, 'total_iops_sec': 0L, 'read_iops_sec': 0L, 'read_bytes_sec': 0L, 'write_iops_sec': 0L, 'total_bytes_sec': 0L}, 'path':
u'/rhev/data-center/mnt/nfs.virt.local:_iso/3fd616f2-a1dd-4a31-ad35-02d039bb30b1/images/6a5c7e95-da88-4702-9595-2762e4870196/45ef1052-26ba-4c3b-90d8-9039e085419a', 'name': 'sda'}, {'ioTune': {'write_bytes_sec': 0L, 'total_iops_sec': 0L, 'read_iops_sec': 0L, 'read_bytes_sec': 0L, 'write_iops_sec': 0L, 'total_bytes_sec': 0L}, 'path':
'/rhev/data-center/mnt/blockSD/48619c98-840f-4d3b-948d-47d081d2c55a/images/5163a9d9-8881-40a3-a993-acaa5ae3218d/b1942e61-9d1f-4b2c-868c-1bde862a6107', 'name': 'sdb'}]}, '66394fff-6207-4277-ab47-bffd39075eef': {'policy': [], 'current_values': [{'ioTune': {'write_bytes_sec': 0L, 'total_iops_sec': 0L, 'read_iops_sec': 0L, 'read_bytes_sec': 0L, 'write_iops_sec': 0L, 'total_bytes_sec': 0L}, 'path':
u'/rhev/data-center/mnt/nfs.virt.local:_iso/3fd616f2-a1dd-4a31-ad35-02d039bb30b1/images/2f0acad7-74e1-4775-8fb1-fc47b1cf5ee7/a7b9a10c-34eb-4273-a9dd-daebf1099a39', 'name': 'sda'}]}}} from=::1,41018 (api:54) 2019-02-25 12:41:38,869+0200 INFO (jsonrpc/2) [jsonrpc.JsonRpcServer] RPC call Host.getAllVmIoTunePolicies succeeded in 0.01 seconds (__init__:312) 2019-02-25 12:41:40,451+0200 INFO (monitor/5106a44) [IOProcessClient] (3fd616f2-a1dd-4a31-ad35-02d039bb30b1) Closing client (__init__:584) 2019-02-25 12:41:40,452+0200 INFO (monitor/5106a44) [IOProcessClient] (75efef0a-ccd4-4799-a6ed-8a4072edb698) Closing client (__init__:584) 2019-02-25 12:41:40,534+0200 WARN (check/loop) [storage.check] Checker u'/rhev/data-center/mnt/nfs.virt.local:_ovirtexport/75efef0a-ccd4-4799-a6ed-8a4072edb698/dom_md/metadata' is blocked for 60.00 seconds (check:282) 2019-02-25 12:41:40,545+0200 WARN (check/loop) [storage.check] Checker u'/rhev/data-center/mnt/nfs.virt.local:_iso/3fd616f2-a1dd-4a31-ad35-02d039bb30b1/dom_md/metadata' is blocked for 59.99 seconds (check:282) 2019-02-25 12:41:40,885+0200 INFO (jsonrpc/5) [root] managedvolume not supported: Managed Volume Not Supported. Missing package os-brick.: ('Cannot import os_brick.initiator',) (caps:147) 2019-02-25 12:41:41,582+0200 INFO (jsonrpc/5) [root] /usr/libexec/vdsm/hooks/after_get_caps/50_openstacknet: rc=0 err= (hooks:114) 2019-02-25 12:41:41,754+0200 INFO (jsonrpc/3) [jsonrpc.JsonRpcServer] RPC call Host.ping2 succeeded in 0.00 seconds (__init__:312) 2019-02-25 12:41:41,854+0200 INFO (jsonrpc/5) [root] /usr/libexec/vdsm/hooks/after_get_caps/openstacknet_utils.py: rc=0 err= (hooks:114) 2019-02-25 12:41:41,891+0200 INFO (jsonrpc/6) [jsonrpc.JsonRpcServer] RPC call Host.ping2 succeeded in 0.00 seconds (__init__:312) 2019-02-25 12:41:41,899+0200 INFO (jsonrpc/4) [vdsm.api] START repoStats(domains=[u'5106a440-3a0a-4a79-a8ee-0ffc5fa8e3a2']) from=::1,41016, task_id=6273a6a4-efbf-4780-9676-26b99502d712 (api:48) 2019-02-25 12:41:41,899+0200 INFO (jsonrpc/4) [vdsm.api] FINISH repoStats return={u'5106a440-3a0a-4a79-a8ee-0ffc5fa8e3a2': {'code': 0, 'actual': True, 'version': 4, 'acquired': True, 'delay': '0.000473384', 'lastCheck': '1.4', 'valid': True}} from=::1,41016, task_id=6273a6a4-efbf-4780-9676-26b99502d712 (api:54) 2019-02-25 12:41:41,900+0200 INFO (jsonrpc/4) [jsonrpc.JsonRpcServer] RPC call Host.getStorageRepoStats succeeded in 0.01 seconds (__init__:312) 2019-02-25 12:41:41,904+0200 INFO (jsonrpc/0) [jsonrpc.JsonRpcServer] RPC call Host.ping2 succeeded in 0.00 seconds (__init__:312) 2019-02-25 12:41:41,915+0200 INFO (jsonrpc/7) [vdsm.api] START repoStats(domains=[u'5106a440-3a0a-4a79-a8ee-0ffc5fa8e3a2']) from=::1,41016, task_id=171a65d3-8249-463d-bfe2-646bf1be9472 (api:48) 2019-02-25 12:41:41,916+0200 INFO (jsonrpc/7) [vdsm.api] FINISH repoStats return={u'5106a440-3a0a-4a79-a8ee-0ffc5fa8e3a2': {'code': 0, 'actual': True, 'version': 4, 'acquired': True, 'delay': '0.000473384', 'lastCheck': '1.4', 'valid': True}} from=::1,41016, task_id=171a65d3-8249-463d-bfe2-646bf1be9472 (api:54) 2019-02-25 12:41:41,916+0200 INFO (jsonrpc/7) [jsonrpc.JsonRpcServer] RPC call Host.getStorageRepoStats succeeded in 0.00 seconds (__init__:312) 2019-02-25 12:41:42,049+0200 WARN (qgapoller/0) [virt.periodic.VmDispatcher] could not run <function <lambda> at 0x7fc9e8092aa0> on ['59103e4b-863c-4c16-8f45-226712a9ab08', '66394fff-6207-4277-ab47-bffd39075eef'] (periodic:289) 2019-02-25 12:41:42,200+0200 INFO (jsonrpc/5) [root] /usr/libexec/vdsm/hooks/after_get_caps/ovirt_provider_ovn_hook: rc=0 err= (hooks:114)
This is from engine.log with the relevant call:
2019-02-25 12:40:38,461+02 INFO [org.ovirt.engine.core.vdsbroker.vdsbroker.HostSetupNetworksVDSCommand] (default task-17) [1f31b2f5] START, HostSetupNetworksVDSCommand(HostName = ovirt01.virt.local,
HostSetupNetworksVdsCommandParameters:{hostId='7722c972-c25a-411e-81ba-af6b0801e9ba', vds='Host[ovirt01.virt.local,7722c972-c25a-411e-81ba-af6b0801e9ba]', rollbackOnFailure='true', commitOnSuccess='true', connectivityTimeout='120', networks='[HostNetwork:{defaultRoute='true', bonding='false', networkName='ovirtmgmt', vdsmName='ovirtmgmt', nicName='eno1', vlan='null', vmNetwork='true', stp='false', properties='[]', ipv4BootProtocol='DHCP', ipv4Address='null', ipv4Netmask='null', ipv4Gateway='null', ipv6BootProtocol='NONE', ipv6Address='null', ipv6Prefix='null', ipv6Gateway='null', nameServers='null'}]', removedNetworks='[]', bonds='[]', removedBonds='[]', clusterSwitchType='LEGACY', managementNetworkChanged='true'}), log id: 7be20d99 2019-02-25 12:40:38,463+02 INFO [org.ovirt.engine.core.vdsbroker.vdsbroker.HostSetupNetworksVDSCommand] (default task-17) [1f31b2f5] FINISH, HostSetupNetworksVDSCommand, return: , log id: 7be20d99 2019-02-25 12:40:51,529+02 INFO [org.ovirt.vdsm.jsonrpc.client.reactors.ReactorClient] (SSL Stomp Reactor) [] No interaction with host 'ovirt01.virt.local' for 20000 ms. 2019-02-25 12:40:54,081+02 ERROR [org.ovirt.vdsm.jsonrpc.client.reactors.ReactorClient] (SSL Stomp Reactor) [] Connection timeout for host 'ovirt01.virt.local', last response arrived 23360 ms ago. 2019-02-25 12:40:54,100+02 WARN [org.ovirt.engine.core.vdsbroker.VdsManager] (EE-ManagedThreadFactory-engine-Thread-6299) [] Host 'ovirt01.virt.local' is not responding. It will stay in Connecting state for a grace period of 82 seconds and after that an attempt to fence the host will be issued.
You can see DHCP from there too and results of it getting things wrong. Maybe there should option when you try to resync that engine shows what it wants to change, so you can approve the change. Noticed also that clocks are not in sync. Put the ntpd on now. There is not ntpd installed in HE?
I have engine backups from Feb 7th and this is the line in vds_interface table in every backup until failure (10.168.8.111 is the correct address). So if this is the table engine uses for node management, the question is how it came to conclusion that is should use DHCP?
aef5b4fd-5760-44dc-950f-58363de7afcf eno1 ovirtmgmt 7722c972-c25a-411e-81ba-af6b0801e9ba 54:80:28:4a:2b:18 f \N \N \N \N 1000 10.168.8.111 255.255.255.0 10.168.8.1 2 2 2018-12-05 12:51:58.241569+02 2019-02-06 11:54:57.027254+02 1500 t \N f \N 2 \N \N :: \N legacy \N \N t
This led the node being rebooted by HP ILO. After reboot the node came up with DHCP address.
I have no idea why this happened, maybe there is a trace in /var/log/messages ?
Reboot was because of fencing. See the log above.
VDSM would revert a change in management networking configuration, if after the change communication to engine is broken. But if the host is rebooted during the change, there might be a problem. If you think you triggered Bug 1664106 - [Network] Host reboot while setup networks leads to host stuck on status 'connecting' you logfiles would be very helpful, if they are attached to the bug.
I don't think that this is connected to that because change succeeded.
Thanks, -Juhani _______________________________________________ Users mailing list -- users@ovirt.org To unsubscribe send an email to users-leave@ovirt.org Privacy Statement: https://www.ovirt.org/site/privacy-policy/ oVirt Code of Conduct: https://www.ovirt.org/community/about/community-guidelines/ List Archives: https://lists.ovirt.org/archives/list/users@ovirt.org/message/LV3DX4WH5PSE7Z...

On Thu, Feb 28, 2019 at 3:01 PM Edward Haas <edwardh@redhat.com> wrote:
On boot, VDSM attempts to apply its persisted network desired configuration through ifcfg. Even though you may have fixed the ifcfg files, the persisted VDSM config may had been set with DHCP, therefore your changes would get overwritten on boot or when the synchronization was issued from Engine.
This is what I thought when I saw it coming up with DHCP.
When you get an unsync warning, fixes on the host should match with the persisted config known to VDSM.
How can you see the persistent config? At least the webadmin is maybe too user friendly and just offers the resync. It doesn't tell what is going to do when it does resync. I would have not started the operation if I had known that it was going to switch to DHCP.
Usually the persisted state in VDSM and the one in Engine are the same, unless something very strange has happened... like moving hosts between Engine servers or self editing on one side without it reaching the other side.
I had to move hosts to different cluster (same HE). That was recommended here because of the EPYC migration problems when upgrading to 4.3. Discussions are in the list archives and in bug 1672859 for that one. Maybe this was side effect of that. Thanks, -Juhani
On Thu, Feb 28, 2019 at 9:34 AM Juhani Rautiainen <juhani.rautiainen@gmail.com> wrote:
On Wed, Feb 27, 2019 at 6:26 PM Dominik Holler <dholler@redhat.com> wrote:
This is a valid question.
I noticed that one node had ovirtmgmt network unsynchronized. I tried
oVirt detected a difference between the expected configuration and applied configuration. This might happen if the interface configuration is change directly on the host instead of using oVirt Engine.
to resynchronize it.
If you have the vdsm.log, the relevant lines start at the pattern Calling 'Host.setupNetworks' and ends at the pattern FINISH getCapabilities
This gave some clues. See the log below. IMHO it points engine getting something wrong because it seems to ask for DHCP setup in query. Probably fails because it succeeds in address change and network connection is torn down.
2019-02-25 12:41:38,478+0200 WARN (vdsm.Scheduler) [Executor] Worker blocked: <Worker name=jsonrpc/1 running <Task <JsonRpcTask {'params': {u'bondings': {}, u'networks': {u'ovirtmgmt': {u'ipv6autoconf': False, u'nic': u'eno1', u'mtu': 1500, u'switch': u'legacy', u'dhcpv6': False, u'STP': u'no', u'bridged': u'true', u'defaultRoute': True, u'bootproto': u'dhcp'}}, u'options': {u'connectivityCheck': u'true', u'connectivityTimeout': 120, u'commitOnSuccess': True}}, 'jsonrpc': '2.0', 'method': u'Host.setupNetworks', 'id': u'2ca75cf3-6410-43b4-aebf-cdc3f262e5c2'} at 0x7fc9c95ef350> timeout=60, duration=60.01 at 0x7fc9c961e090> task#=230106 at 0x7fc9a43a1fd0>, traceback: File: "/usr/lib64/python2.7/threading.py", line 785, in __bootstrap self.__bootstrap_inner() File: "/usr/lib64/python2.7/threading.py", line 812, in __bootstrap_inner self.run() File: "/usr/lib64/python2.7/threading.py", line 765, in run self.__target(*self.__args, **self.__kwargs) File: "/usr/lib/python2.7/site-packages/vdsm/common/concurrent.py", line 195, in run ret = func(*args, **kwargs) File: "/usr/lib/python2.7/site-packages/vdsm/executor.py", line 301, in _run self._execute_task() File: "/usr/lib/python2.7/site-packages/vdsm/executor.py", line 315, in _execute_task task() File: "/usr/lib/python2.7/site-packages/vdsm/executor.py", line 391, in __call__ self._callable() File: "/usr/lib/python2.7/site-packages/yajsonrpc/__init__.py", line 262, in __call__ self._handler(self._ctx, self._req) File: "/usr/lib/python2.7/site-packages/yajsonrpc/__init__.py", line 305, in _serveRequest response = self._handle_request(req, ctx) File: "/usr/lib/python2.7/site-packages/yajsonrpc/__init__.py", line 345, in _handle_request res = method(**params) File: "/usr/lib/python2.7/site-packages/vdsm/rpc/Bridge.py", line 194, in _dynamicMethod result = fn(*methodArgs) File: "<string>", line 2, in setupNetworks File: "/usr/lib/python2.7/site-packages/vdsm/common/api.py", line 50, in method ret = func(*args, **kwargs) File: "/usr/lib/python2.7/site-packages/vdsm/API.py", line 1562, in setupNetworks supervdsm.getProxy().setupNetworks(networks, bondings, options) File: "/usr/lib/python2.7/site-packages/vdsm/common/supervdsm.py", line 56, in __call__ return callMethod() File: "/usr/lib/python2.7/site-packages/vdsm/common/supervdsm.py", line 54, in <lambda> **kwargs) File: "<string>", line 2, in setupNetworks File: "/usr/lib64/python2.7/multiprocessing/managers.py", line 759, in _callmethod kind, result = conn.recv() (executor:363) 2019-02-25 12:41:38,532+0200 INFO (jsonrpc/0) [jsonrpc.JsonRpcServer] RPC call Host.ping2 succeeded in 0.00 seconds (__init__:312) 2019-02-25 12:41:38,538+0200 INFO (jsonrpc/5) [api.host] START getCapabilities() from=::1,41016 (api:48) 2019-02-25 12:41:38,834+0200 INFO (jsonrpc/7) [api.host] START getAllVmStats() from=::1,41018 (api:48) 2019-02-25 12:41:38,845+0200 WARN (jsonrpc/7) [virt.vm] (vmId='66394fff-6207-4277-ab47-bffd39075eef') monitor became unresponsive (command timeout, age=70.0700000003) (vm:6014) 2019-02-25 12:41:38,850+0200 INFO (jsonrpc/7) [api.host] FINISH getAllVmStats return={'status': {'message': 'Done', 'code': 0}, 'statsList': (suppressed)} from=::1,41018 (api:54) 2019-02-25 12:41:38,854+0200 INFO (jsonrpc/7) [jsonrpc.JsonRpcServer] RPC call Host.getAllVmStats succeeded in 0.02 seconds (__init__:312) 2019-02-25 12:41:38,867+0200 INFO (jsonrpc/2) [api.host] START getAllVmIoTunePolicies() from=::1,41018 (api:48) 2019-02-25 12:41:38,868+0200 INFO (jsonrpc/2) [api.host] FINISH getAllVmIoTunePolicies return={'status': {'message': 'Done', 'code': 0}, 'io_tune_policies_dict': {'5cb297ee-81c0-4fc6-a2ba-87192723e6ab': {'policy': [], 'current_values': [{'ioTune': {'write_bytes_sec': 0L, 'total_iops_sec': 0L, 'read_iops_sec': 0L, 'read_bytes_sec': 0L, 'write_iops_sec': 0L, 'total_bytes_sec': 0L}, 'path': '/rhev/data-center/mnt/blockSD/5106a440-3a0a-4a79-a8ee-0ffc5fa8e3a2/images/a4b8b705-d999-4578-99eb-c80aa1cbbec6/19e9b0b0-6485-43ed-93e5-c2ef40d2e388', 'name': 'vda'}]}, '16a6d21b-3d69-4967-982b-dfe05534f153': {'policy': [], 'current_values': [{'ioTune': {'write_bytes_sec': 0L, 'total_iops_sec': 0L, 'read_iops_sec': 0L, 'read_bytes_sec': 0L, 'write_iops_sec': 0L, 'total_bytes_sec': 0L}, 'path': '/rhev/data-center/mnt/blockSD/5106a440-3a0a-4a79-a8ee-0ffc5fa8e3a2/images/67c41187-6510-429e-b88d-93a907bbbf38/b1f627d3-13c2-4800-8f29-4504d35bba5f', 'name': 'sda'}]}, '59103e4b-863c-4c16-8f45-226712a9ab08': {'policy': [], 'current_values': [{'ioTune': {'write_bytes_sec': 0L, 'total_iops_sec': 0L, 'read_iops_sec': 0L, 'read_bytes_sec': 0L, 'write_iops_sec': 0L, 'total_bytes_sec': 0L}, 'path': u'/rhev/data-center/mnt/nfs.virt.local:_iso/3fd616f2-a1dd-4a31-ad35-02d039bb30b1/images/6a5c7e95-da88-4702-9595-2762e4870196/45ef1052-26ba-4c3b-90d8-9039e085419a', 'name': 'sda'}, {'ioTune': {'write_bytes_sec': 0L, 'total_iops_sec': 0L, 'read_iops_sec': 0L, 'read_bytes_sec': 0L, 'write_iops_sec': 0L, 'total_bytes_sec': 0L}, 'path': '/rhev/data-center/mnt/blockSD/48619c98-840f-4d3b-948d-47d081d2c55a/images/5163a9d9-8881-40a3-a993-acaa5ae3218d/b1942e61-9d1f-4b2c-868c-1bde862a6107', 'name': 'sdb'}]}, '66394fff-6207-4277-ab47-bffd39075eef': {'policy': [], 'current_values': [{'ioTune': {'write_bytes_sec': 0L, 'total_iops_sec': 0L, 'read_iops_sec': 0L, 'read_bytes_sec': 0L, 'write_iops_sec': 0L, 'total_bytes_sec': 0L}, 'path': u'/rhev/data-center/mnt/nfs.virt.local:_iso/3fd616f2-a1dd-4a31-ad35-02d039bb30b1/images/2f0acad7-74e1-4775-8fb1-fc47b1cf5ee7/a7b9a10c-34eb-4273-a9dd-daebf1099a39', 'name': 'sda'}]}}} from=::1,41018 (api:54) 2019-02-25 12:41:38,869+0200 INFO (jsonrpc/2) [jsonrpc.JsonRpcServer] RPC call Host.getAllVmIoTunePolicies succeeded in 0.01 seconds (__init__:312) 2019-02-25 12:41:40,451+0200 INFO (monitor/5106a44) [IOProcessClient] (3fd616f2-a1dd-4a31-ad35-02d039bb30b1) Closing client (__init__:584) 2019-02-25 12:41:40,452+0200 INFO (monitor/5106a44) [IOProcessClient] (75efef0a-ccd4-4799-a6ed-8a4072edb698) Closing client (__init__:584) 2019-02-25 12:41:40,534+0200 WARN (check/loop) [storage.check] Checker u'/rhev/data-center/mnt/nfs.virt.local:_ovirtexport/75efef0a-ccd4-4799-a6ed-8a4072edb698/dom_md/metadata' is blocked for 60.00 seconds (check:282) 2019-02-25 12:41:40,545+0200 WARN (check/loop) [storage.check] Checker u'/rhev/data-center/mnt/nfs.virt.local:_iso/3fd616f2-a1dd-4a31-ad35-02d039bb30b1/dom_md/metadata' is blocked for 59.99 seconds (check:282) 2019-02-25 12:41:40,885+0200 INFO (jsonrpc/5) [root] managedvolume not supported: Managed Volume Not Supported. Missing package os-brick.: ('Cannot import os_brick.initiator',) (caps:147) 2019-02-25 12:41:41,582+0200 INFO (jsonrpc/5) [root] /usr/libexec/vdsm/hooks/after_get_caps/50_openstacknet: rc=0 err= (hooks:114) 2019-02-25 12:41:41,754+0200 INFO (jsonrpc/3) [jsonrpc.JsonRpcServer] RPC call Host.ping2 succeeded in 0.00 seconds (__init__:312) 2019-02-25 12:41:41,854+0200 INFO (jsonrpc/5) [root] /usr/libexec/vdsm/hooks/after_get_caps/openstacknet_utils.py: rc=0 err= (hooks:114) 2019-02-25 12:41:41,891+0200 INFO (jsonrpc/6) [jsonrpc.JsonRpcServer] RPC call Host.ping2 succeeded in 0.00 seconds (__init__:312) 2019-02-25 12:41:41,899+0200 INFO (jsonrpc/4) [vdsm.api] START repoStats(domains=[u'5106a440-3a0a-4a79-a8ee-0ffc5fa8e3a2']) from=::1,41016, task_id=6273a6a4-efbf-4780-9676-26b99502d712 (api:48) 2019-02-25 12:41:41,899+0200 INFO (jsonrpc/4) [vdsm.api] FINISH repoStats return={u'5106a440-3a0a-4a79-a8ee-0ffc5fa8e3a2': {'code': 0, 'actual': True, 'version': 4, 'acquired': True, 'delay': '0.000473384', 'lastCheck': '1.4', 'valid': True}} from=::1,41016, task_id=6273a6a4-efbf-4780-9676-26b99502d712 (api:54) 2019-02-25 12:41:41,900+0200 INFO (jsonrpc/4) [jsonrpc.JsonRpcServer] RPC call Host.getStorageRepoStats succeeded in 0.01 seconds (__init__:312) 2019-02-25 12:41:41,904+0200 INFO (jsonrpc/0) [jsonrpc.JsonRpcServer] RPC call Host.ping2 succeeded in 0.00 seconds (__init__:312) 2019-02-25 12:41:41,915+0200 INFO (jsonrpc/7) [vdsm.api] START repoStats(domains=[u'5106a440-3a0a-4a79-a8ee-0ffc5fa8e3a2']) from=::1,41016, task_id=171a65d3-8249-463d-bfe2-646bf1be9472 (api:48) 2019-02-25 12:41:41,916+0200 INFO (jsonrpc/7) [vdsm.api] FINISH repoStats return={u'5106a440-3a0a-4a79-a8ee-0ffc5fa8e3a2': {'code': 0, 'actual': True, 'version': 4, 'acquired': True, 'delay': '0.000473384', 'lastCheck': '1.4', 'valid': True}} from=::1,41016, task_id=171a65d3-8249-463d-bfe2-646bf1be9472 (api:54) 2019-02-25 12:41:41,916+0200 INFO (jsonrpc/7) [jsonrpc.JsonRpcServer] RPC call Host.getStorageRepoStats succeeded in 0.00 seconds (__init__:312) 2019-02-25 12:41:42,049+0200 WARN (qgapoller/0) [virt.periodic.VmDispatcher] could not run <function <lambda> at 0x7fc9e8092aa0> on ['59103e4b-863c-4c16-8f45-226712a9ab08', '66394fff-6207-4277-ab47-bffd39075eef'] (periodic:289) 2019-02-25 12:41:42,200+0200 INFO (jsonrpc/5) [root] /usr/libexec/vdsm/hooks/after_get_caps/ovirt_provider_ovn_hook: rc=0 err= (hooks:114)
This is from engine.log with the relevant call:
2019-02-25 12:40:38,461+02 INFO [org.ovirt.engine.core.vdsbroker.vdsbroker.HostSetupNetworksVDSCommand] (default task-17) [1f31b2f5] START, HostSetupNetworksVDSCommand(HostName = ovirt01.virt.local, HostSetupNetworksVdsCommandParameters:{hostId='7722c972-c25a-411e-81ba-af6b0801e9ba', vds='Host[ovirt01.virt.local,7722c972-c25a-411e-81ba-af6b0801e9ba]', rollbackOnFailure='true', commitOnSuccess='true', connectivityTimeout='120', networks='[HostNetwork:{defaultRoute='true', bonding='false', networkName='ovirtmgmt', vdsmName='ovirtmgmt', nicName='eno1', vlan='null', vmNetwork='true', stp='false', properties='[]', ipv4BootProtocol='DHCP', ipv4Address='null', ipv4Netmask='null', ipv4Gateway='null', ipv6BootProtocol='NONE', ipv6Address='null', ipv6Prefix='null', ipv6Gateway='null', nameServers='null'}]', removedNetworks='[]', bonds='[]', removedBonds='[]', clusterSwitchType='LEGACY', managementNetworkChanged='true'}), log id: 7be20d99 2019-02-25 12:40:38,463+02 INFO [org.ovirt.engine.core.vdsbroker.vdsbroker.HostSetupNetworksVDSCommand] (default task-17) [1f31b2f5] FINISH, HostSetupNetworksVDSCommand, return: , log id: 7be20d99 2019-02-25 12:40:51,529+02 INFO [org.ovirt.vdsm.jsonrpc.client.reactors.ReactorClient] (SSL Stomp Reactor) [] No interaction with host 'ovirt01.virt.local' for 20000 ms. 2019-02-25 12:40:54,081+02 ERROR [org.ovirt.vdsm.jsonrpc.client.reactors.ReactorClient] (SSL Stomp Reactor) [] Connection timeout for host 'ovirt01.virt.local', last response arrived 23360 ms ago. 2019-02-25 12:40:54,100+02 WARN [org.ovirt.engine.core.vdsbroker.VdsManager] (EE-ManagedThreadFactory-engine-Thread-6299) [] Host 'ovirt01.virt.local' is not responding. It will stay in Connecting state for a grace period of 82 seconds and after that an attempt to fence the host will be issued.
You can see DHCP from there too and results of it getting things wrong. Maybe there should option when you try to resync that engine shows what it wants to change, so you can approve the change. Noticed also that clocks are not in sync. Put the ntpd on now. There is not ntpd installed in HE?
I have engine backups from Feb 7th and this is the line in vds_interface table in every backup until failure (10.168.8.111 is the correct address). So if this is the table engine uses for node management, the question is how it came to conclusion that is should use DHCP?
aef5b4fd-5760-44dc-950f-58363de7afcf eno1 ovirtmgmt 7722c972-c25a-411e-81ba-af6b0801e9ba 54:80:28:4a:2b:18 f \N \N \N \N 1000 10.168.8.111 255.255.255.0 10.168.8.1 2 2 2018-12-05 12:51:58.241569+02 2019-02-06 11:54:57.027254+02 1500 t \N f \N 2 \N \N :: \N legacy \N \N t
This led the node being rebooted by HP ILO. After reboot the node came up with DHCP address.
I have no idea why this happened, maybe there is a trace in /var/log/messages ?
Reboot was because of fencing. See the log above.
VDSM would revert a change in management networking configuration, if after the change communication to engine is broken. But if the host is rebooted during the change, there might be a problem. If you think you triggered Bug 1664106 - [Network] Host reboot while setup networks leads to host stuck on status 'connecting' you logfiles would be very helpful, if they are attached to the bug.
I don't think that this is connected to that because change succeeded.
Thanks, -Juhani _______________________________________________ Users mailing list -- users@ovirt.org To unsubscribe send an email to users-leave@ovirt.org Privacy Statement: https://www.ovirt.org/site/privacy-policy/ oVirt Code of Conduct: https://www.ovirt.org/community/about/community-guidelines/ List Archives: https://lists.ovirt.org/archives/list/users@ovirt.org/message/LV3DX4WH5PSE7Z...

On Thu, Feb 28, 2019 at 6:27 PM Juhani Rautiainen < juhani.rautiainen@gmail.com> wrote:
On Thu, Feb 28, 2019 at 3:01 PM Edward Haas <edwardh@redhat.com> wrote:
On boot, VDSM attempts to apply its persisted network desired
configuration through ifcfg.
Even though you may have fixed the ifcfg files, the persisted VDSM config may had been set with DHCP, therefore your changes would get overwritten on boot or when the synchronization was issued from Engine.
This is what I thought when I saw it coming up with DHCP.
When you get an unsync warning, fixes on the host should match with the persisted config known to VDSM.
How can you see the persistent config? At least the webadmin is maybe too user friendly and just offers the resync. It doesn't tell what is going to do when it does resync. I would have not started the operation if I had known that it was going to switch to DHCP.
Engine will pass a command to VDSM per the configuration you see on the network attachement window. If there was something else on the host, it will be changed based on this command. Even if the parameteres are grayed, it should still show you the current desired state. You could identify that something is not correct, uncheck the `resync` checkbox, edit the values and then apply.
Usually the persisted state in VDSM and the one in Engine are the same, unless something very strange has happened... like moving hosts between Engine servers or self editing on one side without it reaching the other side.
I had to move hosts to different cluster (same HE). That was recommended here because of the EPYC migration problems when upgrading to 4.3. Discussions are in the list archives and in bug 1672859 for that one. Maybe this was side effect of that.
Thanks, -Juhani
On Thu, Feb 28, 2019 at 9:34 AM Juhani Rautiainen <
On Wed, Feb 27, 2019 at 6:26 PM Dominik Holler <dholler@redhat.com>
wrote:
This is a valid question.
> I noticed that one node had ovirtmgmt network unsynchronized. I
juhani.rautiainen@gmail.com> wrote: tried
oVirt detected a difference between the expected configuration and
applied
configuration. This might happen if the interface configuration is change directly on the host instead of using oVirt Engine.
> to resynchronize it.
If you have the vdsm.log, the relevant lines start at the pattern Calling 'Host.setupNetworks' and ends at the pattern FINISH getCapabilities
This gave some clues. See the log below. IMHO it points engine getting something wrong because it seems to ask for DHCP setup in query. Probably fails because it succeeds in address change and network connection is torn down.
2019-02-25 12:41:38,478+0200 WARN (vdsm.Scheduler) [Executor] Worker blocked: <Worker name=jsonrpc/1 running <Task <JsonRpcTask {'params': {u'bondings': {}, u'networks': {u'ovirtmgmt': {u'ipv6autoconf': False, u'nic': u'eno1', u'mtu': 1500, u'switch': u'legacy', u'dhcpv6': False, u'STP': u'no', u'bridged': u'true', u'defaultRoute': True, u'bootproto': u'dhcp'}}, u'options': {u'connectivityCheck': u'true', u'connectivityTimeout': 120, u'commitOnSuccess': True}}, 'jsonrpc': '2.0', 'method': u'Host.setupNetworks', 'id': u'2ca75cf3-6410-43b4-aebf-cdc3f262e5c2'} at 0x7fc9c95ef350> timeout=60, duration=60.01 at 0x7fc9c961e090> task#=230106 at 0x7fc9a43a1fd0>, traceback: File: "/usr/lib64/python2.7/threading.py", line 785, in __bootstrap self.__bootstrap_inner() File: "/usr/lib64/python2.7/threading.py", line 812, in __bootstrap_inner self.run() File: "/usr/lib64/python2.7/threading.py", line 765, in run self.__target(*self.__args, **self.__kwargs) File: "/usr/lib/python2.7/site-packages/vdsm/common/concurrent.py", line 195, in run ret = func(*args, **kwargs) File: "/usr/lib/python2.7/site-packages/vdsm/executor.py", line 301, in _run self._execute_task() File: "/usr/lib/python2.7/site-packages/vdsm/executor.py", line 315, in _execute_task task() File: "/usr/lib/python2.7/site-packages/vdsm/executor.py", line 391, in __call__ self._callable() File: "/usr/lib/python2.7/site-packages/yajsonrpc/__init__.py", line 262, in __call__ self._handler(self._ctx, self._req) File: "/usr/lib/python2.7/site-packages/yajsonrpc/__init__.py", line 305, in _serveRequest response = self._handle_request(req, ctx) File: "/usr/lib/python2.7/site-packages/yajsonrpc/__init__.py", line 345, in _handle_request res = method(**params) File: "/usr/lib/python2.7/site-packages/vdsm/rpc/Bridge.py", line 194, in _dynamicMethod result = fn(*methodArgs) File: "<string>", line 2, in setupNetworks File: "/usr/lib/python2.7/site-packages/vdsm/common/api.py", line 50, in method ret = func(*args, **kwargs) File: "/usr/lib/python2.7/site-packages/vdsm/API.py", line 1562, in setupNetworks supervdsm.getProxy().setupNetworks(networks, bondings, options) File: "/usr/lib/python2.7/site-packages/vdsm/common/supervdsm.py", line 56, in __call__ return callMethod() File: "/usr/lib/python2.7/site-packages/vdsm/common/supervdsm.py", line 54, in <lambda> **kwargs) File: "<string>", line 2, in setupNetworks File: "/usr/lib64/python2.7/multiprocessing/managers.py", line 759, in _callmethod kind, result = conn.recv() (executor:363) 2019-02-25 12:41:38,532+0200 INFO (jsonrpc/0) [jsonrpc.JsonRpcServer] RPC call Host.ping2 succeeded in 0.00 seconds (__init__:312) 2019-02-25 12:41:38,538+0200 INFO (jsonrpc/5) [api.host] START getCapabilities() from=::1,41016 (api:48) 2019-02-25 12:41:38,834+0200 INFO (jsonrpc/7) [api.host] START getAllVmStats() from=::1,41018 (api:48) 2019-02-25 12:41:38,845+0200 WARN (jsonrpc/7) [virt.vm] (vmId='66394fff-6207-4277-ab47-bffd39075eef') monitor became unresponsive (command timeout, age=70.0700000003) (vm:6014) 2019-02-25 12:41:38,850+0200 INFO (jsonrpc/7) [api.host] FINISH getAllVmStats return={'status': {'message': 'Done', 'code': 0}, 'statsList': (suppressed)} from=::1,41018 (api:54) 2019-02-25 12:41:38,854+0200 INFO (jsonrpc/7) [jsonrpc.JsonRpcServer] RPC call Host.getAllVmStats succeeded in 0.02 seconds (__init__:312) 2019-02-25 12:41:38,867+0200 INFO (jsonrpc/2) [api.host] START getAllVmIoTunePolicies() from=::1,41018 (api:48) 2019-02-25 12:41:38,868+0200 INFO (jsonrpc/2) [api.host] FINISH getAllVmIoTunePolicies return={'status': {'message': 'Done', 'code': 0}, 'io_tune_policies_dict': {'5cb297ee-81c0-4fc6-a2ba-87192723e6ab': {'policy': [], 'current_values': [{'ioTune': {'write_bytes_sec': 0L, 'total_iops_sec': 0L, 'read_iops_sec': 0L, 'read_bytes_sec': 0L, 'write_iops_sec': 0L, 'total_bytes_sec': 0L}, 'path':
'/rhev/data-center/mnt/blockSD/5106a440-3a0a-4a79-a8ee-0ffc5fa8e3a2/images/a4b8b705-d999-4578-99eb-c80aa1cbbec6/19e9b0b0-6485-43ed-93e5-c2ef40d2e388',
'name': 'vda'}]}, '16a6d21b-3d69-4967-982b-dfe05534f153': {'policy': [], 'current_values': [{'ioTune': {'write_bytes_sec': 0L, 'total_iops_sec': 0L, 'read_iops_sec': 0L, 'read_bytes_sec': 0L, 'write_iops_sec': 0L, 'total_bytes_sec': 0L}, 'path':
'/rhev/data-center/mnt/blockSD/5106a440-3a0a-4a79-a8ee-0ffc5fa8e3a2/images/67c41187-6510-429e-b88d-93a907bbbf38/b1f627d3-13c2-4800-8f29-4504d35bba5f',
'name': 'sda'}]}, '59103e4b-863c-4c16-8f45-226712a9ab08': {'policy': [], 'current_values': [{'ioTune': {'write_bytes_sec': 0L, 'total_iops_sec': 0L, 'read_iops_sec': 0L, 'read_bytes_sec': 0L, 'write_iops_sec': 0L, 'total_bytes_sec': 0L}, 'path':
u'/rhev/data-center/mnt/nfs.virt.local:_iso/3fd616f2-a1dd-4a31-ad35-02d039bb30b1/images/6a5c7e95-da88-4702-9595-2762e4870196/45ef1052-26ba-4c3b-90d8-9039e085419a',
'name': 'sda'}, {'ioTune': {'write_bytes_sec': 0L, 'total_iops_sec': 0L, 'read_iops_sec': 0L, 'read_bytes_sec': 0L, 'write_iops_sec': 0L, 'total_bytes_sec': 0L}, 'path':
'/rhev/data-center/mnt/blockSD/48619c98-840f-4d3b-948d-47d081d2c55a/images/5163a9d9-8881-40a3-a993-acaa5ae3218d/b1942e61-9d1f-4b2c-868c-1bde862a6107',
'name': 'sdb'}]}, '66394fff-6207-4277-ab47-bffd39075eef': {'policy': [], 'current_values': [{'ioTune': {'write_bytes_sec': 0L, 'total_iops_sec': 0L, 'read_iops_sec': 0L, 'read_bytes_sec': 0L, 'write_iops_sec': 0L, 'total_bytes_sec': 0L}, 'path':
u'/rhev/data-center/mnt/nfs.virt.local:_iso/3fd616f2-a1dd-4a31-ad35-02d039bb30b1/images/2f0acad7-74e1-4775-8fb1-fc47b1cf5ee7/a7b9a10c-34eb-4273-a9dd-daebf1099a39',
'name': 'sda'}]}}} from=::1,41018 (api:54) 2019-02-25 12:41:38,869+0200 INFO (jsonrpc/2) [jsonrpc.JsonRpcServer] RPC call Host.getAllVmIoTunePolicies succeeded in 0.01 seconds (__init__:312) 2019-02-25 12:41:40,451+0200 INFO (monitor/5106a44) [IOProcessClient] (3fd616f2-a1dd-4a31-ad35-02d039bb30b1) Closing client (__init__:584) 2019-02-25 12:41:40,452+0200 INFO (monitor/5106a44) [IOProcessClient] (75efef0a-ccd4-4799-a6ed-8a4072edb698) Closing client (__init__:584) 2019-02-25 12:41:40,534+0200 WARN (check/loop) [storage.check] Checker u'/rhev/data-center/mnt/nfs.virt.local:_ovirtexport/75efef0a-ccd4-4799-a6ed-8a4072edb698/dom_md/metadata' is blocked for 60.00 seconds (check:282) 2019-02-25 12:41:40,545+0200 WARN (check/loop) [storage.check] Checker u'/rhev/data-center/mnt/nfs.virt.local:_iso/3fd616f2-a1dd-4a31-ad35-02d039bb30b1/dom_md/metadata' is blocked for 59.99 seconds (check:282) 2019-02-25 12:41:40,885+0200 INFO (jsonrpc/5) [root] managedvolume not supported: Managed Volume Not Supported. Missing package os-brick.: ('Cannot import os_brick.initiator',) (caps:147) 2019-02-25 12:41:41,582+0200 INFO (jsonrpc/5) [root] /usr/libexec/vdsm/hooks/after_get_caps/50_openstacknet: rc=0 err= (hooks:114) 2019-02-25 12:41:41,754+0200 INFO (jsonrpc/3) [jsonrpc.JsonRpcServer] RPC call Host.ping2 succeeded in 0.00 seconds (__init__:312) 2019-02-25 12:41:41,854+0200 INFO (jsonrpc/5) [root] /usr/libexec/vdsm/hooks/after_get_caps/openstacknet_utils.py: rc=0 err= (hooks:114) 2019-02-25 12:41:41,891+0200 INFO (jsonrpc/6) [jsonrpc.JsonRpcServer] RPC call Host.ping2 succeeded in 0.00 seconds (__init__:312) 2019-02-25 12:41:41,899+0200 INFO (jsonrpc/4) [vdsm.api] START repoStats(domains=[u'5106a440-3a0a-4a79-a8ee-0ffc5fa8e3a2']) from=::1,41016, task_id=6273a6a4-efbf-4780-9676-26b99502d712 (api:48) 2019-02-25 12:41:41,899+0200 INFO (jsonrpc/4) [vdsm.api] FINISH repoStats return={u'5106a440-3a0a-4a79-a8ee-0ffc5fa8e3a2': {'code': 0, 'actual': True, 'version': 4, 'acquired': True, 'delay': '0.000473384', 'lastCheck': '1.4', 'valid': True}} from=::1,41016, task_id=6273a6a4-efbf-4780-9676-26b99502d712 (api:54) 2019-02-25 12:41:41,900+0200 INFO (jsonrpc/4) [jsonrpc.JsonRpcServer] RPC call Host.getStorageRepoStats succeeded in 0.01 seconds (__init__:312) 2019-02-25 12:41:41,904+0200 INFO (jsonrpc/0) [jsonrpc.JsonRpcServer] RPC call Host.ping2 succeeded in 0.00 seconds (__init__:312) 2019-02-25 12:41:41,915+0200 INFO (jsonrpc/7) [vdsm.api] START repoStats(domains=[u'5106a440-3a0a-4a79-a8ee-0ffc5fa8e3a2']) from=::1,41016, task_id=171a65d3-8249-463d-bfe2-646bf1be9472 (api:48) 2019-02-25 12:41:41,916+0200 INFO (jsonrpc/7) [vdsm.api] FINISH repoStats return={u'5106a440-3a0a-4a79-a8ee-0ffc5fa8e3a2': {'code': 0, 'actual': True, 'version': 4, 'acquired': True, 'delay': '0.000473384', 'lastCheck': '1.4', 'valid': True}} from=::1,41016, task_id=171a65d3-8249-463d-bfe2-646bf1be9472 (api:54) 2019-02-25 12:41:41,916+0200 INFO (jsonrpc/7) [jsonrpc.JsonRpcServer] RPC call Host.getStorageRepoStats succeeded in 0.00 seconds (__init__:312) 2019-02-25 12:41:42,049+0200 WARN (qgapoller/0) [virt.periodic.VmDispatcher] could not run <function <lambda> at 0x7fc9e8092aa0> on ['59103e4b-863c-4c16-8f45-226712a9ab08', '66394fff-6207-4277-ab47-bffd39075eef'] (periodic:289) 2019-02-25 12:41:42,200+0200 INFO (jsonrpc/5) [root] /usr/libexec/vdsm/hooks/after_get_caps/ovirt_provider_ovn_hook: rc=0 err= (hooks:114)
This is from engine.log with the relevant call:
2019-02-25 12:40:38,461+02 INFO [org.ovirt.engine.core.vdsbroker.vdsbroker.HostSetupNetworksVDSCommand] (default task-17) [1f31b2f5] START, HostSetupNetworksVDSCommand(HostName = ovirt01.virt.local,
HostSetupNetworksVdsCommandParameters:{hostId='7722c972-c25a-411e-81ba-af6b0801e9ba',
vds='Host[ovirt01.virt.local,7722c972-c25a-411e-81ba-af6b0801e9ba]', rollbackOnFailure='true', commitOnSuccess='true', connectivityTimeout='120', networks='[HostNetwork:{defaultRoute='true', bonding='false', networkName='ovirtmgmt', vdsmName='ovirtmgmt', nicName='eno1', vlan='null', vmNetwork='true', stp='false', properties='[]', ipv4BootProtocol='DHCP', ipv4Address='null', ipv4Netmask='null', ipv4Gateway='null', ipv6BootProtocol='NONE', ipv6Address='null', ipv6Prefix='null', ipv6Gateway='null', nameServers='null'}]', removedNetworks='[]', bonds='[]', removedBonds='[]', clusterSwitchType='LEGACY', managementNetworkChanged='true'}), log id: 7be20d99 2019-02-25 12:40:38,463+02 INFO [org.ovirt.engine.core.vdsbroker.vdsbroker.HostSetupNetworksVDSCommand] (default task-17) [1f31b2f5] FINISH, HostSetupNetworksVDSCommand, return: , log id: 7be20d99 2019-02-25 12:40:51,529+02 INFO [org.ovirt.vdsm.jsonrpc.client.reactors.ReactorClient] (SSL Stomp Reactor) [] No interaction with host 'ovirt01.virt.local' for 20000 ms. 2019-02-25 12:40:54,081+02 ERROR [org.ovirt.vdsm.jsonrpc.client.reactors.ReactorClient] (SSL Stomp Reactor) [] Connection timeout for host 'ovirt01.virt.local', last response arrived 23360 ms ago. 2019-02-25 12:40:54,100+02 WARN [org.ovirt.engine.core.vdsbroker.VdsManager] (EE-ManagedThreadFactory-engine-Thread-6299) [] Host 'ovirt01.virt.local' is not responding. It will stay in Connecting state for a grace period of 82 seconds and after that an attempt to fence the host will be issued.
You can see DHCP from there too and results of it getting things wrong. Maybe there should option when you try to resync that engine shows what it wants to change, so you can approve the change. Noticed also that clocks are not in sync. Put the ntpd on now. There is not ntpd installed in HE?
I have engine backups from Feb 7th and this is the line in vds_interface table in every backup until failure (10.168.8.111 is the correct address). So if this is the table engine uses for node management, the question is how it came to conclusion that is should use DHCP?
aef5b4fd-5760-44dc-950f-58363de7afcf eno1 ovirtmgmt 7722c972-c25a-411e-81ba-af6b0801e9ba 54:80:28:4a:2b:18 f \N \N \N \N 1000 10.168.8.111 255.255.255.0 10.168.8.1 2 2 2018-12-05 12:51:58.241569+02 2019-02-06 11:54:57.027254+02 1500 t \N f \N 2 \N \N :: \N legacy \N \N t
> This led the node being rebooted by HP ILO. After > reboot the node came up with DHCP address.
I have no idea why this happened, maybe there is a trace in /var/log/messages ?
Reboot was because of fencing. See the log above.
VDSM would revert a change in management networking configuration, if after the change communication to engine is broken. But if the host is rebooted during the change, there might be a problem. If you think you triggered Bug 1664106 - [Network] Host reboot while setup networks leads to host stuck on status 'connecting' you logfiles would be very helpful, if they are attached to the bug.
I don't think that this is connected to that because change succeeded.
Thanks, -Juhani _______________________________________________ Users mailing list -- users@ovirt.org To unsubscribe send an email to users-leave@ovirt.org Privacy Statement: https://www.ovirt.org/site/privacy-policy/ oVirt Code of Conduct: https://www.ovirt.org/community/about/community-guidelines/ List Archives: https://lists.ovirt.org/archives/list/users@ovirt.org/message/LV3DX4WH5PSE7Z...

I had this issue, I believe that when I tried to fix the network manually so that ovirt could sync the correct config, vdsm was kicking in and overwriting my changes with what it had stored in /var/lib/vdsm/persistence/netconf/ before the sync took place. For whatever reason this was dhcp.

Thanks for the tip. I upgraded my cluster and one node (which started this thread) again picked up DHCP address after reboot. I checked /var/lib/vdsm/persistence/netconf/nets/ovirtmgmt and it _still_ had DHCP settings in it. And I had previously removed and added node to cluster. It seems that VDSM doesn't compare between settings in engine DB and node. Don't know if it because it "knows" that they are correct. -Juhani On Sun, Mar 3, 2019 at 11:02 AM <jajadating@gmail.com> wrote:
I had this issue, I believe that when I tried to fix the network manually so that ovirt could sync the correct config, vdsm was kicking in and overwriting my changes with what it had stored in /var/lib/vdsm/persistence/netconf/ before the sync took place. For whatever reason this was dhcp. _______________________________________________ Users mailing list -- users@ovirt.org To unsubscribe send an email to users-leave@ovirt.org Privacy Statement: https://www.ovirt.org/site/privacy-policy/ oVirt Code of Conduct: https://www.ovirt.org/community/about/community-guidelines/ List Archives: https://lists.ovirt.org/archives/list/users@ovirt.org/message/23MLUL2YU7VYFG...
-- Juhani Rautiainen jrauti@iki.fi
participants (5)
-
Dominik Holler
-
Edward Haas
-
jajadating@gmail.com
-
Juhani Rautiainen
-
Juhani Rautiainen