Re: Upgrade from 4.3.5 to 4.3.10 HE Host issue

Adding screenshot representing the state: [Graphical user interface, text Description automatically generated] ----- kind regards/met vriendelijke groeten Marko Vrgotic Sr. System Engineer @ System Administration ActiveVideo o: +31 (35) 6774131 m: +31 (65) 5734174 e: m.vrgotic@activevideo.com<mailto:m.vrgotic@activevideo.com> w: www.activevideo.com<http://www.activevideo.com> ActiveVideo Networks BV. Mediacentrum 3745 Joop van den Endeplein 1.1217 WJ Hilversum, The Netherlands. The information contained in this message may be legally privileged and confidential. It is intended to be read only by the individual or entity to whom it is addressed or by their designee. If the reader of this message is not the intended recipient, you are on notice that any distribution of this message, in any form, is strictly prohibited. If you have received this message in error, please immediately notify the sender and/or ActiveVideo Networks, LLC by telephone at +1 408.931.9200 and delete or destroy any copy of this message. From: Marko Vrgotic <M.Vrgotic@activevideo.com> Date: Thursday, 15 April 2021 at 23:00 To: users@ovirt.org <users@ovirt.org> Cc: Yedidyah Bar David <didi@redhat.com> Subject: Re: [ovirt-users] Re: Upgrade from 4.3.5 to 4.3.10 HE Host issue Looking further onto storage part, checking the host which I am unable to re-add to HE Host pool: [root@ovirt-sj-02 10.210.13.64:_hosted__engine]# ls -la total 8 drwxr-xr-x. 3 nobody nobody 4096 Apr 15 20:47 . drwxr-xr-x. 3 vdsm kvm 42 Apr 15 14:30 .. drwxr-xr-x. 6 nobody nobody 4096 Aug 20 2019 054c43fc-1924-4106-9f80-0f2ac62b9886 -rwxr-xr-x. 1 nobody nobody 0 Feb 18 2020 __DIRECT_IO_TEST__ [root@ovirt-sj-02 10.210.13.64:_hosted__engine]# cd 054c43fc-1924-4106-9f80-0f2ac62b9886/ [root@ovirt-sj-02 054c43fc-1924-4106-9f80-0f2ac62b9886]# ls dom_md ha_agent images master [root@ovirt-sj-02 054c43fc-1924-4106-9f80-0f2ac62b9886]# cd ha_agent/ [root@ovirt-sj-02 ha_agent]# ls hosted-engine.lockspace hosted-engine.metadata [root@ovirt-sj-02 ha_agent]# cat hosted-engine.lockspace cat: hosted-engine.lockspace: No such file or directory [root@ovirt-sj-02 ha_agent]# ls -la total 16 drwxr-xr-x. 2 nobody nobody 4096 Mar 31 10:30 . drwxr-xr-x. 6 nobody nobody 4096 Aug 20 2019 .. lrwxrwxrwx. 1 nobody nobody 132 Mar 31 10:30 hosted-engine.lockspace -> /var/run/vdsm/storage/054c43fc-1924-4106-9f80-0f2ac62b9886/e08188be-f733-4d5c-9222-a4b4e2228955/081f81c5-b2b2-46d5-9f82-9d9041ccc108 lrwxrwxrwx. 1 nobody nobody 132 Mar 31 10:30 hosted-engine.metadata -> /var/run/vdsm/storage/054c43fc-1924-4106-9f80-0f2ac62b9886/16b3e5ac-e70b-46e3-bf81-322954fe0b44/b6326e48-a7d2-4cba-af91-441db9f353c2 [root@ovirt-sj-02 ha_agent]# cat ^C [root@ovirt-sj-02 ha_agent]# cat /var/run/vdsm/storage/054c43fc-1924-4106-9f80-0f2ac62b9886/e08188be-f733-4d5c-9222-a4b4e2228955/081f81c5-b2b2-46d5-9f82-9d9041ccc108 cat: /var/run/vdsm/storage/054c43fc-1924-4106-9f80-0f2ac62b9886/e08188be-f733-4d5c-9222-a4b4e2228955/081f81c5-b2b2-46d5-9f82-9d9041ccc108: No such file or directory It looks like there is still lockspace and metalinks which point to location thatno longer exists – the links are marked with red. The broker.log is showing the following: Main Thread WARNING storage_broker ovirt_hosted_engine_ha.broker.storage_broker.Storage Broker Can't connect vdsm storage: 'metadata_image_UUID can't be 'None' I am starting to think that I need to run the lockspace reinitialization: 1. on each HE host systemctl stop ovirt-ha-agent ovirt-ha-brokersanlock client shutdown -f 1 # carefully, it could trigger the watchdog and reboot 2. on a single hosthosted-engine --reinitialize-lockspace 3. on each HE hostsystemctl start ovirt-ha-agent ovirt-ha-broker Is action 2 required to be executed only on Host with an issue or the action itself is gonna reinitialize lockspace on all HE Hosts? ----- kind regards/met vriendelijke groeten Marko Vrgotic Sr. System Engineer @ System Administration ActiveVideo o: +31 (35) 6774131 m: +31 (65) 5734174 e: m.vrgotic@activevideo.com<mailto:m.vrgotic@activevideo.com> w: www.activevideo.com<http://www.activevideo.com> ActiveVideo Networks BV. Mediacentrum 3745 Joop van den Endeplein 1.1217 WJ Hilversum, The Netherlands. The information contained in this message may be legally privileged and confidential. It is intended to be read only by the individual or entity to whom it is addressed or by their designee. If the reader of this message is not the intended recipient, you are on notice that any distribution of this message, in any form, is strictly prohibited. If you have received this message in error, please immediately notify the sender and/or ActiveVideo Networks, LLC by telephone at +1 408.931.9200 and delete or destroy any copy of this message. From: Marko Vrgotic <M.Vrgotic@activevideo.com> Date: Thursday, 15 April 2021 at 16:57 To: Yedidyah Bar David <didi@redhat.com> Cc: users@ovirt.org <users@ovirt.org> Subject: Re: [ovirt-users] Re: Upgrade from 4.3.5 to 4.3.10 HE Host issue Hi Didi, I compared the hosted-engine.conf on all three machines and indeed, host 1 and 3 have identical ones , except hosted. Hosted-engine.conf on host2 that I am trying to add back contains only hostid and ca path: ca_cert=/etc/pki/vdsm/libvirt-spice/ca-cert.pem host_id=2 Can someone help me how to check if there is DB or Storage corruption? Would it be dectructive or risky to try to populate the hosted-engine.conf of host 2 with missing values? Any advices? ----- kind regards/met vriendelijke groeten Marko Vrgotic Sr. System Engineer @ System Administration ActiveVideo o: +31 (35) 6774131 m: +31 (65) 5734174 e: m.vrgotic@activevideo.com<mailto:m.vrgotic@activevideo.com> w: www.activevideo.com<http://www.activevideo.com> ActiveVideo Networks BV. Mediacentrum 3745 Joop van den Endeplein 1.1217 WJ Hilversum, The Netherlands. The information contained in this message may be legally privileged and confidential. It is intended to be read only by the individual or entity to whom it is addressed or by their designee. If the reader of this message is not the intended recipient, you are on notice that any distribution of this message, in any form, is strictly prohibited. If you have received this message in error, please immediately notify the sender and/or ActiveVideo Networks, LLC by telephone at +1 408.931.9200 and delete or destroy any copy of this message. From: Marko Vrgotic <M.Vrgotic@activevideo.com> Date: Wednesday, 14 April 2021 at 16:16 To: Yedidyah Bar David <didi@redhat.com> Cc: users@ovirt.org <users@ovirt.org> Subject: Re: [ovirt-users] Re: Upgrade from 4.3.5 to 4.3.10 HE Host issue Hi Didi, It looks like the issue was with Hosted-engine Undeploy, being incomplete – the other HE Hosts still had the entries of the Host I was trying to remove, so any following HE Deploy on that Host was failing. I was able to get the other hosts to forget about this one, by running hosted-engine –clean-metadate –host-id=2 Now I would like to try to add the host back to HE pool, but I have a question: “Is there a time I should wait, between cleaning metadata and re-adding the host?” Kindly awaiting your reply. ----- kind regards/met vriendelijke groeten Marko Vrgotic Sr. System Engineer @ System Administration ActiveVideo o: +31 (35) 6774131 m: +31 (65) 5734174 e: m.vrgotic@activevideo.com<mailto:m.vrgotic@activevideo.com> w: www.activevideo.com<http://www.activevideo.com> ActiveVideo Networks BV. Mediacentrum 3745 Joop van den Endeplein 1.1217 WJ Hilversum, The Netherlands. The information contained in this message may be legally privileged and confidential. It is intended to be read only by the individual or entity to whom it is addressed or by their designee. If the reader of this message is not the intended recipient, you are on notice that any distribution of this message, in any form, is strictly prohibited. If you have received this message in error, please immediately notify the sender and/or ActiveVideo Networks, LLC by telephone at +1 408.931.9200 and delete or destroy any copy of this message. From: Yedidyah Bar David <didi@redhat.com> Date: Thursday, 18 March 2021 at 15:09 To: Marko Vrgotic <M.Vrgotic@activevideo.com> Cc: users@ovirt.org <users@ovirt.org> Subject: Re: [ovirt-users] Re: Upgrade from 4.3.5 to 4.3.10 HE Host issue ***CAUTION: This email originated from outside of the organization. Do not click links or open attachments unless you recognize the sender!!!*** Hi, On Mon, Mar 8, 2021 at 4:55 PM Marko Vrgotic <M.Vrgotic@activevideo.com> wrote:
The broker log, these lines are pretty much repeating:
MainThread::WARNING::2021-03-03 09:19:12,086::storage_broker::97::ovirt_hosted_engine_ha.broker.storage_broker.StorageBroker::(__init__) Can't connect vdsm storage: 'metadata_image_UUID can't be 'None'
Please compare the content of /etc/ovirt-hosted-engine/hosted-engine.conf between all your hosts. host id should be unique per host, but otherwise they should be identical. If they are not, most likely there is some corruption somewhere - in the engine db or shared storage. You might want to skim this for a general rather-low-level overview: https://nam10.safelinks.protection.outlook.com/?url=https%3A%2F%2Fwww.ovirt.... Do you see no errors on your other hosts? In -ha logs? Please also note that 4.3 is EOL. The deploy process was completely rewritten in 4.4 (in ansible, previous was python), although should in principle behave similarly - so if your data is corrupted, upgrade to 4.4 probably won't fix it. Good luck and best regards,
MainThread::INFO::2021-03-03 09:19:12,829::broker::47::ovirt_hosted_engine_ha.broker.broker.Broker::(run) ovirt-hosted-engine-ha broker 2.3.6 started
MainThread::INFO::2021-03-03 09:19:12,829::monitor::40::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors) Searching for submonitors in /usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/broker/sub
monitors
MainThread::INFO::2021-03-03 09:19:12,829::monitor::49::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors) Loaded submonitor cpu-load
MainThread::INFO::2021-03-03 09:19:12,832::monitor::49::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors) Loaded submonitor cpu-load-no-engine
MainThread::INFO::2021-03-03 09:19:12,832::monitor::49::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors) Loaded submonitor engine-health
MainThread::INFO::2021-03-03 09:19:12,832::monitor::49::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors) Loaded submonitor mem-free
MainThread::INFO::2021-03-03 09:19:12,833::monitor::49::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors) Loaded submonitor mgmt-bridge
MainThread::INFO::2021-03-03 09:19:12,833::monitor::49::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors) Loaded submonitor network
MainThread::INFO::2021-03-03 09:19:12,833::monitor::49::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors) Loaded submonitor storage-domain
MainThread::INFO::2021-03-03 09:19:12,833::monitor::49::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors) Loaded submonitor cpu-load
MainThread::INFO::2021-03-03 09:19:12,834::monitor::49::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors) Loaded submonitor cpu-load-no-engine
MainThread::INFO::2021-03-03 09:19:12,835::monitor::49::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors) Loaded submonitor engine-health
MainThread::INFO::2021-03-03 09:19:12,835::monitor::49::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors) Loaded submonitor mem-free
MainThread::INFO::2021-03-03 09:19:12,835::monitor::49::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors) Loaded submonitor mgmt-bridge
MainThread::INFO::2021-03-03 09:19:12,835::monitor::49::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors) Loaded submonitor network
MainThread::INFO::2021-03-03 09:19:12,836::monitor::49::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors) Loaded submonitor storage-domain
MainThread::INFO::2021-03-03 09:19:12,836::monitor::50::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors) Finished loading submonitors
MainThread::WARNING::2021-03-03 09:19:12,836::storage_broker::97::ovirt_hosted_engine_ha.broker.storage_broker.StorageBroker::(__init__) Can't connect vdsm storage: 'metadata_image_UUID can't be 'None'
MainThread::INFO::2021-03-03 09:19:13,574::broker::47::ovirt_hosted_engine_ha.broker.broker.Broker::(run) ovirt-hosted-engine-ha broker 2.3.6 started
MainThread::INFO::2021-03-03 09:19:13,575::monitor::40::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors) Searching for submonitors in /usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/broker/submonitors
MainThread::INFO::2021-03-03 09:19:13,575::monitor::49::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors) Loaded submonitor cpu-load
MainThread::INFO::2021-03-03 09:19:13,577::monitor::49::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors) Loaded submonitor cpu-load-no-engine
MainThread::INFO::2021-03-03 09:19:13,578::monitor::49::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors) Loaded submonitor engine-health
-----
kind regards/met vriendelijke groeten
Marko Vrgotic Sr. System Engineer @ System Administration
ActiveVideo
o: +31 (35) 6774131
m: +31 (65) 5734174
e: m.vrgotic@activevideo.com w: https://nam10.safelinks.protection.outlook.com/?url=http%3A%2F%2Fwww.activev...
ActiveVideo Networks BV. Mediacentrum 3745 Joop van den Endeplein 1.1217 WJ Hilversum, The Netherlands. The information contained in this message may be legally privileged and confidential. It is intended to be read only by the individual or entity to whom it is addressed or by their designee. If the reader of this message is not the intended recipient, you are on notice that any distribution of this message, in any form, is strictly prohibited. If you have received this message in error, please immediately notify the sender and/or ActiveVideo Networks, LLC by telephone at +1 408.931.9200 and delete or destroy any copy of this message.
From: Marko Vrgotic <M.Vrgotic@activevideo.com> Date: Monday, 8 March 2021 at 15:34 To: Yedidyah Bar David <didi@redhat.com> Cc: users@ovirt.org <users@ovirt.org> Subject: Re: [ovirt-users] Re: Upgrade from 4.3.5 to 4.3.10 HE Host issue
Hi Didi,
Please find the attached logs from Host and Engine.
Host ovirt-sj-02 HE Undeploy 2021-03-08 14:15:52 till 2021-03-08 14:18:24
Host ovirt-sj-02 HE Deploy 2021-03-08 14:20:51 till 2021-03-08 14:23:22
I do see errors in the agent and broker and vdsm, but I do not see why it happened.
Thank you for helping, let me know if any additional files are needed.
-----
kind regards/met vriendelijke groeten
Marko Vrgotic Sr. System Engineer @ System Administration
ActiveVideo
o: +31 (35) 6774131
m: +31 (65) 5734174
e: m.vrgotic@activevideo.com w: https://nam10.safelinks.protection.outlook.com/?url=http%3A%2F%2Fwww.activev...
ActiveVideo Networks BV. Mediacentrum 3745 Joop van den Endeplein 1.1217 WJ Hilversum, The Netherlands. The information contained in this message may be legally privileged and confidential. It is intended to be read only by the individual or entity to whom it is addressed or by their designee. If the reader of this message is not the intended recipient, you are on notice that any distribution of this message, in any form, is strictly prohibited. If you have received this message in error, please immediately notify the sender and/or ActiveVideo Networks, LLC by telephone at +1 408.931.9200 and delete or destroy any copy of this message.
From: Yedidyah Bar David <didi@redhat.com> Date: Monday, 8 March 2021 at 09:25 To: Marko Vrgotic <M.Vrgotic@activevideo.com> Cc: users@ovirt.org <users@ovirt.org> Subject: Re: [ovirt-users] Re: Upgrade from 4.3.5 to 4.3.10 HE Host issue
***CAUTION: This email originated from outside of the organization. Do not click links or open attachments unless you recognize the sender!!!***
Hi,
On Mon, Mar 8, 2021 at 10:13 AM Marko Vrgotic <M.Vrgotic@activevideo.com> wrote:
I cannot find the reason why the re-Deployment on this Hosts fails, as it was already deployed on it before.
No errors, found int the deployment, but it seems half done, based on messages I sent in previous email.
Please check/share all relevant logs. Thanks. Can be all of /var/log from engine and hosts, and at least:
/var/log/ovirt-engine/engine.log
/var/log/vdsm/*
/var/log/ovirt-hosted-engine-ha/*
Best regards, -- Didi
-- Didi
participants (1)
-
Marko Vrgotic