Looking further onto storage part, checking the host which I am unable to re-add to HE
Host pool:
[root@ovirt-sj-02 10.210.13.64:_hosted__engine]# ls -la
total 8
drwxr-xr-x. 3 nobody nobody 4096 Apr 15 20:47 .
drwxr-xr-x. 3 vdsm kvm 42 Apr 15 14:30 ..
drwxr-xr-x. 6 nobody nobody 4096 Aug 20 2019 054c43fc-1924-4106-9f80-0f2ac62b9886
-rwxr-xr-x. 1 nobody nobody 0 Feb 18 2020 __DIRECT_IO_TEST__
[root@ovirt-sj-02 10.210.13.64:_hosted__engine]# cd 054c43fc-1924-4106-9f80-0f2ac62b9886/
[root@ovirt-sj-02 054c43fc-1924-4106-9f80-0f2ac62b9886]# ls
dom_md ha_agent images master
[root@ovirt-sj-02 054c43fc-1924-4106-9f80-0f2ac62b9886]# cd ha_agent/
[root@ovirt-sj-02 ha_agent]# ls
hosted-engine.lockspace hosted-engine.metadata
[root@ovirt-sj-02 ha_agent]# cat hosted-engine.lockspace
cat: hosted-engine.lockspace: No such file or directory
[root@ovirt-sj-02 ha_agent]# ls -la
total 16
drwxr-xr-x. 2 nobody nobody 4096 Mar 31 10:30 .
drwxr-xr-x. 6 nobody nobody 4096 Aug 20 2019 ..
lrwxrwxrwx. 1 nobody nobody 132 Mar 31 10:30 hosted-engine.lockspace ->
/var/run/vdsm/storage/054c43fc-1924-4106-9f80-0f2ac62b9886/e08188be-f733-4d5c-9222-a4b4e2228955/081f81c5-b2b2-46d5-9f82-9d9041ccc108
lrwxrwxrwx. 1 nobody nobody 132 Mar 31 10:30 hosted-engine.metadata ->
/var/run/vdsm/storage/054c43fc-1924-4106-9f80-0f2ac62b9886/16b3e5ac-e70b-46e3-bf81-322954fe0b44/b6326e48-a7d2-4cba-af91-441db9f353c2
[root@ovirt-sj-02 ha_agent]# cat
^C
[root@ovirt-sj-02 ha_agent]# cat
/var/run/vdsm/storage/054c43fc-1924-4106-9f80-0f2ac62b9886/e08188be-f733-4d5c-9222-a4b4e2228955/081f81c5-b2b2-46d5-9f82-9d9041ccc108
cat:
/var/run/vdsm/storage/054c43fc-1924-4106-9f80-0f2ac62b9886/e08188be-f733-4d5c-9222-a4b4e2228955/081f81c5-b2b2-46d5-9f82-9d9041ccc108:
No such file or directory
It looks like there is still lockspace and metalinks which point to location thatno longer
exists – the links are marked with red.
The broker.log is showing the following:
Main Thread WARNING storage_broker ovirt_hosted_engine_ha.broker.storage_broker.Storage
Broker Can't connect vdsm storage: 'metadata_image_UUID can't be
'None'
I am starting to think that I need to run the lockspace reinitialization:
1. on each HE host
systemctl stop ovirt-ha-agent ovirt-ha-brokersanlock client shutdown -f 1 # carefully, it
could trigger the watchdog and reboot
2. on a single hosthosted-engine --reinitialize-lockspace
3. on each HE hostsystemctl start ovirt-ha-agent ovirt-ha-broker
Is action 2 required to be executed only on Host with an issue or the action itself is
gonna reinitialize lockspace on all HE Hosts?
-----
kind regards/met vriendelijke groeten
Marko Vrgotic
Sr. System Engineer @ System Administration
ActiveVideo
o: +31 (35) 6774131
m: +31 (65) 5734174
e: m.vrgotic@activevideo.com<mailto:m.vrgotic@activevideo.com>
w:
www.activevideo.com<http://www.activevideo.com>
ActiveVideo Networks BV. Mediacentrum 3745 Joop van den Endeplein 1.1217 WJ Hilversum, The
Netherlands. The information contained in this message may be legally privileged and
confidential. It is intended to be read only by the individual or entity to whom it is
addressed or by their designee. If the reader of this message is not the intended
recipient, you are on notice that any distribution of this message, in any form, is
strictly prohibited. If you have received this message in error, please immediately
notify the sender and/or ActiveVideo Networks, LLC by telephone at +1 408.931.9200 and
delete or destroy any copy of this message.
From: Marko Vrgotic <M.Vrgotic(a)activevideo.com>
Date: Thursday, 15 April 2021 at 16:57
To: Yedidyah Bar David <didi(a)redhat.com>
Cc: users(a)ovirt.org <users(a)ovirt.org>
Subject: Re: [ovirt-users] Re: Upgrade from 4.3.5 to 4.3.10 HE Host issue
Hi Didi,
I compared the hosted-engine.conf on all three machines and indeed, host 1 and 3 have
identical ones , except hosted.
Hosted-engine.conf on host2 that I am trying to add back contains only hostid and ca
path:
ca_cert=/etc/pki/vdsm/libvirt-spice/ca-cert.pem
host_id=2
Can someone help me how to check if there is DB or Storage corruption?
Would it be dectructive or risky to try to populate the hosted-engine.conf of host 2 with
missing values?
Any advices?
-----
kind regards/met vriendelijke groeten
Marko Vrgotic
Sr. System Engineer @ System Administration
ActiveVideo
o: +31 (35) 6774131
m: +31 (65) 5734174
e: m.vrgotic@activevideo.com<mailto:m.vrgotic@activevideo.com>
w:
www.activevideo.com<http://www.activevideo.com>
ActiveVideo Networks BV. Mediacentrum 3745 Joop van den Endeplein 1.1217 WJ Hilversum, The
Netherlands. The information contained in this message may be legally privileged and
confidential. It is intended to be read only by the individual or entity to whom it is
addressed or by their designee. If the reader of this message is not the intended
recipient, you are on notice that any distribution of this message, in any form, is
strictly prohibited. If you have received this message in error, please immediately
notify the sender and/or ActiveVideo Networks, LLC by telephone at +1 408.931.9200 and
delete or destroy any copy of this message.
From: Marko Vrgotic <M.Vrgotic(a)activevideo.com>
Date: Wednesday, 14 April 2021 at 16:16
To: Yedidyah Bar David <didi(a)redhat.com>
Cc: users(a)ovirt.org <users(a)ovirt.org>
Subject: Re: [ovirt-users] Re: Upgrade from 4.3.5 to 4.3.10 HE Host issue
Hi Didi,
It looks like the issue was with Hosted-engine Undeploy, being incomplete – the other HE
Hosts still had the entries of the Host I was trying to remove, so any following HE Deploy
on that Host was failing.
I was able to get the other hosts to forget about this one, by running hosted-engine
–clean-metadate –host-id=2
Now I would like to try to add the host back to HE pool, but I have a question: “Is there
a time I should wait, between cleaning metadata and re-adding the host?”
Kindly awaiting your reply.
-----
kind regards/met vriendelijke groeten
Marko Vrgotic
Sr. System Engineer @ System Administration
ActiveVideo
o: +31 (35) 6774131
m: +31 (65) 5734174
e: m.vrgotic@activevideo.com<mailto:m.vrgotic@activevideo.com>
w:
www.activevideo.com<http://www.activevideo.com>
ActiveVideo Networks BV. Mediacentrum 3745 Joop van den Endeplein 1.1217 WJ Hilversum, The
Netherlands. The information contained in this message may be legally privileged and
confidential. It is intended to be read only by the individual or entity to whom it is
addressed or by their designee. If the reader of this message is not the intended
recipient, you are on notice that any distribution of this message, in any form, is
strictly prohibited. If you have received this message in error, please immediately
notify the sender and/or ActiveVideo Networks, LLC by telephone at +1 408.931.9200 and
delete or destroy any copy of this message.
From: Yedidyah Bar David <didi(a)redhat.com>
Date: Thursday, 18 March 2021 at 15:09
To: Marko Vrgotic <M.Vrgotic(a)activevideo.com>
Cc: users(a)ovirt.org <users(a)ovirt.org>
Subject: Re: [ovirt-users] Re: Upgrade from 4.3.5 to 4.3.10 HE Host issue
***CAUTION: This email originated from outside of the organization. Do not click links or
open attachments unless you recognize the sender!!!***
Hi,
On Mon, Mar 8, 2021 at 4:55 PM Marko Vrgotic <M.Vrgotic(a)activevideo.com> wrote:
The broker log, these lines are pretty much repeating:
MainThread::WARNING::2021-03-03
09:19:12,086::storage_broker::97::ovirt_hosted_engine_ha.broker.storage_broker.StorageBroker::(__init__)
Can't connect vdsm storage: 'metadata_image_UUID can't be 'None'
Please compare the content of
/etc/ovirt-hosted-engine/hosted-engine.conf between all your hosts.
host id should be unique per host, but otherwise they should be
identical. If they are not, most likely there is some corruption
somewhere - in the engine db or shared storage.
You might want to skim this for a general rather-low-level overview:
https://nam10.safelinks.protection.outlook.com/?url=https%3A%2F%2Fwww.ovi...
Do you see no errors on your other hosts? In -ha logs?
Please also note that 4.3 is EOL. The deploy process was completely
rewritten in 4.4 (in ansible, previous was python), although should in
principle behave similarly - so if your data is corrupted, upgrade to
4.4 probably won't fix it.
Good luck and best regards,
MainThread::INFO::2021-03-03
09:19:12,829::broker::47::ovirt_hosted_engine_ha.broker.broker.Broker::(run)
ovirt-hosted-engine-ha broker 2.3.6 started
MainThread::INFO::2021-03-03
09:19:12,829::monitor::40::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors)
Searching for submonitors in
/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/broker/sub
monitors
MainThread::INFO::2021-03-03
09:19:12,829::monitor::49::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors)
Loaded submonitor cpu-load
MainThread::INFO::2021-03-03
09:19:12,832::monitor::49::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors)
Loaded submonitor cpu-load-no-engine
MainThread::INFO::2021-03-03
09:19:12,832::monitor::49::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors)
Loaded submonitor engine-health
MainThread::INFO::2021-03-03
09:19:12,832::monitor::49::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors)
Loaded submonitor mem-free
MainThread::INFO::2021-03-03
09:19:12,833::monitor::49::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors)
Loaded submonitor mgmt-bridge
MainThread::INFO::2021-03-03
09:19:12,833::monitor::49::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors)
Loaded submonitor network
MainThread::INFO::2021-03-03
09:19:12,833::monitor::49::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors)
Loaded submonitor storage-domain
MainThread::INFO::2021-03-03
09:19:12,833::monitor::49::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors)
Loaded submonitor cpu-load
MainThread::INFO::2021-03-03
09:19:12,834::monitor::49::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors)
Loaded submonitor cpu-load-no-engine
MainThread::INFO::2021-03-03
09:19:12,835::monitor::49::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors)
Loaded submonitor engine-health
MainThread::INFO::2021-03-03
09:19:12,835::monitor::49::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors)
Loaded submonitor mem-free
MainThread::INFO::2021-03-03
09:19:12,835::monitor::49::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors)
Loaded submonitor mgmt-bridge
MainThread::INFO::2021-03-03
09:19:12,835::monitor::49::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors)
Loaded submonitor network
MainThread::INFO::2021-03-03
09:19:12,836::monitor::49::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors)
Loaded submonitor storage-domain
MainThread::INFO::2021-03-03
09:19:12,836::monitor::50::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors)
Finished loading submonitors
MainThread::WARNING::2021-03-03
09:19:12,836::storage_broker::97::ovirt_hosted_engine_ha.broker.storage_broker.StorageBroker::(__init__)
Can't connect vdsm storage: 'metadata_image_UUID can't be 'None'
MainThread::INFO::2021-03-03
09:19:13,574::broker::47::ovirt_hosted_engine_ha.broker.broker.Broker::(run)
ovirt-hosted-engine-ha broker 2.3.6 started
MainThread::INFO::2021-03-03
09:19:13,575::monitor::40::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors)
Searching for submonitors in
/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/broker/submonitors
MainThread::INFO::2021-03-03
09:19:13,575::monitor::49::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors)
Loaded submonitor cpu-load
MainThread::INFO::2021-03-03
09:19:13,577::monitor::49::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors)
Loaded submonitor cpu-load-no-engine
MainThread::INFO::2021-03-03
09:19:13,578::monitor::49::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors)
Loaded submonitor engine-health
-----
kind regards/met vriendelijke groeten
Marko Vrgotic
Sr. System Engineer @ System Administration
ActiveVideo
o: +31 (35) 6774131
m: +31 (65) 5734174
e: m.vrgotic(a)activevideo.com
w:
https://nam10.safelinks.protection.outlook.com/?url=http%3A%2F%2Fwww.acti...
ActiveVideo Networks BV. Mediacentrum 3745 Joop van den Endeplein 1.1217 WJ Hilversum,
The Netherlands. The information contained in this message may be legally privileged and
confidential. It is intended to be read only by the individual or entity to whom it is
addressed or by their designee. If the reader of this message is not the intended
recipient, you are on notice that any distribution of this message, in any form, is
strictly prohibited. If you have received this message in error, please immediately
notify the sender and/or ActiveVideo Networks, LLC by telephone at +1 408.931.9200 and
delete or destroy any copy of this message.
From: Marko Vrgotic <M.Vrgotic(a)activevideo.com>
Date: Monday, 8 March 2021 at 15:34
To: Yedidyah Bar David <didi(a)redhat.com>
Cc: users(a)ovirt.org <users(a)ovirt.org>
Subject: Re: [ovirt-users] Re: Upgrade from 4.3.5 to 4.3.10 HE Host issue
Hi Didi,
Please find the attached logs from Host and Engine.
Host ovirt-sj-02 HE Undeploy 2021-03-08 14:15:52 till 2021-03-08 14:18:24
Host ovirt-sj-02 HE Deploy 2021-03-08 14:20:51 till 2021-03-08 14:23:22
I do see errors in the agent and broker and vdsm, but I do not see why it happened.
Thank you for helping, let me know if any additional files are needed.
-----
kind regards/met vriendelijke groeten
Marko Vrgotic
Sr. System Engineer @ System Administration
ActiveVideo
o: +31 (35) 6774131
m: +31 (65) 5734174
e: m.vrgotic(a)activevideo.com
w:
https://nam10.safelinks.protection.outlook.com/?url=http%3A%2F%2Fwww.acti...
ActiveVideo Networks BV. Mediacentrum 3745 Joop van den Endeplein 1.1217 WJ Hilversum,
The Netherlands. The information contained in this message may be legally privileged and
confidential. It is intended to be read only by the individual or entity to whom it is
addressed or by their designee. If the reader of this message is not the intended
recipient, you are on notice that any distribution of this message, in any form, is
strictly prohibited. If you have received this message in error, please immediately
notify the sender and/or ActiveVideo Networks, LLC by telephone at +1 408.931.9200 and
delete or destroy any copy of this message.
From: Yedidyah Bar David <didi(a)redhat.com>
Date: Monday, 8 March 2021 at 09:25
To: Marko Vrgotic <M.Vrgotic(a)activevideo.com>
Cc: users(a)ovirt.org <users(a)ovirt.org>
Subject: Re: [ovirt-users] Re: Upgrade from 4.3.5 to 4.3.10 HE Host issue
***CAUTION: This email originated from outside of the organization. Do not click links or
open attachments unless you recognize the sender!!!***
Hi,
On Mon, Mar 8, 2021 at 10:13 AM Marko Vrgotic <M.Vrgotic(a)activevideo.com> wrote:
>
> I cannot find the reason why the re-Deployment on this Hosts fails, as it was
already deployed on it before.
>
> No errors, found int the deployment, but it seems half done, based on messages I
sent in previous email.
Please check/share all relevant logs. Thanks. Can be all of /var/log
from engine and hosts, and at least:
/var/log/ovirt-engine/engine.log
/var/log/vdsm/*
/var/log/ovirt-hosted-engine-ha/*
Best regards,
--
Didi
--
Didi