Adding screenshot representing the state:

Graphical user interface, text

Description automatically generated

 

-----

kind regards/met vriendelijke groeten

 

Marko Vrgotic
Sr. System Engineer @ System Administration


ActiveVideo

o: +31 (35) 6774131

m: +31 (65) 5734174

e: m.vrgotic@activevideo.com
w: www.activevideo.com

 

ActiveVideo Networks BV. Mediacentrum 3745 Joop van den Endeplein 1.1217 WJ Hilversum, The Netherlands. The information contained in this message may be legally privileged and confidential. It is intended to be read only by the individual or entity to whom it is addressed or by their designee. If the reader of this message is not the intended recipient, you are on notice that any distribution of this message, in any form, is strictly prohibited.  If you have received this message in error, please immediately notify the sender and/or ActiveVideo Networks, LLC by telephone at +1 408.931.9200 and delete or destroy any copy of this message.

 

 

 

From: Marko Vrgotic <M.Vrgotic@activevideo.com>
Date: Thursday, 15 April 2021 at 23:00
To: users@ovirt.org <users@ovirt.org>
Cc: Yedidyah Bar David <didi@redhat.com>
Subject: Re: [ovirt-users] Re: Upgrade from 4.3.5 to 4.3.10 HE Host issue

Looking further onto storage part, checking the host which I am unable to re-add to HE Host pool:

 

[root@ovirt-sj-02 10.210.13.64:_hosted__engine]# ls -la

total 8

drwxr-xr-x. 3 nobody nobody 4096 Apr 15 20:47 .

drwxr-xr-x. 3 vdsm   kvm      42 Apr 15 14:30 ..

drwxr-xr-x. 6 nobody nobody 4096 Aug 20  2019 054c43fc-1924-4106-9f80-0f2ac62b9886

-rwxr-xr-x. 1 nobody nobody    0 Feb 18  2020 __DIRECT_IO_TEST__

[root@ovirt-sj-02 10.210.13.64:_hosted__engine]# cd 054c43fc-1924-4106-9f80-0f2ac62b9886/

[root@ovirt-sj-02 054c43fc-1924-4106-9f80-0f2ac62b9886]# ls

dom_md  ha_agent  images  master

[root@ovirt-sj-02 054c43fc-1924-4106-9f80-0f2ac62b9886]# cd ha_agent/

[root@ovirt-sj-02 ha_agent]# ls

hosted-engine.lockspace  hosted-engine.metadata

[root@ovirt-sj-02 ha_agent]# cat hosted-engine.lockspace

cat: hosted-engine.lockspace: No such file or directory

[root@ovirt-sj-02 ha_agent]# ls -la

total 16

drwxr-xr-x. 2 nobody nobody 4096 Mar 31 10:30 .

drwxr-xr-x. 6 nobody nobody 4096 Aug 20  2019 ..

lrwxrwxrwx. 1 nobody nobody  132 Mar 31 10:30 hosted-engine.lockspace -> /var/run/vdsm/storage/054c43fc-1924-4106-9f80-0f2ac62b9886/e08188be-f733-4d5c-9222-a4b4e2228955/081f81c5-b2b2-46d5-9f82-9d9041ccc108

lrwxrwxrwx. 1 nobody nobody  132 Mar 31 10:30 hosted-engine.metadata -> /var/run/vdsm/storage/054c43fc-1924-4106-9f80-0f2ac62b9886/16b3e5ac-e70b-46e3-bf81-322954fe0b44/b6326e48-a7d2-4cba-af91-441db9f353c2

[root@ovirt-sj-02 ha_agent]# cat

^C

[root@ovirt-sj-02 ha_agent]# cat /var/run/vdsm/storage/054c43fc-1924-4106-9f80-0f2ac62b9886/e08188be-f733-4d5c-9222-a4b4e2228955/081f81c5-b2b2-46d5-9f82-9d9041ccc108

cat: /var/run/vdsm/storage/054c43fc-1924-4106-9f80-0f2ac62b9886/e08188be-f733-4d5c-9222-a4b4e2228955/081f81c5-b2b2-46d5-9f82-9d9041ccc108: No such file or directory

 

It looks like there is still lockspace and metalinks which point to location thatno longer exists – the links are marked with red.

 

The broker.log is showing the following:

Main Thread WARNING storage_broker ovirt_hosted_engine_ha.broker.storage_broker.Storage Broker Can't connect vdsm storage: 'metadata_image_UUID can't be 'None'

 

I am starting to think that I need to run the lockspace reinitialization:

 

1. on each HE host

systemctl stop ovirt-ha-agent ovirt-ha-brokersanlock client shutdown -f 1 # carefully, it could trigger the watchdog and reboot

2. on a single hosthosted-engine --reinitialize-lockspace

3. on each HE hostsystemctl start ovirt-ha-agent ovirt-ha-broker

 

Is action 2 required to be executed only on Host with an issue or the action itself is gonna reinitialize lockspace on all HE Hosts?

 

 

 

-----

kind regards/met vriendelijke groeten

 

Marko Vrgotic
Sr. System Engineer @ System Administration


ActiveVideo

o: +31 (35) 6774131

m: +31 (65) 5734174

e: m.vrgotic@activevideo.com
w: www.activevideo.com

 

ActiveVideo Networks BV. Mediacentrum 3745 Joop van den Endeplein 1.1217 WJ Hilversum, The Netherlands. The information contained in this message may be legally privileged and confidential. It is intended to be read only by the individual or entity to whom it is addressed or by their designee. If the reader of this message is not the intended recipient, you are on notice that any distribution of this message, in any form, is strictly prohibited.  If you have received this message in error, please immediately notify the sender and/or ActiveVideo Networks, LLC by telephone at +1 408.931.9200 and delete or destroy any copy of this message.

 

 

 

From: Marko Vrgotic <M.Vrgotic@activevideo.com>
Date: Thursday, 15 April 2021 at 16:57
To: Yedidyah Bar David <didi@redhat.com>
Cc: users@ovirt.org <users@ovirt.org>
Subject: Re: [ovirt-users] Re: Upgrade from 4.3.5 to 4.3.10 HE Host issue

Hi Didi,

 

I compared the hosted-engine.conf on all three machines and indeed, host 1 and 3 have identical ones , except hosted.

 

Hosted-engine.conf on host2 that I am trying to add back contains only hostid and ca path:
ca_cert=/etc/pki/vdsm/libvirt-spice/ca-cert.pem

host_id=2

 

Can someone help me how to check if there is DB or Storage corruption?

Would it be dectructive or risky to try to populate the hosted-engine.conf of host 2 with missing values?

 

Any advices?

 

-----

kind regards/met vriendelijke groeten

 

Marko Vrgotic
Sr. System Engineer @ System Administration


ActiveVideo

o: +31 (35) 6774131

m: +31 (65) 5734174

e: m.vrgotic@activevideo.com
w: www.activevideo.com

 

ActiveVideo Networks BV. Mediacentrum 3745 Joop van den Endeplein 1.1217 WJ Hilversum, The Netherlands. The information contained in this message may be legally privileged and confidential. It is intended to be read only by the individual or entity to whom it is addressed or by their designee. If the reader of this message is not the intended recipient, you are on notice that any distribution of this message, in any form, is strictly prohibited.  If you have received this message in error, please immediately notify the sender and/or ActiveVideo Networks, LLC by telephone at +1 408.931.9200 and delete or destroy any copy of this message.

 

 

 

From: Marko Vrgotic <M.Vrgotic@activevideo.com>
Date: Wednesday, 14 April 2021 at 16:16
To: Yedidyah Bar David <didi@redhat.com>
Cc: users@ovirt.org <users@ovirt.org>
Subject: Re: [ovirt-users] Re: Upgrade from 4.3.5 to 4.3.10 HE Host issue

Hi Didi,

 

It looks like the issue was with Hosted-engine Undeploy, being incomplete – the other HE Hosts still had the entries of the Host I was trying to remove, so any following HE Deploy on that Host was failing.

 

I was able to get the other hosts to forget about this one, by running hosted-engine –clean-metadate –host-id=2

 

Now I would like to try to add the host back to HE pool, but I have a question: “Is there a time I should wait, between cleaning metadata and re-adding the host?”

 

Kindly awaiting your reply.

 

-----

kind regards/met vriendelijke groeten

 

Marko Vrgotic
Sr. System Engineer @ System Administration


ActiveVideo

o: +31 (35) 6774131

m: +31 (65) 5734174

e: m.vrgotic@activevideo.com
w: www.activevideo.com

 

ActiveVideo Networks BV. Mediacentrum 3745 Joop van den Endeplein 1.1217 WJ Hilversum, The Netherlands. The information contained in this message may be legally privileged and confidential. It is intended to be read only by the individual or entity to whom it is addressed or by their designee. If the reader of this message is not the intended recipient, you are on notice that any distribution of this message, in any form, is strictly prohibited.  If you have received this message in error, please immediately notify the sender and/or ActiveVideo Networks, LLC by telephone at +1 408.931.9200 and delete or destroy any copy of this message.

 

 

 

From: Yedidyah Bar David <didi@redhat.com>
Date: Thursday, 18 March 2021 at 15:09
To: Marko Vrgotic <M.Vrgotic@activevideo.com>
Cc: users@ovirt.org <users@ovirt.org>
Subject: Re: [ovirt-users] Re: Upgrade from 4.3.5 to 4.3.10 HE Host issue

***CAUTION: This email originated from outside of the organization. Do not click links or open attachments unless you recognize the sender!!!***

Hi,

On Mon, Mar 8, 2021 at 4:55 PM Marko Vrgotic <M.Vrgotic@activevideo.com> wrote:
>
> The broker log, these lines are pretty much repeating:
>
>
>
> MainThread::WARNING::2021-03-03 09:19:12,086::storage_broker::97::ovirt_hosted_engine_ha.broker.storage_broker.StorageBroker::(__init__) Can't connect vdsm storage: 'metadata_image_UUID can't be 'None'

Please compare the content of
/etc/ovirt-hosted-engine/hosted-engine.conf between all your hosts.
host id should be unique per host, but otherwise they should be
identical. If they are not, most likely there is some corruption
somewhere - in the engine db or shared storage.

You might want to skim this for a general rather-low-level overview:

https://nam10.safelinks.protection.outlook.com/?url=https%3A%2F%2Fwww.ovirt.org%2Fimages%2FHosted-Engine-4.3-deep-dive.pdf&amp;data=04%7C01%7Cm.vrgotic%40activevideo.com%7C51962d8f93164ed4898508d8ea1748dd%7C214268a3e1214486acd4545c9faf2252%7C0%7C0%7C637516733771153152%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&amp;sdata=ywjQfjgXhRODK5lJFE9wCtr1n2H%2B39ANindH5eM2Ayw%3D&amp;reserved=0

Do you see no errors on your other hosts? In -ha logs?

Please also note that 4.3 is EOL. The deploy process was completely
rewritten in 4.4 (in ansible, previous was python), although should in
principle behave similarly - so if your data is corrupted, upgrade to
4.4 probably won't fix it.

Good luck and best regards,

>
> MainThread::INFO::2021-03-03 09:19:12,829::broker::47::ovirt_hosted_engine_ha.broker.broker.Broker::(run) ovirt-hosted-engine-ha broker 2.3.6 started
>
> MainThread::INFO::2021-03-03 09:19:12,829::monitor::40::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors) Searching for submonitors in /usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/broker/sub
>
> monitors
>
> MainThread::INFO::2021-03-03 09:19:12,829::monitor::49::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors) Loaded submonitor cpu-load
>
> MainThread::INFO::2021-03-03 09:19:12,832::monitor::49::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors) Loaded submonitor cpu-load-no-engine
>
> MainThread::INFO::2021-03-03 09:19:12,832::monitor::49::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors) Loaded submonitor engine-health
>
> MainThread::INFO::2021-03-03 09:19:12,832::monitor::49::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors) Loaded submonitor mem-free
>
> MainThread::INFO::2021-03-03 09:19:12,833::monitor::49::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors) Loaded submonitor mgmt-bridge
>
> MainThread::INFO::2021-03-03 09:19:12,833::monitor::49::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors) Loaded submonitor network
>
> MainThread::INFO::2021-03-03 09:19:12,833::monitor::49::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors) Loaded submonitor storage-domain
>
> MainThread::INFO::2021-03-03 09:19:12,833::monitor::49::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors) Loaded submonitor cpu-load
>
> MainThread::INFO::2021-03-03 09:19:12,834::monitor::49::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors) Loaded submonitor cpu-load-no-engine
>
> MainThread::INFO::2021-03-03 09:19:12,835::monitor::49::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors) Loaded submonitor engine-health
>
> MainThread::INFO::2021-03-03 09:19:12,835::monitor::49::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors) Loaded submonitor mem-free
>
> MainThread::INFO::2021-03-03 09:19:12,835::monitor::49::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors) Loaded submonitor mgmt-bridge
>
> MainThread::INFO::2021-03-03 09:19:12,835::monitor::49::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors) Loaded submonitor network
>
> MainThread::INFO::2021-03-03 09:19:12,836::monitor::49::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors) Loaded submonitor storage-domain
>
> MainThread::INFO::2021-03-03 09:19:12,836::monitor::50::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors) Finished loading submonitors
>
> MainThread::WARNING::2021-03-03 09:19:12,836::storage_broker::97::ovirt_hosted_engine_ha.broker.storage_broker.StorageBroker::(__init__) Can't connect vdsm storage: 'metadata_image_UUID can't be 'None'
>
> MainThread::INFO::2021-03-03 09:19:13,574::broker::47::ovirt_hosted_engine_ha.broker.broker.Broker::(run) ovirt-hosted-engine-ha broker 2.3.6 started
>
> MainThread::INFO::2021-03-03 09:19:13,575::monitor::40::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors) Searching for submonitors in /usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/broker/submonitors
>
> MainThread::INFO::2021-03-03 09:19:13,575::monitor::49::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors) Loaded submonitor cpu-load
>
> MainThread::INFO::2021-03-03 09:19:13,577::monitor::49::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors) Loaded submonitor cpu-load-no-engine
>
> MainThread::INFO::2021-03-03 09:19:13,578::monitor::49::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors) Loaded submonitor engine-health
>
>
>
>
>
>
>
> -----
>
> kind regards/met vriendelijke groeten
>
>
>
> Marko Vrgotic
> Sr. System Engineer @ System Administration
>
>
> ActiveVideo
>
> o: +31 (35) 6774131
>
> m: +31 (65) 5734174
>
> e: m.vrgotic@activevideo.com
> w: https://nam10.safelinks.protection.outlook.com/?url=http%3A%2F%2Fwww.activevideo.com%2F&amp;data=04%7C01%7Cm.vrgotic%40activevideo.com%7C51962d8f93164ed4898508d8ea1748dd%7C214268a3e1214486acd4545c9faf2252%7C0%7C0%7C637516733771153152%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&amp;sdata=4pHwwAvBA%2Fm15T%2FaB8TiiCYqcNCbu6ccFsEqV5NGWgg%3D&amp;reserved=0
>
>
>
> ActiveVideo Networks BV. Mediacentrum 3745 Joop van den Endeplein 1.1217 WJ Hilversum, The Netherlands. The information contained in this message may be legally privileged and confidential. It is intended to be read only by the individual or entity to whom it is addressed or by their designee. If the reader of this message is not the intended recipient, you are on notice that any distribution of this message, in any form, is strictly prohibited.  If you have received this message in error, please immediately notify the sender and/or ActiveVideo Networks, LLC by telephone at +1 408.931.9200 and delete or destroy any copy of this message.
>
>
>
>
>
>
>
> From: Marko Vrgotic <M.Vrgotic@activevideo.com>
> Date: Monday, 8 March 2021 at 15:34
> To: Yedidyah Bar David <didi@redhat.com>
> Cc: users@ovirt.org <users@ovirt.org>
> Subject: Re: [ovirt-users] Re: Upgrade from 4.3.5 to 4.3.10 HE Host issue
>
> Hi Didi,
>
>
>
> Please find the attached logs from Host and Engine.
>
>
>
> Host ovirt-sj-02 HE Undeploy 2021-03-08 14:15:52  till  2021-03-08 14:18:24
>
>
>
>
>
> Host ovirt-sj-02 HE Deploy 2021-03-08 14:20:51 till 2021-03-08 14:23:22
>
>
>
> I do see errors in the agent and broker and vdsm, but I do not see why it happened.
>
>
>
> Thank you for helping, let me know if any additional files are needed.
>
>
>
>
>
> -----
>
> kind regards/met vriendelijke groeten
>
>
>
> Marko Vrgotic
> Sr. System Engineer @ System Administration
>
>
> ActiveVideo
>
> o: +31 (35) 6774131
>
> m: +31 (65) 5734174
>
> e: m.vrgotic@activevideo.com
> w: https://nam10.safelinks.protection.outlook.com/?url=http%3A%2F%2Fwww.activevideo.com%2F&amp;data=04%7C01%7Cm.vrgotic%40activevideo.com%7C51962d8f93164ed4898508d8ea1748dd%7C214268a3e1214486acd4545c9faf2252%7C0%7C0%7C637516733771153152%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&amp;sdata=4pHwwAvBA%2Fm15T%2FaB8TiiCYqcNCbu6ccFsEqV5NGWgg%3D&amp;reserved=0
>
>
>
> ActiveVideo Networks BV. Mediacentrum 3745 Joop van den Endeplein 1.1217 WJ Hilversum, The Netherlands. The information contained in this message may be legally privileged and confidential. It is intended to be read only by the individual or entity to whom it is addressed or by their designee. If the reader of this message is not the intended recipient, you are on notice that any distribution of this message, in any form, is strictly prohibited.  If you have received this message in error, please immediately notify the sender and/or ActiveVideo Networks, LLC by telephone at +1 408.931.9200 and delete or destroy any copy of this message.
>
>
>
>
>
>
>
> From: Yedidyah Bar David <didi@redhat.com>
> Date: Monday, 8 March 2021 at 09:25
> To: Marko Vrgotic <M.Vrgotic@activevideo.com>
> Cc: users@ovirt.org <users@ovirt.org>
> Subject: Re: [ovirt-users] Re: Upgrade from 4.3.5 to 4.3.10 HE Host issue
>
> ***CAUTION: This email originated from outside of the organization. Do not click links or open attachments unless you recognize the sender!!!***
>
> Hi,
>
> On Mon, Mar 8, 2021 at 10:13 AM Marko Vrgotic <M.Vrgotic@activevideo.com> wrote:
> >
> > I cannot find the reason why the re-Deployment on this Hosts fails, as it was already deployed on it before.
> >
> > No errors, found int the deployment, but it seems half done, based on messages I sent in previous email.
>
> Please check/share all relevant logs. Thanks. Can be all of /var/log
> from engine and hosts, and at least:
>
> /var/log/ovirt-engine/engine.log
>
> /var/log/vdsm/*
>
> /var/log/ovirt-hosted-engine-ha/*
>
> Best regards,
> --
> Didi



--
Didi