Hi Strahil and Yedidyah,

 

First of all I want to thank you for helping me.

 

So, this is what I did:

[root@ovirt-sj-03 ~]# hosted-engine --vm-status | grep -B2 "Host ID"

Status up-to-date                  : True

Hostname                           : ovirt-sj-01.ictv.com

Host ID                            : 1

--

Status up-to-date                  : True

Hostname                           : ovirt-sj-03.ictv.com

Host ID                            : 3

 

From the start of deployment, I could see following ERROR messages on Host2:

 

VDSM:

2021-05-03 07:58:21,205+0000 ERROR (periodic/3) [root] failed to retrieve Hosted Engine HA score '[Errno 2] No such file or directory'Is the Hosted Engine setup finished? (api:196)

2021-05-03 07:58:36,257+0000 ERROR (periodic/3) [root] failed to retrieve Hosted Engine HA score '[Errno 2] No such file or directory'Is the Hosted Engine setup finished? (api:196)

2021-05-03 07:58:46,093+0000 ERROR (jsonrpc/7) [root] failed to retrieve Hosted Engine HA score '[Errno 2] No such file or directory'Is the Hosted Engine setup finished? (api:196)

 

OVIRT-HA-AGENT|BROKER:

MainThread::ERROR::2021-05-03 08:12:57,690::agent::145::ovirt_hosted_engine_ha.agent.agent.Agent::(_run_agent) Trying to restart agent

MainThread::INFO::2021-05-03 08:12:57,691::agent::89::ovirt_hosted_engine_ha.agent.agent.Agent::(run) Agent shutting down

MainThread::INFO::2021-05-03 08:13:08,005::agent::67::ovirt_hosted_engine_ha.agent.agent.Agent::(run) ovirt-hosted-engine-ha agent 2.3.6 started

MainThread::INFO::2021-05-03 08:13:08,061::hosted_engine::234::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(_get_hostname) Found certificate common name: ovirt-sj-02.ictv.com

MainThread::INFO::2021-05-03 08:13:08,187::hosted_engine::543::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(_initialize_broker) Initializing ha-broker connection

MainThread::INFO::2021-05-03 08:13:08,189::brokerlink::80::ovirt_hosted_engine_ha.lib.brokerlink.BrokerLink::(start_monitor) Starting monitor network, options {'tcp_t_address': None, 'network_test': None, 'tcp_t_port': None, 'addr': None}

MainThread::ERROR::2021-05-03 08:13:08,190::hosted_engine::559::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(_initialize_broker) Failed to start necessary monitors

MainThread::ERROR::2021-05-03 08:13:08,190::agent::144::ovirt_hosted_engine_ha.agent.agent.Agent::(_run_agent) Traceback (most recent call last):

  File "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/agent/agent.py", line 131, in _run_agent

    return action(he)

  File "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/agent/agent.py", line 55, in action_proper

    return he.start_monitoring()

  File "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/agent/hosted_engine.py", line 432, in start_monitoring

    self._initialize_broker()

  File "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/agent/hosted_engine.py", line 556, in _initialize_broker

    m.get('options', {}))

  File "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/lib/brokerlink.py", line 89, in start_monitor

    ).format(t=type, o=options, e=e)

RequestError: brokerlink - failed to start monitor via ovirt-ha-broker: [Errno 2] No such file or directory, [monitor: 'network', options: {'tcp_t_address': None, 'network_test': None, 'tcp_t_port': None, 'addr': None}]

 

MainThread::ERROR::2021-05-03 08:13:08,191::agent::145::ovirt_hosted_engine_ha.agent.agent.Agent::(_run_agent) Trying to restart agent

MainThread::INFO::2021-05-03 08:13:08,191::agent::89::ovirt_hosted_engine_ha.agent.agent.Agent::(run) Agent shutting down

 

And the Hosted-Engine.conf file looks like this:

[root@ovirt-sj-02 ~]# cat /etc/ovirt-hosted-engine/hosted-engine.conf

ca_cert=/etc/pki/vdsm/libvirt-spice/ca-cert.pem

host_id=2

 

Ovirt hosted_engine network path is mounted.

 

What could be the reason hosted-engine.conf is missing all other entries?

 

Do I copy the hosted-engine.conf from host1 or host3, making sure host_id is unique (host_id=2) on Host2, and try to re-deploy?

How does the hosted-engine.conf gets loaded or from where? Am I missing a mount point? Is the deployment failing because hosted-engine.conf is empty or it’s the other way around?

 

 

 

-----

kind regards/met vriendelijke groeten

 

Marko Vrgotic
Sr. System Engineer @ System Administration


ActiveVideo

o: +31 (35) 6774131

m: +31 (65) 5734174

e: m.vrgotic@activevideo.com
w: www.activevideo.com

 

ActiveVideo Networks BV. Mediacentrum 3745 Joop van den Endeplein 1.1217 WJ Hilversum, The Netherlands. The information contained in this message may be legally privileged and confidential. It is intended to be read only by the individual or entity to whom it is addressed or by their designee. If the reader of this message is not the intended recipient, you are on notice that any distribution of this message, in any form, is strictly prohibited.  If you have received this message in error, please immediately notify the sender and/or ActiveVideo Networks, LLC by telephone at +1 408.931.9200 and delete or destroy any copy of this message.

 

 

 

From: Yedidyah Bar David <didi@redhat.com>
Date: Sunday, 2 May 2021 at 08:44
To: Strahil Nikolov <hunter86_bg@yahoo.com>
Cc: Marko Vrgotic <M.Vrgotic@activevideo.com>, users@ovirt.org <users@ovirt.org>
Subject: Re: [ovirt-users] Re: Unable to migrate Engine to another HE Host

***CAUTION: This email originated from outside of the organization. Do not click links or open attachments unless you recognize the sender!!!***

On Sat, May 1, 2021 at 6:46 PM Strahil Nikolov via Users
<users@ovirt.org> wrote:
>
> As we have tested off the list, it seems that the symbolic link in /var/lib/vdsm that ovirt-ha-agent/broker create was missing.
> Yet migration succeeds, but the donor host looses score as the vm 'died unexpectedly'.
>
> Try to cleanup the host2's metadata and try to provision it , so you can proceed with the fix of host1 & host3.
>
>
> I have no clue if engine-cleanup will affect the shared syorage, but it's possible - so use as last resort.

'engine-cleanup' is a utility you run on the engine machine, which
does basically the opposite of 'engine-setup', and in any case cleans
up the _engine_, not any particular host. If you run it, your engine
is gone, likely forever (unless you kept backups).

What you might were looking for is 'ovirt-hosted-engine-cleanup'. This
one cleans up a host from a hosted-engine deployment. It's generally
intended to be used after a failed deployment attempt. I think it can
work well also in your case, but would first try to fix using other
means.

>
> If you fail to add host2 , you can always reinstall it as host4 and try to add it fresh.
>
> Best Regards,
> Strahil Nikolov
>
> On Fri, Apr 30, 2021 at 16:27, Marko Vrgotic
> <M.Vrgotic@activevideo.com> wrote:
>
> Dear oVirt,
>
>
>
> I have already reached out twice regarding the issues that occurred, due to power outage, but noticed only when upgrading engine to latest 4.3. version.
>
>
>
> I am unable to redeploy engine on Host2, the hosted-engine file stays empty and VDSM on Hosts1 and 3 is reporting, even though I cleared the metadata for the Host2, on Host 1 and Host3:
>
>
>
> 2021-04-30 05:57:58,454-0700 ERROR (jsonrpc/7) [ovirt_hosted_engine_ha.client.client.HAClient] Malformed metadata for host 2: received 0 of 512 expected bytes (client:137)
>
>
>
> Today I tried to migrate HE from Host 3 to Host 1 and it fails each time with following message:
>
>
>
> On Engine:
>
> 2021-04-30 12:57:56,961Z ERROR [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (EE-ManagedThreadFactory-engine-Thread-1233892) [] EVENT_ID: VM_MIGRATION_TO_SERVER_FAILED(120), Migration failed  (VM: HostedEngine, Source: ovirt-sj-03.ictv.com, Destination: ovirt-sj-01.ictv.com).
>
>
>
> On source Host:
>
> 2021-04-30 05:57:56,705-0700 ERROR (migsrc/66b6d489) [virt.vm] (vmId='66b6d489-ceb8-486a-951a-355e21f13627') Failed to migrate (migration:450)
>
> Traceback (most recent call last):
>
>   File "/usr/lib/python2.7/site-packages/vdsm/virt/migration.py", line 431, in _regular_run
>
>     time.time(), migrationParams, machineParams
>
>   File "/usr/lib/python2.7/site-packages/vdsm/virt/migration.py", line 505, in _startUnderlyingMigration
>
>     self._perform_with_conv_schedule(duri, muri)
>
>   File "/usr/lib/python2.7/site-packages/vdsm/virt/migration.py", line 591, in _perform_with_conv_schedule
>
>     self._perform_migration(duri, muri)
>
>   File "/usr/lib/python2.7/site-packages/vdsm/virt/migration.py", line 525, in _perform_migration
>
>     self._migration_flags)
>
>   File "/usr/lib/python2.7/site-packages/vdsm/virt/virdomain.py", line 100, in f
>
>     ret = attr(*args, **kwargs)
>
>   File "/usr/lib/python2.7/site-packages/vdsm/common/libvirtconnection.py", line 131, in wrapper
>
>     ret = f(*args, **kwargs)
>
>   File "/usr/lib/python2.7/site-packages/vdsm/common/function.py", line 94, in wrapper
>
>     return func(inst, *args, **kwargs)
>
>   File "/usr/lib64/python2.7/site-packages/libvirt.py", line 1781, in migrateToURI3
>
>     if ret == -1: raise libvirtError ('virDomainMigrateToURI3() failed', dom=self)
>
> libvirtError: operation aborted: migration out job: canceled by client
>
>
>
>
>
> I know that this version is end of life – but I would very much appreciate if someone could help me asses if this means corruption in DB or the overall damage, simply to know how to plan further actions.
>
> My impression was that I still had to functional HE Hosts in the pool, but after seeing migration failure, it’s pretty much down to single host.
>
>
>
> This is production system, so I cannot just move on to upgrading/deploying to 4.4.
>
>
>
> Additionally – :
>
> Is the effect of the engine-cleanup on HE Host local or it affects all HE Hosts? Could that help bringing the Host back to state so that HE can be re-deployed?
> What is the effect or reinitialize-lockspace?
>
>
>
> Kindly awaiting your reply. Happy to provide any additional information needed.
>
>
>
>
>
>
>
> -----
>
> kind regards/met vriendelijke groeten
>
>
>
> Marko Vrgotic
> Sr. System Engineer @ System Administration
>
>
> ActiveVideo
>
> o: +31 (35) 6774131
>
> m: +31 (65) 5734174
>
> e: m.vrgotic@activevideo.com
> w: www.activevideo.com
>
>
>
> ActiveVideo Networks BV. Mediacentrum 3745 Joop van den Endeplein 1.1217 WJ Hilversum, The Netherlands. The information contained in this message may be legally privileged and confidential. It is intended to be read only by the individual or entity to whom it is addressed or by their designee. If the reader of this message is not the intended recipient, you are on notice that any distribution of this message, in any form, is strictly prohibited.  If you have received this message in error, please immediately notify the sender and/or ActiveVideo Networks, LLC by telephone at +1 408.931.9200 and delete or destroy any copy of this message.
>
>
>
>
>
> _______________________________________________
> Users mailing list -- users@ovirt.org
> To unsubscribe send an email to users-leave@ovirt.org
> Privacy Statement: https://nam10.safelinks.protection.outlook.com/?url=https%3A%2F%2Fwww.ovirt.org%2Fprivacy-policy.html&amp;data=04%7C01%7Cm.vrgotic%40activevideo.com%7Ce7cd37f8d15b41f772b708d90d35ab86%7C214268a3e1214486acd4545c9faf2252%7C0%7C0%7C637555346434029695%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&amp;sdata=eVBVHZfEwaD%2FQ8%2Bxx3RZ3ISL7uYy5aGfYKGlo4bFS1M%3D&amp;reserved=0
> oVirt Code of Conduct: https://nam10.safelinks.protection.outlook.com/?url=https%3A%2F%2Fwww.ovirt.org%2Fcommunity%2Fabout%2Fcommunity-guidelines%2F&amp;data=04%7C01%7Cm.vrgotic%40activevideo.com%7Ce7cd37f8d15b41f772b708d90d35ab86%7C214268a3e1214486acd4545c9faf2252%7C0%7C0%7C637555346434029695%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&amp;sdata=zexrm4x0mYmVULFIWOmyyTF4%2FXWBbpqJTNCpcd5Cqq8%3D&amp;reserved=0
> List Archives: https://nam10.safelinks.protection.outlook.com/?url=https%3A%2F%2Flists.ovirt.org%2Farchives%2Flist%2Fusers%40ovirt.org%2Fmessage%2FFRTRNTSGPLQOS72JDPSE766WWROFGTBA%2F&amp;data=04%7C01%7Cm.vrgotic%40activevideo.com%7Ce7cd37f8d15b41f772b708d90d35ab86%7C214268a3e1214486acd4545c9faf2252%7C0%7C0%7C637555346434029695%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&amp;sdata=bQLBv4pOIuPH%2BKjz7K%2ByZ6A0hH0sTBkJAVq4jqLbV3s%3D&amp;reserved=0
>
> _______________________________________________
> Users mailing list -- users@ovirt.org
> To unsubscribe send an email to users-leave@ovirt.org
> Privacy Statement: https://nam10.safelinks.protection.outlook.com/?url=https%3A%2F%2Fwww.ovirt.org%2Fprivacy-policy.html&amp;data=04%7C01%7Cm.vrgotic%40activevideo.com%7Ce7cd37f8d15b41f772b708d90d35ab86%7C214268a3e1214486acd4545c9faf2252%7C0%7C0%7C637555346434029695%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&amp;sdata=eVBVHZfEwaD%2FQ8%2Bxx3RZ3ISL7uYy5aGfYKGlo4bFS1M%3D&amp;reserved=0
> oVirt Code of Conduct: https://nam10.safelinks.protection.outlook.com/?url=https%3A%2F%2Fwww.ovirt.org%2Fcommunity%2Fabout%2Fcommunity-guidelines%2F&amp;data=04%7C01%7Cm.vrgotic%40activevideo.com%7Ce7cd37f8d15b41f772b708d90d35ab86%7C214268a3e1214486acd4545c9faf2252%7C0%7C0%7C637555346434029695%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&amp;sdata=zexrm4x0mYmVULFIWOmyyTF4%2FXWBbpqJTNCpcd5Cqq8%3D&amp;reserved=0
> List Archives: https://nam10.safelinks.protection.outlook.com/?url=https%3A%2F%2Flists.ovirt.org%2Farchives%2Flist%2Fusers%40ovirt.org%2Fmessage%2FX4BO5TG4ZKL5VQUK4QREOUMD7SZOPHJH%2F&amp;data=04%7C01%7Cm.vrgotic%40activevideo.com%7Ce7cd37f8d15b41f772b708d90d35ab86%7C214268a3e1214486acd4545c9faf2252%7C0%7C0%7C637555346434039650%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&amp;sdata=ltCmu8cmiqwFsiPWQpxRmoHWjqtNllvNzl901kTYpEg%3D&amp;reserved=0



--
Didi