oVirt 4.4.7 development
by Michal Skrivanek
Hi all,
we are starting to work on oVirt 4.4.7 and as you probably noticed we are slowly transitioning away from CentOS. We added CentOS Stream option for regular hosts, and the current 4.4.6 Node and Appliance are already built from Stream. We also have some initial patches for alternative RHEL clones other than CentOS, however we can’t really support them until they provide the required dependencies.
We are not ditching CentOS just yet, but we can no longer accommodate the long delay it takes CentOS team to produce a next minor version, hence for oVirt 4.4.7 we will require the operating system to be at the RHEL 8.4 level - that’s a CentOS 8.4 if that gets released in time
or the actual RHEL 8.4 or any 8.4 clone like Alma, Rocky, OL with the right dependencies (probably the ones from Stream)
or CentOS Stream.
In the meantime we're going to be moving the existing CI resources to test with CentOS Stream to unblock development of new features.
If you would want to help with testing other OSes patches are welcome to add support for them
Thanks,
michal
3 years, 7 months
Re: Unable to migrate Engine to another HE Host
by Marko Vrgotic
Ok –
These are the symlinks in place:
1. /var/run/vdsm/storage/<hosted_storage_id>/<path_to_conf_image>
2. /var/run/vdsm/storage/<hosted_storage_id>/<path_to_metadata_image>
3. /var/run/vdsm/storage/<hosted_storage_id>/<path_to_lockspace_image>
4. /var/run/vdsm/storage/<hosted_storage_id>/<path_to_vmdisk_id_image> - this one exist only on the current Host
5. /var/run/vdsm/storage/<hosted_storage_id>/<one more_link_I_have_no_clue what_it_is_for>
Migration issue – seem to be solved at part 3:
Part1:
* Try to migrate HE from Host1 to Host3 – migration fails
* Analysis showed that Host3 was missing the symlink num4 – which should be created during migration
* I created symlink num4 and that migration works but HE on donor host has status “down_unexpected” instead of “down”
Part2:
* Next thing I noticed was that symlink num1 was missing on both Hosts
* Added symlink num1 to both Hosts and reran the migration
* Migration done successfully from Host1 to Host3 and HE on donor host had status “down”
Part3:
* Migrating back from Host3 to Host1
* Noticed that at beginning of migration the symlink num4 gets created on destination Host (Host1 in this case)
* This time the symlink num4 on destination was not deleted and migration went through
Part4:
* Repeated migration from Host1 to Host3
* All went as expected.
New HE Host deploy issue:
* Current conclusion is that the new HE Host deploy was failing as the symlink num1 was not existing on running Hosts ( 1 and 3) so it was reported as missing/non-existent
* Since the link is now added and it persists – I will try to deploy new HA Host
I will keep you posted.
-----
kind regards/met vriendelijke groeten
Marko Vrgotic
Sr. System Engineer @ System Administration
ActiveVideo
o: +31 (35) 6774131
m: +31 (65) 5734174
e: m.vrgotic(a)activevideo.com<mailto:m.vrgotic@activevideo.com>
w: www.activevideo.com<http://www.activevideo.com>
ActiveVideo Networks BV. Mediacentrum 3745 Joop van den Endeplein 1.1217 WJ Hilversum, The Netherlands. The information contained in this message may be legally privileged and confidential. It is intended to be read only by the individual or entity to whom it is addressed or by their designee. If the reader of this message is not the intended recipient, you are on notice that any distribution of this message, in any form, is strictly prohibited. If you have received this message in error, please immediately notify the sender and/or ActiveVideo Networks, LLC by telephone at +1 408.931.9200 and delete or destroy any copy of this message.
From: Marko Vrgotic <M.Vrgotic(a)activevideo.com>
Date: Tuesday, 4 May 2021 at 13:47
To: Strahil Nikolov <hunter86_bg(a)yahoo.com>, Yedidyah Bar David <didi(a)redhat.com>
Cc: users(a)ovirt.org <users(a)ovirt.org>
Subject: Re: [ovirt-users] Re: Unable to migrate Engine to another HE Host
Hi Strahil,
Creating link towards conf_image on shared storage, did help in regards to HE migrating with donor not ending up with down_expected but simply down.
Regarding “Can you paste the contents of the ha daemons' config where the shared storage should be defined?”, are you referring to /etc/ovirt-hosted-engine-ha/agent.conf, var/lib/ovirt-hosted-engine-ha/broker.conf and ha.conf ?
If not, please give me specific files, so I can provide configs.
-----
kind regards/met vriendelijke groeten
Marko Vrgotic
Sr. System Engineer @ System Administration
ActiveVideo
o: +31 (35) 6774131
m: +31 (65) 5734174
e: m.vrgotic(a)activevideo.com<mailto:m.vrgotic@activevideo.com>
w: www.activevideo.com<http://www.activevideo.com>
ActiveVideo Networks BV. Mediacentrum 3745 Joop van den Endeplein 1.1217 WJ Hilversum, The Netherlands. The information contained in this message may be legally privileged and confidential. It is intended to be read only by the individual or entity to whom it is addressed or by their designee. If the reader of this message is not the intended recipient, you are on notice that any distribution of this message, in any form, is strictly prohibited. If you have received this message in error, please immediately notify the sender and/or ActiveVideo Networks, LLC by telephone at +1 408.931.9200 and delete or destroy any copy of this message.
From: Strahil Nikolov <hunter86_bg(a)yahoo.com>
Date: Tuesday, 4 May 2021 at 09:21
To: Marko Vrgotic <M.Vrgotic(a)activevideo.com>, Yedidyah Bar David <didi(a)redhat.com>
Cc: users(a)ovirt.org <users(a)ovirt.org>
Subject: Re: [ovirt-users] Re: Unable to migrate Engine to another HE Host
***CAUTION: This email originated from outside of the organization. Do not click links or open attachments unless you recognize the sender!!!***
It's quite interesting why the link is not created when it should ...
Can you paste the contents of the ha daemons' config where the shared storage should be defined?
You can create the link if needed, but it won't solve the problem.
Best Regards,
Strahil Nikolov
3 years, 7 months
Re: Unable to migrate Engine to another HE Host
by Strahil Nikolov
It's quite interesting why the link is not created when it should ...Can you paste the contents of the ha daemons' config where the shared storage should be defined?
You can create the link if needed, but it won't solve the problem.
Best Regards,Strahil Nikolov
3 years, 7 months
OLVM: error deleting snapshot live
by Renevey Christian
Hi,
With last OLVM update installed, we are unable to delete snapshot without stopping the vm (live merge error)
Vdsm.log:
2021-05-04 09:44:28,389+0200 ERROR (jsonrpc/6) [virt.vm] (vmId='dba27e34-f8f3-4d9b-a6fd-95ba92c2970e') Live merge failed (job: 36f515b5-fb9e-4e29-b5f8-5b7c062cef5d) (vm:5957)
Traceback (most recent call last):
File "/usr/lib/python2.7/site-packages/vdsm/virt/vm.py", line 5955, in merge
bandwidth, flags)
File "/usr/lib/python2.7/site-packages/vdsm/virt/virdomain.py", line 100, in f
ret = attr(*args, **kwargs)
File "/usr/lib/python2.7/site-packages/vdsm/common/libvirtconnection.py", line 131, in wrapper
ret = f(*args, **kwargs)
File "/usr/lib/python2.7/site-packages/vdsm/common/function.py", line 94, in wrapper
return func(inst, *args, **kwargs)
File "/usr/lib64/python2.7/site-packages/libvirt.py", line 719, in blockCommit
if ret == -1: raise libvirtError ('virDomainBlockCommit() failed', dom=self)
libvirtError: internal error: child reported (status=125): Requested operation is not valid: Setting different SELinux label on /rhev/data-center/mnt/blockSD/47c2336e-1936-4b28-b759-efcfe8a19b58/images/269f76c4-2a63-4cdf-a2da-0f56f4acc464/c317c53b-ad79-421f-9263-f1f19c81a9a7 which is already in use
2021-05-04 09:44:28,412+0200 INFO (jsonrpc/6) [api.virt] FINISH merge return={'status': {'message': 'Merge failed', 'code': 52}} from=::ffff:192.168.1.110,38834, flow_id=390f0b25-f364-4adc-b7b3-b732ee0e0213, vmId=dba27e34-f8f3-4d9b-a6fd-95ba92c2970e (api:54)
2021-05-04 09:44:28,413+0200 INFO (jsonrpc/6) [jsonrpc.JsonRpcServer] RPC call VM.merge failed (error 52) in 0.58 seconds (__init__:312)
2021-05-04 09:44:29,124+0200 INFO (jsonrpc/3) [api.host] START getAllVmStats() from=::ffff:192.168.1.110,38834 (api:48)
Any idea whats wrong ?
Rgds
3 years, 7 months
How to use Ovirt node 4.4.5.0 with local storage
by danilosevilla@gmail.com
Hi, I'm totally new with Ovirt.
I have a production server in which I planned to install Ovirt.
I need to use their local storage for Virtualization, but it seems that only have chance to storage in NAS or other network or fiber storage.
Can I use the local storage for virtualization?. If yes, how can I do that.
Thanks.
3 years, 7 months
error id 980
by ozmen62@hotmail.com
Hi,
First of all, thanks for this great list and the people who try to help eachother.
This time my problem is about engine OVF file or engine storage setting.
How can i came with this idea, because every 2-3 hours engine try to migrate other host(From host1 to host2)
"Invalid status on Data Center B300. Setting status to Non Responsive."
when i check hosts log , see these
ovirt-ha-agent ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine.config.vm ERROR Failed extracting VM OVF from the OVF_STORE volume, falling back to initial vm.conf
ovirt-ha-agent ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine ERROR Engine VM stopped on localhost
ovirt-ha-agent ovirt_hosted_engine_ha.lib.ovf.ovf_store.OVFStore ERROR Unable to extract HEVM OVF
i believe there is some kind of fc storage path or config problem. My storage setting is good. there is no error about them
Is there anyone who experienced about that to solve it
thanks
3 years, 7 months
Re: Unable to migrate Engine to another HE Host
by Marko Vrgotic
Investigation so far has showed, that symlink for conf_volume/conf_image is missing on Hosts 1 and 3.
According to the ovirt-engine deep dive, connection/access to conf_volume is required for adding new HE Hosts as it holds its configuration template. This is most likely the reason why I could not add new Host and why the Host1 and Host3 were occasionally complaining the the conf_image_uuid is missing.
I also checked the DB tb images – the conf_image and conf_volume entries exist.
Question:
Considering DB entries seem correct, am I “safe” to manually create the symlink on Host1 and Host3 , from global maintenance mode, set correct permissions and than exit maintenance mode?
-----
kind regards/met vriendelijke groeten
Marko Vrgotic
Sr. System Engineer @ System Administration
ActiveVideo
o: +31 (35) 6774131
m: +31 (65) 5734174
e: m.vrgotic(a)activevideo.com<mailto:m.vrgotic@activevideo.com>
w: www.activevideo.com<http://www.activevideo.com>
ActiveVideo Networks BV. Mediacentrum 3745 Joop van den Endeplein 1.1217 WJ Hilversum, The Netherlands. The information contained in this message may be legally privileged and confidential. It is intended to be read only by the individual or entity to whom it is addressed or by their designee. If the reader of this message is not the intended recipient, you are on notice that any distribution of this message, in any form, is strictly prohibited. If you have received this message in error, please immediately notify the sender and/or ActiveVideo Networks, LLC by telephone at +1 408.931.9200 and delete or destroy any copy of this message.
From: Marko Vrgotic <M.Vrgotic(a)activevideo.com>
Date: Monday, 3 May 2021 at 12:44
To: Yedidyah Bar David <didi(a)redhat.com>, Strahil Nikolov <hunter86_bg(a)yahoo.com>
Cc: users(a)ovirt.org <users(a)ovirt.org>
Subject: Re: [ovirt-users] Re: Unable to migrate Engine to another HE Host
Dear Strahil and Yedidyah,
Is it possible that HostedEngine DEPLOY on Host2 is failing due to “inability” to access the conf_volume_UUID “c518f937-60fe-4fed-a54c-db11328bb507”
I do not see that error message on Host2, but I see it on Host1 and Host3:
2021-05-03 02:43:32,438-0700 ERROR (jsonrpc/5) [storage.TaskManager.Task] (Task='b4bba796-b62b-41cb-8d91-6637c5fb9705') Unexpected error (task:875)
2021-05-03 02:43:32,439-0700 ERROR (jsonrpc/5) [storage.Dispatcher] FINISH prepareImage error=Volume does not exist: (u'c518f937-60fe-4fed-a54c-db11328bb507',) (dispatcher:83)
Could this be result of hosted-engine.conf staying empty and deployment not working?
If I check the /var/log/ovirt-hosted-engine-setup/ folder is empty.
-----
kind regards/met vriendelijke groeten
Marko Vrgotic
Sr. System Engineer @ System Administration
ActiveVideo
o: +31 (35) 6774131
m: +31 (65) 5734174
e: m.vrgotic(a)activevideo.com<mailto:m.vrgotic@activevideo.com>
w: www.activevideo.com<http://www.activevideo.com>
ActiveVideo Networks BV. Mediacentrum 3745 Joop van den Endeplein 1.1217 WJ Hilversum, The Netherlands. The information contained in this message may be legally privileged and confidential. It is intended to be read only by the individual or entity to whom it is addressed or by their designee. If the reader of this message is not the intended recipient, you are on notice that any distribution of this message, in any form, is strictly prohibited. If you have received this message in error, please immediately notify the sender and/or ActiveVideo Networks, LLC by telephone at +1 408.931.9200 and delete or destroy any copy of this message.
From: Marko Vrgotic <M.Vrgotic(a)activevideo.com>
Date: Monday, 3 May 2021 at 11:38
To: Yedidyah Bar David <didi(a)redhat.com>, Strahil Nikolov <hunter86_bg(a)yahoo.com>
Cc: users(a)ovirt.org <users(a)ovirt.org>
Subject: Re: [ovirt-users] Re: Unable to migrate Engine to another HE Host
This is ERROR i caught on Host2 during HE Deploy:
2021-05-03 02:36:07,982-0700 ERROR (periodic/0) [root] failed to retrieve Hosted Engine HA score '[Errno 2] No such file or directory'Is the Hosted Engine setup finished? (api:196) 2021-05-03 02:36:07,985-0700 INFO (MainThread) [vds] Received signal 15, shutting down (vdsmd:71) 2021-05-03 02:36:07,987-0700 ERROR (ioprocess/4934) [IOProcessClient] (054c43fc-1924-4106-9f80-0f2ac62b9886) Communication thread failed (__init__:160) Traceback (most recent call last): File "/usr/lib/python2.7/site-packages/ioprocess/__init__.py", line 114, in _communicate raise Exception("FD closed") Exception: FD closed 2021-05-03 02:36:07,992-0700 ERROR (ioprocess/4948) [IOProcessClient] (ec0dc13d-8721-42a1-a5c3-762a3e235f76) Communication thread failed (__init__:160) Traceback (most recent call last): File "/usr/lib/python2.7/site-packages/ioprocess/__init__.py", line 114, in _communicate raise Exception("FD closed") Exception: FD closed 2021-05-03 02:36:08,009-0700 INFO (MainThread) [jsonrpc.JsonRpcServer] Stopping JsonRPC Server (__init__:442)
-----
kind regards/met vriendelijke groeten
Marko Vrgotic
Sr. System Engineer @ System Administration
ActiveVideo
o: +31 (35) 6774131
m: +31 (65) 5734174
e: m.vrgotic(a)activevideo.com<mailto:m.vrgotic@activevideo.com>
w: www.activevideo.com<http://www.activevideo.com>
ActiveVideo Networks BV. Mediacentrum 3745 Joop van den Endeplein 1.1217 WJ Hilversum, The Netherlands. The information contained in this message may be legally privileged and confidential. It is intended to be read only by the individual or entity to whom it is addressed or by their designee. If the reader of this message is not the intended recipient, you are on notice that any distribution of this message, in any form, is strictly prohibited. If you have received this message in error, please immediately notify the sender and/or ActiveVideo Networks, LLC by telephone at +1 408.931.9200 and delete or destroy any copy of this message.
From: Marko Vrgotic <M.Vrgotic(a)activevideo.com>
Date: Monday, 3 May 2021 at 11:34
To: Yedidyah Bar David <didi(a)redhat.com>, Strahil Nikolov <hunter86_bg(a)yahoo.com>
Cc: users(a)ovirt.org <users(a)ovirt.org>
Subject: Re: [ovirt-users] Re: Unable to migrate Engine to another HE Host
I executed ovirt-hosted-engine-cleanup on Host2 and it seems as it went ok:
[root@ovirt-sj-02 log]# ovirt-hosted-engine-cleanup
This will de-configure the host to run ovirt-hosted-engine-setup from scratch.
Caution, this operation should be used with care.Are you sure you want to proceed? [y/n]
y
-=== Destroy hosted-engine VM ===-
You must run deploy first
error: failed to get domain 'HostedEngine'
error: Domain not found: no domain with matching name 'HostedEngine' -=== Stop HA services ===-
-=== Shutdown sanlock ===-
shutdown force 1 wait 0
shutdown done 0
-=== Disconnecting the hosted-engine storage domain ===-
You must run deploy first
-=== De-configure VDSM networks ===-
ovirtmgmt
ovirtmgmt
A previously configured management bridge has been found on the system, this will try to de-configure it. Under certain circumstances you can loose network connection.
Caution, this operation should be used with care.Are you sure you want to proceed? [y/n]
y
-=== Stop other services ===-
-=== De-configure external daemons ===-
Removing database file /var/lib/vdsm/storage/managedvolume.db
-=== Removing configuration files ===-
? /etc/init/libvirtd.conf already missing
- removing /etc/libvirt/nwfilter/vdsm-no-mac-spoofing.xml
? /etc/ovirt-hosted-engine/answers.conf already missing
- removing /etc/ovirt-hosted-engine/hosted-engine.conf
- removing /etc/vdsm/vdsm.conf
- removing /etc/pki/vdsm/certs/cacert.pem
- removing /etc/pki/vdsm/certs/vdsmcert.pem
- removing /etc/pki/vdsm/keys/vdsmkey.pem
- removing /etc/pki/vdsm/libvirt-spice/ca-cert.pem
- removing /etc/pki/vdsm/libvirt-spice/server-cert.pem
- removing /etc/pki/vdsm/libvirt-spice/server-key.pem
- removing /etc/pki/vdsm/libvirt-vnc/ca-cert.pem
- removing /etc/pki/vdsm/libvirt-vnc/server-cert.pem
- removing /etc/pki/vdsm/libvirt-vnc/server-key.pem
- removing /etc/pki/CA/cacert.pem
- removing /etc/pki/libvirt/clientcert.pem
- removing /etc/pki/libvirt/private/clientkey.pem
? /etc/pki/ovirt-vmconsole/*.pem already missing
- removing /var/cache/libvirt/qemu
? /var/run/ovirt-hosted-engine-ha/* already missing
-=== Removing IP Rules ===-
Reran the HE Deploy on Host2 and from logs I see exactly same situation as in my previous email.
This is AGENT|BROKER logs with DEBUG:
MainThread::INFO::2021-05-03 02:32:47,007::agent::67::ovirt_hosted_engine_ha.agent.agent.Agent::(run) ovirt-hosted-engine-ha agent 2.3.6 started
MainThread::DEBUG::2021-05-03 02:32:47,007::agent::72::ovirt_hosted_engine_ha.agent.agent.Agent::(run) Running agent
MainThread::DEBUG::2021-05-03 02:32:47,007::hosted_engine::220::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(_get_hostname) Executing: openssl x509 -in /etc/pki/vdsm/certs/vdsmcert.pem -noout -subject
MainThread::DEBUG::2021-05-03 02:32:47,064::hosted_engine::230::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(_get_hostname) Certificate subject: subject= /O=ictv.com/CN=ovirt-sj-02.ictv.com
MainThread::INFO::2021-05-03 02:32:47,065::hosted_engine::234::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(_get_hostname) Found certificate common name: ovirt-sj-02.ictv.com
MainThread::DEBUG::2021-05-03 02:32:47,066::hosted_engine::568::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(_initialize_vdsm) Initializing VDSM
MainThread::DEBUG::2021-05-03 02:32:47,066::hosted_engine::600::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(_check_service) Checking vdsmd status
MainThread::DEBUG::2021-05-03 02:32:47,130::hosted_engine::605::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(_check_service) vdsmd running
MainThread::DEBUG::2021-05-03 02:32:47,130::util::384::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(__log_debug) Creating a new json-rpc connection to VDSM
Client localhost:54321::DEBUG::2021-05-03 02:32:47,144::concurrent::258::root::(run) START thread <Thread(Client localhost:54321, started daemon 140520355350272)> (func=<bound method Reactor.process_requests of <yajsonrpc.betterAsyncore.Reactor object at 0x7fcd71fe9590>>, args=(), kwargs={})
Client localhost:54321::DEBUG::2021-05-03 02:32:47,151::stompclient::138::yajsonrpc.protocols.stomp.AsyncClient::(_process_connected) Stomp connection established
MainThread::DEBUG::2021-05-03 02:32:47,161::stompclient::294::jsonrpc.AsyncoreClient::(send) Sending response
MainThread::INFO::2021-05-03 02:32:47,192::hosted_engine::543::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(_initialize_broker) Initializing ha-broker connection
MainThread::INFO::2021-05-03 02:32:47,195::brokerlink::80::ovirt_hosted_engine_ha.lib.brokerlink.BrokerLink::(start_monitor) Starting monitor network, options {'tcp_t_address': None, 'network_test': None, 'tcp_t_port': None, 'addr': None}
MainThread::ERROR::2021-05-03 02:32:47,195::hosted_engine::559::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(_initialize_broker) Failed to start necessary monitors
MainThread::ERROR::2021-05-03 02:32:47,196::agent::144::ovirt_hosted_engine_ha.agent.agent.Agent::(_run_agent) Traceback (most recent call last):
File "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/agent/agent.py", line 131, in _run_agent
return action(he)
File "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/agent/agent.py", line 55, in action_proper
return he.start_monitoring()
File "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/agent/hosted_engine.py", line 432, in start_monitoring
self._initialize_broker()
File "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/agent/hosted_engine.py", line 556, in _initialize_broker
m.get('options', {}))
File "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/lib/brokerlink.py", line 89, in start_monitor
).format(t=type, o=options, e=e)
RequestError: brokerlink - failed to start monitor via ovirt-ha-broker: [Errno 2] No such file or directory, [monitor: 'network', options: {'tcp_t_address': None, 'network_test': None, 'tcp_t_port': None, 'addr': None}]
MainThread::ERROR::2021-05-03 02:32:47,196::agent::145::ovirt_hosted_engine_ha.agent.agent.Agent::(_run_agent) Trying to restart agent
MainThread::INFO::2021-05-03 02:32:47,196::agent::89::ovirt_hosted_engine_ha.agent.agent.Agent::(run) Agent shutting down
-----
kind regards/met vriendelijke groeten
Marko Vrgotic
Sr. System Engineer @ System Administration
ActiveVideo
o: +31 (35) 6774131
m: +31 (65) 5734174
e: m.vrgotic(a)activevideo.com<mailto:m.vrgotic@activevideo.com>
w: www.activevideo.com<http://www.activevideo.com>
ActiveVideo Networks BV. Mediacentrum 3745 Joop van den Endeplein 1.1217 WJ Hilversum, The Netherlands. The information contained in this message may be legally privileged and confidential. It is intended to be read only by the individual or entity to whom it is addressed or by their designee. If the reader of this message is not the intended recipient, you are on notice that any distribution of this message, in any form, is strictly prohibited. If you have received this message in error, please immediately notify the sender and/or ActiveVideo Networks, LLC by telephone at +1 408.931.9200 and delete or destroy any copy of this message.
From: Marko Vrgotic <M.Vrgotic(a)activevideo.com>
Date: Monday, 3 May 2021 at 10:19
To: Yedidyah Bar David <didi(a)redhat.com>, Strahil Nikolov <hunter86_bg(a)yahoo.com>
Cc: users(a)ovirt.org <users(a)ovirt.org>
Subject: Re: [ovirt-users] Re: Unable to migrate Engine to another HE Host
Hi Strahil and Yedidyah,
First of all I want to thank you for helping me.
So, this is what I did:
* Executed steps as in https://access.redhat.com/solutions/2212601 and https://access.redhat.com/solutions/3319891 on working Hosts (Host1 and Host3) – to clean up failed deployment
* Verified on Host1 and Host3 that metadata is cleaned by running :
[root@ovirt-sj-03 ~]# hosted-engine --vm-status | grep -B2 "Host ID"
Status up-to-date : True
Hostname : ovirt-sj-01.ictv.com
Host ID : 1
--
Status up-to-date : True
Hostname : ovirt-sj-03.ictv.com
Host ID : 3
* Reran hosted-engine deploy from UI on Host2
From the start of deployment, I could see following ERROR messages on Host2:
VDSM:
2021-05-03 07:58:21,205+0000 ERROR (periodic/3) [root] failed to retrieve Hosted Engine HA score '[Errno 2] No such file or directory'Is the Hosted Engine setup finished? (api:196)
2021-05-03 07:58:36,257+0000 ERROR (periodic/3) [root] failed to retrieve Hosted Engine HA score '[Errno 2] No such file or directory'Is the Hosted Engine setup finished? (api:196)
2021-05-03 07:58:46,093+0000 ERROR (jsonrpc/7) [root] failed to retrieve Hosted Engine HA score '[Errno 2] No such file or directory'Is the Hosted Engine setup finished? (api:196)
OVIRT-HA-AGENT|BROKER:
MainThread::ERROR::2021-05-03 08:12:57,690::agent::145::ovirt_hosted_engine_ha.agent.agent.Agent::(_run_agent) Trying to restart agent
MainThread::INFO::2021-05-03 08:12:57,691::agent::89::ovirt_hosted_engine_ha.agent.agent.Agent::(run) Agent shutting down
MainThread::INFO::2021-05-03 08:13:08,005::agent::67::ovirt_hosted_engine_ha.agent.agent.Agent::(run) ovirt-hosted-engine-ha agent 2.3.6 started
MainThread::INFO::2021-05-03 08:13:08,061::hosted_engine::234::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(_get_hostname) Found certificate common name: ovirt-sj-02.ictv.com
MainThread::INFO::2021-05-03 08:13:08,187::hosted_engine::543::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(_initialize_broker) Initializing ha-broker connection
MainThread::INFO::2021-05-03 08:13:08,189::brokerlink::80::ovirt_hosted_engine_ha.lib.brokerlink.BrokerLink::(start_monitor) Starting monitor network, options {'tcp_t_address': None, 'network_test': None, 'tcp_t_port': None, 'addr': None}
MainThread::ERROR::2021-05-03 08:13:08,190::hosted_engine::559::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(_initialize_broker) Failed to start necessary monitors
MainThread::ERROR::2021-05-03 08:13:08,190::agent::144::ovirt_hosted_engine_ha.agent.agent.Agent::(_run_agent) Traceback (most recent call last):
File "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/agent/agent.py", line 131, in _run_agent
return action(he)
File "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/agent/agent.py", line 55, in action_proper
return he.start_monitoring()
File "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/agent/hosted_engine.py", line 432, in start_monitoring
self._initialize_broker()
File "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/agent/hosted_engine.py", line 556, in _initialize_broker
m.get('options', {}))
File "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/lib/brokerlink.py", line 89, in start_monitor
).format(t=type, o=options, e=e)
RequestError: brokerlink - failed to start monitor via ovirt-ha-broker: [Errno 2] No such file or directory, [monitor: 'network', options: {'tcp_t_address': None, 'network_test': None, 'tcp_t_port': None, 'addr': None}]
MainThread::ERROR::2021-05-03 08:13:08,191::agent::145::ovirt_hosted_engine_ha.agent.agent.Agent::(_run_agent) Trying to restart agent
MainThread::INFO::2021-05-03 08:13:08,191::agent::89::ovirt_hosted_engine_ha.agent.agent.Agent::(run) Agent shutting down
And the Hosted-Engine.conf file looks like this:
[root@ovirt-sj-02 ~]# cat /etc/ovirt-hosted-engine/hosted-engine.conf
ca_cert=/etc/pki/vdsm/libvirt-spice/ca-cert.pem
host_id=2
Ovirt hosted_engine network path is mounted.
What could be the reason hosted-engine.conf is missing all other entries?
Do I copy the hosted-engine.conf from host1 or host3, making sure host_id is unique (host_id=2) on Host2, and try to re-deploy?
How does the hosted-engine.conf gets loaded or from where? Am I missing a mount point? Is the deployment failing because hosted-engine.conf is empty or it’s the other way around?
-----
kind regards/met vriendelijke groeten
Marko Vrgotic
Sr. System Engineer @ System Administration
ActiveVideo
o: +31 (35) 6774131
m: +31 (65) 5734174
e: m.vrgotic(a)activevideo.com<mailto:m.vrgotic@activevideo.com>
w: www.activevideo.com<http://www.activevideo.com>
ActiveVideo Networks BV. Mediacentrum 3745 Joop van den Endeplein 1.1217 WJ Hilversum, The Netherlands. The information contained in this message may be legally privileged and confidential. It is intended to be read only by the individual or entity to whom it is addressed or by their designee. If the reader of this message is not the intended recipient, you are on notice that any distribution of this message, in any form, is strictly prohibited. If you have received this message in error, please immediately notify the sender and/or ActiveVideo Networks, LLC by telephone at +1 408.931.9200 and delete or destroy any copy of this message.
From: Yedidyah Bar David <didi(a)redhat.com>
Date: Sunday, 2 May 2021 at 08:44
To: Strahil Nikolov <hunter86_bg(a)yahoo.com>
Cc: Marko Vrgotic <M.Vrgotic(a)activevideo.com>, users(a)ovirt.org <users(a)ovirt.org>
Subject: Re: [ovirt-users] Re: Unable to migrate Engine to another HE Host
***CAUTION: This email originated from outside of the organization. Do not click links or open attachments unless you recognize the sender!!!***
On Sat, May 1, 2021 at 6:46 PM Strahil Nikolov via Users
<users(a)ovirt.org> wrote:
>
> As we have tested off the list, it seems that the symbolic link in /var/lib/vdsm that ovirt-ha-agent/broker create was missing.
> Yet migration succeeds, but the donor host looses score as the vm 'died unexpectedly'.
>
> Try to cleanup the host2's metadata and try to provision it , so you can proceed with the fix of host1 & host3.
>
>
> I have no clue if engine-cleanup will affect the shared syorage, but it's possible - so use as last resort.
'engine-cleanup' is a utility you run on the engine machine, which
does basically the opposite of 'engine-setup', and in any case cleans
up the _engine_, not any particular host. If you run it, your engine
is gone, likely forever (unless you kept backups).
What you might were looking for is 'ovirt-hosted-engine-cleanup'. This
one cleans up a host from a hosted-engine deployment. It's generally
intended to be used after a failed deployment attempt. I think it can
work well also in your case, but would first try to fix using other
means.
>
> If you fail to add host2 , you can always reinstall it as host4 and try to add it fresh.
>
> Best Regards,
> Strahil Nikolov
>
> On Fri, Apr 30, 2021 at 16:27, Marko Vrgotic
> <M.Vrgotic(a)activevideo.com> wrote:
>
> Dear oVirt,
>
>
>
> I have already reached out twice regarding the issues that occurred, due to power outage, but noticed only when upgrading engine to latest 4.3. version.
>
>
>
> I am unable to redeploy engine on Host2, the hosted-engine file stays empty and VDSM on Hosts1 and 3 is reporting, even though I cleared the metadata for the Host2, on Host 1 and Host3:
>
>
>
> 2021-04-30 05:57:58,454-0700 ERROR (jsonrpc/7) [ovirt_hosted_engine_ha.client.client.HAClient] Malformed metadata for host 2: received 0 of 512 expected bytes (client:137)
>
>
>
> Today I tried to migrate HE from Host 3 to Host 1 and it fails each time with following message:
>
>
>
> On Engine:
>
> 2021-04-30 12:57:56,961Z ERROR [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (EE-ManagedThreadFactory-engine-Thread-1233892) [] EVENT_ID: VM_MIGRATION_TO_SERVER_FAILED(120), Migration failed (VM: HostedEngine, Source: ovirt-sj-03.ictv.com, Destination: ovirt-sj-01.ictv.com).
>
>
>
> On source Host:
>
> 2021-04-30 05:57:56,705-0700 ERROR (migsrc/66b6d489) [virt.vm] (vmId='66b6d489-ceb8-486a-951a-355e21f13627') Failed to migrate (migration:450)
>
> Traceback (most recent call last):
>
> File "/usr/lib/python2.7/site-packages/vdsm/virt/migration.py", line 431, in _regular_run
>
> time.time(), migrationParams, machineParams
>
> File "/usr/lib/python2.7/site-packages/vdsm/virt/migration.py", line 505, in _startUnderlyingMigration
>
> self._perform_with_conv_schedule(duri, muri)
>
> File "/usr/lib/python2.7/site-packages/vdsm/virt/migration.py", line 591, in _perform_with_conv_schedule
>
> self._perform_migration(duri, muri)
>
> File "/usr/lib/python2.7/site-packages/vdsm/virt/migration.py", line 525, in _perform_migration
>
> self._migration_flags)
>
> File "/usr/lib/python2.7/site-packages/vdsm/virt/virdomain.py", line 100, in f
>
> ret = attr(*args, **kwargs)
>
> File "/usr/lib/python2.7/site-packages/vdsm/common/libvirtconnection.py", line 131, in wrapper
>
> ret = f(*args, **kwargs)
>
> File "/usr/lib/python2.7/site-packages/vdsm/common/function.py", line 94, in wrapper
>
> return func(inst, *args, **kwargs)
>
> File "/usr/lib64/python2.7/site-packages/libvirt.py", line 1781, in migrateToURI3
>
> if ret == -1: raise libvirtError ('virDomainMigrateToURI3() failed', dom=self)
>
> libvirtError: operation aborted: migration out job: canceled by client
>
>
>
>
>
> I know that this version is end of life – but I would very much appreciate if someone could help me asses if this means corruption in DB or the overall damage, simply to know how to plan further actions.
>
> My impression was that I still had to functional HE Hosts in the pool, but after seeing migration failure, it’s pretty much down to single host.
>
>
>
> This is production system, so I cannot just move on to upgrading/deploying to 4.4.
>
>
>
> Additionally – :
>
> Is the effect of the engine-cleanup on HE Host local or it affects all HE Hosts? Could that help bringing the Host back to state so that HE can be re-deployed?
> What is the effect or reinitialize-lockspace?
>
>
>
> Kindly awaiting your reply. Happy to provide any additional information needed.
>
>
>
>
>
>
>
> -----
>
> kind regards/met vriendelijke groeten
>
>
>
> Marko Vrgotic
> Sr. System Engineer @ System Administration
>
>
> ActiveVideo
>
> o: +31 (35) 6774131
>
> m: +31 (65) 5734174
>
> e: m.vrgotic(a)activevideo.com
> w: www.activevideo.com<http://www.activevideo.com>
>
>
>
> ActiveVideo Networks BV. Mediacentrum 3745 Joop van den Endeplein 1.1217 WJ Hilversum, The Netherlands. The information contained in this message may be legally privileged and confidential. It is intended to be read only by the individual or entity to whom it is addressed or by their designee. If the reader of this message is not the intended recipient, you are on notice that any distribution of this message, in any form, is strictly prohibited. If you have received this message in error, please immediately notify the sender and/or ActiveVideo Networks, LLC by telephone at +1 408.931.9200 and delete or destroy any copy of this message.
>
>
>
>
>
> _______________________________________________
> Users mailing list -- users(a)ovirt.org
> To unsubscribe send an email to users-leave(a)ovirt.org
> Privacy Statement: https://nam10.safelinks.protection.outlook.com/?url=https%3A%2F%2Fwww.ovi...
> oVirt Code of Conduct: https://nam10.safelinks.protection.outlook.com/?url=https%3A%2F%2Fwww.ovi...
> List Archives: https://nam10.safelinks.protection.outlook.com/?url=https%3A%2F%2Flists.o...
>
> _______________________________________________
> Users mailing list -- users(a)ovirt.org
> To unsubscribe send an email to users-leave(a)ovirt.org
> Privacy Statement: https://nam10.safelinks.protection.outlook.com/?url=https%3A%2F%2Fwww.ovi...
> oVirt Code of Conduct: https://nam10.safelinks.protection.outlook.com/?url=https%3A%2F%2Fwww.ovi...
> List Archives: https://nam10.safelinks.protection.outlook.com/?url=https%3A%2F%2Flists.o...
--
Didi
3 years, 7 months
Re: Unable to migrate Engine to another HE Host
by Marko Vrgotic
Dear Strahil and Yedidyah,
Is it possible that HostedEngine DEPLOY on Host2 is failing due to “inability” to access the conf_volume_UUID “c518f937-60fe-4fed-a54c-db11328bb507”
I do not see that error message on Host2, but I see it on Host1 and Host3:
2021-05-03 02:43:32,438-0700 ERROR (jsonrpc/5) [storage.TaskManager.Task] (Task='b4bba796-b62b-41cb-8d91-6637c5fb9705') Unexpected error (task:875)
2021-05-03 02:43:32,439-0700 ERROR (jsonrpc/5) [storage.Dispatcher] FINISH prepareImage error=Volume does not exist: (u'c518f937-60fe-4fed-a54c-db11328bb507',) (dispatcher:83)
Could this be result of hosted-engine.conf staying empty and deployment not working?
If I check the /var/log/ovirt-hosted-engine-setup/ folder is empty.
-----
kind regards/met vriendelijke groeten
Marko Vrgotic
Sr. System Engineer @ System Administration
ActiveVideo
o: +31 (35) 6774131
m: +31 (65) 5734174
e: m.vrgotic(a)activevideo.com<mailto:m.vrgotic@activevideo.com>
w: www.activevideo.com<http://www.activevideo.com>
ActiveVideo Networks BV. Mediacentrum 3745 Joop van den Endeplein 1.1217 WJ Hilversum, The Netherlands. The information contained in this message may be legally privileged and confidential. It is intended to be read only by the individual or entity to whom it is addressed or by their designee. If the reader of this message is not the intended recipient, you are on notice that any distribution of this message, in any form, is strictly prohibited. If you have received this message in error, please immediately notify the sender and/or ActiveVideo Networks, LLC by telephone at +1 408.931.9200 and delete or destroy any copy of this message.
From: Marko Vrgotic <M.Vrgotic(a)activevideo.com>
Date: Monday, 3 May 2021 at 11:38
To: Yedidyah Bar David <didi(a)redhat.com>, Strahil Nikolov <hunter86_bg(a)yahoo.com>
Cc: users(a)ovirt.org <users(a)ovirt.org>
Subject: Re: [ovirt-users] Re: Unable to migrate Engine to another HE Host
This is ERROR i caught on Host2 during HE Deploy:
2021-05-03 02:36:07,982-0700 ERROR (periodic/0) [root] failed to retrieve Hosted Engine HA score '[Errno 2] No such file or directory'Is the Hosted Engine setup finished? (api:196) 2021-05-03 02:36:07,985-0700 INFO (MainThread) [vds] Received signal 15, shutting down (vdsmd:71) 2021-05-03 02:36:07,987-0700 ERROR (ioprocess/4934) [IOProcessClient] (054c43fc-1924-4106-9f80-0f2ac62b9886) Communication thread failed (__init__:160) Traceback (most recent call last): File "/usr/lib/python2.7/site-packages/ioprocess/__init__.py", line 114, in _communicate raise Exception("FD closed") Exception: FD closed 2021-05-03 02:36:07,992-0700 ERROR (ioprocess/4948) [IOProcessClient] (ec0dc13d-8721-42a1-a5c3-762a3e235f76) Communication thread failed (__init__:160) Traceback (most recent call last): File "/usr/lib/python2.7/site-packages/ioprocess/__init__.py", line 114, in _communicate raise Exception("FD closed") Exception: FD closed 2021-05-03 02:36:08,009-0700 INFO (MainThread) [jsonrpc.JsonRpcServer] Stopping JsonRPC Server (__init__:442)
-----
kind regards/met vriendelijke groeten
Marko Vrgotic
Sr. System Engineer @ System Administration
ActiveVideo
o: +31 (35) 6774131
m: +31 (65) 5734174
e: m.vrgotic(a)activevideo.com<mailto:m.vrgotic@activevideo.com>
w: www.activevideo.com<http://www.activevideo.com>
ActiveVideo Networks BV. Mediacentrum 3745 Joop van den Endeplein 1.1217 WJ Hilversum, The Netherlands. The information contained in this message may be legally privileged and confidential. It is intended to be read only by the individual or entity to whom it is addressed or by their designee. If the reader of this message is not the intended recipient, you are on notice that any distribution of this message, in any form, is strictly prohibited. If you have received this message in error, please immediately notify the sender and/or ActiveVideo Networks, LLC by telephone at +1 408.931.9200 and delete or destroy any copy of this message.
From: Marko Vrgotic <M.Vrgotic(a)activevideo.com>
Date: Monday, 3 May 2021 at 11:34
To: Yedidyah Bar David <didi(a)redhat.com>, Strahil Nikolov <hunter86_bg(a)yahoo.com>
Cc: users(a)ovirt.org <users(a)ovirt.org>
Subject: Re: [ovirt-users] Re: Unable to migrate Engine to another HE Host
I executed ovirt-hosted-engine-cleanup on Host2 and it seems as it went ok:
[root@ovirt-sj-02 log]# ovirt-hosted-engine-cleanup
This will de-configure the host to run ovirt-hosted-engine-setup from scratch.
Caution, this operation should be used with care.Are you sure you want to proceed? [y/n]
y
-=== Destroy hosted-engine VM ===-
You must run deploy first
error: failed to get domain 'HostedEngine'
error: Domain not found: no domain with matching name 'HostedEngine' -=== Stop HA services ===-
-=== Shutdown sanlock ===-
shutdown force 1 wait 0
shutdown done 0
-=== Disconnecting the hosted-engine storage domain ===-
You must run deploy first
-=== De-configure VDSM networks ===-
ovirtmgmt
ovirtmgmt
A previously configured management bridge has been found on the system, this will try to de-configure it. Under certain circumstances you can loose network connection.
Caution, this operation should be used with care.Are you sure you want to proceed? [y/n]
y
-=== Stop other services ===-
-=== De-configure external daemons ===-
Removing database file /var/lib/vdsm/storage/managedvolume.db
-=== Removing configuration files ===-
? /etc/init/libvirtd.conf already missing
- removing /etc/libvirt/nwfilter/vdsm-no-mac-spoofing.xml
? /etc/ovirt-hosted-engine/answers.conf already missing
- removing /etc/ovirt-hosted-engine/hosted-engine.conf
- removing /etc/vdsm/vdsm.conf
- removing /etc/pki/vdsm/certs/cacert.pem
- removing /etc/pki/vdsm/certs/vdsmcert.pem
- removing /etc/pki/vdsm/keys/vdsmkey.pem
- removing /etc/pki/vdsm/libvirt-spice/ca-cert.pem
- removing /etc/pki/vdsm/libvirt-spice/server-cert.pem
- removing /etc/pki/vdsm/libvirt-spice/server-key.pem
- removing /etc/pki/vdsm/libvirt-vnc/ca-cert.pem
- removing /etc/pki/vdsm/libvirt-vnc/server-cert.pem
- removing /etc/pki/vdsm/libvirt-vnc/server-key.pem
- removing /etc/pki/CA/cacert.pem
- removing /etc/pki/libvirt/clientcert.pem
- removing /etc/pki/libvirt/private/clientkey.pem
? /etc/pki/ovirt-vmconsole/*.pem already missing
- removing /var/cache/libvirt/qemu
? /var/run/ovirt-hosted-engine-ha/* already missing
-=== Removing IP Rules ===-
Reran the HE Deploy on Host2 and from logs I see exactly same situation as in my previous email.
This is AGENT|BROKER logs with DEBUG:
MainThread::INFO::2021-05-03 02:32:47,007::agent::67::ovirt_hosted_engine_ha.agent.agent.Agent::(run) ovirt-hosted-engine-ha agent 2.3.6 started
MainThread::DEBUG::2021-05-03 02:32:47,007::agent::72::ovirt_hosted_engine_ha.agent.agent.Agent::(run) Running agent
MainThread::DEBUG::2021-05-03 02:32:47,007::hosted_engine::220::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(_get_hostname) Executing: openssl x509 -in /etc/pki/vdsm/certs/vdsmcert.pem -noout -subject
MainThread::DEBUG::2021-05-03 02:32:47,064::hosted_engine::230::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(_get_hostname) Certificate subject: subject= /O=ictv.com/CN=ovirt-sj-02.ictv.com
MainThread::INFO::2021-05-03 02:32:47,065::hosted_engine::234::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(_get_hostname) Found certificate common name: ovirt-sj-02.ictv.com
MainThread::DEBUG::2021-05-03 02:32:47,066::hosted_engine::568::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(_initialize_vdsm) Initializing VDSM
MainThread::DEBUG::2021-05-03 02:32:47,066::hosted_engine::600::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(_check_service) Checking vdsmd status
MainThread::DEBUG::2021-05-03 02:32:47,130::hosted_engine::605::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(_check_service) vdsmd running
MainThread::DEBUG::2021-05-03 02:32:47,130::util::384::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(__log_debug) Creating a new json-rpc connection to VDSM
Client localhost:54321::DEBUG::2021-05-03 02:32:47,144::concurrent::258::root::(run) START thread <Thread(Client localhost:54321, started daemon 140520355350272)> (func=<bound method Reactor.process_requests of <yajsonrpc.betterAsyncore.Reactor object at 0x7fcd71fe9590>>, args=(), kwargs={})
Client localhost:54321::DEBUG::2021-05-03 02:32:47,151::stompclient::138::yajsonrpc.protocols.stomp.AsyncClient::(_process_connected) Stomp connection established
MainThread::DEBUG::2021-05-03 02:32:47,161::stompclient::294::jsonrpc.AsyncoreClient::(send) Sending response
MainThread::INFO::2021-05-03 02:32:47,192::hosted_engine::543::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(_initialize_broker) Initializing ha-broker connection
MainThread::INFO::2021-05-03 02:32:47,195::brokerlink::80::ovirt_hosted_engine_ha.lib.brokerlink.BrokerLink::(start_monitor) Starting monitor network, options {'tcp_t_address': None, 'network_test': None, 'tcp_t_port': None, 'addr': None}
MainThread::ERROR::2021-05-03 02:32:47,195::hosted_engine::559::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(_initialize_broker) Failed to start necessary monitors
MainThread::ERROR::2021-05-03 02:32:47,196::agent::144::ovirt_hosted_engine_ha.agent.agent.Agent::(_run_agent) Traceback (most recent call last):
File "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/agent/agent.py", line 131, in _run_agent
return action(he)
File "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/agent/agent.py", line 55, in action_proper
return he.start_monitoring()
File "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/agent/hosted_engine.py", line 432, in start_monitoring
self._initialize_broker()
File "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/agent/hosted_engine.py", line 556, in _initialize_broker
m.get('options', {}))
File "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/lib/brokerlink.py", line 89, in start_monitor
).format(t=type, o=options, e=e)
RequestError: brokerlink - failed to start monitor via ovirt-ha-broker: [Errno 2] No such file or directory, [monitor: 'network', options: {'tcp_t_address': None, 'network_test': None, 'tcp_t_port': None, 'addr': None}]
MainThread::ERROR::2021-05-03 02:32:47,196::agent::145::ovirt_hosted_engine_ha.agent.agent.Agent::(_run_agent) Trying to restart agent
MainThread::INFO::2021-05-03 02:32:47,196::agent::89::ovirt_hosted_engine_ha.agent.agent.Agent::(run) Agent shutting down
-----
kind regards/met vriendelijke groeten
Marko Vrgotic
Sr. System Engineer @ System Administration
ActiveVideo
o: +31 (35) 6774131
m: +31 (65) 5734174
e: m.vrgotic(a)activevideo.com<mailto:m.vrgotic@activevideo.com>
w: www.activevideo.com<http://www.activevideo.com>
ActiveVideo Networks BV. Mediacentrum 3745 Joop van den Endeplein 1.1217 WJ Hilversum, The Netherlands. The information contained in this message may be legally privileged and confidential. It is intended to be read only by the individual or entity to whom it is addressed or by their designee. If the reader of this message is not the intended recipient, you are on notice that any distribution of this message, in any form, is strictly prohibited. If you have received this message in error, please immediately notify the sender and/or ActiveVideo Networks, LLC by telephone at +1 408.931.9200 and delete or destroy any copy of this message.
From: Marko Vrgotic <M.Vrgotic(a)activevideo.com>
Date: Monday, 3 May 2021 at 10:19
To: Yedidyah Bar David <didi(a)redhat.com>, Strahil Nikolov <hunter86_bg(a)yahoo.com>
Cc: users(a)ovirt.org <users(a)ovirt.org>
Subject: Re: [ovirt-users] Re: Unable to migrate Engine to another HE Host
Hi Strahil and Yedidyah,
First of all I want to thank you for helping me.
So, this is what I did:
* Executed steps as in https://access.redhat.com/solutions/2212601 and https://access.redhat.com/solutions/3319891 on working Hosts (Host1 and Host3) – to clean up failed deployment
* Verified on Host1 and Host3 that metadata is cleaned by running :
[root@ovirt-sj-03 ~]# hosted-engine --vm-status | grep -B2 "Host ID"
Status up-to-date : True
Hostname : ovirt-sj-01.ictv.com
Host ID : 1
--
Status up-to-date : True
Hostname : ovirt-sj-03.ictv.com
Host ID : 3
* Reran hosted-engine deploy from UI on Host2
From the start of deployment, I could see following ERROR messages on Host2:
VDSM:
2021-05-03 07:58:21,205+0000 ERROR (periodic/3) [root] failed to retrieve Hosted Engine HA score '[Errno 2] No such file or directory'Is the Hosted Engine setup finished? (api:196)
2021-05-03 07:58:36,257+0000 ERROR (periodic/3) [root] failed to retrieve Hosted Engine HA score '[Errno 2] No such file or directory'Is the Hosted Engine setup finished? (api:196)
2021-05-03 07:58:46,093+0000 ERROR (jsonrpc/7) [root] failed to retrieve Hosted Engine HA score '[Errno 2] No such file or directory'Is the Hosted Engine setup finished? (api:196)
OVIRT-HA-AGENT|BROKER:
MainThread::ERROR::2021-05-03 08:12:57,690::agent::145::ovirt_hosted_engine_ha.agent.agent.Agent::(_run_agent) Trying to restart agent
MainThread::INFO::2021-05-03 08:12:57,691::agent::89::ovirt_hosted_engine_ha.agent.agent.Agent::(run) Agent shutting down
MainThread::INFO::2021-05-03 08:13:08,005::agent::67::ovirt_hosted_engine_ha.agent.agent.Agent::(run) ovirt-hosted-engine-ha agent 2.3.6 started
MainThread::INFO::2021-05-03 08:13:08,061::hosted_engine::234::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(_get_hostname) Found certificate common name: ovirt-sj-02.ictv.com
MainThread::INFO::2021-05-03 08:13:08,187::hosted_engine::543::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(_initialize_broker) Initializing ha-broker connection
MainThread::INFO::2021-05-03 08:13:08,189::brokerlink::80::ovirt_hosted_engine_ha.lib.brokerlink.BrokerLink::(start_monitor) Starting monitor network, options {'tcp_t_address': None, 'network_test': None, 'tcp_t_port': None, 'addr': None}
MainThread::ERROR::2021-05-03 08:13:08,190::hosted_engine::559::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(_initialize_broker) Failed to start necessary monitors
MainThread::ERROR::2021-05-03 08:13:08,190::agent::144::ovirt_hosted_engine_ha.agent.agent.Agent::(_run_agent) Traceback (most recent call last):
File "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/agent/agent.py", line 131, in _run_agent
return action(he)
File "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/agent/agent.py", line 55, in action_proper
return he.start_monitoring()
File "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/agent/hosted_engine.py", line 432, in start_monitoring
self._initialize_broker()
File "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/agent/hosted_engine.py", line 556, in _initialize_broker
m.get('options', {}))
File "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/lib/brokerlink.py", line 89, in start_monitor
).format(t=type, o=options, e=e)
RequestError: brokerlink - failed to start monitor via ovirt-ha-broker: [Errno 2] No such file or directory, [monitor: 'network', options: {'tcp_t_address': None, 'network_test': None, 'tcp_t_port': None, 'addr': None}]
MainThread::ERROR::2021-05-03 08:13:08,191::agent::145::ovirt_hosted_engine_ha.agent.agent.Agent::(_run_agent) Trying to restart agent
MainThread::INFO::2021-05-03 08:13:08,191::agent::89::ovirt_hosted_engine_ha.agent.agent.Agent::(run) Agent shutting down
And the Hosted-Engine.conf file looks like this:
[root@ovirt-sj-02 ~]# cat /etc/ovirt-hosted-engine/hosted-engine.conf
ca_cert=/etc/pki/vdsm/libvirt-spice/ca-cert.pem
host_id=2
Ovirt hosted_engine network path is mounted.
What could be the reason hosted-engine.conf is missing all other entries?
Do I copy the hosted-engine.conf from host1 or host3, making sure host_id is unique (host_id=2) on Host2, and try to re-deploy?
How does the hosted-engine.conf gets loaded or from where? Am I missing a mount point? Is the deployment failing because hosted-engine.conf is empty or it’s the other way around?
-----
kind regards/met vriendelijke groeten
Marko Vrgotic
Sr. System Engineer @ System Administration
ActiveVideo
o: +31 (35) 6774131
m: +31 (65) 5734174
e: m.vrgotic(a)activevideo.com<mailto:m.vrgotic@activevideo.com>
w: www.activevideo.com<http://www.activevideo.com>
ActiveVideo Networks BV. Mediacentrum 3745 Joop van den Endeplein 1.1217 WJ Hilversum, The Netherlands. The information contained in this message may be legally privileged and confidential. It is intended to be read only by the individual or entity to whom it is addressed or by their designee. If the reader of this message is not the intended recipient, you are on notice that any distribution of this message, in any form, is strictly prohibited. If you have received this message in error, please immediately notify the sender and/or ActiveVideo Networks, LLC by telephone at +1 408.931.9200 and delete or destroy any copy of this message.
From: Yedidyah Bar David <didi(a)redhat.com>
Date: Sunday, 2 May 2021 at 08:44
To: Strahil Nikolov <hunter86_bg(a)yahoo.com>
Cc: Marko Vrgotic <M.Vrgotic(a)activevideo.com>, users(a)ovirt.org <users(a)ovirt.org>
Subject: Re: [ovirt-users] Re: Unable to migrate Engine to another HE Host
***CAUTION: This email originated from outside of the organization. Do not click links or open attachments unless you recognize the sender!!!***
On Sat, May 1, 2021 at 6:46 PM Strahil Nikolov via Users
<users(a)ovirt.org> wrote:
>
> As we have tested off the list, it seems that the symbolic link in /var/lib/vdsm that ovirt-ha-agent/broker create was missing.
> Yet migration succeeds, but the donor host looses score as the vm 'died unexpectedly'.
>
> Try to cleanup the host2's metadata and try to provision it , so you can proceed with the fix of host1 & host3.
>
>
> I have no clue if engine-cleanup will affect the shared syorage, but it's possible - so use as last resort.
'engine-cleanup' is a utility you run on the engine machine, which
does basically the opposite of 'engine-setup', and in any case cleans
up the _engine_, not any particular host. If you run it, your engine
is gone, likely forever (unless you kept backups).
What you might were looking for is 'ovirt-hosted-engine-cleanup'. This
one cleans up a host from a hosted-engine deployment. It's generally
intended to be used after a failed deployment attempt. I think it can
work well also in your case, but would first try to fix using other
means.
>
> If you fail to add host2 , you can always reinstall it as host4 and try to add it fresh.
>
> Best Regards,
> Strahil Nikolov
>
> On Fri, Apr 30, 2021 at 16:27, Marko Vrgotic
> <M.Vrgotic(a)activevideo.com> wrote:
>
> Dear oVirt,
>
>
>
> I have already reached out twice regarding the issues that occurred, due to power outage, but noticed only when upgrading engine to latest 4.3. version.
>
>
>
> I am unable to redeploy engine on Host2, the hosted-engine file stays empty and VDSM on Hosts1 and 3 is reporting, even though I cleared the metadata for the Host2, on Host 1 and Host3:
>
>
>
> 2021-04-30 05:57:58,454-0700 ERROR (jsonrpc/7) [ovirt_hosted_engine_ha.client.client.HAClient] Malformed metadata for host 2: received 0 of 512 expected bytes (client:137)
>
>
>
> Today I tried to migrate HE from Host 3 to Host 1 and it fails each time with following message:
>
>
>
> On Engine:
>
> 2021-04-30 12:57:56,961Z ERROR [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (EE-ManagedThreadFactory-engine-Thread-1233892) [] EVENT_ID: VM_MIGRATION_TO_SERVER_FAILED(120), Migration failed (VM: HostedEngine, Source: ovirt-sj-03.ictv.com, Destination: ovirt-sj-01.ictv.com).
>
>
>
> On source Host:
>
> 2021-04-30 05:57:56,705-0700 ERROR (migsrc/66b6d489) [virt.vm] (vmId='66b6d489-ceb8-486a-951a-355e21f13627') Failed to migrate (migration:450)
>
> Traceback (most recent call last):
>
> File "/usr/lib/python2.7/site-packages/vdsm/virt/migration.py", line 431, in _regular_run
>
> time.time(), migrationParams, machineParams
>
> File "/usr/lib/python2.7/site-packages/vdsm/virt/migration.py", line 505, in _startUnderlyingMigration
>
> self._perform_with_conv_schedule(duri, muri)
>
> File "/usr/lib/python2.7/site-packages/vdsm/virt/migration.py", line 591, in _perform_with_conv_schedule
>
> self._perform_migration(duri, muri)
>
> File "/usr/lib/python2.7/site-packages/vdsm/virt/migration.py", line 525, in _perform_migration
>
> self._migration_flags)
>
> File "/usr/lib/python2.7/site-packages/vdsm/virt/virdomain.py", line 100, in f
>
> ret = attr(*args, **kwargs)
>
> File "/usr/lib/python2.7/site-packages/vdsm/common/libvirtconnection.py", line 131, in wrapper
>
> ret = f(*args, **kwargs)
>
> File "/usr/lib/python2.7/site-packages/vdsm/common/function.py", line 94, in wrapper
>
> return func(inst, *args, **kwargs)
>
> File "/usr/lib64/python2.7/site-packages/libvirt.py", line 1781, in migrateToURI3
>
> if ret == -1: raise libvirtError ('virDomainMigrateToURI3() failed', dom=self)
>
> libvirtError: operation aborted: migration out job: canceled by client
>
>
>
>
>
> I know that this version is end of life – but I would very much appreciate if someone could help me asses if this means corruption in DB or the overall damage, simply to know how to plan further actions.
>
> My impression was that I still had to functional HE Hosts in the pool, but after seeing migration failure, it’s pretty much down to single host.
>
>
>
> This is production system, so I cannot just move on to upgrading/deploying to 4.4.
>
>
>
> Additionally – :
>
> Is the effect of the engine-cleanup on HE Host local or it affects all HE Hosts? Could that help bringing the Host back to state so that HE can be re-deployed?
> What is the effect or reinitialize-lockspace?
>
>
>
> Kindly awaiting your reply. Happy to provide any additional information needed.
>
>
>
>
>
>
>
> -----
>
> kind regards/met vriendelijke groeten
>
>
>
> Marko Vrgotic
> Sr. System Engineer @ System Administration
>
>
> ActiveVideo
>
> o: +31 (35) 6774131
>
> m: +31 (65) 5734174
>
> e: m.vrgotic(a)activevideo.com
> w: www.activevideo.com<http://www.activevideo.com>
>
>
>
> ActiveVideo Networks BV. Mediacentrum 3745 Joop van den Endeplein 1.1217 WJ Hilversum, The Netherlands. The information contained in this message may be legally privileged and confidential. It is intended to be read only by the individual or entity to whom it is addressed or by their designee. If the reader of this message is not the intended recipient, you are on notice that any distribution of this message, in any form, is strictly prohibited. If you have received this message in error, please immediately notify the sender and/or ActiveVideo Networks, LLC by telephone at +1 408.931.9200 and delete or destroy any copy of this message.
>
>
>
>
>
> _______________________________________________
> Users mailing list -- users(a)ovirt.org
> To unsubscribe send an email to users-leave(a)ovirt.org
> Privacy Statement: https://nam10.safelinks.protection.outlook.com/?url=https%3A%2F%2Fwww.ovi...
> oVirt Code of Conduct: https://nam10.safelinks.protection.outlook.com/?url=https%3A%2F%2Fwww.ovi...
> List Archives: https://nam10.safelinks.protection.outlook.com/?url=https%3A%2F%2Flists.o...
>
> _______________________________________________
> Users mailing list -- users(a)ovirt.org
> To unsubscribe send an email to users-leave(a)ovirt.org
> Privacy Statement: https://nam10.safelinks.protection.outlook.com/?url=https%3A%2F%2Fwww.ovi...
> oVirt Code of Conduct: https://nam10.safelinks.protection.outlook.com/?url=https%3A%2F%2Fwww.ovi...
> List Archives: https://nam10.safelinks.protection.outlook.com/?url=https%3A%2F%2Flists.o...
--
Didi
3 years, 7 months
Cluster level 4.5
by Don Dupuis
What version of libvirt is required for a host to be put in this cluster
level? I am using CentOS 8.3 and cpu is Cascade Lake Server. It says that
my host is only compatible with cluster version 4.2,4.3 and 4.4. I am doing
a new install of Ovirt 4.4.5. I have tried to update libvirt version but
have run into issues. Currently installed libvirt
is libvirt-6.0.0-28.module_el8.3.0+555+a55c8938.x86_64.
Don
3 years, 7 months
Re: Unable to migrate Engine to another HE Host
by Marko Vrgotic
This is ERROR i caught on Host2 during HE Deploy:
2021-05-03 02:36:07,982-0700 ERROR (periodic/0) [root] failed to retrieve Hosted Engine HA score '[Errno 2] No such file or directory'Is the Hosted Engine setup finished? (api:196) 2021-05-03 02:36:07,985-0700 INFO (MainThread) [vds] Received signal 15, shutting down (vdsmd:71) 2021-05-03 02:36:07,987-0700 ERROR (ioprocess/4934) [IOProcessClient] (054c43fc-1924-4106-9f80-0f2ac62b9886) Communication thread failed (__init__:160) Traceback (most recent call last): File "/usr/lib/python2.7/site-packages/ioprocess/__init__.py", line 114, in _communicate raise Exception("FD closed") Exception: FD closed 2021-05-03 02:36:07,992-0700 ERROR (ioprocess/4948) [IOProcessClient] (ec0dc13d-8721-42a1-a5c3-762a3e235f76) Communication thread failed (__init__:160) Traceback (most recent call last): File "/usr/lib/python2.7/site-packages/ioprocess/__init__.py", line 114, in _communicate raise Exception("FD closed") Exception: FD closed 2021-05-03 02:36:08,009-0700 INFO (MainThread) [jsonrpc.JsonRpcServer] Stopping JsonRPC Server (__init__:442)
-----
kind regards/met vriendelijke groeten
Marko Vrgotic
Sr. System Engineer @ System Administration
ActiveVideo
o: +31 (35) 6774131
m: +31 (65) 5734174
e: m.vrgotic(a)activevideo.com<mailto:m.vrgotic@activevideo.com>
w: www.activevideo.com<http://www.activevideo.com>
ActiveVideo Networks BV. Mediacentrum 3745 Joop van den Endeplein 1.1217 WJ Hilversum, The Netherlands. The information contained in this message may be legally privileged and confidential. It is intended to be read only by the individual or entity to whom it is addressed or by their designee. If the reader of this message is not the intended recipient, you are on notice that any distribution of this message, in any form, is strictly prohibited. If you have received this message in error, please immediately notify the sender and/or ActiveVideo Networks, LLC by telephone at +1 408.931.9200 and delete or destroy any copy of this message.
From: Marko Vrgotic <M.Vrgotic(a)activevideo.com>
Date: Monday, 3 May 2021 at 11:34
To: Yedidyah Bar David <didi(a)redhat.com>, Strahil Nikolov <hunter86_bg(a)yahoo.com>
Cc: users(a)ovirt.org <users(a)ovirt.org>
Subject: Re: [ovirt-users] Re: Unable to migrate Engine to another HE Host
I executed ovirt-hosted-engine-cleanup on Host2 and it seems as it went ok:
[root@ovirt-sj-02 log]# ovirt-hosted-engine-cleanup
This will de-configure the host to run ovirt-hosted-engine-setup from scratch.
Caution, this operation should be used with care.Are you sure you want to proceed? [y/n]
y
-=== Destroy hosted-engine VM ===-
You must run deploy first
error: failed to get domain 'HostedEngine'
error: Domain not found: no domain with matching name 'HostedEngine' -=== Stop HA services ===-
-=== Shutdown sanlock ===-
shutdown force 1 wait 0
shutdown done 0
-=== Disconnecting the hosted-engine storage domain ===-
You must run deploy first
-=== De-configure VDSM networks ===-
ovirtmgmt
ovirtmgmt
A previously configured management bridge has been found on the system, this will try to de-configure it. Under certain circumstances you can loose network connection.
Caution, this operation should be used with care.Are you sure you want to proceed? [y/n]
y
-=== Stop other services ===-
-=== De-configure external daemons ===-
Removing database file /var/lib/vdsm/storage/managedvolume.db
-=== Removing configuration files ===-
? /etc/init/libvirtd.conf already missing
- removing /etc/libvirt/nwfilter/vdsm-no-mac-spoofing.xml
? /etc/ovirt-hosted-engine/answers.conf already missing
- removing /etc/ovirt-hosted-engine/hosted-engine.conf
- removing /etc/vdsm/vdsm.conf
- removing /etc/pki/vdsm/certs/cacert.pem
- removing /etc/pki/vdsm/certs/vdsmcert.pem
- removing /etc/pki/vdsm/keys/vdsmkey.pem
- removing /etc/pki/vdsm/libvirt-spice/ca-cert.pem
- removing /etc/pki/vdsm/libvirt-spice/server-cert.pem
- removing /etc/pki/vdsm/libvirt-spice/server-key.pem
- removing /etc/pki/vdsm/libvirt-vnc/ca-cert.pem
- removing /etc/pki/vdsm/libvirt-vnc/server-cert.pem
- removing /etc/pki/vdsm/libvirt-vnc/server-key.pem
- removing /etc/pki/CA/cacert.pem
- removing /etc/pki/libvirt/clientcert.pem
- removing /etc/pki/libvirt/private/clientkey.pem
? /etc/pki/ovirt-vmconsole/*.pem already missing
- removing /var/cache/libvirt/qemu
? /var/run/ovirt-hosted-engine-ha/* already missing
-=== Removing IP Rules ===-
Reran the HE Deploy on Host2 and from logs I see exactly same situation as in my previous email.
This is AGENT|BROKER logs with DEBUG:
MainThread::INFO::2021-05-03 02:32:47,007::agent::67::ovirt_hosted_engine_ha.agent.agent.Agent::(run) ovirt-hosted-engine-ha agent 2.3.6 started
MainThread::DEBUG::2021-05-03 02:32:47,007::agent::72::ovirt_hosted_engine_ha.agent.agent.Agent::(run) Running agent
MainThread::DEBUG::2021-05-03 02:32:47,007::hosted_engine::220::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(_get_hostname) Executing: openssl x509 -in /etc/pki/vdsm/certs/vdsmcert.pem -noout -subject
MainThread::DEBUG::2021-05-03 02:32:47,064::hosted_engine::230::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(_get_hostname) Certificate subject: subject= /O=ictv.com/CN=ovirt-sj-02.ictv.com
MainThread::INFO::2021-05-03 02:32:47,065::hosted_engine::234::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(_get_hostname) Found certificate common name: ovirt-sj-02.ictv.com
MainThread::DEBUG::2021-05-03 02:32:47,066::hosted_engine::568::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(_initialize_vdsm) Initializing VDSM
MainThread::DEBUG::2021-05-03 02:32:47,066::hosted_engine::600::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(_check_service) Checking vdsmd status
MainThread::DEBUG::2021-05-03 02:32:47,130::hosted_engine::605::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(_check_service) vdsmd running
MainThread::DEBUG::2021-05-03 02:32:47,130::util::384::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(__log_debug) Creating a new json-rpc connection to VDSM
Client localhost:54321::DEBUG::2021-05-03 02:32:47,144::concurrent::258::root::(run) START thread <Thread(Client localhost:54321, started daemon 140520355350272)> (func=<bound method Reactor.process_requests of <yajsonrpc.betterAsyncore.Reactor object at 0x7fcd71fe9590>>, args=(), kwargs={})
Client localhost:54321::DEBUG::2021-05-03 02:32:47,151::stompclient::138::yajsonrpc.protocols.stomp.AsyncClient::(_process_connected) Stomp connection established
MainThread::DEBUG::2021-05-03 02:32:47,161::stompclient::294::jsonrpc.AsyncoreClient::(send) Sending response
MainThread::INFO::2021-05-03 02:32:47,192::hosted_engine::543::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(_initialize_broker) Initializing ha-broker connection
MainThread::INFO::2021-05-03 02:32:47,195::brokerlink::80::ovirt_hosted_engine_ha.lib.brokerlink.BrokerLink::(start_monitor) Starting monitor network, options {'tcp_t_address': None, 'network_test': None, 'tcp_t_port': None, 'addr': None}
MainThread::ERROR::2021-05-03 02:32:47,195::hosted_engine::559::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(_initialize_broker) Failed to start necessary monitors
MainThread::ERROR::2021-05-03 02:32:47,196::agent::144::ovirt_hosted_engine_ha.agent.agent.Agent::(_run_agent) Traceback (most recent call last):
File "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/agent/agent.py", line 131, in _run_agent
return action(he)
File "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/agent/agent.py", line 55, in action_proper
return he.start_monitoring()
File "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/agent/hosted_engine.py", line 432, in start_monitoring
self._initialize_broker()
File "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/agent/hosted_engine.py", line 556, in _initialize_broker
m.get('options', {}))
File "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/lib/brokerlink.py", line 89, in start_monitor
).format(t=type, o=options, e=e)
RequestError: brokerlink - failed to start monitor via ovirt-ha-broker: [Errno 2] No such file or directory, [monitor: 'network', options: {'tcp_t_address': None, 'network_test': None, 'tcp_t_port': None, 'addr': None}]
MainThread::ERROR::2021-05-03 02:32:47,196::agent::145::ovirt_hosted_engine_ha.agent.agent.Agent::(_run_agent) Trying to restart agent
MainThread::INFO::2021-05-03 02:32:47,196::agent::89::ovirt_hosted_engine_ha.agent.agent.Agent::(run) Agent shutting down
-----
kind regards/met vriendelijke groeten
Marko Vrgotic
Sr. System Engineer @ System Administration
ActiveVideo
o: +31 (35) 6774131
m: +31 (65) 5734174
e: m.vrgotic(a)activevideo.com<mailto:m.vrgotic@activevideo.com>
w: www.activevideo.com<http://www.activevideo.com>
ActiveVideo Networks BV. Mediacentrum 3745 Joop van den Endeplein 1.1217 WJ Hilversum, The Netherlands. The information contained in this message may be legally privileged and confidential. It is intended to be read only by the individual or entity to whom it is addressed or by their designee. If the reader of this message is not the intended recipient, you are on notice that any distribution of this message, in any form, is strictly prohibited. If you have received this message in error, please immediately notify the sender and/or ActiveVideo Networks, LLC by telephone at +1 408.931.9200 and delete or destroy any copy of this message.
From: Marko Vrgotic <M.Vrgotic(a)activevideo.com>
Date: Monday, 3 May 2021 at 10:19
To: Yedidyah Bar David <didi(a)redhat.com>, Strahil Nikolov <hunter86_bg(a)yahoo.com>
Cc: users(a)ovirt.org <users(a)ovirt.org>
Subject: Re: [ovirt-users] Re: Unable to migrate Engine to another HE Host
Hi Strahil and Yedidyah,
First of all I want to thank you for helping me.
So, this is what I did:
* Executed steps as in https://access.redhat.com/solutions/2212601 and https://access.redhat.com/solutions/3319891 on working Hosts (Host1 and Host3) – to clean up failed deployment
* Verified on Host1 and Host3 that metadata is cleaned by running :
[root@ovirt-sj-03 ~]# hosted-engine --vm-status | grep -B2 "Host ID"
Status up-to-date : True
Hostname : ovirt-sj-01.ictv.com
Host ID : 1
--
Status up-to-date : True
Hostname : ovirt-sj-03.ictv.com
Host ID : 3
* Reran hosted-engine deploy from UI on Host2
From the start of deployment, I could see following ERROR messages on Host2:
VDSM:
2021-05-03 07:58:21,205+0000 ERROR (periodic/3) [root] failed to retrieve Hosted Engine HA score '[Errno 2] No such file or directory'Is the Hosted Engine setup finished? (api:196)
2021-05-03 07:58:36,257+0000 ERROR (periodic/3) [root] failed to retrieve Hosted Engine HA score '[Errno 2] No such file or directory'Is the Hosted Engine setup finished? (api:196)
2021-05-03 07:58:46,093+0000 ERROR (jsonrpc/7) [root] failed to retrieve Hosted Engine HA score '[Errno 2] No such file or directory'Is the Hosted Engine setup finished? (api:196)
OVIRT-HA-AGENT|BROKER:
MainThread::ERROR::2021-05-03 08:12:57,690::agent::145::ovirt_hosted_engine_ha.agent.agent.Agent::(_run_agent) Trying to restart agent
MainThread::INFO::2021-05-03 08:12:57,691::agent::89::ovirt_hosted_engine_ha.agent.agent.Agent::(run) Agent shutting down
MainThread::INFO::2021-05-03 08:13:08,005::agent::67::ovirt_hosted_engine_ha.agent.agent.Agent::(run) ovirt-hosted-engine-ha agent 2.3.6 started
MainThread::INFO::2021-05-03 08:13:08,061::hosted_engine::234::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(_get_hostname) Found certificate common name: ovirt-sj-02.ictv.com
MainThread::INFO::2021-05-03 08:13:08,187::hosted_engine::543::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(_initialize_broker) Initializing ha-broker connection
MainThread::INFO::2021-05-03 08:13:08,189::brokerlink::80::ovirt_hosted_engine_ha.lib.brokerlink.BrokerLink::(start_monitor) Starting monitor network, options {'tcp_t_address': None, 'network_test': None, 'tcp_t_port': None, 'addr': None}
MainThread::ERROR::2021-05-03 08:13:08,190::hosted_engine::559::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(_initialize_broker) Failed to start necessary monitors
MainThread::ERROR::2021-05-03 08:13:08,190::agent::144::ovirt_hosted_engine_ha.agent.agent.Agent::(_run_agent) Traceback (most recent call last):
File "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/agent/agent.py", line 131, in _run_agent
return action(he)
File "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/agent/agent.py", line 55, in action_proper
return he.start_monitoring()
File "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/agent/hosted_engine.py", line 432, in start_monitoring
self._initialize_broker()
File "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/agent/hosted_engine.py", line 556, in _initialize_broker
m.get('options', {}))
File "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/lib/brokerlink.py", line 89, in start_monitor
).format(t=type, o=options, e=e)
RequestError: brokerlink - failed to start monitor via ovirt-ha-broker: [Errno 2] No such file or directory, [monitor: 'network', options: {'tcp_t_address': None, 'network_test': None, 'tcp_t_port': None, 'addr': None}]
MainThread::ERROR::2021-05-03 08:13:08,191::agent::145::ovirt_hosted_engine_ha.agent.agent.Agent::(_run_agent) Trying to restart agent
MainThread::INFO::2021-05-03 08:13:08,191::agent::89::ovirt_hosted_engine_ha.agent.agent.Agent::(run) Agent shutting down
And the Hosted-Engine.conf file looks like this:
[root@ovirt-sj-02 ~]# cat /etc/ovirt-hosted-engine/hosted-engine.conf
ca_cert=/etc/pki/vdsm/libvirt-spice/ca-cert.pem
host_id=2
Ovirt hosted_engine network path is mounted.
What could be the reason hosted-engine.conf is missing all other entries?
Do I copy the hosted-engine.conf from host1 or host3, making sure host_id is unique (host_id=2) on Host2, and try to re-deploy?
How does the hosted-engine.conf gets loaded or from where? Am I missing a mount point? Is the deployment failing because hosted-engine.conf is empty or it’s the other way around?
-----
kind regards/met vriendelijke groeten
Marko Vrgotic
Sr. System Engineer @ System Administration
ActiveVideo
o: +31 (35) 6774131
m: +31 (65) 5734174
e: m.vrgotic(a)activevideo.com<mailto:m.vrgotic@activevideo.com>
w: www.activevideo.com<http://www.activevideo.com>
ActiveVideo Networks BV. Mediacentrum 3745 Joop van den Endeplein 1.1217 WJ Hilversum, The Netherlands. The information contained in this message may be legally privileged and confidential. It is intended to be read only by the individual or entity to whom it is addressed or by their designee. If the reader of this message is not the intended recipient, you are on notice that any distribution of this message, in any form, is strictly prohibited. If you have received this message in error, please immediately notify the sender and/or ActiveVideo Networks, LLC by telephone at +1 408.931.9200 and delete or destroy any copy of this message.
From: Yedidyah Bar David <didi(a)redhat.com>
Date: Sunday, 2 May 2021 at 08:44
To: Strahil Nikolov <hunter86_bg(a)yahoo.com>
Cc: Marko Vrgotic <M.Vrgotic(a)activevideo.com>, users(a)ovirt.org <users(a)ovirt.org>
Subject: Re: [ovirt-users] Re: Unable to migrate Engine to another HE Host
***CAUTION: This email originated from outside of the organization. Do not click links or open attachments unless you recognize the sender!!!***
On Sat, May 1, 2021 at 6:46 PM Strahil Nikolov via Users
<users(a)ovirt.org> wrote:
>
> As we have tested off the list, it seems that the symbolic link in /var/lib/vdsm that ovirt-ha-agent/broker create was missing.
> Yet migration succeeds, but the donor host looses score as the vm 'died unexpectedly'.
>
> Try to cleanup the host2's metadata and try to provision it , so you can proceed with the fix of host1 & host3.
>
>
> I have no clue if engine-cleanup will affect the shared syorage, but it's possible - so use as last resort.
'engine-cleanup' is a utility you run on the engine machine, which
does basically the opposite of 'engine-setup', and in any case cleans
up the _engine_, not any particular host. If you run it, your engine
is gone, likely forever (unless you kept backups).
What you might were looking for is 'ovirt-hosted-engine-cleanup'. This
one cleans up a host from a hosted-engine deployment. It's generally
intended to be used after a failed deployment attempt. I think it can
work well also in your case, but would first try to fix using other
means.
>
> If you fail to add host2 , you can always reinstall it as host4 and try to add it fresh.
>
> Best Regards,
> Strahil Nikolov
>
> On Fri, Apr 30, 2021 at 16:27, Marko Vrgotic
> <M.Vrgotic(a)activevideo.com> wrote:
>
> Dear oVirt,
>
>
>
> I have already reached out twice regarding the issues that occurred, due to power outage, but noticed only when upgrading engine to latest 4.3. version.
>
>
>
> I am unable to redeploy engine on Host2, the hosted-engine file stays empty and VDSM on Hosts1 and 3 is reporting, even though I cleared the metadata for the Host2, on Host 1 and Host3:
>
>
>
> 2021-04-30 05:57:58,454-0700 ERROR (jsonrpc/7) [ovirt_hosted_engine_ha.client.client.HAClient] Malformed metadata for host 2: received 0 of 512 expected bytes (client:137)
>
>
>
> Today I tried to migrate HE from Host 3 to Host 1 and it fails each time with following message:
>
>
>
> On Engine:
>
> 2021-04-30 12:57:56,961Z ERROR [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (EE-ManagedThreadFactory-engine-Thread-1233892) [] EVENT_ID: VM_MIGRATION_TO_SERVER_FAILED(120), Migration failed (VM: HostedEngine, Source: ovirt-sj-03.ictv.com, Destination: ovirt-sj-01.ictv.com).
>
>
>
> On source Host:
>
> 2021-04-30 05:57:56,705-0700 ERROR (migsrc/66b6d489) [virt.vm] (vmId='66b6d489-ceb8-486a-951a-355e21f13627') Failed to migrate (migration:450)
>
> Traceback (most recent call last):
>
> File "/usr/lib/python2.7/site-packages/vdsm/virt/migration.py", line 431, in _regular_run
>
> time.time(), migrationParams, machineParams
>
> File "/usr/lib/python2.7/site-packages/vdsm/virt/migration.py", line 505, in _startUnderlyingMigration
>
> self._perform_with_conv_schedule(duri, muri)
>
> File "/usr/lib/python2.7/site-packages/vdsm/virt/migration.py", line 591, in _perform_with_conv_schedule
>
> self._perform_migration(duri, muri)
>
> File "/usr/lib/python2.7/site-packages/vdsm/virt/migration.py", line 525, in _perform_migration
>
> self._migration_flags)
>
> File "/usr/lib/python2.7/site-packages/vdsm/virt/virdomain.py", line 100, in f
>
> ret = attr(*args, **kwargs)
>
> File "/usr/lib/python2.7/site-packages/vdsm/common/libvirtconnection.py", line 131, in wrapper
>
> ret = f(*args, **kwargs)
>
> File "/usr/lib/python2.7/site-packages/vdsm/common/function.py", line 94, in wrapper
>
> return func(inst, *args, **kwargs)
>
> File "/usr/lib64/python2.7/site-packages/libvirt.py", line 1781, in migrateToURI3
>
> if ret == -1: raise libvirtError ('virDomainMigrateToURI3() failed', dom=self)
>
> libvirtError: operation aborted: migration out job: canceled by client
>
>
>
>
>
> I know that this version is end of life – but I would very much appreciate if someone could help me asses if this means corruption in DB or the overall damage, simply to know how to plan further actions.
>
> My impression was that I still had to functional HE Hosts in the pool, but after seeing migration failure, it’s pretty much down to single host.
>
>
>
> This is production system, so I cannot just move on to upgrading/deploying to 4.4.
>
>
>
> Additionally – :
>
> Is the effect of the engine-cleanup on HE Host local or it affects all HE Hosts? Could that help bringing the Host back to state so that HE can be re-deployed?
> What is the effect or reinitialize-lockspace?
>
>
>
> Kindly awaiting your reply. Happy to provide any additional information needed.
>
>
>
>
>
>
>
> -----
>
> kind regards/met vriendelijke groeten
>
>
>
> Marko Vrgotic
> Sr. System Engineer @ System Administration
>
>
> ActiveVideo
>
> o: +31 (35) 6774131
>
> m: +31 (65) 5734174
>
> e: m.vrgotic(a)activevideo.com
> w: www.activevideo.com<http://www.activevideo.com>
>
>
>
> ActiveVideo Networks BV. Mediacentrum 3745 Joop van den Endeplein 1.1217 WJ Hilversum, The Netherlands. The information contained in this message may be legally privileged and confidential. It is intended to be read only by the individual or entity to whom it is addressed or by their designee. If the reader of this message is not the intended recipient, you are on notice that any distribution of this message, in any form, is strictly prohibited. If you have received this message in error, please immediately notify the sender and/or ActiveVideo Networks, LLC by telephone at +1 408.931.9200 and delete or destroy any copy of this message.
>
>
>
>
>
> _______________________________________________
> Users mailing list -- users(a)ovirt.org
> To unsubscribe send an email to users-leave(a)ovirt.org
> Privacy Statement: https://nam10.safelinks.protection.outlook.com/?url=https%3A%2F%2Fwww.ovi...
> oVirt Code of Conduct: https://nam10.safelinks.protection.outlook.com/?url=https%3A%2F%2Fwww.ovi...
> List Archives: https://nam10.safelinks.protection.outlook.com/?url=https%3A%2F%2Flists.o...
>
> _______________________________________________
> Users mailing list -- users(a)ovirt.org
> To unsubscribe send an email to users-leave(a)ovirt.org
> Privacy Statement: https://nam10.safelinks.protection.outlook.com/?url=https%3A%2F%2Fwww.ovi...
> oVirt Code of Conduct: https://nam10.safelinks.protection.outlook.com/?url=https%3A%2F%2Fwww.ovi...
> List Archives: https://nam10.safelinks.protection.outlook.com/?url=https%3A%2F%2Flists.o...
--
Didi
3 years, 7 months