On Wed, Mar 10, 2021 at 10:16 PM penguin pages <jeremey.wise(a)gmail.com> wrote:
well.. figured the package remove was means to get rid of "upgrade pending"
which would then allow me to get engine failover to start working.... but... ya..
don't do that.
If you refer to "Use --allowerasing without fully understanding what's
going to be erased", then I definitely agree - don't do that.
How to destroy engine:
1) yum update --allowerasing
What did it remove? If this includes vdsm, it will definitely prevent
starting the engine vm.
2) reboot
3) no more engine starting.
https://access.redhat.com/documentation/en-us/red_hat_virtualization/4.1/...
Validated services look ok
[root@thor ~]# systemctl status ovirt-ha-proxy
Unit ovirt-ha-proxy.service could not be found.
[root@thor ~]# systemctl status ovirt-ha-agent
● ovirt-ha-agent.service - oVirt Hosted Engine High Availability Monitoring Agent
Loaded: loaded (/usr/lib/systemd/system/ovirt-ha-agent.service; enabled; vendor
preset: disabled)
Active: active (running) since Wed 2021-03-10 14:55:17 EST; 14min ago
Main PID: 6390 (ovirt-ha-agent)
Tasks: 2 (limit: 1080501)
Memory: 25.8M
CGroup: /system.slice/ovirt-ha-agent.service
└─6390 /usr/libexec/platform-python
/usr/share/ovirt-hosted-engine-ha/ovirt-ha-agent
Mar 10 14:55:17 thor.penguinpages.local systemd[1]: Started oVirt Hosted Engine High
Availability Monitoring Agent.
[root@thor ~]# systemctl status -l ovirt-ha-agent
● ovirt-ha-agent.service - oVirt Hosted Engine High Availability Monitoring Agent
Loaded: loaded (/usr/lib/systemd/system/ovirt-ha-agent.service; enabled; vendor
preset: disabled)
Active: active (running) since Wed 2021-03-10 14:55:17 EST; 16min ago
Main PID: 6390 (ovirt-ha-agent)
Tasks: 2 (limit: 1080501)
Memory: 25.6M
CGroup: /system.slice/ovirt-ha-agent.service
└─6390 /usr/libexec/platform-python
/usr/share/ovirt-hosted-engine-ha/ovirt-ha-agent
Mar 10 14:55:17 thor.penguinpages.local systemd[1]: Started oVirt Hosted Engine High
Availability Monitoring Agent.
[root@thor ~]#journalctl -u ovirt-ha-agent
-- Logs begin at Wed 2021-03-10 14:47:34 EST, end at Wed 2021-03-10 15:12:12 EST. --
Mar 10 14:48:35 thor.penguinpages.local systemd[1]: Started oVirt Hosted Engine High
Availability Monitoring Agent.
Mar 10 14:48:37 thor.penguinpages.local ovirt-ha-agent[3463]: ovirt-ha-agent
ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine ERROR Failed to start necessary
monitors
Mar 10 14:48:37 thor.penguinpages.local ovirt-ha-agent[3463]: ovirt-ha-agent
ovirt_hosted_engine_ha.agent.agent.Agent ERROR Traceback (most recent call last):
File
"/usr/lib/python3.6/site-packages/ovirt_hosted_engine_ha/lib/brokerlink.py",
line 85, in start_monitor
I think this is while trying to connect to ovirt-ha-broker, you might
want to check the status of that one.
response = self._proxy.start_monitor(type, options)
File
"/usr/lib64/python3.6/xmlrpc/client.py", line 1112, in __call__
return
self.__send(self.__name, args)
File
"/usr/lib64/python3.6/xmlrpc/client.py", line 1452, in __request
verbose=self.__verbose
File
"/usr/lib64/python3.6/xmlrpc/client.py", line 1154, in request
return
self.single_request(host, handler, request_body, verbose)
File
"/usr/lib64/python3.6/xmlrpc/client.py", line 1166, in single_request
http_conn =
self.send_request(host, handler, request_body, verbose)
File
"/usr/lib64/python3.6/xmlrpc/client.py", line 1279, in send_request
self.send_content(connection, request_body)
File
"/usr/lib64/python3.6/xmlrpc/client.py", line 1309, in send_content
connection.endheaders(request_body)
File
"/usr/lib64/python3.6/http/client.py", line 1264, in endheaders
self._send_output(message_body, encode_chunked=encode_chunked)
File
"/usr/lib64/python3.6/http/client.py", line 1040, in _send_output
self.send(msg)
File
"/usr/lib64/python3.6/http/client.py", line 978, in send
self.connect()
File
"/usr/lib/python3.6/site-packages/ovirt_hosted_engine_ha/lib/unixrpc.py", line
74, in connect
self.sock.connect(base64.b16decode(self.host))
FileNotFoundError: [Errno
2] No such file or directory
[root@thor ~]# tail /var/log/messages
error rotating in /var/log/messages but I think this is just some form of "engine is
fubar.
"/usr/lib64/python3.6/smtplib.py", line 336, in connect#012 self.sock =
self._get_socket(host, port, self.timeout)#012 File
"/usr/lib64/python3.6/smtplib.py", line 307, in _get_socket#012
self.source_address)#012 File "/usr/lib64/python3.6/socket.py", line 724, in
create_connection#012 raise err#012 File "/usr/lib64/python3.6/socket.py",
line 713, in create_connection#012 sock.connect(sa)#012ConnectionRefusedError: [Errno
111] Connection refused
Mar 10 15:08:59 thor journal[1454]: ovirt-ha-broker
ovirt_hosted_engine_ha.broker.notifications.Notifications ERROR [Errno 111] Connection
refused#012Traceback (most recent call last):#012 File
"/usr/lib/python3.6/site-packages/ovirt_hosted_engine_ha/broker/notifications.py",
line 29, in send_email#012 timeout=float(cfg["smtp-timeout"]))#012 File
"/usr/lib64/python3.6/smtplib.py", line 251, in __init__#012 (code, msg) =
self.connect(host, port)#012 File "/usr/lib64/python3.6/smtplib.py", line 336,
in connect#012 self.sock = self._get_socket(host, port, self.timeout)#012 File
"/usr/lib64/python3.6/smtplib.py", line 307, in _get_socket#012
self.source_address)#012 File "/usr/lib64/python3.6/socket.py", line 724, in
create_connection#012 raise err#012 File "/usr/lib64/python3.6/socket.py",
line 713, in create_connection#012 sock.connect(sa)#012ConnectionRefusedError: [Errno
111] Connection refused
Mar 10 15:08:59 thor journal[1454]: ovirt-ha-broker
ovirt_hosted_engine_ha.broker.notifications.Notifications ERROR [Errno 111] Connection
refused#012Traceback (most recent call last):#012 File
"/usr/lib/python3.6/site-packages/ovirt_hosted_engine_ha/broker/notifications.py",
line 29, in send_email#012 timeout=float(cfg["smtp-timeout"]))#012 File
"/usr/lib64/python3.6/smtplib.py", line 251, in __init__#012 (code, msg) =
self.connect(host, port)#012 File "/usr/lib64/python3.6/smtplib.py", line 336,
in connect#012 self.sock = self._get_socket(host, port, self.timeout)#012 File
"/usr/lib64/python3.6/smtplib.py", line 307, in _get_socket#012
self.source_address)#012 File "/usr/
I think this is while it's trying to email a notification (about the
failure?). Can be ignored, in itself - probably your sendmail is down.
I guess I get to re-deploy.. again.
Good luck and best regards,
--
Didi