
On Wed, Mar 10, 2021 at 10:16 PM penguin pages <jeremey.wise@gmail.com> wrote:
well.. figured the package remove was means to get rid of "upgrade pending" which would then allow me to get engine failover to start working.... but... ya.. don't do that.
If you refer to "Use --allowerasing without fully understanding what's going to be erased", then I definitely agree - don't do that.
How to destroy engine: 1) yum update --allowerasing
What did it remove? If this includes vdsm, it will definitely prevent starting the engine vm.
2) reboot 3) no more engine starting. https://access.redhat.com/documentation/en-us/red_hat_virtualization/4.1/htm...
Validated services look ok [root@thor ~]# systemctl status ovirt-ha-proxy Unit ovirt-ha-proxy.service could not be found. [root@thor ~]# systemctl status ovirt-ha-agent ● ovirt-ha-agent.service - oVirt Hosted Engine High Availability Monitoring Agent Loaded: loaded (/usr/lib/systemd/system/ovirt-ha-agent.service; enabled; vendor preset: disabled) Active: active (running) since Wed 2021-03-10 14:55:17 EST; 14min ago Main PID: 6390 (ovirt-ha-agent) Tasks: 2 (limit: 1080501) Memory: 25.8M CGroup: /system.slice/ovirt-ha-agent.service └─6390 /usr/libexec/platform-python /usr/share/ovirt-hosted-engine-ha/ovirt-ha-agent
Mar 10 14:55:17 thor.penguinpages.local systemd[1]: Started oVirt Hosted Engine High Availability Monitoring Agent. [root@thor ~]# systemctl status -l ovirt-ha-agent ● ovirt-ha-agent.service - oVirt Hosted Engine High Availability Monitoring Agent Loaded: loaded (/usr/lib/systemd/system/ovirt-ha-agent.service; enabled; vendor preset: disabled) Active: active (running) since Wed 2021-03-10 14:55:17 EST; 16min ago Main PID: 6390 (ovirt-ha-agent) Tasks: 2 (limit: 1080501) Memory: 25.6M CGroup: /system.slice/ovirt-ha-agent.service └─6390 /usr/libexec/platform-python /usr/share/ovirt-hosted-engine-ha/ovirt-ha-agent
Mar 10 14:55:17 thor.penguinpages.local systemd[1]: Started oVirt Hosted Engine High Availability Monitoring Agent. [root@thor ~]#journalctl -u ovirt-ha-agent
-- Logs begin at Wed 2021-03-10 14:47:34 EST, end at Wed 2021-03-10 15:12:12 EST. -- Mar 10 14:48:35 thor.penguinpages.local systemd[1]: Started oVirt Hosted Engine High Availability Monitoring Agent. Mar 10 14:48:37 thor.penguinpages.local ovirt-ha-agent[3463]: ovirt-ha-agent ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine ERROR Failed to start necessary monitors Mar 10 14:48:37 thor.penguinpages.local ovirt-ha-agent[3463]: ovirt-ha-agent ovirt_hosted_engine_ha.agent.agent.Agent ERROR Traceback (most recent call last): File "/usr/lib/python3.6/site-packages/ovirt_hosted_engine_ha/lib/brokerlink.py", line 85, in start_monitor
I think this is while trying to connect to ovirt-ha-broker, you might want to check the status of that one.
response = self._proxy.start_monitor(type, options) File "/usr/lib64/python3.6/xmlrpc/client.py", line 1112, in __call__ return self.__send(self.__name, args) File "/usr/lib64/python3.6/xmlrpc/client.py", line 1452, in __request verbose=self.__verbose File "/usr/lib64/python3.6/xmlrpc/client.py", line 1154, in request return self.single_request(host, handler, request_body, verbose) File "/usr/lib64/python3.6/xmlrpc/client.py", line 1166, in single_request http_conn = self.send_request(host, handler, request_body, verbose) File "/usr/lib64/python3.6/xmlrpc/client.py", line 1279, in send_request self.send_content(connection, request_body) File "/usr/lib64/python3.6/xmlrpc/client.py", line 1309, in send_content connection.endheaders(request_body) File "/usr/lib64/python3.6/http/client.py", line 1264, in endheaders self._send_output(message_body, encode_chunked=encode_chunked) File "/usr/lib64/python3.6/http/client.py", line 1040, in _send_output self.send(msg) File "/usr/lib64/python3.6/http/client.py", line 978, in send self.connect() File "/usr/lib/python3.6/site-packages/ovirt_hosted_engine_ha/lib/unixrpc.py", line 74, in connect self.sock.connect(base64.b16decode(self.host)) FileNotFoundError: [Errno 2] No such file or directory [root@thor ~]# tail /var/log/messages
error rotating in /var/log/messages but I think this is just some form of "engine is fubar. "/usr/lib64/python3.6/smtplib.py", line 336, in connect#012 self.sock = self._get_socket(host, port, self.timeout)#012 File "/usr/lib64/python3.6/smtplib.py", line 307, in _get_socket#012 self.source_address)#012 File "/usr/lib64/python3.6/socket.py", line 724, in create_connection#012 raise err#012 File "/usr/lib64/python3.6/socket.py", line 713, in create_connection#012 sock.connect(sa)#012ConnectionRefusedError: [Errno 111] Connection refused Mar 10 15:08:59 thor journal[1454]: ovirt-ha-broker ovirt_hosted_engine_ha.broker.notifications.Notifications ERROR [Errno 111] Connection refused#012Traceback (most recent call last):#012 File "/usr/lib/python3.6/site-packages/ovirt_hosted_engine_ha/broker/notifications.py", line 29, in send_email#012 timeout=float(cfg["smtp-timeout"]))#012 File "/usr/lib64/python3.6/smtplib.py", line 251, in __init__#012 (code, msg) = self.connect(host, port)#012 File "/usr/lib64/python3.6/smtplib.py", line 336, in connect#012 self.sock = self._get_socket(host, port, self.timeout)#012 File "/usr/lib64/python3.6/smtplib.py", line 307, in _get_socket#012 self.source_address)#012 File "/usr/lib64/python3.6/socket.py", line 724, in create_connection#012 raise err#012 File "/usr/lib64/python3.6/socket.py", line 713, in create_connection#012 sock.connect(sa)#012ConnectionRefusedError: [Errno 111] Connection refused Mar 10 15:08:59 thor journal[1454]: ovirt-ha-broker ovirt_hosted_engine_ha.broker.notifications.Notifications ERROR [Errno 111] Connection refused#012Traceback (most recent call last):#012 File "/usr/lib/python3.6/site-packages/ovirt_hosted_engine_ha/broker/notifications.py", line 29, in send_email#012 timeout=float(cfg["smtp-timeout"]))#012 File "/usr/lib64/python3.6/smtplib.py", line 251, in __init__#012 (code, msg) = self.connect(host, port)#012 File "/usr/lib64/python3.6/smtplib.py", line 336, in connect#012 self.sock = self._get_socket(host, port, self.timeout)#012 File "/usr/lib64/python3.6/smtplib.py", line 307, in _get_socket#012 self.source_address)#012 File "/usr/
I think this is while it's trying to email a notification (about the failure?). Can be ignored, in itself - probably your sendmail is down.
I guess I get to re-deploy.. again.
Good luck and best regards, -- Didi