This is the result of "hosted-engine --vm-status" on the first node, which
currently runs the hosted-engine:
--== Host ipc1.dc (id: 1) status ==--
Host ID : 1
Host timestamp : 89980
Score : 3400
Engine status : {"vm": "up", "health":
"good", "detail": "Up"}
Hostname : ipc1.dc
Local maintenance : False
stopped : False
crc32 : 256cb440
conf_on_shared_storage : True
local_conf_timestamp : 89980
Status up-to-date : True
Extra metadata (valid at timestamp):
metadata_parse_version=1
metadata_feature_version=1
timestamp=89980 (Tue Sep 15 16:17:00 2020)
host-id=1
score=3400
vm_conf_refresh_time=89980 (Tue Sep 15 16:17:00 2020)
conf_on_shared_storage=True
maintenance=False
state=EngineUp
stopped=False
--== Host ipc3.dc (id: 2) status ==--
Host ID : 2
Host timestamp : 65213
Score : 3400
Engine status : unknown stale-data
Hostname : ipc3.dc
Local maintenance : False
stopped : False
crc32 : c4f62c8b
conf_on_shared_storage : True
local_conf_timestamp : 65213
Status up-to-date : False
Extra metadata (valid at timestamp):
metadata_parse_version=1
metadata_feature_version=1
timestamp=65213 (Wed Sep 9 11:01:18 2020)
host-id=2
score=3400
vm_conf_refresh_time=65213 (Wed Sep 9 11:01:18 2020)
conf_on_shared_storage=True
maintenance=False
state=EngineDown
stopped=False
--== Host ipc2.dc (id: 3) status ==--
Host ID : 3
Host timestamp : 93167
Score : 3400
Engine status : {"vm": "down",
"health": "bad", "detail": "unknown",
"reason": "vm not running on this host"}
Hostname : ipc2.dc
Local maintenance : False
stopped : False
crc32 : f02f19b0
conf_on_shared_storage : True
local_conf_timestamp : 93167
Status up-to-date : True
Extra metadata (valid at timestamp):
metadata_parse_version=1
metadata_feature_version=1
timestamp=93167 (Tue Sep 15 16:16:58 2020)
host-id=3
score=3400
vm_conf_refresh_time=93167 (Tue Sep 15 16:16:58 2020)
conf_on_shared_storage=True
maintenance=False
state=EngineDown
stopped=False
For the new added node it is:
"The hosted engine configuration has not been retrieved from shared storage. Please
ensure that ovirt-ha-agent is running and the storage server is reachable. "
But the mentioned service status seem to be ok, too. But Actually I've noticed it
restarting from time to time.
● ovirt-ha-broker.service - oVirt Hosted Engine High Availability Communications Broker
Loaded: loaded (/usr/lib/systemd/system/ovirt-ha-broker.service; enabled; vendor
preset: disabled)
Active: active (running) since Tue 2020-09-15 10:13:11 EDT; 2min 11s ago
Main PID: 23971 (ovirt-ha-broker)
Tasks: 11 (limit: 100744)
Memory: 29.3M
CGroup: /system.slice/ovirt-ha-broker.service
└─23971 /usr/libexec/platform-python
/usr/share/ovirt-hosted-engine-ha/ovirt-ha-broker
Sep 15 10:13:11 ipc3.dc systemd[1]: Started oVirt Hosted Engine High Availability
Communications Broker.
● ovirt-ha-agent.service - oVirt Hosted Engine High Availability Monitoring Agent
Loaded: loaded (/usr/lib/systemd/system/ovirt-ha-agent.service; enabled; vendor preset:
disabled)
Active: active (running) since Tue 2020-09-15 10:13:22 EDT; 2min 1s ago
Main PID: 24165 (ovirt-ha-agent)
Tasks: 2 (limit: 100744)
Memory: 27.2M
CGroup: /system.slice/ovirt-ha-agent.service
└─24165 /usr/libexec/platform-python
/usr/share/ovirt-hosted-engine-ha/ovirt-ha-agent
Sometimes it says:
● ovirt-ha-agent.service - oVirt Hosted Engine High Availability Monitoring Agent
Loaded: loaded (/usr/lib/systemd/system/ovirt-ha-agent.service; enabled; vendor preset:
disabled)
Active: activating (auto-restart) (Result: exit-code) since Tue 2020-09-15 10:23:15
EDT; 4s ago
Process: 28372 ExecStart=/usr/share/ovirt-hosted-engine-ha/ovirt-ha-agent (code=exited,
status=157)
Main PID: 28372 (code=exited, status=157)
And sometimes it's:
● ovirt-ha-broker.service - oVirt Hosted Engine High Availability Communications Broker
Loaded: loaded (/usr/lib/systemd/system/ovirt-ha-broker.service; enabled; vendor
preset: disabled)
Active: active (running) since Tue 2020-09-15 10:23:14 EDT; 5min ago
Main PID: 28370 (ovirt-ha-broker)
Tasks: 11 (limit: 100744)
Memory: 29.7M
CGroup: /system.slice/ovirt-ha-broker.service
└─28370 /usr/libexec/platform-python
/usr/share/ovirt-hosted-engine-ha/ovirt-ha-broker
Sep 15 10:23:14 ipc3.dc systemd[1]: Started oVirt Hosted Engine High Availability
Communications Broker.
Sep 15 10:27:31 ipc3.dc ovirt-ha-broker[28370]: ovirt-ha-broker
ovirt_hosted_engine_ha.broker.storage_broker.StorageBroker ERROR Failed to start
monitoring domain (sd_uuid=e83f0c32-bb91-4909-8e80-6fa974b61968, >Sep 15 10:27:31
ipc3.dc ovirt-ha-broker[28370]: ovirt-ha-broker
ovirt_hosted_engine_ha.broker.listener.Action.start_domain_monitor ERROR Error in RPC
call: Failed to start monitoring domain (sd_uuid=e83f0c32-bb>Sep 15 10:28:02 ipc3.dc
ovirt-ha-broker[28370]: ovirt-ha-broker
ovirt_hosted_engine_ha.broker.notifications.Notifications ERROR [Errno 111] Connection
refused
Traceback (most recent call last):
File
"/usr/lib/python3.6/site-packages/ovirt_hosted_engine_ha/broker/notifications.py",
line 29, in send_email
timeout=float(cfg["smtp-timeout"]))
File
"/usr/lib64/python3.6/smtplib.py", line 251, in __init__
(code, msg) = self.connect(host,
port)
File
"/usr/lib64/python3.6/smtplib.py", line 336, in connect
self.sock = self._get_socket(host,
port, self.timeout)
File
"/usr/lib64/python3.6/smtplib.py", line 307, in _get_socket
self.source_address)
File
"/usr/lib64/python3.6/socket.py", line 724, in create_connection
raise err
File
"/usr/lib64/python3.6/socket.py", line 713, in create_connection
sock.connect(sa)
ConnectionRefusedError: [Errno 111]
Connection refused
● ovirt-ha-agent.service - oVirt Hosted Engine High Availability Monitoring Agent
Loaded: loaded (/usr/lib/systemd/system/ovirt-ha-agent.service; enabled; vendor preset:
disabled)
Active: active (running) since Tue 2020-09-15 10:23:25 EDT; 5min ago
Main PID: 28520 (ovirt-ha-agent)
Tasks: 2 (limit: 100744)
Memory: 27.8M
CGroup: /system.slice/ovirt-ha-agent.service
└─28520 /usr/libexec/platform-python
/usr/share/ovirt-hosted-engine-ha/ovirt-ha-agent
Sep 15 10:23:25 ipc3.dc systemd[1]: Started oVirt Hosted Engine High Availability
Monitoring Agent.
Sep 15 10:28:02 ipc3.dc ovirt-ha-agent[28520]: ovirt-ha-agent
ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine.config.vm ERROR Failed scanning
for OVF_STORE due to Command Volume.getInfo with args {'volu>
(code=100, message=Cannot inquire
Lease(name='66b004b7-504c-4376-acc1-27890b17213b',
path='/rhev/data-center/mnt/glusterSD/ipc1.dc:_engine/e83f0c32-bb91-4909-8e80->
I think at this point we've even managed to make it worse. Now we got several
different problems on all 3 nodes, like:
- HSMGetTaskStatusVDS failed
- SpmStopVDS failed
- HSMGetAllTasksStatusesVDS failed
- Sync errors
We're going to reinstall the whole cluster from scratch.
But I think the initial issue/scenario to replace (add) a host and make it run the
hosted-engine is still not solved at this point.
Thanks and greetings
Marcus
-----Ursprüngliche Nachricht-----
Von: Yedidyah Bar David <didi(a)redhat.com>
Gesendet: Dienstag, 15. September 2020 14:04
An: Rapsilber, Marcus <Marcus.Rapsilber(a)isotravel.com>
Cc: users <users(a)ovirt.org>
Betreff: Re: [ovirt-users] Enable a cluster node to run the hosted engine
On Tue, Sep 15, 2020 at 2:40 PM Rapsilber, Marcus <Marcus.Rapsilber(a)isotravel.com>
wrote:
I'm not sure if this log files tells anything about the problem why the node
"ipc3.dc" isn't capable to run the hosted engine.
Today we tried the whole procedure again. But this time we didn't install the new
node via single node cluster setup. It was a manual setup of the cluster storage. When
we've added the host ("New Host") we made sure that "Hosted engine
deployment action" was set to "deploy". Nevertheless we're still not
able to allow the new node to run the hosted engine. The grey crown is missing.
What's the output of 'hosted-engine --vm-status' on this host, and on other
hosts (that are ok)?
What is the criteria for a host to be able to run the hosted engine? Is some special
service required?
Do we have to install another package? Or is there an Ansible script that does the
required setup?
Generally speaking, it should be fully automatic, if you mark the checkbox in "Add
host", and AFAICT, the log you attached looks ok.
Also:
- The host needs to be in the same DC/cluster, needs to have access to the shared storage,
etc.
You can try to start the services manually, if they are not up:
systemctl status ovirt-ha-broker ovirt-ha-agent systemctl start ovirt-ha-broker
ovirt-ha-agent
- and/or check their logs, in /var/log/ovirt-hosted-engine-ha .
Best regards,
Thanks and greetings
Marcus
-----Ursprüngliche Nachricht-----
Von: Yedidyah Bar David <didi(a)redhat.com>
Gesendet: Dienstag, 15. September 2020 09:33
An: Rapsilber, Marcus <Marcus.Rapsilber(a)isotravel.com>
Cc: users <users(a)ovirt.org>
Betreff: Re: [ovirt-users] Enable a cluster node to run the hosted
engine
On Tue, Sep 15, 2020 at 10:10 AM Rapsilber, Marcus <Marcus.Rapsilber(a)isotravel.com>
wrote:
>
> Hello again,
>
> to answer your question, how did I make a clean install and reintegrate the node in
the cluster? Maybe my approach was a bit awkward/inconvenient, but this is what I did:
> - Install CentOS 8
> - Install oVirt Repository and packages: cockpit-ovirt-dashboard,
> vdsm-gluster, ovirt-host
> - Remove the Gluster bricks of the old node from the
> data/engine/vmstore volumes
> - Process a single cluster node installation on the new node via the
> oVirt Dashboard, in order to setup Gluster and the bricks
> (hosted-engine setup was skipped)
> - On the new node: Delete the vmstore/engine/data volumes and the
> file metadata in the bricks folder
> - Added the bricks to the volumes of the existing cluster again
> - Added the host to the cluster
>
> Would you suggest a better approach to setup a new node for an existing cluster?
Sorry, I have no experience with gluster, so can't comment on your particular steps,
although they sound reasonable.
the main missing thing is enabling hosted-engine when adding the host to the engine.
>
> At this point I'm not sure if I just overlooked the "hosted engine
deployment action" when I've added the new host. Unfortunately I cannot try to
edit the host anymore since my colleague did another reinstall of the node.
Very well.
If this happens again, please tell us.
Best regards,
>
> Thanks so far and greetings,
> Marcus
>
> -----Ursprüngliche Nachricht-----
> Von: Yedidyah Bar David <didi(a)redhat.com>
> Gesendet: Montag, 14. September 2020 10:56
> An: Rapsilber, Marcus <Marcus.Rapsilber(a)isotravel.com>
> Cc: users <users(a)ovirt.org>
> Betreff: Re: [ovirt-users] Enable a cluster node to run the hosted
> engine
>
> On Mon, Sep 14, 2020 at 11:18 AM <rap(a)isogmbh.de> wrote:
> >
> > Hi there,
> >
> > currently my team is evaluating oVirt and we're also testing several fail
scenarios, backup and so on.
> > One scenario was:
> > - hyperconverged oVirt cluster with 3 nodes
> > - self-hosted engine
> > - simulate the break down of one of the nodes by power off
> > - to replace it make a clean install of a new node and reintegrate
> > it in the cluster
>
> How exactly did you do that?
>
> >
> > Actually everything worked out fine. The new installed node and related bricks
(vmstore, data, engine) were added to the existing Gluster storage and it was added to the
oVirt cluster (as host).
> >
> > But there's one remaining problem: The new host doesn't have the grey
crown, which means it's unable to run the hosted engine. How can I achieve that?
> > I also found out that the ovirt-ha-agent and ovirt-ha-broker isn't
started/enabled on that node. Reason is that the
/etc/ovirt-hosted-engine/hosted-engine.conf doesn't exist. I guess this is not only a
problem concerning the hosted engine, but also for HA VM's.
>
> When you add a host to the engine, one of the options in the dialog is to deploy it
as a hosted-engine.
> If you don't, you won't get this crown, nor these services, nor its status
in 'hosted-engine --vm-status'.
>
> If you didn't, perhaps try to move to maintenance and reinstall, adding this
option.
>
> If you did choose it, that's perhaps a bug - please check/share relevant logs
(e.g. in /var/log/ovirt-engine, including host-deploy/).
>
> Best regards,
>
> >
> > Thank you for any advice and greetings, Marcus
> > _______________________________________________
> > Users mailing list -- users(a)ovirt.org To unsubscribe send an email
> > to users-leave(a)ovirt.org Privacy
> > Statement:
> >
https://protection.retarus.com/v1?u=https%3A%2F%2Fwww.ovirt.org%2F
> > pr
> > iv
> >
acy-policy.html&c=3ilYjgr&r=338RVlOwLz6SWhhP16s8RO&k=7s1&s=i9ZtAxZ
> > 4H jh a7cyQljzYgZoSsOuJ5qnJkh0cU75rfgL oVirt Code of Conduct:
> >
https://protection.retarus.com/v1?u=https%3A%2F%2Fwww.ovirt.org%2F
> > co
> > mm
> > unity%2Fabout%2Fcommunity-guidelines%2F&c=3ilYjgr&r=5VIqhyv90pUj07
> > OG Zz 9qix&k=7s1&s=UiOTbmf9BSOB46ff91IjO7G8dkMWHzi2GOIcveqAySn
> > List Archives:
>
>
>
> --
> Didi
>
--
Didi
--
Didi