Hi,
On Thu, Aug 22, 2019 at 3:29 PM <thomas(a)hoberg.net> wrote:
Hi Didi, thanks for the attention!
I created a clean slate test using four identical (Atom J5005 32GB RAM, 1TB SSD) boxes
and virgin SSDs.
Three-node hosted-engine using oVirt node 4.3.5 (August 5th) image.
Tested adding an n+1 compute node with the same current oVirt image, which worked fine
(after a long serious of trouble with existing CentOS machines).
Since EL7 seems to have equal standing as oVirt nodes, I couldn't quite believe that
it fails always. I tried to isolate if there was something specific in my way of setting
up the CentOS nodes that caused the failure.
So I started with a clean sheet CentOS 7 an another clean SSD.
CentOS 1810 ISOs, developer workstation install variant, storage layout identical to the
oVirt recommended, updated to the latest patches.
Added oVirt repo, cockpit oVirt dashboard, ssl key etc. with all the proper reboots
Test 1: Add 4th node as host to three-node cluster. Works like a charm, migrate a running
VM back and forth, activate maintenance and remove 4th host
Experiment 1: Add epel "yum install epe-release", yum install yumex, pick
Cinnamon in Yumex (or basically yum groupinstall cinnamon)
I assume this was successful. Did you check what packages were
actually installed? Especially which were updated?
Test 2: Try to add 4th node again: Immediate failure with that message in the management
engine's deploy log.
Before doing that, did you try disabling/removing full epel repo (only
leaving enabled the parts enabled by ovirt-release* package)?
Experiment 2: yum erase group cinnamon; yum autoremove; reboot
Test 3: Add 4th node again: Success! (Adding Cinnamon afterwards, no problem, re-install
seems likely to fail)
Yes, I saw that bug report. Noticed it was REHL8 and... wasn't sure it would be
related.
And, like you, I can't really fathom why this would happen: I compared packet
versions on both sides, oVirt nodes and CentOS for everyone involved and they seemed
identical for everything.
After installing Cinnamon?
I had installed the yum priorities plugin and made sure that the "full" epel
repository had prio 99 while the oVirt repos had 1 to ensure that oVirt would win out in
case of any potential version conflict...
This helps if there is a *conflict*, not sure it does much if epel has
a newer version.
As I understand it, this log on the engine lists actions performed via an ssh connection
on the to-be host, and that OTOPI might actually be gathering packages through some time
of y miniyum to then deploy on the to-be host in the next step.
Didn't understand "through some time of y miniyum". ovirt-host-deploy,
which what is ran on the host at that point, is based on otopi, and
otopi has a yum plugin, and a miniyum module that it uses, and these
indeed try to install/update packages. This is optional - if you want
to prevent that, check "OFFLINE" section in:
https://github.com/oVirt/ovirt-host-deploy/blob/master/README
So I don't know if the Python context of this is the normal Python 2 context of the
to-be host, or a temporary one that was actually created on-the-fly etc. Somehow I never
found the 200 page oVirt insider bible .-)
Such a thing does not exist, and if it did, it will quickly become
out-of-date, and quickly get worse over time. If you search around,
you actually can find parts of it scattered around, as blog posts,
'deep dive' videos, conference presentations, etc. Part of these is
indeed out-of-date :-(, but at least you can rather easily see when
they were posted an which version was documented. And of course, you
have the source! :-)
If you can help me with some context, perhaps I can dig deeper and help filing a better
bug report. At the moment it seems so odd and vage, I didn't feel comfortable.
I'd start with:
1. Check host-deploy logs. You can find them on the engine machine
(copied from the host) in /var/log/ovirt-engine/host-deploy. Compare
failed and successful ones, especially around 'yum' - it should log
the list of packages it's going to update etc.
2. Compare 'rpm -qa' between a failed and a working setup. Also 'yum list
all'.
Can't see any attach buttons, so I guess I may have to fiddle with my browser's
privacy settings..
You mean attach to your email to the mailing list? Not sure you should
see, but it's anyway considered better these days to upload somewhere
(dropbox, google drive, some pastebin if it's just log snippets) and
share a link. This applies to mails to the list. If you open a bug in
bugzilla, please do attach everything directly.
Thanks and best regards,
--
Didi