Hi Didi, thanks for the attention!
I created a clean slate test using four identical (Atom J5005 32GB RAM, 1TB SSD) boxes and
virgin SSDs.
Three-node hosted-engine using oVirt node 4.3.5 (August 5th) image.
Tested adding an n+1 compute node with the same current oVirt image, which worked fine
(after a long serious of trouble with existing CentOS machines).
Since EL7 seems to have equal standing as oVirt nodes, I couldn't quite believe that
it fails always. I tried to isolate if there was something specific in my way of setting
up the CentOS nodes that caused the failure.
So I started with a clean sheet CentOS 7 an another clean SSD.
CentOS 1810 ISOs, developer workstation install variant, storage layout identical to the
oVirt recommended, updated to the latest patches.
Added oVirt repo, cockpit oVirt dashboard, ssl key etc. with all the proper reboots
Test 1: Add 4th node as host to three-node cluster. Works like a charm, migrate a running
VM back and forth, activate maintenance and remove 4th host
Experiment 1: Add epel "yum install epe-release", yum install yumex, pick
Cinnamon in Yumex (or basically yum groupinstall cinnamon)
Test 2: Try to add 4th node again: Immediate failure with that message in the management
engine's deploy log.
Experiment 2: yum erase group cinnamon; yum autoremove; reboot
Test 3: Add 4th node again: Success! (Adding Cinnamon afterwards, no problem, re-install
seems likely to fail)
Yes, I saw that bug report. Noticed it was REHL8 and... wasn't sure it would be
related.
And, like you, I can't really fathom why this would happen: I compared packet versions
on both sides, oVirt nodes and CentOS for everyone involved and they seemed identical for
everything.
I had installed the yum priorities plugin and made sure that the "full" epel
repository had prio 99 while the oVirt repos had 1 to ensure that oVirt would win out in
case of any potential version conflict...
As I understand it, this log on the engine lists actions performed via an ssh connection
on the to-be host, and that OTOPI might actually be gathering packages through some time
of y miniyum to then deploy on the to-be host in the next step.
So I don't know if the Python context of this is the normal Python 2 context of the
to-be host, or a temporary one that was actually created on-the-fly etc. Somehow I never
found the 200 page oVirt insider bible .-)
If you can help me with some context, perhaps I can dig deeper and help filing a better
bug report. At the moment it seems so odd and vage, I didn't feel comfortable.
Can't see any attach buttons, so I guess I may have to fiddle with my browser's
privacy settings..