Hello,
This is a simple testimony about what happened yesterday in one of our DC.
This DC runs on a dedicated bare-metal engine, oversized compared to the
need, thus I've added a NFS service on it to host a small storage domain
and the ISO storage domain.
Yesterday, after having received the colorful announce about the 4.2.5
version, I decided to upgrade.
As our engine was still on a CentOS 7.4, I first upgraded its OS version
to 7.5, then reboot. Smooth.
Then I followed the very usual oVirt engine upgrade path. Smooth.
Eventually, I upgraded the hosts with ovirt-ansible-cluster-upgrade as
usual.
The result was frightening because the hosts were put in maintenance,
upgraded, back to life, seen unavailable, unreachable, connecting,
alive, rebooted, then back to another turn and looping...
During this, the SPM role was obviously jumping around, and that did not
help the debug.
In the end, it appeared that something during an upgrade stopped and
disabled the NFS service. My hosts partially relied on it, so after
having restarted the NFS service, all came back to life.
The NFS disabling may come from the CentOS upgrade, except if someone
tells me it could come from something on the oVirt side?
I'm sure the RH people will advice me not to run NFS on the engine, but
apart this event, I had no trouble doing this in years.
Regards,
--
Nicolas ECARNOT