On Wed, Aug 1, 2018 at 10:02 AM, Nicolas Ecarnot <nicolas(a)ecarnot.net> wrote:
Hello,
This is a simple testimony about what happened yesterday in one of our DC.
This DC runs on a dedicated bare-metal engine, oversized compared to the
need, thus I've added a NFS service on it to host a small storage domain and
the ISO storage domain.
Yesterday, after having received the colorful announce about the 4.2.5
version, I decided to upgrade.
As our engine was still on a CentOS 7.4, I first upgraded its OS version to
7.5, then reboot. Smooth.
Then I followed the very usual oVirt engine upgrade path. Smooth.
Eventually, I upgraded the hosts with ovirt-ansible-cluster-upgrade as
usual.
The result was frightening because the hosts were put in maintenance,
upgraded, back to life, seen unavailable, unreachable, connecting, alive,
rebooted, then back to another turn and looping...
During this, the SPM role was obviously jumping around, and that did not
help the debug.
In the end, it appeared that something during an upgrade stopped and
disabled the NFS service. My hosts partially relied on it, so after having
restarted the NFS service, all came back to life.
The NFS disabling may come from the CentOS upgrade, except if someone tells
me it could come from something on the oVirt side?
Well, it *could*, you know - it's a bit hard to guess without any logs...
But I do not think so.
Was the upgrade from 4.2.4, or something older?
If it was disabled directly by engine-setup, it should definitely be
mentioned in the log. Otherwise, it's hard to say. Seems like 'systemctl
disable someservice' does not log this anywhere. You might get a clue by
checking the exact time you started getting error from your hosts (nfs
clients) and try to correlate that with engine-side logs.
I'm sure the RH people will advice me not to run NFS on the engine, but
apart this event, I had no trouble doing this in years.
Assuming you do not want to run hosted-engine, if I were you, I'd run the
engine on a KVM vm on that host (using virt-manager/virsh/whatever) and
the NFS server on another VM.
Best regards,
--
Didi