Greetings,
My setup is a complete Red Hat install.
Manager OS: RHEL 7.5
Hypervisors OS: RHEL 7.5
Running Red Hat CephFS (with their Ceph repos on all of the systems)
with Red Hat Virtualization (aka oVirt).
Everything is fully patched and updated as of yesterday morning.
Yes, I have valid Red Hat support but I figured this was an odd enough
problem that the community (and the Red-Hat-ers who hang out on this
list) might have a better idea of where to start. (Although I might open
a ticket anyway just because that is what support is for, right? :)
Quick background:
Your /etc/fstab when you mount a nfs should probably look something like
this:
<your.nfs.ip.addr>:/path/ /mount/point nfs <various options> 0 0
Just one IP is needed. Since part of the redundancy for Ceph is in the
monitors, to mount CephFS the fstab should look something like this:
<your.ceph.ip.addr1>,<your.ceph.ip.addr2>,<your.ceph.ip.addrX>:/path/
/mount/point ceph <various options> 0 0
Both the Ceph community and Red Hat recommend the comma separator for
mounting multiple CephFS monitor nodes. (See section 4.2 point 3)
https://access.redhat.com/documentation/en-us/red_hat_ceph_storage/2/html...
Now to oVirt/RHV.
When I mount my Data Domain path as a Posix file system with a path of
"<your.ceph.ip.addr1>:/path/" it works splendidly well (especially after
the last Red Hat kernel update!). I've done a bunch of stuff to it and
it seems to work every time. However, I don't have the redundancy of
multiple Ceph Monitors.
When I mount my Data Domain path as a Posix file system with a path of
"<your.ceph.ip.addr1>,<your.ceph.ip.addr2>,<your.ceph.ip.addrX>:/path/"
most things seem to work. But I noticed a higher rate of failures. The
only failure that I can trigger 100% of the time though is to mount a
second data import domain and attempt to copy a vm disk from the import
into the CephFS Data domain. Then I get an error like this:
would
VDSM ovirt01 command HSMGetAllTasksStatusesVDS failed:
low level Image copy failed:
(u'Destination volume 7c1bb510-9f35-4456-8d51-0955f788ac3e error:
ParamsList: sep , in
/rhev/data-center/mnt/<your.ceph.ip.addr1>,<your.ceph.ip.addr2>,<your.ceph.ip.addr3>:_ovirt_data/70fb34ad-e66d-43e6-8412-5e020baa34df/images/23991a68-0c43-433f-b1f9-48b1533da54a',)
Uh, oh. It seems that the commas in the mount path are causing the
problems. So I went looking through the logs for "sep , in" and found a
bunch more hits which makes me think that this is actually the problem
message.
I've switched back to just one IP address for the time being but I
obviously want the Ceph redundancy back. While running on just one IP,
the vm disk that refused to copy before had no problem copying. The
_only_ change I made was dropping two of the three IP's from the Data
Domain path option.
Is this a bug, or did I do something wrong?
Does anyone have a suggestion for me to try?
Thank you!
~Stack~