Greetings,
My setup is a complete Red Hat install.
Manager OS: RHEL 7.5
Hypervisors OS: RHEL 7.5
Running Red Hat CephFS (with their Ceph repos on all of the
systems)
with Red Hat Virtualization (aka oVirt).
Everything is fully patched and updated as of yesterday
morning.
Yes, I have valid Red Hat support but I figured this was an
odd enough
problem that the community (and the Red-Hat-ers who hang out
on this
list) might have a better idea of where to start. (Although I
might open
a ticket anyway just because that is what support is for,
right? :)
Quick background:
Your /etc/fstab when you mount a nfs should probably look
something like
this:
<your.nfs.ip.addr>:/path/ /mount/point nfs <various
options> 0 0
Just one IP is needed. Since part of the redundancy for Ceph
is in the
monitors, to mount CephFS the fstab should look something like
this:
<your.ceph.ip.addr1>,<your.ceph.ip.addr2>,<your.ceph.ip.addrX>:/path/
/mount/point ceph <various options> 0 0
Both the Ceph community and Red Hat recommend the comma
separator for
mounting multiple CephFS monitor nodes. (See section 4.2 point
3)
https://access.redhat.com/documentation/en-us/red_hat_ceph_storage/2/html/ceph_file_system_guide_technology_preview/mounting_and_unmounting_ceph_file_systems
Now to oVirt/RHV.
When I mount my Data Domain path as a Posix file system with a
path of
"<your.ceph.ip.addr1>:/path/" it works splendidly well
(especially after
the last Red Hat kernel update!). I've done a bunch of stuff
to it and
it seems to work every time. However, I don't have the
redundancy of
multiple Ceph Monitors.
When I mount my Data Domain path as a Posix file system with a
path of
"<your.ceph.ip.addr1>,<your.ceph.ip.addr2>,<your.ceph.ip.addrX>:/path/"
most things seem to work. But I noticed a higher rate of
failures. The
only failure that I can trigger 100% of the time though is to
mount a
second data import domain and attempt to copy a vm disk from
the import
into the CephFS Data domain. Then I get an error like this:
would
VDSM ovirt01 command HSMGetAllTasksStatusesVDS failed:
low level Image copy failed:
(u'Destination volume 7c1bb510-9f35-4456-8d51-0955f788ac3e
error:
ParamsList: sep , in
/rhev/data-center/mnt/<your.ceph.ip.addr1>,<your.ceph.ip.addr2>,<your.ceph.ip.addr3>:_ovirt_data/70fb34ad-e66d-43e6-8412-5e020baa34df/images/23991a68-0c43-433f-b1f9-48b1533da54a',)
Uh, oh. It seems that the commas in the mount path are causing
the
problems. So I went looking through the logs for "sep , in"
and found a
bunch more hits which makes me think that this is actually the
problem
message.
I've switched back to just one IP address for the time being
but I
obviously want the Ceph redundancy back. While running on just
one IP,
the vm disk that refused to copy before had no problem
copying. The
_only_ change I made was dropping two of the three IP's from
the Data
Domain path option.
Is this a bug, or did I do something wrong?
Looks like a bug,aybe vdsm is not parsing the mount spec
correctly.
Please file vdsm bug and attach vdsm logs showing the entire
flow.
Regardless, I'm not sure how well oVirt with cephfs is
tested, or recommended.
Adding Yaniv t9 add more info on this.
Nir