Well, after a very stressful weekend, I think I have things largely working. Turns out that most of the above issues were caused by the linux permissions of the exports for all three volumes (they had been reset to 600; setting them to 774 or 770 fixed many of the issues). Of course, I didn't find that until a much more harrowing outage, and hours and hours of work, including beginning to look at rebuilding my cluster....
So, now my cluster is operating again, and everything looks good EXCEPT for one major Gluster issue/question that I haven't found any references or info on.
my host ovirt2, one of the replica gluster servers, is the one that lost its storage and had to reinitialize it from the cluster. the iso volume is perfectly fine and complete, but the engine and data volumes are smaller on disk on this node than on the other node (and this node before the crash). On the engine store, the entire cluster reports the smaller utilization on mounted gluster filesystems; on the data partition, it reports the larger size (rest of cluster). Here's some df statments to help clarify:
(brick1 = engine; brick2=data, brick4=iso):
Filesystem Size Used Avail Use% Mounted on
/dev/mapper/gluster-engine 25G 12G 14G 47% /gluster/brick1
/dev/mapper/gluster-data 136G 125G 12G 92% /gluster/brick2
/dev/mapper/gluster-iso 25G 7.3G 18G 29% /gluster/brick4
192.168.8.11:/engine 15G 9.7G 5.4G 65% /rhev/data-center/mnt/glusterSD/192.168.8.11:_engine
192.168.8.11:/data 136G 125G 12G 92% /rhev/data-center/mnt/glusterSD/192.168.8.11:_data
192.168.8.11:/iso 13G 7.3G 5.8G 56% /rhev/data-center/mnt/glusterSD/192.168.8.11:_iso
View from ovirt2:
Filesystem Size Used Avail Use% Mounted on
/dev/mapper/gluster-engine 15G 9.7G 5.4G 65% /gluster/brick1
/dev/mapper/gluster-data 174G 119G 56G 69% /gluster/brick2
/dev/mapper/gluster-iso 13G 7.3G 5.8G 56% /gluster/brick4
192.168.8.11:/engine 15G 9.7G 5.4G 65% /rhev/data-center/mnt/glusterSD/192.168.8.11:_engine
192.168.8.11:/data 136G 125G 12G 92% /rhev/data-center/mnt/glusterSD/192.168.8.11:_data
192.168.8.11:/iso 13G 7.3G 5.8G 56% /rhev/data-center/mnt/glusterSD/192.168.8.11:_iso
As you can see, in the process of rebuilding the hard drive for ovirt2, I did resize some things to give more space to data, where I desperately need it. If this goes well and the storage is given a clean bill of health at this time, then I will take ovirt1 down and resize to match ovirt2, and thus score a decent increase in storage for data. I fully realize that right now the gluster mounted volumes should have the total size as the least common denominator.
So, is this size reduction appropriate? A big part of me thinks data is missing, but I even went through and shut down ovirt2's gluster daemons, wiped all the gluster data, and restarted gluster to allow it a fresh heal attempt, and it again came back to the exact same size. This cluster was originally built about the time ovirt 4.0 came out, and has been upgraded to 'current', so perhaps some new gluster features are making more efficient use of space (dedupe or something)?
Thank you for your assistance!
--JIm