Hello:

I've been involved in virtualization from its very early days, and been running linux virtualization solutions off and on for a decade.  Previously, I was always frustrated with the long feature list offered by many linux virtualization systems but with no reasonable way to manage that.  It seemed that I had to spend an inordinate amount of time doing everything by hand.  Thus, when I found oVirt, I was ecstatic!  Unfortunately, at that time I changed employment (or rather left employment and became self-employed), and didn't have any reason to build my own virt cluster..until now!

So I'm back with oVirt, and actually deploying a small 3-node cluster.  I intend to run on it:
VoIP Server
Web Server
Business backend server
UniFi management server
Monitoring server (zabbix)

Not a heavy load, and 3 servers is probably overkill, but I need this to work, and it sounds like 3 is the magic entry level for all the cluster/failover stuff to work.  For now, my intent is to use a single SSD on each node with gluster for the storage backend.  I figure if all the failover stuff actually working, if I loose a node due to disk failure, its not the end of the world.  I can rebuild it, reconnect gluster, and restart everything.  As this is for a startup business, funds are thin at the moment, so I'm trying to cut a couple corners that don't affect overall reliability.  If this side of the business grows more, I would likely invest in some dedicated servers.

So far, I've based my efforts around this guide on oVirt's website:
http://www.ovirt.org/blog/2016/08/up-and-running-with-ovirt-4-0-and-gluster-storage/

My cluster is currently functioning, but not entirely correctly.  Some of it is gut feel, some of it is specific test cases (more to follow).  First, some areas that lacked clarity and the choices I made in them:

Early on, Jason talks about using a dedicated gluster network for the gluster storage sync'ing.  I liked that idea, and as I had 4 nics on each machine, I thought dedicating one or two to gluster would be fine.  So, on my clean, bare machines, I setup another network with private NiCs and put it on a standalone switch.  I added hostnames with a designator (-g on the end) for the IPs for all three nodes into /etc/hosts on all three nodes so now each node can resolve itself and the other nodes on the -g name (and private IP) as well as their main host name and "more public" (but not public) IP.

Then, for gdeploy, I put the hostnames in as the -g hostnames, as I didn't see anywhere to tell gluster to use the private network.  I think this is a place I went wrong, but didn't realize it until the end....

I set up the gdeploy script (it took a few times, and a few OS rebuilds to get it just right...), and ran it, and it was successful!  When complete, I had a working gluster cluster and the right software installed on each node!

I set up the engine on node1, and that worked, and I was able to log in to the web gui.  I mistakenly skipped the web gui enable gluster service before doing the engine vm reboot to complete the engine setup process, but I did go back in after the reboot and do that.  After doing that, I was notified in the gui that there were additional nodes, did I want to add them.  Initially, I skipped that and went back to the command line as Jason suggests.  Unfortunately, it could not find any other nodes through his method, and it didn't work.  Combine that with the warnings that I should not be using the command line method, and it would be removed in the next release, I went back to the gui and attempted to add the nodes that way.

Here's where things appeared to go wrong...It showed me two additional nodes, but ONLY by their -g (private gluster) hostname.  And the ssh fingerprints were not populated, so it would not let me proceed.  After messing with this for a bit, I realized that the engine cannot get to the nodes via the gluster interface (and as far as I knew, it shouldn't).  Working late at night, I let myself "hack it up" a bit, and on the engine VM, I added /etc/hosts entries for the -g hostnames pointing to the main IPs.  It then populated the ssh host keys and let me add them in.  Ok, so things appear to be working..kinda.  I noticed at this point that ALL aspects of the gui became VERY slow.  Clicking in and typing in any field felt like I was on ssh over a satellite link.  Everything felt a bit worse than the early days of vSphere....Painfully slow.  but it was still working, so I pressed on.

I configured gluster storage.  Eventually I was successful, but initially it would only let me add a "Data" storage domain, the drop-down menu did NOT contain iso, export, or anything else...  Somehow, on its own, after leaving and re-entering that tab a few times, iso and export materialized on their own in the menu, so I was able to finish that setup.

Ok, all looks good.  I wanted to try out his little tip on adding a VM, too.  I saw "ovirt-imiage-repository" in the "external providers" section, but he mentioned it in the storage section.  It wasn't there on mine, and in external providers, I couldn't find anyway to do anything useful.  I tried and fumbled with this, and still, I have not figured out how to use this feature.  It would be nice....

Anyway, I moved on for now.  As I was skeptical that things were set up correctly, i tried putting node 1 (which was running my engine, and was NOT set up with the -g hostname) into maintence mode, to see if it really did smoothly failover.  It failed to go into maintence mode (left it for 12 hours, too!).  I suspect its because of the hostnames/networks in use.

Oh, I forgot to mention...I did follow the instructions in Jason's guide to set up the gluster network in ovirt and map that to the right physical interface on all 3 nodes.  I also moved migration from the main network to the gluster network as Jason had suggested.

So...How badly did I do?  How do I fix the issues?  (I'm not opposed to starting from scratch again, either...I've already done that 3-4 times in the early phases of getting the gdeploy script down, and I already have kickstart files setup with a network environment...I was rebuilding that often!  I just need to know how to fix my setup this time....)

I do greatly appreciate others' help and insight.  I am in the IRC channel under kusznir currently, too.

--Jim