On Thu, Dec 29, 2016 at 10:53 AM, Jim Kusznir <jim(a)palousetech.com> wrote:
I've been involved in virtualization from its very early days, and been
running linux virtualization solutions off and on for a decade.
Previously, I was always frustrated with the long feature list offered by
many linux virtualization systems but with no reasonable way to manage
that. It seemed that I had to spend an inordinate amount of time doing
everything by hand. Thus, when I found oVirt, I was ecstatic!
Unfortunately, at that time I changed employment (or rather left employment
and became self-employed), and didn't have any reason to build my own virt
So I'm back with oVirt, and actually deploying a small 3-node cluster. I
intend to run on it:
Business backend server
UniFi management server
Monitoring server (zabbix)
Not a heavy load, and 3 servers is probably overkill, but I need this to
work, and it sounds like 3 is the magic entry level for all the
cluster/failover stuff to work. For now, my intent is to use a single SSD
on each node with gluster for the storage backend. I figure if all the
failover stuff actually working, if I loose a node due to disk failure, its
not the end of the world. I can rebuild it, reconnect gluster, and restart
everything. As this is for a startup business, funds are thin at the
moment, so I'm trying to cut a couple corners that don't affect overall
reliability. If this side of the business grows more, I would likely
invest in some dedicated servers.
Welcome back to oVirt :)
So far, I've based my efforts around this guide on oVirt's website:
My cluster is currently functioning, but not entirely correctly. Some of
it is gut feel, some of it is specific test cases (more to follow). First,
some areas that lacked clarity and the choices I made in them:
Early on, Jason talks about using a dedicated gluster network for the
gluster storage sync'ing. I liked that idea, and as I had 4 nics on each
machine, I thought dedicating one or two to gluster would be fine. So, on
my clean, bare machines, I setup another network with private NiCs and put
it on a standalone switch. I added hostnames with a designator (-g on the
end) for the IPs for all three nodes into /etc/hosts on all three nodes so
now each node can resolve itself and the other nodes on the -g name (and
private IP) as well as their main host name and "more public" (but not
Then, for gdeploy, I put the hostnames in as the -g hostnames, as I didn't
see anywhere to tell gluster to use the private network. I think this is a
place I went wrong, but didn't realize it until the end....
-g hostnames are the right ones to put in for gdeploy. gdeploy peer probes
the cluster and creates the gluster volumes, so it needs the gluster
specific ip addresses.
I set up the gdeploy script (it took a few times, and a few OS rebuilds to
get it just right...), and ran it, and it was successful! When complete, I
had a working gluster cluster and the right software installed on each node!
Were these errors specific to gdeploy configuration? With the latest
release of gdeploy, there's an option "skip_<section-name>_errors".
could help avoid the OS rebuilds, I think.
I set up the engine on node1, and that worked, and I was able to log in to
the web gui. I mistakenly skipped the web gui enable gluster service
before doing the engine vm reboot to complete the engine setup process, but
I did go back in after the reboot and do that. After doing that, I was
notified in the gui that there were additional nodes, did I want to add
them. Initially, I skipped that and went back to the command line as Jason
suggests. Unfortunately, it could not find any other nodes through his
method, and it didn't work. Combine that with the warnings that I should
not be using the command line method, and it would be removed in the next
release, I went back to the gui and attempted to add the nodes that way.
Here's where things appeared to go wrong...It showed me two additional
nodes, but ONLY by their -g (private gluster) hostname. And the ssh
fingerprints were not populated, so it would not let me proceed. After
messing with this for a bit, I realized that the engine cannot get to the
nodes via the gluster interface (and as far as I knew, it shouldn't).
Working late at night, I let myself "hack it up" a bit, and on the engine
VM, I added /etc/hosts entries for the -g hostnames pointing to the main
IPs. It then populated the ssh host keys and let me add them in. Ok, so
things appear to be working..kinda. I noticed at this point that ALL
aspects of the gui became VERY slow. Clicking in and typing in any field
felt like I was on ssh over a satellite link. Everything felt a bit worse
than the early days of vSphere....Painfully slow. but it was still
working, so I pressed on.
Import host flow lists the peers as gluster understands it, and hence the
-g (private gluster) hostname. Rather than importing the hosts, you should
add the additional hosts using the Add Host flow, and specify the non "-g"
hostname. This ensures that oVirt understands the host via the non-private
hostname. Once the hosts are added, mark the gluster interface so that the
bricks are correctly identified via the -g hostname.
I configured gluster storage. Eventually I was successful, but
it would only let me add a "Data" storage domain, the drop-down menu did
NOT contain iso, export, or anything else... Somehow, on its own, after
leaving and re-entering that tab a few times, iso and export materialized
on their own in the menu, so I was able to finish that setup.
Ok, all looks good. I wanted to try out his little tip on adding a VM,
too. I saw "ovirt-imiage-repository" in the "external providers"
but he mentioned it in the storage section. It wasn't there on mine, and
in external providers, I couldn't find anyway to do anything useful. I
tried and fumbled with this, and still, I have not figured out how to use
this feature. It would be nice....
Anyway, I moved on for now. As I was skeptical that things were set up
correctly, i tried putting node 1 (which was running my engine, and was NOT
set up with the -g hostname) into maintence mode, to see if it really did
smoothly failover. It failed to go into maintence mode (left it for 12
hours, too!). I suspect its because of the hostnames/networks in use.
Oh, I forgot to mention...I did follow the instructions in Jason's guide
to set up the gluster network in ovirt and map that to the right physical
interface on all 3 nodes. I also moved migration from the main network to
the gluster network as Jason had suggested.
So...How badly did I do? How do I fix the issues? (I'm not opposed to
starting from scratch again, either...I've already done that 3-4 times in
the early phases of getting the gdeploy script down, and I already have
kickstart files setup with a network environment...I was rebuilding that
often! I just need to know how to fix my setup this time....)
I do greatly appreciate others' help and insight. I am in the IRC channel
under kusznir currently, too.
Users mailing list