[Users] New user to oVirt, and I haz a sad so far...
Will Dennis (Live.com)
willarddennis at live.com
Thu Jan 16 23:08:12 EST 2014
Hi all, ready for a story? (well, more of a rant, but hopefully it will be a
good UX tale, and may even be entertaining.)
Had one of the groups come to me at work this week and request a OpenStack
setup. When I sat down and discussed their needs, it turns out that they
really only need a multi-hypervisor setup where they can spin up VMs for
their research projects. The VMs should be fairly long-lived, and will have
persistent storage. Their other request is that the storage should be local
on the hypervisor nodes (they plan to use Intel servers with 8-10 2TB drives
for VM storage on each node.) They desire this in order to keep the VM I/O
local - they do not have a SAN of any sort anyhow, and they do not care
about live migration, etc.
In any case, knowing that they did not want to afford a VMware setup (which
is what I'm used to using), I proposed using oVirt to fill their needs,
having heard and read up on it a bit (It's "open-source VMware", right?)
even though I had not used it before (I have however made single-node KVM
hypervisors for their group before, utilizing Open vSwitch, libvirt,
virt-manager etc., so I'm not completely ignorant of KVM/libvirt etc.)
In any case, I took one of their older servers which was already running
CentOS 6.5, installed the requisite packages on it, and in short order had
an engine server up and running (oVirt 3.3.2). That seems to have been the
easy part :-/ Now came the installation of a hypervisor node. I downloaded
and burned an ISO of the latest oVirt node installer
(ovirt-node-iso-3.0.3-1.1.vdsm.fc19.iso) and tried to install it on one of
their target Intel servers. On the 1st try I got to the end of the setup
TUI, invoked the Install link, and was promptly thrown an error (sorry, but
forgot what it was, something like "press X for a command prompt, or
Reboot".) No problem, I rebooted, selected booting off the CD again, waited
until the TUI came up, and when I tried to move past the first screen, it
threw me out to a login prompt. OK, enough of that (the server takes a long
time to reboot, and then boot off the CD) - I then thought I would try it on
a VMware Workstation VM (yes, I get the irony, but VMware wkstn can handle
nested virt, so it's a great testbed platform for OpenStack, etc.) because
that would install a heck of a lot faster. That went a lot better - got the
oVirt node 3.0.3 installed on the first try.
More pain was soon to follow, however. I logged in and started configuring
the node. The TUI was easy enough - much like an ESXi node ;) I set the NIC
to IPv4 static, entered in the correct IP info, registered a DNS name for
the IP I had assigned, and then tested pinging the engine, all was good. I
then moved on to the section where you define the engine. I entered in the
FQDN of the engine, verified the key fingerprint, and clicked the "Save and
Register" link at the bottom. That seemed to work, so I completed the rest
of the TUI, and then looked at the oVirt engine web UI. There was my new
node, ready for authorization. I clicked the link to authorize it, and after
a while, the UI came back with "Install Failed" status. Hmmm. So I went back
to the node's TUI, and now some of the screens said that the IP addr was
unconfigured? I went then to the Network screen, and sure enough, the NIC at
the bottom showed "Unconfigured". WTF? So I went and entered in the correct
info back in the IPv4 section, and then arrowed down to the Save link and
clicked it - and the next screen said something like "No info needing
changes, nothing to do." Whaaaa? Went back to the network setup screen, NIC
still showing "Unconfigured" even though the IPv4 info still was there. I
did a ping test at this point from the Ping link on the network setup page,
and what do you know - I could still ping IP's (the engine, the default gw,
etc.) But as I moved around the TUI, other screens still said that the
network was unconfigured. Went back to the Web UI of the engine, put the
host in Maint, then tried to Activate it, still no go - Install Failed. Even
though I had configured the node to allow remote access and set a password,
and also verified via nmap that TCP port 22 on the node was indeed
listening, when I tried to SSH into the node as admin, I immediately got a
"connection closed" message, so that failed as well. Went back to the node's
network setup page, set the IPv4 to "Disabled", saved it, then went back and
set it back to "Static" then re-entered the IPv4 info. Clicked the Save
link, it went thru the setup again, came back with a success, verified with
ping etc. that networking was working on the node. The engine web UI still
said that it could not connect to the node however. So I put the node in
Maint, and then removed it. I went back to the node, went to the Engine
setup page, and re-did the screen to define the engine on the node. I notice
that after I did this, however, that the node screens went back to saying
that the network was unconfigured. Grrrrrr. But the node was back in the
engine's Web UI, however no joy this time either - "Install failed" again.
Well, the hell with this, said I - I removed the node again from the engine,
and went and installed Fedora 19 minimal install on the VM, so I could use
the directions found in
http://www.ovirt.org/Quick_Start_Guide#Install_Fedora_Host and give that a
try. (At least I can see what's going on with the node's OS using F19.)
Installing F19 on the VM was a breeze, then logged in as root and did the
"yum localinstall http://ovirt.org/releases/ovirt-release-fedora.noarch.rpm"
which ran fine. I then stopped firewalld, and set it not to run at boot
(which is what we typically do anyhow for internal research servers.) Then
went over to the engine UI, and manually added the node. Oh happy day - the
node seemed to install OK - it had the status of "Installing" for quite some
time, and I looked at the processes on the F19 node, and could see python
installer programs running via an SSH session from the engine. HOWEVER, at
the end of the process, the Web UI reported a status of "Non Responsive",
even though the F19 node looks OK (it had sanlock, supervdsmServer, vdsm
processes running.) So thinking that it may take an after-install reboot, I
did that, waited until the node came back up again, then clicked in the
engine web UI and executed the "Confirm 'Host has been rebooted'" command,
but still no good - the node remains in "Non Responsive" status.
So I have no idea on how to proceed now, and what methods I can use to try
and debug the connectivity problem between the engine and node. And there's
many miles to go in setting up the whole environment. Maaaaaybe OpenStack
would be easier ;-P No, I will press on and try to get this thing
working... It seems it works for others, and looks like the right fit for
the job. Just wish it was easier to get up and running.
Oh yes, I tried reading the docs - go to
http://www.ovirt.org/Quick_Start_Guide#Install_Hosts and click on the link
for "see the oVirt Node deployment documentation." Not too helpful... (Bug
report has been opened.)
Thanks for reading, and any clues on getting a node up and running
gratefully accepted...
- Will
More information about the Users
mailing list