Thanks for the response!

On Sat, Sep 29, 2012 at 8:21 AM, Dave Neary <dneary@redhat.com> wrote:
Hi,


On 09/29/2012 01:37 PM, Hans Lellelid wrote:
I apologize in advance that this email is less about a specific
problem and more a general inquiry as to the most recommended /
likely-to-be-successful way path.

Having just gone through the process, I hope I can help a little! You might want to check (and add to) the Troubleshooting page where I documented the various hiccups I had, and how I addressed them:

http://wiki.ovirt.org/wiki/Troubleshooting

There's also "Node Troubleshooting" and "Troubleshooting NFS Storage Issues" which might help you: http://wiki.ovirt.org/wiki/Node_Troubleshooting and http://wiki.ovirt.org/wiki/Troubleshooting_NFS_Storage_Issues

Also Jason Brooks's "Up and running with oVirt 3.1" article is useful I think: http://blog.jebpages.com/archives/up-and-running-with-ovirt-3-1-edition/



I have read a few of those resources, but not the main "Troubleshooting" page, so I will scour the wiki to see if something might help me out.
 
2nd attempt: I re-installed the nodes as Fedora 17 boxes and
downgraded the kernels to 3.4.6-2.  Then I connected these from the
Engine (specifying the root pw) and watched the logs while things
installed.  After reboot neither of the servers were reachable.
Sitting in front of the console, I realized that networking was
refusing to start; several errors printed to the console looked like:

When you say that they are not reachable, what do you mean? By default, installing F17 as a node sets the iptables settings to:
<snip>

I mean, that the network interfaces cannot be brought up, not an iptables issue.  Sitting (well, standing, they're rack-mounted) in front of the servers yields the multipath errors I mention when trying to start networking.  What I started doing (and will likely continue to pursue) is running etc under source control and start combing through the changes that are introduced when I do the remote setup from the engine to see if I can pick apart where it's going south.
 
So if you're trying to ping the nodes, you should see nothing, but ssh, snmp and vdsm should be available. If you have a local console access to the nodes, you should check the IPTables config.

I don't understand why you would lose your network connection entirely, though. I don't think that the network config for the nodes is changed by the installer.


Yeah, it's definitely changed by the installer.  The installer sets up the ovirt-bridge (I think that is what it was called) and changes the primary interfaces to reference the bridge, etc.  I didn't seen anything obviously wrong with the setup, but clearly it was not working.  (I also didn't know exactly what I was starting from, so that is my mistake and I should be able to approach the next time with more confidence.)  I did the bridge setup manually myself for attempt #3 and didn't have any problems.


3rd attempt: I re-installed the nodes with Fedora 17 and attempted to
install VDSM manually by RPM.  Despite following the instructions to
turn off ssl (ssl=false in /etc/vdsm/vdsm.conf), I am seeing SSL
"unknown cert" errors from the python socket server with every attempt
of the engine to talk to the node.

Hopefully the "Node Troubleshooting" page (or somebody else) can help you here, I'm afraid I can't.


The
Fedora-17-installed-by-engine sounds good, but there's a lot of magic
there & it obviously completely broke my systems.  Is that where I
should focus my efforts?  Should I ditch NFS storage and just try to
get something working with local-only storage on the nodes?  (Shared
storage would be a primary motivation for moving to ovirt, though.)

I would focus on this approach, and would continue to aim to use NFS storage. It works fine as long as you are on the 3?4?x kernels.


I am very excited for this to work for me someday.  I think it has
been frustrating to have such sparse (or outdated?) documentation and
such fundamental problems/bugs/configuration challenges.  I'm using
pretty standard (Dell) commodity servers (SATA drives, simple RAID
setups, etc.).

The "Quick Setup Guide" was useful to me, as long as everything went well: http://wiki.ovirt.org/wiki/Quick_Start_Guide

Hope some of that is helpful!


I will take a look a that guide -- not sure if I've read that one yet.  I will follow back up with what I learn / what works so it might help others.  If there's a way that I can update the wiki to help those in my specific predicament, I will do that too.  (It's possible there is something about my [Dell] hardware that is not compatible with oVirt's default installer, etc.)

Thanks,
Hans