[ovirt-users] New user intro & some questions
George Skorup
george at mwcomm.com
Fri Jan 30 02:13:30 UTC 2015
Hello oVirt Users Community,
I've been working with Red Hat and RHEL and clones for about 11 years,
though I do still consider myself amateur mostly because I'm more of a
networking guy. :) One-man IT department so I get very little time to
tinker.
I'm evaluating oVirt (because the boss said no to VMware) and will
likely begin implementation soon to virtualize our datacenter. So I have
a SuperMicro Twin2 (4 nodes) system and a cheap managed L2+ switch to
use for now. Dual 6-core Xeon's and 24GB per node. The two on-board
82574L's are bonded 802.3ad, no issues there (so far). I currently have
two 1TB WD RE4 SATA drives configured as RAID1 using the Intel RAID BIOS
in each node. I understand this is software RAID. That's all working
fine and I did this so that if a drive dies then I can still boot the
machine(s). I have a 500MB partition formatted as ext4 for /boot. A 48GB
ext4 for the root. 24GB for swap. And finally the rest (800-something
GB) is LVM and XFS for Gluster.
I've been following Jason Brooks' "Up and Running with oVirt" guides
(which are great, BTW!). I have the cluster up and running with CentOS 7
and oVirt 3.5, hosted-engine on CentOS 6.6 and CTDB to host a virtual IP
for the engine NFS mount. There are a couple test VMs running along with
the engine on various nodes. I found it interesting that I was able to
upload a ripped ISO of Win 2k3 Enterprise (not SP2) and was able to
successfully boot it, after which I promptly installed SP2 and oVirt
guest tools. I do very little with Windows, but there's always that one
remaining customer that needs IIS and we're not about to buy a new
Windows Server 2012 license just for them.
So anyway, I'm having a problem with node reboots. They simply will not
shut down and reboot cleanly. Instead, it looks like they hang after all
processes are shut down, or at least attempted to be shut down. Then
after a couple minutes, the hardware watchdog resets the system. I've
came to the conclusion that sanlock and/or wdmd is causing the hangup.
I'm guessing an active but non-responsive NFS mount is the culprit,
possibly the ISO domain NFS mount which is on the engine? I've tried
manually shutting down all oVirt, VDSM, etc. processes, unmounting all
NFS shares, but it seems sanlock still has a hold on something in
/rhev/.. I've Google'd a bit and have come across posts about this as
well. Any tips here?
Then I experienced something else odd yesterday. I did a yum update for
the glibc vulnerability stuff. Gluster was updated as well which really
threw a wrench into things because I wasn't paying attention and quorum
broke, etc. I got that fixed. Rebooted all nodes (which is when I found
the sanlock/watchdog problem). Nodes 2, 3 and 4 came back up, but node1
did not. I logged into the IPKVM console and found that it had no
network configuration. All /etc/sysconfig/network-scripts/ifcfg-* files
were gone. I was able to manually reconfigure the physical interfaces,
set the bonding back up and add the ovirtmgmt bridge. But then the
engine reported the host as non-operational due to '..does not comply
with cluster default networks... ovirtmgmt missing' which I was able to
resolve by reconfiguring the host's network config within the engine GUI
and all is now well. I'm just curious how/why the ifcfg files were wiped
out? I haven't touched the network config on any hosts since running
hosted-engine --deploy.
Please forgive my ignorance and point me to the correct place if these
issues have been discussed and/or resolved already.
And overall I'm very much liking oVirt, especially as a viable and
cost-effective alternative to vSphere.
Thanks,
George
More information about the Users
mailing list