Looks like there still needs to be some work done on oVirt 4.0 Node and
ovirt-hosted-engine-setup before it's ready for general consumption. I
have spent days trying to get this to work, and only got it running (on
one host) after encountering 8 serious issues (7 below and the initial
glusterfs one). I have not been able to successfully deploy a second
host (see issue 7 below). I will be moving back to deploying hosts using
CentOS (with either oVirt 4.0 or oVirt 3.6) as I need a working oVirt
deployment up and running.
In case anyone is interested in reproducing the issues, I used the Node
ISO here [1] and the latest (7/2/2016) engine appliance OVA here [2].
Those seem to be the "official" files as far as I can tell (which is
difficult as the documentation is not clear).
List of issues:
1. The error I mentioned seems to be an problem with the code. I
bypassed it by deleting
/usr/libexec/vdsm/hooks/before_network_setup/50_fcoe.
2. ovirt-hosted-engine-setup is unable to connect to the vdsm service if
the FQDN of the node is not resolvable (i.e. if a DNS server is not
entered in the initial setup). This should be checked in either the
initial oVirt Node setup process or the beginning of
ovirt-hosted-engine-setup.
3. The management bridge does not get created properly when the server
is set up with a manually configured DNS server and running
NetworkManager (the default on Node). It seems like a bug has been filed
for this back in 2014. [3]
4. Using cloud-init with default values to customize the engine
appliance can fail on the line "Creating/refreshing DWH database schema"
if it takes longer than 600 seconds to return output. This may apply to
any other step that takes a long time to complete. The VM no longer
appears to be exist after the setup exits that so I am unable to debug.
5. Without using cloud-init, the setup creates an engine VM that I
cannot log into (it does not seem to use the engine admin password or a
blank password).
6. Destroying the VM (option 4) leaves the files intact on the shared
storage so I cannot restart setup without deleting those first. This may
be intentional, but the use of kvm terminology (destroy for power off)
is not common, not to mention that "virsh -r list --all" does not list
the VM anymore.
7. Unable to deploy second host through web UI (error "Failed to
configure management network on host node2 due to setup networks
failure.") or using ovirt-hosted-engine-setup (it looks like it can't
connect to or doesn't start the broker service).
8. Random errors to stderr: "vcpu0 unhandled rdmsr" (this seems to be an
upstream bug) and "multipath: error getting device" (this has been an
issue for years with oVirt and seems to be due to multipathing being on
by default even for systems where that does not apply).
[1]
It looks like I'm now getting an error when the deployment tries
to
configure the management bridge.
Setup log:
2016-07-01 20:29:47 INFO otopi.plugins.gr_he_common.network.bridge
bridge._misc:
372 Configuring the management bridge
2016-07-01 20:29:48 DEBUG otopi.plugins.gr_he_common.network.bridge
bridge._misc
:384 networks: {'ovirtmgmt': {'nic': 'eno1', 'ipaddr':
u'192.168.1.211', 'netmask': u'255.255.255.0',
'bootproto': u'none',
'gateway': u'192.168.1.1', 'defaultRoute': True}}
2016-07-01 20:29:48 DEBUG otopi.plugins.gr_he_common.network.bridge
bridge._misc
:385 bonds: {}
2016-07-01 20:29:48 DEBUG otopi.plugins.gr_he_common.network.bridge
bridge._misc
:386 options: {'connectivityCheck': False}
2016-07-01 20:29:48 DEBUG otopi.context context._executeMethod:142
method exception
Traceback (most recent call last):
File "/usr/lib/python2.7/site-packages/otopi/context.py", line 132,
in _executeMethod
method['method']()
File
"/usr/share/ovirt-hosted-engine-setup/scripts/../plugins/gr-he-common/network/bridge.py",
line 387, in _misc
_setupNetworks(conn, networks, bonds, options)
File
"/usr/share/ovirt-hosted-engine-setup/scripts/../plugins/gr-he-common/network/bridge.py",
line 405, in _setupNetworks
'message: "%s"' % (networks, code, message))
RuntimeError: Failed to setup networks {'ovirtmgmt': {'nic':
'eno1',
'ipaddr': u'192.168.1.211', 'netmask': u'255.255.255.0',
'bootproto':
u'none', 'gateway': u'192.168.1.1', 'defaultRoute':
True}}. Error
code: "78" message: "Hook error: Hook Error: ('Traceback (most recent
call last):\n File
"/usr/libexec/vdsm/hooks/before_network_setup/50_fcoe", line 18, in
<module>\n from vdsm.netconfpersistence import
RunningConfig\nImportError: No module named netconfpersistence\n',)"
2016-07-01 20:29:48 ERROR otopi.context context._executeMethod:151
Failed to execute stage 'Misc configuration': Failed to setup networks
{'ovirtmgmt': {'nic': 'eno1', 'ipaddr':
u'192.168.1.211', 'netmask':
u'255.255.255.0', 'bootproto': u'none', 'gateway':
u'192.168.1.1',
'defaultRoute': True}}. Error code: "78" message: "Hook error:
Hook
Error: ('Traceback (most recent call last):\n File
"/usr/libexec/vdsm/hooks/before_network_setup/50_fcoe", line 18, in
<module>\n from vdsm.netconfpersistence import
RunningConfig\nImportError: No module named netconfpersistence\n',)"
On 7/1/2016 5:21 PM, Kevin Hung wrote:
> Thank you Sahina, that was the issue. I upgraded my glusterfs server
> to 3.7.11 and I was able to continue with the deployment. I am seeing
> other issues with deployment, but I will look into those myself
> first. Bug has been logged [1].
>
> [1]
https://bugzilla.redhat.com/show_bug.cgi?id=1352165
>