
On Sun, Jul 3, 2016 at 5:57 AM, Kevin Hung <khung@nullaxiom.com> wrote:
Looks like there still needs to be some work done on oVirt 4.0 Node and ovirt-hosted-engine-setup before it's ready for general consumption. I have spent days trying to get this to work, and only got it running (on one host) after encountering 8 serious issues (7 below and the initial glusterfs one). I have not been able to successfully deploy a second host (see issue 7 below). I will be moving back to deploying hosts using CentOS (with either oVirt 4.0 or oVirt 3.6) as I need a working oVirt deployment up and running.
In case anyone is interested in reproducing the issues, I used the Node ISO here [1] and the latest (7/2/2016) engine appliance OVA here [2]. Those seem to be the "official" files as far as I can tell (which is difficult as the documentation is not clear).
List of issues: 1. The error I mentioned seems to be an problem with the code. I bypassed it by deleting /usr/libexec/vdsm/hooks/before_network_setup/50_fcoe. 2. ovirt-hosted-engine-setup is unable to connect to the vdsm service if the FQDN of the node is not resolvable (i.e. if a DNS server is not entered in the initial setup). This should be checked in either the initial oVirt Node setup process or the beginning of ovirt-hosted-engine-setup. 3. The management bridge does not get created properly when the server is set up with a manually configured DNS server and running NetworkManager (the default on Node). It seems like a bug has been filed for this back in 2014. [3] 4. Using cloud-init with default values to customize the engine appliance can fail on the line "Creating/refreshing DWH database schema" if it takes longer than 600 seconds to return output. This may apply to any other step that takes a long time to complete. The VM no longer appears to be exist after the setup exits that so I am unable to debug.
600 seconds seams more than a reasonable time to create an empty DB, if it requires more than 10 minutes for a simple/short operation there is probably something strange with the storage.
5. Without using cloud-init, the setup creates an engine VM that I cannot log into (it does not seem to use the engine admin password or a blank password).
Yes, the engine VM host-name and its root password are configured via cloud-init and there is not default password. If you want to avoid using cloud-init you have to reset the root password of the engine VM as for any el7 machine.
6. Destroying the VM (option 4) leaves the files intact on the shared storage so I cannot restart setup without deleting those first. This may be intentional, but the use of kvm terminology (destroy for power off) is not common, not to mention that "virsh -r list --all" does not list the VM anymore.
On failures, there is not just the engine VM disk but a whole storage domain for hosted-engine which also contains ancillary disks. Re-deploying over a dirty storage is not supported so please clean up the whole storage domain on failures.
7. Unable to deploy second host through web UI (error "Failed to configure management network on host node2 due to setup networks failure.") or using
This is not hosted-engine specific: https://bugzilla.redhat.com/show_bug.cgi?id=1350763
ovirt-hosted-engine-setup (it looks like it can't connect to or doesn't start the broker service). 8. Random errors to stderr: "vcpu0 unhandled rdmsr" (this seems to be an
Are you running in a nested env?
upstream bug) and "multipath: error getting device" (this has been an issue for years with oVirt and seems to be due to multipathing being on by default even for systems where that does not apply).
[1] http://resources.ovirt.org/pub/ovirt-4.0/iso/ovirt-node-ng-installer/ovirt-n... [2] http://jenkins.ovirt.org/view/All/job/ovirt-appliance_ovirt-4.0_build-artifa... [3] https://bugzilla.redhat.com/show_bug.cgi?id=1160423
On 7/1/2016 8:37 PM, Kevin Hung wrote:
It looks like I'm now getting an error when the deployment tries to configure the management bridge.
Setup log:
2016-07-01 20:29:47 INFO otopi.plugins.gr_he_common.network.bridge bridge._misc: 372 Configuring the management bridge 2016-07-01 20:29:48 DEBUG otopi.plugins.gr_he_common.network.bridge bridge._misc :384 networks: {'ovirtmgmt': {'nic': 'eno1', 'ipaddr': u'192.168.1.211', 'netmask': u'255.255.255.0', 'bootproto': u'none', 'gateway': u'192.168.1.1', 'defaultRoute': True}} 2016-07-01 20:29:48 DEBUG otopi.plugins.gr_he_common.network.bridge bridge._misc :385 bonds: {} 2016-07-01 20:29:48 DEBUG otopi.plugins.gr_he_common.network.bridge bridge._misc :386 options: {'connectivityCheck': False} 2016-07-01 20:29:48 DEBUG otopi.context context._executeMethod:142 method exception Traceback (most recent call last): File "/usr/lib/python2.7/site-packages/otopi/context.py", line 132, in _executeMethod method['method']() File "/usr/share/ovirt-hosted-engine-setup/scripts/../plugins/gr-he-common/network/bridge.py", line 387, in _misc _setupNetworks(conn, networks, bonds, options) File "/usr/share/ovirt-hosted-engine-setup/scripts/../plugins/gr-he-common/network/bridge.py", line 405, in _setupNetworks 'message: "%s"' % (networks, code, message)) RuntimeError: Failed to setup networks {'ovirtmgmt': {'nic': 'eno1', 'ipaddr': u'192.168.1.211', 'netmask': u'255.255.255.0', 'bootproto': u'none', 'gateway': u'192.168.1.1', 'defaultRoute': True}}. Error code: "78" message: "Hook error: Hook Error: ('Traceback (most recent call last):\n File "/usr/libexec/vdsm/hooks/before_network_setup/50_fcoe", line 18, in <module>\n from vdsm.netconfpersistence import RunningConfig\nImportError: No module named netconfpersistence\n',)" 2016-07-01 20:29:48 ERROR otopi.context context._executeMethod:151 Failed to execute stage 'Misc configuration': Failed to setup networks {'ovirtmgmt': {'nic': 'eno1', 'ipaddr': u'192.168.1.211', 'netmask': u'255.255.255.0', 'bootproto': u'none', 'gateway': u'192.168.1.1', 'defaultRoute': True}}. Error code: "78" message: "Hook error: Hook Error: ('Traceback (most recent call last):\n File "/usr/libexec/vdsm/hooks/before_network_setup/50_fcoe", line 18, in <module>\n from vdsm.netconfpersistence import RunningConfig\nImportError: No module named netconfpersistence\n',)"
On 7/1/2016 5:21 PM, Kevin Hung wrote:
Thank you Sahina, that was the issue. I upgraded my glusterfs server to 3.7.11 and I was able to continue with the deployment. I am seeing other issues with deployment, but I will look into those myself first. Bug has been logged [1].
_______________________________________________ Users mailing list Users@ovirt.org http://lists.ovirt.org/mailman/listinfo/users