On Thu, Jan 10, 2019 at 4:00 PM Sandro Bonazzola <sbonazzo@redhat.com> wrote:

The oVirt Project is pleased to announce the availability of the First Release Candidate of oVirt 4.3.0, as of January 10th, 2018


This is pre-release software. This pre-release should not to be used in production.



Hi Sandro, thanks for this release!

I'm just testing deployment of HCI single host using oVirt Node NG CentOS 7 iso

I was able to complete the gluster setup via cockpit with these modifications:

1) I wanted to check via ssh and found that files *key under /etc/ssh/ had too weak config so that ssh daemon didn't started after installation of node from iso
changing to 600 and restarting the service was ok

2) I used a single disk configured as jbod, so I choose that option instead of the default proposed RAID6

But the playook failed with
. . .
PLAY [gluster_servers] *********************************************************

TASK [Create LVs with specified size for the VGs] ******************************
changed: [192.168.124.211] => (item={u'lv': u'gluster_thinpool_sdb', u'size': u'50GB', u'extent': u'100%FREE', u'vg': u'gluster_vg_sdb'})

PLAY RECAP *********************************************************************
192.168.124.211            : ok=1    changed=1    unreachable=0    failed=0   
Ignoring errors...
Error: Section diskcount not found in the configuration file

Reading inside the playbooks involved here:
/usr/share/gdeploy/playbooks/auto_lvcreate_for_gluster.yml
/usr/share/gdeploy/playbooks/vgcreate.yml

and the snippet

  - name: Convert the logical volume
    lv: action=convert thinpool={{ item.vg }}/{{item.pool }}
        poolmetadata={{ item.vg }}/'metadata' poolmetadataspare=n
        vgname={{ item.vg }} disktype="{{disktype}}"
        diskcount="{{ diskcount }}"
        stripesize="{{stripesize}}"
        chunksize="{{ chunksize | default('') }}"
        snapshot_reserve="{{ snapshot_reserve }}"
    with_items: "{{ lvpools }}"
    ignore_errors: yes

I simply edited the gdeploy.conf from the gui button adding this section under the [disktype] one
"

[diskcount]
1

"
then clean lv/vg/pv and the gdeploy step completed successfully

3) at first stage of ansible deploy I have this failed command that seems not to prevent from completion but that I have not understood..

PLAY [gluster_servers] *********************************************************

TASK [Run a command in the shell] **********************************************
failed: [192.168.124.211] (item=vdsm-tool configure --force) => {"changed": true, "cmd": "vdsm-tool configure --force", "delta": "0:00:01.475528", "end": "2019-01-11 10:59:55.147601", "item": "vdsm-tool configure --force", "msg": "non-zero return code", "rc": 1, "start": "2019-01-11 10:59:53.672073", "stderr": "Traceback (most recent call last):\n  File \"/usr/bin/vdsm-tool\", line 220, in main\n    return tool_command[cmd][\"command\"](*args)\n  File \"/usr/lib/python2.7/site-packages/vdsm/tool/__init__.py\", line 40, in wrapper\n    func(*args, **kwargs)\n  File \"/usr/lib/python2.7/site-packages/vdsm/tool/configurator.py\", line 143, in configure\n    _configure(c)\n  File \"/usr/lib/python2.7/site-packages/vdsm/tool/configurator.py\", line 90, in _configure\n    getattr(module, 'configure', lambda: None)()\n  File \"/usr/lib/python2.7/site-packages/vdsm/tool/configurators/bond_defaults.py\", line 39, in configure\n    sysfs_options_mapper.dump_bonding_options()\n  File \"/usr/lib/python2.7/site-packages/vdsm/network/link/bond/sysfs_options_mapper.py\", line 48, in dump_bonding_options\n    with open(sysfs_options.BONDING_DEFAULTS, 'w') as f:\nIOError: [Errno 2] No such file or directory: '/var/run/vdsm/bonding-defaults.json'", "stderr_lines": ["Traceback (most recent call last):", "  File \"/usr/bin/vdsm-tool\", line 220, in main", "    return tool_command[cmd][\"command\"](*args)", "  File \"/usr/lib/python2.7/site-packages/vdsm/tool/__init__.py\", line 40, in wrapper", "    func(*args, **kwargs)", "  File \"/usr/lib/python2.7/site-packages/vdsm/tool/configurator.py\", line 143, in configure", "    _configure(c)", "  File \"/usr/lib/python2.7/site-packages/vdsm/tool/configurator.py\", line 90, in _configure", "    getattr(module, 'configure', lambda: None)()", "  File \"/usr/lib/python2.7/site-packages/vdsm/tool/configurators/bond_defaults.py\", line 39, in configure", "    sysfs_options_mapper.dump_bonding_options()", "  File \"/usr/lib/python2.7/site-packages/vdsm/network/link/bond/sysfs_options_mapper.py\", line 48, in dump_bonding_options", "    with open(sysfs_options.BONDING_DEFAULTS, 'w') as f:", "IOError: [Errno 2] No such file or directory: '/var/run/vdsm/bonding-defaults.json'"], "stdout": "\nChecking configuration status...\n\nabrt is already configured for vdsm\nlvm is configured for vdsm\nlibvirt is already configured for vdsm\nSUCCESS: ssl configured to true. No conflicts\nManual override for multipath.conf detected - preserving current configuration\nThis manual override for multipath.conf was based on downrevved template. You are strongly advised to contact your support representatives\n\nRunning configure...\nReconfiguration of abrt is done.\nReconfiguration of passwd is done.\nReconfiguration of libvirt is done.", "stdout_lines": ["", "Checking configuration status...", "", "abrt is already configured for vdsm", "lvm is configured for vdsm", "libvirt is already configured for vdsm", "SUCCESS: ssl configured to true. No conflicts", "Manual override for multipath.conf detected - preserving current configuration", "This manual override for multipath.conf was based on downrevved template. You are strongly advised to contact your support representatives", "", "Running configure...", "Reconfiguration of abrt is done.", "Reconfiguration of passwd is done.", "Reconfiguration of libvirt is done."]}
to retry, use: --limit @/tmp/tmpQXe2el/shell_cmd.retry

PLAY RECAP *********************************************************************
192.168.124.211            : ok=0    changed=0    unreachable=0    failed=1   

Would it be possible to save in some way the ansible playbook log even if it completes ok, without going directly to the "successful" page?
Or is anyway stored in some location on disk of host?

I then proceeded with Hosted Engine install/setup and

4) it fails here at final stages of the local vm deploy:

[ INFO ] TASK [oVirt.hosted-engine-setup : Set Engine public key as authorized key without validating the TLS/SSL certificates]
[ INFO ] changed: [localhost]
[ INFO ] TASK [oVirt.hosted-engine-setup : include_tasks]
[ INFO ] ok: [localhost]
[ INFO ] TASK [oVirt.hosted-engine-setup : Obtain SSO token using username/password credentials]
[ INFO ] ok: [localhost]
[ INFO ] TASK [oVirt.hosted-engine-setup : Ensure that the target datacenter is present]
[ INFO ] ok: [localhost]
[ INFO ] TASK [oVirt.hosted-engine-setup : Ensure that the target cluster is present in the target datacenter]
[ INFO ] ok: [localhost]
[ INFO ] TASK [oVirt.hosted-engine-setup : Enable GlusterFS at cluster level]
[ INFO ] changed: [localhost]
[ INFO ] TASK [oVirt.hosted-engine-setup : Set VLAN ID at datacenter level]
[ INFO ] skipping: [localhost]
[ INFO ] TASK [oVirt.hosted-engine-setup : Force host-deploy in offline mode]
[ INFO ] changed: [localhost]
[ INFO ] TASK [oVirt.hosted-engine-setup : Add host]
[ INFO ] changed: [localhost]
[ INFO ] TASK [oVirt.hosted-engine-setup : Wait for the host to be up]

Going to see the log /var/log/ovirt-engine/host-deploy/ovirt-host-deploy-20190111113227-ov4301.localdomain.local-5d387e0d.log
it seems the error is about ovirt-imageio-daemon

2019-01-11 11:32:26,893+0100 DEBUG otopi.plugins.otopi.services.systemd plugin.executeRaw:863 execute-result: ('/usr/bin/systemctl', 'start', 'ovirt-imageio-daemon.service'), rc=1
2019-01-11 11:32:26,894+0100 DEBUG otopi.plugins.otopi.services.systemd plugin.execute:921 execute-output: ('/usr/bin/systemctl', 'start', 'ovirt-imageio-daemon.service') stdout:
2019-01-11 11:32:26,895+0100 DEBUG otopi.plugins.otopi.services.systemd plugin.execute:926 execute-output: ('/usr/bin/systemctl', 'start', 'ovirt-imageio-daemon.service') stderr:
Job for ovirt-imageio-daemon.service failed because the control process exited with error code. See "systemctl status ovirt-imageio-daemon.service" and "journalctl -xe" for details.
2019-01-11 11:32:26,896+0100 DEBUG otopi.context context._executeMethod:143 method exception
Traceback (most recent call last):
  File "/tmp/ovirt-PBFI2dyoDO/pythonlib/otopi/context.py", line 133, in _executeMethod
    method['method']()
  File "/tmp/ovirt-PBFI2dyoDO/otopi-plugins/ovirt-host-deploy/vdsm/packages.py", line 175, in _start
    self.services.state('ovirt-imageio-daemon', True)
  File "/tmp/ovirt-PBFI2dyoDO/otopi-plugins/otopi/services/systemd.py", line 141, in state
    service=name,
RuntimeError: Failed to start service 'ovirt-imageio-daemon'
2019-01-11 11:32:26,898+0100 ERROR otopi.context context._executeMethod:152 Failed to execute stage 'Closing up': Failed to start service 'ovirt-imageio-daemon'
2019-01-11 11:32:26,899+0100 DEBUG otopi.plugins.otopi.dialog.machine dialog.__logString:204 DIALOG:SEND       **%EventEnd STAGE closeup METHOD otopi.plugins.ovirt_host_deploy.vdsm.packages.Plugin._start (odeploycons.packages.vdsm.started)

The reason:

[root@ov4301 ~]# systemctl status ovirt-imageio-daemon -l
● ovirt-imageio-daemon.service - oVirt ImageIO Daemon
   Loaded: loaded (/usr/lib/systemd/system/ovirt-imageio-daemon.service; disabled; vendor preset: disabled)
   Active: failed (Result: start-limit) since Fri 2019-01-11 11:32:29 CET; 27min ago
  Process: 11625 ExecStart=/usr/bin/ovirt-imageio-daemon (code=exited, status=1/FAILURE)
 Main PID: 11625 (code=exited, status=1/FAILURE)

Jan 11 11:32:29 ov4301.localdomain.local systemd[1]: ovirt-imageio-daemon.service: main process exited, code=exited, status=1/FAILURE
Jan 11 11:32:29 ov4301.localdomain.local systemd[1]: Failed to start oVirt ImageIO Daemon.
Jan 11 11:32:29 ov4301.localdomain.local systemd[1]: Unit ovirt-imageio-daemon.service entered failed state.
Jan 11 11:32:29 ov4301.localdomain.local systemd[1]: ovirt-imageio-daemon.service failed.
Jan 11 11:32:29 ov4301.localdomain.local systemd[1]: ovirt-imageio-daemon.service holdoff time over, scheduling restart.
Jan 11 11:32:29 ov4301.localdomain.local systemd[1]: Stopped oVirt ImageIO Daemon.
Jan 11 11:32:29 ov4301.localdomain.local systemd[1]: start request repeated too quickly for ovirt-imageio-daemon.service
Jan 11 11:32:29 ov4301.localdomain.local systemd[1]: Failed to start oVirt ImageIO Daemon.
Jan 11 11:32:29 ov4301.localdomain.local systemd[1]: Unit ovirt-imageio-daemon.service entered failed state.
Jan 11 11:32:29 ov4301.localdomain.local systemd[1]: ovirt-imageio-daemon.service failed.
[root@ov4301 ~]# 

The file /var/log/ovirt-imageio-daemon/daemon.log contains

2019-01-11 10:28:30,191 INFO    (MainThread) [server] Starting (pid=3702, version=1.4.6)
2019-01-11 10:28:30,229 ERROR   (MainThread) [server] Service failed (remote_service=<ovirt_imageio_daemon.server.RemoteService object at 0x7fea9dc88050>, local_service=<ovirt_imageio_daemon.server.LocalService object at 0x7fea9ca24850>, control_service=None, running=True)
Traceback (most recent call last):
  File "/usr/lib/python2.7/site-packages/ovirt_imageio_daemon/server.py", line 58, in main
    start(config)
  File "/usr/lib/python2.7/site-packages/ovirt_imageio_daemon/server.py", line 99, in start
    control_service = ControlService(config)
  File "/usr/lib/python2.7/site-packages/ovirt_imageio_daemon/server.py", line 206, in __init__
    config.tickets.socket, uhttp.UnixWSGIRequestHandler)
  File "/usr/lib64/python2.7/SocketServer.py", line 419, in __init__
    self.server_bind()
  File "/usr/lib/python2.7/site-packages/ovirt_imageio_daemon/uhttp.py", line 79, in server_bind
    self.socket.bind(self.server_address)
  File "/usr/lib64/python2.7/socket.py", line 224, in meth
    return getattr(self._sock,name)(*args)
error: [Errno 2] No such file or directory


One potential problem I noticed is that on this host I setup eth0 with 192.168.122.x (for ovirtmgmt) and eth1 with 192.168.124.y (for gluster, even if only one host, but aiming at adding other 2 hosts in second step) and the libvirt network temporarily created for the local engine vm is also on 192.168.124.0 network.....

4: virbr0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default qlen 1000
    link/ether 52:54:00:b8:6b:3c brd ff:ff:ff:ff:ff:ff
    inet 192.168.124.1/24 brd 192.168.124.255 scope global virbr0
       valid_lft forever preferred_lft forever
5: virbr0-nic: <BROADCAST,MULTICAST> mtu 1500 qdisc pfifo_fast master virbr0 state DOWN group default qlen 1000
    link/ether 52:54:00:b8:6b:3c brd ff:ff:ff:ff:ff:ff

I can change my gluster network of this env and re-test, but would it be possible to have the libvirt network configurable? It seems risky to have a fixed one...

Can I go ahead from this failed hosted engine after understanding reason of ovirt-imageio-daemon failure or am I forced to scratch?
Supposing I go to power down and then power on again this host, how can I retry without scratching?

Thanks,
Gianluca