On Thu, Jan 10, 2019 at 4:00 PM Sandro Bonazzola <sbonazzo(a)redhat.com>
wrote:
The oVirt Project is pleased to announce the availability of the
First
Release Candidate of oVirt 4.3.0, as of January 10th, 2018
This is pre-release software. This pre-release should not to be used in
production.
Hi Sandro, thanks for this release!
I'm just testing deployment of HCI single host using oVirt Node NG CentOS 7
iso
I was able to complete the gluster setup via cockpit with these
modifications:
1) I wanted to check via ssh and found that files *key under /etc/ssh/ had
too weak config so that ssh daemon didn't started after installation of
node from iso
changing to 600 and restarting the service was ok
2) I used a single disk configured as jbod, so I choose that option instead
of the default proposed RAID6
But the playook failed with
. . .
PLAY [gluster_servers]
*********************************************************
TASK [Create LVs with specified size for the VGs]
******************************
changed: [192.168.124.211] => (item={u'lv': u'gluster_thinpool_sdb',
u'size': u'50GB', u'extent': u'100%FREE', u'vg':
u'gluster_vg_sdb'})
PLAY RECAP
*********************************************************************
192.168.124.211 : ok=1 changed=1 unreachable=0
failed=0
Ignoring errors...
Error: Section diskcount not found in the configuration file
Reading inside the playbooks involved here:
/usr/share/gdeploy/playbooks/auto_lvcreate_for_gluster.yml
/usr/share/gdeploy/playbooks/vgcreate.yml
and the snippet
- name: Convert the logical volume
lv: action=convert thinpool={{ item.vg }}/{{item.pool }}
poolmetadata={{ item.vg }}/'metadata' poolmetadataspare=n
vgname={{ item.vg }} disktype="{{disktype}}"
diskcount="{{ diskcount }}"
stripesize="{{stripesize}}"
chunksize="{{ chunksize | default('') }}"
snapshot_reserve="{{ snapshot_reserve }}"
with_items: "{{ lvpools }}"
ignore_errors: yes
I simply edited the gdeploy.conf from the gui button adding this section
under the [disktype] one
"
[diskcount]
1
"
then clean lv/vg/pv and the gdeploy step completed successfully
3) at first stage of ansible deploy I have this failed command that seems
not to prevent from completion but that I have not understood..
PLAY [gluster_servers]
*********************************************************
TASK [Run a command in the shell]
**********************************************
failed: [192.168.124.211] (item=vdsm-tool configure --force) => {"changed":
true, "cmd": "vdsm-tool configure --force", "delta":
"0:00:01.475528",
"end": "2019-01-11 10:59:55.147601", "item": "vdsm-tool
configure --force",
"msg": "non-zero return code", "rc": 1, "start":
"2019-01-11
10:59:53.672073", "stderr": "Traceback (most recent call last):\n
File
\"/usr/bin/vdsm-tool\", line 220, in main\n return
tool_command[cmd][\"command\"](*args)\n File
\"/usr/lib/python2.7/site-packages/vdsm/tool/__init__.py\", line 40, in
wrapper\n func(*args, **kwargs)\n File
\"/usr/lib/python2.7/site-packages/vdsm/tool/configurator.py\", line 143,
in configure\n _configure(c)\n File
\"/usr/lib/python2.7/site-packages/vdsm/tool/configurator.py\", line 90, in
_configure\n getattr(module, 'configure', lambda: None)()\n File
\"/usr/lib/python2.7/site-packages/vdsm/tool/configurators/bond_defaults.py\",
line 39, in configure\n sysfs_options_mapper.dump_bonding_options()\n
File
\"/usr/lib/python2.7/site-packages/vdsm/network/link/bond/sysfs_options_mapper.py\",
line 48, in dump_bonding_options\n with
open(sysfs_options.BONDING_DEFAULTS, 'w') as f:\nIOError: [Errno 2] No such
file or directory: '/var/run/vdsm/bonding-defaults.json'",
"stderr_lines":
["Traceback (most recent call last):", " File
\"/usr/bin/vdsm-tool\", line
220, in main", " return
tool_command[cmd][\"command\"](*args)", " File
\"/usr/lib/python2.7/site-packages/vdsm/tool/__init__.py\", line 40, in
wrapper", " func(*args, **kwargs)", " File
\"/usr/lib/python2.7/site-packages/vdsm/tool/configurator.py\", line 143,
in configure", " _configure(c)", " File
\"/usr/lib/python2.7/site-packages/vdsm/tool/configurator.py\", line 90, in
_configure", " getattr(module, 'configure', lambda: None)()",
" File
\"/usr/lib/python2.7/site-packages/vdsm/tool/configurators/bond_defaults.py\",
line 39, in configure", " sysfs_options_mapper.dump_bonding_options()",
" File
\"/usr/lib/python2.7/site-packages/vdsm/network/link/bond/sysfs_options_mapper.py\",
line 48, in dump_bonding_options", " with
open(sysfs_options.BONDING_DEFAULTS, 'w') as f:", "IOError: [Errno 2]
No
such file or directory: '/var/run/vdsm/bonding-defaults.json'"],
"stdout":
"\nChecking configuration status...\n\nabrt is already configured for
vdsm\nlvm is configured for vdsm\nlibvirt is already configured for
vdsm\nSUCCESS: ssl configured to true. No conflicts\nManual override for
multipath.conf detected - preserving current configuration\nThis manual
override for multipath.conf was based on downrevved template. You are
strongly advised to contact your support representatives\n\nRunning
configure...\nReconfiguration of abrt is done.\nReconfiguration of passwd
is done.\nReconfiguration of libvirt is done.", "stdout_lines":
["",
"Checking configuration status...", "", "abrt is already
configured for
vdsm", "lvm is configured for vdsm", "libvirt is already configured
for
vdsm", "SUCCESS: ssl configured to true. No conflicts", "Manual
override
for multipath.conf detected - preserving current configuration", "This
manual override for multipath.conf was based on downrevved template. You
are strongly advised to contact your support representatives", "",
"Running
configure...", "Reconfiguration of abrt is done.", "Reconfiguration
of
passwd is done.", "Reconfiguration of libvirt is done."]}
to retry, use: --limit @/tmp/tmpQXe2el/shell_cmd.retry
PLAY RECAP
*********************************************************************
192.168.124.211 : ok=0 changed=0 unreachable=0
failed=1
Would it be possible to save in some way the ansible playbook log even if
it completes ok, without going directly to the "successful" page?
Or is anyway stored in some location on disk of host?
I then proceeded with Hosted Engine install/setup and
4) it fails here at final stages of the local vm deploy:
[ INFO ] TASK [oVirt.hosted-engine-setup : Set Engine public key as
authorized key without validating the TLS/SSL certificates]
[ INFO ] changed: [localhost]
[ INFO ] TASK [oVirt.hosted-engine-setup : include_tasks]
[ INFO ] ok: [localhost]
[ INFO ] TASK [oVirt.hosted-engine-setup : Obtain SSO token using
username/password credentials]
[ INFO ] ok: [localhost]
[ INFO ] TASK [oVirt.hosted-engine-setup : Ensure that the target
datacenter is present]
[ INFO ] ok: [localhost]
[ INFO ] TASK [oVirt.hosted-engine-setup : Ensure that the target cluster
is present in the target datacenter]
[ INFO ] ok: [localhost]
[ INFO ] TASK [oVirt.hosted-engine-setup : Enable GlusterFS at cluster
level]
[ INFO ] changed: [localhost]
[ INFO ] TASK [oVirt.hosted-engine-setup : Set VLAN ID at datacenter level]
[ INFO ] skipping: [localhost]
[ INFO ] TASK [oVirt.hosted-engine-setup : Force host-deploy in offline
mode]
[ INFO ] changed: [localhost]
[ INFO ] TASK [oVirt.hosted-engine-setup : Add host]
[ INFO ] changed: [localhost]
[ INFO ] TASK [oVirt.hosted-engine-setup : Wait for the host to be up]
Going to see the
log
/var/log/ovirt-engine/host-deploy/ovirt-host-deploy-20190111113227-ov4301.localdomain.local-5d387e0d.log
it seems the error is about ovirt-imageio-daemon
2019-01-11 11:32:26,893+0100 DEBUG otopi.plugins.otopi.services.systemd
plugin.executeRaw:863 execute-result: ('/usr/bin/systemctl', 'start',
'ovirt-imageio-daemon.service'), rc=1
2019-01-11 11:32:26,894+0100 DEBUG otopi.plugins.otopi.services.systemd
plugin.execute:921 execute-output: ('/usr/bin/systemctl', 'start',
'ovirt-imageio-daemon.service') stdout:
2019-01-11 11:32:26,895+0100 DEBUG otopi.plugins.otopi.services.systemd
plugin.execute:926 execute-output: ('/usr/bin/systemctl', 'start',
'ovirt-imageio-daemon.service') stderr:
Job for ovirt-imageio-daemon.service failed because the control process
exited with error code. See "systemctl status ovirt-imageio-daemon.service"
and "journalctl -xe" for details.
2019-01-11 11:32:26,896+0100 DEBUG otopi.context context._executeMethod:143
method exception
Traceback (most recent call last):
File "/tmp/ovirt-PBFI2dyoDO/pythonlib/otopi/context.py", line 133, in
_executeMethod
method['method']()
File
"/tmp/ovirt-PBFI2dyoDO/otopi-plugins/ovirt-host-deploy/vdsm/packages.py",
line 175, in _start
self.services.state('ovirt-imageio-daemon', True)
File "/tmp/ovirt-PBFI2dyoDO/otopi-plugins/otopi/services/systemd.py",
line 141, in state
service=name,
RuntimeError: Failed to start service 'ovirt-imageio-daemon'
2019-01-11 11:32:26,898+0100 ERROR otopi.context context._executeMethod:152
Failed to execute stage 'Closing up': Failed to start service
'ovirt-imageio-daemon'
2019-01-11 11:32:26,899+0100 DEBUG otopi.plugins.otopi.dialog.machine
dialog.__logString:204 DIALOG:SEND **%EventEnd STAGE closeup METHOD
otopi.plugins.ovirt_host_deploy.vdsm.packages.Plugin._start
(odeploycons.packages.vdsm.started)
The reason:
[root@ov4301 ~]# systemctl status ovirt-imageio-daemon -l
● ovirt-imageio-daemon.service - oVirt ImageIO Daemon
Loaded: loaded (/usr/lib/systemd/system/ovirt-imageio-daemon.service;
disabled; vendor preset: disabled)
Active: failed (Result: start-limit) since Fri 2019-01-11 11:32:29 CET;
27min ago
Process: 11625 ExecStart=/usr/bin/ovirt-imageio-daemon (code=exited,
status=1/FAILURE)
Main PID: 11625 (code=exited, status=1/FAILURE)
Jan 11 11:32:29 ov4301.localdomain.local systemd[1]:
ovirt-imageio-daemon.service: main process exited, code=exited,
status=1/FAILURE
Jan 11 11:32:29 ov4301.localdomain.local systemd[1]: Failed to start oVirt
ImageIO Daemon.
Jan 11 11:32:29 ov4301.localdomain.local systemd[1]: Unit
ovirt-imageio-daemon.service entered failed state.
Jan 11 11:32:29 ov4301.localdomain.local systemd[1]:
ovirt-imageio-daemon.service failed.
Jan 11 11:32:29 ov4301.localdomain.local systemd[1]:
ovirt-imageio-daemon.service holdoff time over, scheduling restart.
Jan 11 11:32:29 ov4301.localdomain.local systemd[1]: Stopped oVirt ImageIO
Daemon.
Jan 11 11:32:29 ov4301.localdomain.local systemd[1]: start request repeated
too quickly for ovirt-imageio-daemon.service
Jan 11 11:32:29 ov4301.localdomain.local systemd[1]: Failed to start oVirt
ImageIO Daemon.
Jan 11 11:32:29 ov4301.localdomain.local systemd[1]: Unit
ovirt-imageio-daemon.service entered failed state.
Jan 11 11:32:29 ov4301.localdomain.local systemd[1]:
ovirt-imageio-daemon.service failed.
[root@ov4301 ~]#
The file /var/log/ovirt-imageio-daemon/daemon.log contains
2019-01-11 10:28:30,191 INFO (MainThread) [server] Starting (pid=3702,
version=1.4.6)
2019-01-11 10:28:30,229 ERROR (MainThread) [server] Service failed
(remote_service=<ovirt_imageio_daemon.server.RemoteService object at
0x7fea9dc88050>, local_service=<ovirt_imageio_daemon.server.LocalService
object at 0x7fea9ca24850>, control_service=None, running=True)
Traceback (most recent call last):
File "/usr/lib/python2.7/site-packages/ovirt_imageio_daemon/server.py",
line 58, in main
start(config)
File "/usr/lib/python2.7/site-packages/ovirt_imageio_daemon/server.py",
line 99, in start
control_service = ControlService(config)
File "/usr/lib/python2.7/site-packages/ovirt_imageio_daemon/server.py",
line 206, in __init__
config.tickets.socket, uhttp.UnixWSGIRequestHandler)
File "/usr/lib64/python2.7/SocketServer.py", line 419, in __init__
self.server_bind()
File "/usr/lib/python2.7/site-packages/ovirt_imageio_daemon/uhttp.py",
line 79, in server_bind
self.socket.bind(self.server_address)
File "/usr/lib64/python2.7/socket.py", line 224, in meth
return getattr(self._sock,name)(*args)
error: [Errno 2] No such file or directory
One potential problem I noticed is that on this host I setup eth0 with
192.168.122.x (for ovirtmgmt) and eth1 with 192.168.124.y (for gluster,
even if only one host, but aiming at adding other 2 hosts in second step)
and the libvirt network temporarily created for the local engine vm is also
on 192.168.124.0 network.....
4: virbr0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state
UP group default qlen 1000
link/ether 52:54:00:b8:6b:3c brd ff:ff:ff:ff:ff:ff
inet 192.168.124.1/24 brd 192.168.124.255 scope global virbr0
valid_lft forever preferred_lft forever
5: virbr0-nic: <BROADCAST,MULTICAST> mtu 1500 qdisc pfifo_fast master
virbr0 state DOWN group default qlen 1000
link/ether 52:54:00:b8:6b:3c brd ff:ff:ff:ff:ff:ff
I can change my gluster network of this env and re-test, but would it be
possible to have the libvirt network configurable? It seems risky to have a
fixed one...
Can I go ahead from this failed hosted engine after understanding reason of
ovirt-imageio-daemon failure or am I forced to scratch?
Supposing I go to power down and then power on again this host, how can I
retry without scratching?
Thanks,
Gianluca