ovirt node ng 4.3.0 rc1 and HCI single host problems

Let's start a new thread more focused on the subject I'm just testing deployment of HCI single host using oVirt Node NG CentOS 7 iso I was able to complete the gluster setup via cockpit with these modifications: 1) I wanted to check via ssh and found that files *key under /etc/ssh/ had too weak config so that ssh daemon didn't started after installation of node from iso changing to 600 and restarting the service was ok 2) I used a single disk configured as jbod, so I choose that option instead of the default proposed RAID6 But the playook failed with . . . PLAY [gluster_servers] ********************************************************* TASK [Create LVs with specified size for the VGs] ****************************** changed: [192.168.124.211] => (item={u'lv': u'gluster_thinpool_sdb', u'size': u'50GB', u'extent': u'100%FREE', u'vg': u'gluster_vg_sdb'}) PLAY RECAP ********************************************************************* 192.168.124.211 : ok=1 changed=1 unreachable=0 failed=0 Ignoring errors... Error: Section diskcount not found in the configuration file Reading inside the playbooks involved here: /usr/share/gdeploy/playbooks/auto_lvcreate_for_gluster.yml /usr/share/gdeploy/playbooks/vgcreate.yml and the snippet - name: Convert the logical volume lv: action=convert thinpool={{ item.vg }}/{{item.pool }} poolmetadata={{ item.vg }}/'metadata' poolmetadataspare=n vgname={{ item.vg }} disktype="{{disktype}}" diskcount="{{ diskcount }}" stripesize="{{stripesize}}" chunksize="{{ chunksize | default('') }}" snapshot_reserve="{{ snapshot_reserve }}" with_items: "{{ lvpools }}" ignore_errors: yes I simply edited the gdeploy.conf from the gui button adding this section under the [disktype] one " [diskcount] 1 " then clean lv/vg/pv and the gdeploy step completed successfully 3) at first stage of ansible deploy I have this failed command that seems not to prevent from completion but that I have not understood.. PLAY [gluster_servers] ********************************************************* TASK [Run a command in the shell] ********************************************** failed: [192.168.124.211] (item=vdsm-tool configure --force) => {"changed": true, "cmd": "vdsm-tool configure --force", "delta": "0:00:01.475528", "end": "2019-01-11 10:59:55.147601", "item": "vdsm-tool configure --force", "msg": "non-zero return code", "rc": 1, "start": "2019-01-11 10:59:53.672073", "stderr": "Traceback (most recent call last):\n File \"/usr/bin/vdsm-tool\", line 220, in main\n return tool_command[cmd][\"command\"](*args)\n File \"/usr/lib/python2.7/site-packages/vdsm/tool/__init__.py\", line 40, in wrapper\n func(*args, **kwargs)\n File \"/usr/lib/python2.7/site-packages/vdsm/tool/configurator.py\", line 143, in configure\n _configure(c)\n File \"/usr/lib/python2.7/site-packages/vdsm/tool/configurator.py\", line 90, in _configure\n getattr(module, 'configure', lambda: None)()\n File \"/usr/lib/python2.7/site-packages/vdsm/tool/configurators/bond_defaults.py\", line 39, in configure\n sysfs_options_mapper.dump_bonding_options()\n File \"/usr/lib/python2.7/site-packages/vdsm/network/link/bond/sysfs_options_mapper.py\", line 48, in dump_bonding_options\n with open(sysfs_options.BONDING_DEFAULTS, 'w') as f:\nIOError: [Errno 2] No such file or directory: '/var/run/vdsm/bonding-defaults.json'", "stderr_lines": ["Traceback (most recent call last):", " File \"/usr/bin/vdsm-tool\", line 220, in main", " return tool_command[cmd][\"command\"](*args)", " File \"/usr/lib/python2.7/site-packages/vdsm/tool/__init__.py\", line 40, in wrapper", " func(*args, **kwargs)", " File \"/usr/lib/python2.7/site-packages/vdsm/tool/configurator.py\", line 143, in configure", " _configure(c)", " File \"/usr/lib/python2.7/site-packages/vdsm/tool/configurator.py\", line 90, in _configure", " getattr(module, 'configure', lambda: None)()", " File \"/usr/lib/python2.7/site-packages/vdsm/tool/configurators/bond_defaults.py\", line 39, in configure", " sysfs_options_mapper.dump_bonding_options()", " File \"/usr/lib/python2.7/site-packages/vdsm/network/link/bond/sysfs_options_mapper.py\", line 48, in dump_bonding_options", " with open(sysfs_options.BONDING_DEFAULTS, 'w') as f:", "IOError: [Errno 2] No such file or directory: '/var/run/vdsm/bonding-defaults.json'"], "stdout": "\nChecking configuration status...\n\nabrt is already configured for vdsm\nlvm is configured for vdsm\nlibvirt is already configured for vdsm\nSUCCESS: ssl configured to true. No conflicts\nManual override for multipath.conf detected - preserving current configuration\nThis manual override for multipath.conf was based on downrevved template. You are strongly advised to contact your support representatives\n\nRunning configure...\nReconfiguration of abrt is done.\nReconfiguration of passwd is done.\nReconfiguration of libvirt is done.", "stdout_lines": ["", "Checking configuration status...", "", "abrt is already configured for vdsm", "lvm is configured for vdsm", "libvirt is already configured for vdsm", "SUCCESS: ssl configured to true. No conflicts", "Manual override for multipath.conf detected - preserving current configuration", "This manual override for multipath.conf was based on downrevved template. You are strongly advised to contact your support representatives", "", "Running configure...", "Reconfiguration of abrt is done.", "Reconfiguration of passwd is done.", "Reconfiguration of libvirt is done."]} to retry, use: --limit @/tmp/tmpQXe2el/shell_cmd.retry PLAY RECAP ********************************************************************* 192.168.124.211 : ok=0 changed=0 unreachable=0 failed=1 Would it be possible to save in some way the ansible playbook log even if it completes ok, without going directly to the "successful" page? Or is anyway stored in some location on disk of host? I then proceeded with Hosted Engine install/setup and 4) it fails here at final stages of the local vm engine setup during host activation: [ INFO ] TASK [oVirt.hosted-engine-setup : Set Engine public key as authorized key without validating the TLS/SSL certificates] [ INFO ] changed: [localhost] [ INFO ] TASK [oVirt.hosted-engine-setup : include_tasks] [ INFO ] ok: [localhost] [ INFO ] TASK [oVirt.hosted-engine-setup : Obtain SSO token using username/password credentials] [ INFO ] ok: [localhost] [ INFO ] TASK [oVirt.hosted-engine-setup : Ensure that the target datacenter is present] [ INFO ] ok: [localhost] [ INFO ] TASK [oVirt.hosted-engine-setup : Ensure that the target cluster is present in the target datacenter] [ INFO ] ok: [localhost] [ INFO ] TASK [oVirt.hosted-engine-setup : Enable GlusterFS at cluster level] [ INFO ] changed: [localhost] [ INFO ] TASK [oVirt.hosted-engine-setup : Set VLAN ID at datacenter level] [ INFO ] skipping: [localhost] [ INFO ] TASK [oVirt.hosted-engine-setup : Force host-deploy in offline mode] [ INFO ] changed: [localhost] [ INFO ] TASK [oVirt.hosted-engine-setup : Add host] [ INFO ] changed: [localhost] [ INFO ] TASK [oVirt.hosted-engine-setup : Wait for the host to be up] then after several minutes: [ ERROR ] fatal: [localhost]: FAILED! => {"ansible_facts": {"ovirt_hosts": [{"address": "ov4301.localdomain.local", "affinity_labels": [], "auto_numa_status": "unknown", "certificate": {"organization": "localdomain.local", "subject": "O=localdomain.local,CN=ov4301.localdomain.local"}, "cluster": {"href": "/ovirt-engine/api/clusters/5e8fea14-158b-11e9-b2f0-00163e29b9f2", "id": "5e8fea14-158b-11e9-b2f0-00163e29b9f2"}, "comment": "", "cpu": {"speed": 0.0, "topology": {}}, "device_passthrough": {"enabled": false}, "devices": [], "external_network_provider_configurations": [], "external_status": "ok", "hardware_information": {"supported_rng_sources": []}, "hooks": [], "href": "/ovirt-engine/api/hosts/4202de75-75d3-4dcb-b128-2c4a1d257a15", "id": "4202de75-75d3-4dcb-b128-2c4a1d257a15", "katello_errata": [], "kdump_status": "unknown", "ksm": {"enabled": false}, "max_scheduling_memory": 0, "memory": 0, "name": "ov4301.localdomain.local", "network_attachments": [], "nics": [], "numa_nodes": [], "numa_supported": false, "os": {"custom_kernel_cmdline": ""}, "permissions": [], "port": 54321, "power_management": {"automatic_pm_enabled": true, "enabled": false, "kdump_detection": true, "pm_proxies": []}, "protocol": "stomp", "se_linux": {}, "spm": {"priority": 5, "status": "none"}, "ssh": {"fingerprint": "SHA256:iqeQjdWCm15+xe74xEnswrgRJF7JBAWrvsjO/RaW8q8", "port": 22}, "statistics": [], "status": "install_failed", "storage_connection_extensions": [], "summary": {"total": 0}, "tags": [], "transparent_huge_pages": {"enabled": false}, "type": "ovirt_node", "unmanaged_networks": [], "update_available": false}]}, "attempts": 120, "changed": false} [ INFO ] TASK [oVirt.hosted-engine-setup : Fetch logs from the engine VM] [ INFO ] ok: [localhost] [ INFO ] TASK [oVirt.hosted-engine-setup : Set destination directory path] [ INFO ] ok: [localhost] [ INFO ] TASK [oVirt.hosted-engine-setup : Create destination directory] [ INFO ] changed: [localhost] [ INFO ] TASK [oVirt.hosted-engine-setup : include_tasks] [ INFO ] ok: [localhost] [ INFO ] TASK [oVirt.hosted-engine-setup : Find the local appliance image] [ INFO ] ok: [localhost] [ INFO ] TASK [oVirt.hosted-engine-setup : debug] [ INFO ] ok: [localhost] [ INFO ] TASK [oVirt.hosted-engine-setup : Set local_vm_disk_path] [ INFO ] ok: [localhost] [ INFO ] TASK [oVirt.hosted-engine-setup : Give the vm time to flush dirty buffers] [ INFO ] ok: [localhost] [ INFO ] TASK [oVirt.hosted-engine-setup : Copy engine logs] [ INFO ] TASK [oVirt.hosted-engine-setup : include_tasks] [ INFO ] ok: [localhost] [ INFO ] TASK [oVirt.hosted-engine-setup : Remove local vm dir] [ INFO ] changed: [localhost] [ INFO ] TASK [oVirt.hosted-engine-setup : debug] [ INFO ] ok: [localhost] [ INFO ] TASK [oVirt.hosted-engine-setup : Remove temporary entry in /etc/hosts for the local VM] [ INFO ] changed: [localhost] [ INFO ] TASK [oVirt.hosted-engine-setup : Notify the user about a failure] [ ERROR ] fatal: [localhost]: FAILED! => {"changed": false, "msg": "The system may not be provisioned according to the playbook results: please check the logs for the issue, fix accordingly or re-deploy from scratch.\n"} Going to see the log /var/log/ovirt-engine/host-deploy/ovirt-host-deploy-20190111113227-ov4301.localdomain.local-5d387e0d.log it seems the error is about ovirt-imageio-daemon 2019-01-11 11:32:26,893+0100 DEBUG otopi.plugins.otopi.services.systemd plugin.executeRaw:863 execute-result: ('/usr/bin/systemctl', 'start', 'ovirt-imageio-daemon.service'), rc=1 2019-01-11 11:32:26,894+0100 DEBUG otopi.plugins.otopi.services.systemd plugin.execute:921 execute-output: ('/usr/bin/systemctl', 'start', 'ovirt-imageio-daemon.service') stdout: 2019-01-11 11:32:26,895+0100 DEBUG otopi.plugins.otopi.services.systemd plugin.execute:926 execute-output: ('/usr/bin/systemctl', 'start', 'ovirt-imageio-daemon.service') stderr: Job for ovirt-imageio-daemon.service failed because the control process exited with error code. See "systemctl status ovirt-imageio-daemon.service" and "journalctl -xe" for details. 2019-01-11 11:32:26,896+0100 DEBUG otopi.context context._executeMethod:143 method exception Traceback (most recent call last): File "/tmp/ovirt-PBFI2dyoDO/pythonlib/otopi/context.py", line 133, in _executeMethod method['method']() File "/tmp/ovirt-PBFI2dyoDO/otopi-plugins/ovirt-host-deploy/vdsm/packages.py", line 175, in _start self.services.state('ovirt-imageio-daemon', True) File "/tmp/ovirt-PBFI2dyoDO/otopi-plugins/otopi/services/systemd.py", line 141, in state service=name, RuntimeError: Failed to start service 'ovirt-imageio-daemon' 2019-01-11 11:32:26,898+0100 ERROR otopi.context context._executeMethod:152 Failed to execute stage 'Closing up': Failed to start service 'ovirt-imageio-daemon' 2019-01-11 11:32:26,899+0100 DEBUG otopi.plugins.otopi.dialog.machine dialog.__logString:204 DIALOG:SEND **%EventEnd STAGE closeup METHOD otopi.plugins.ovirt_host_deploy.vdsm.packages.Plugin._start (odeploycons.packages.vdsm.started) The reason: [root@ov4301 ~]# systemctl status ovirt-imageio-daemon -l ● ovirt-imageio-daemon.service - oVirt ImageIO Daemon Loaded: loaded (/usr/lib/systemd/system/ovirt-imageio-daemon.service; disabled; vendor preset: disabled) Active: failed (Result: start-limit) since Fri 2019-01-11 11:32:29 CET; 27min ago Process: 11625 ExecStart=/usr/bin/ovirt-imageio-daemon (code=exited, status=1/FAILURE) Main PID: 11625 (code=exited, status=1/FAILURE) Jan 11 11:32:29 ov4301.localdomain.local systemd[1]: ovirt-imageio-daemon.service: main process exited, code=exited, status=1/FAILURE Jan 11 11:32:29 ov4301.localdomain.local systemd[1]: Failed to start oVirt ImageIO Daemon. Jan 11 11:32:29 ov4301.localdomain.local systemd[1]: Unit ovirt-imageio-daemon.service entered failed state. Jan 11 11:32:29 ov4301.localdomain.local systemd[1]: ovirt-imageio-daemon.service failed. Jan 11 11:32:29 ov4301.localdomain.local systemd[1]: ovirt-imageio-daemon.service holdoff time over, scheduling restart. Jan 11 11:32:29 ov4301.localdomain.local systemd[1]: Stopped oVirt ImageIO Daemon. Jan 11 11:32:29 ov4301.localdomain.local systemd[1]: start request repeated too quickly for ovirt-imageio-daemon.service Jan 11 11:32:29 ov4301.localdomain.local systemd[1]: Failed to start oVirt ImageIO Daemon. Jan 11 11:32:29 ov4301.localdomain.local systemd[1]: Unit ovirt-imageio-daemon.service entered failed state. Jan 11 11:32:29 ov4301.localdomain.local systemd[1]: ovirt-imageio-daemon.service failed. [root@ov4301 ~]# The file /var/log/ovirt-imageio-daemon/daemon.log contains 2019-01-11 10:28:30,191 INFO (MainThread) [server] Starting (pid=3702, version=1.4.6) 2019-01-11 10:28:30,229 ERROR (MainThread) [server] Service failed (remote_service=<ovirt_imageio_daemon.server.RemoteService object at 0x7fea9dc88050>, local_service=<ovirt_imageio_daemon.server.LocalService object at 0x7fea9ca24850>, control_service=None, running=True) Traceback (most recent call last): File "/usr/lib/python2.7/site-packages/ovirt_imageio_daemon/server.py", line 58, in main start(config) File "/usr/lib/python2.7/site-packages/ovirt_imageio_daemon/server.py", line 99, in start control_service = ControlService(config) File "/usr/lib/python2.7/site-packages/ovirt_imageio_daemon/server.py", line 206, in __init__ config.tickets.socket, uhttp.UnixWSGIRequestHandler) File "/usr/lib64/python2.7/SocketServer.py", line 419, in __init__ self.server_bind() File "/usr/lib/python2.7/site-packages/ovirt_imageio_daemon/uhttp.py", line 79, in server_bind self.socket.bind(self.server_address) File "/usr/lib64/python2.7/socket.py", line 224, in meth return getattr(self._sock,name)(*args) error: [Errno 2] No such file or directory One potential problem I noticed is that on this host I setup eth0 with 192.168.122.x (for ovirtmgmt) and eth1 with 192.168.124.y (for gluster, even if only one host, but aiming at adding other 2 hosts in second step) and the libvirt network temporarily created for the local engine vm is also on 192.168.124.0 network..... 4: virbr0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default qlen 1000 link/ether 52:54:00:b8:6b:3c brd ff:ff:ff:ff:ff:ff inet 192.168.124.1/24 brd 192.168.124.255 scope global virbr0 valid_lft forever preferred_lft forever 5: virbr0-nic: <BROADCAST,MULTICAST> mtu 1500 qdisc pfifo_fast master virbr0 state DOWN group default qlen 1000 link/ether 52:54:00:b8:6b:3c brd ff:ff:ff:ff:ff:ff I can change my gluster network of this env and re-test, but would it be possible to have the libvirt network configurable? It seems risky to have a fixed one... Can I go ahead from this failed hosted engine after understanding reason of ovirt-imageio-daemon failure or am I forced to scratch? Supposing I go to power down and then power on again this host, how can I retry without scratching? Gianluca

On Fri, Jan 11, 2019 at 4:21 PM Gianluca Cecchi <gianluca.cecchi@gmail.com> wrote:
4) it fails here at final stages of the local vm engine setup during host activation:
Searching around I found these two bugzillas for 4.2 regarding ovirt-imageio-daemon related to missing dir/files in /var/run/vdsm because of errors in systemd-tmpfiles-setup.service startup (unknown vdsm user and kvm group) https://bugzilla.redhat.com/show_bug.cgi?id=1576479 https://bugzilla.redhat.com/show_bug.cgi?id=1639667 So the temporary workaround was to pre-create them before running the setup mkdir /var/run/vdsm chmod 755 /var/run/vdsm chown vdsm.kvm /var/run/vdsm mkdir /var/run/vdsm/dhclientmon chmod 755 /var/run/vdsm/dhclientmon/ chown vdsm.kvm /var/run/vdsm/dhclientmon/ mkdir /var/run/vdsm/trackedInterfaces chmod 755 /var/run/vdsm/trackedInterfaces/ chown vdsm.kvm /var/run/vdsm/trackedInterfaces/ mkdir /var/run/vdsm/v2v chmod 700 /var/run/vdsm/v2v chown vdsm.kvm /var/run/vdsm/v2v/ mkdir /var/run/vdsm/vhostuser chmod 755 /var/run/vdsm/vhostuser/ chown vdsm.kvm /var/run/vdsm/vhostuser/ mkdir /var/run/vdsm/payload chmod 755 /var/run/vdsm/payload/ chown vdsm.kvm /var/run/vdsm/payload/ Then a "systemctl start ovirt-imageio-daemon" completes ok. Host addition completed ok. The problem now is at final step where I get [ INFO ] TASK [oVirt.hosted-engine-setup : debug] [ INFO ] ok: [localhost] [ INFO ] TASK [oVirt.hosted-engine-setup : Fetch Datacenter ID] [ INFO ] ok: [localhost] [ INFO ] TASK [oVirt.hosted-engine-setup : Fetch Datacenter name] [ INFO ] ok: [localhost] [ INFO ] TASK [oVirt.hosted-engine-setup : Add NFS storage domain] [ INFO ] skipping: [localhost] [ INFO ] TASK [oVirt.hosted-engine-setup : Add glusterfs storage domain] [ ERROR ] Error: Fault reason is "Operation Failed". Fault detail is "[Invalid parameter]". HTTP response code is 400. [ ERROR ] fatal: [localhost]: FAILED! => {"changed": false, "msg": "Fault reason is \"Operation Failed\". Fault detail is \"[Invalid parameter]\". HTTP response code is 400."} In /var/log/ovirt-hosted-engine-setup I see file ovirt-hosted-engine-setup-ansible-create_storage_domain-2019011182910-ahbxsh.log with 2019-01-11 18:29:58,486+0100 INFO ansible task start {'status': 'OK', 'ansible_task': u'oVirt.hosted -engine-setup : Add glusterfs storage domain', 'ansible_playbook': u'/usr/share/ovirt-hosted-engine-setup/ansible/create_storage_domain.yml', 'ansible_type': 'task'} 2019-01-11 18:29:58,486+0100 DEBUG ansible on_any args TASK: oVirt.hosted-engine-setup : Add glusterfs storage domain kwargs is_conditional:False 2019-01-11 18:30:12,738+0100 DEBUG var changed: host "localhost" var "otopi_storage_domain_details_gluster" type "<type 'dict'>" value: "{ "changed": false, "exception": "Traceback (most recent call last):\n File \"/tmp/ansible_ovirt_storage_domain_payload_WLrUHW/__main__.py\", line 682, in main\n ret = storage_domains_module.create()\n File \"/tmp/ansible_ovirt_storage_domain_payload_WLrUHW/ansible_ovirt_storage_domain_payload.zip/ansible/module_utils/ovirt.py\", line 587, in create\n **kwargs\n File \"/usr/lib64/python2.7/site-packages/ovirtsdk4/services.py\", line 24225, in add\n return self._internal_add(storage_domain, headers, query, wait)\n File \"/usr/lib64/python2.7/site-packages/ovirtsdk4/service.py\", line 232, in _internal_add\n return future.wait() if wait else future\n File \"/usr/lib64/python2.7/site-packages/ovirtsdk4/service.py\", line 55, in wait\n return self._code(response)\n File \"/usr/lib64/python2.7/site-packages/ovirtsdk4/service.py\", line 229, in callback\n self._check_fault(response)\n File \"/usr/lib64/python2.7/site-packages/ovirtsdk4/service.py\", line 132, in _check_fault\n self._raise_error(response, body)\n File \"/usr/lib64/python2.7/site-packages/ovirtsdk4/service.py\", line 118, in _raise_error\n raise error\nError: Fault reason is \"Operation Failed\". Fault detail is \"[Invalid parameter]\". HTTP response code is 400.\n", "failed": true, "msg": "Fault reason is \"Operation Failed\". Fault detail is \"[Invalid parameter]\". HTTP response code is 400." }" 2019-01-11 18:30:12,738+0100 DEBUG var changed: host "localhost" var "ansible_play_hosts" type "<type 'list'>" value: "[]" 2019-01-11 18:30:12,739+0100 DEBUG var changed: host "localhost" var "play_hosts" type "<type 'list'>" value: "[]" 2019-01-11 18:30:12,739+0100 DEBUG var changed: host "localhost" var "ansible_play_batch" type "<type 'list'>" value: "[]" 2019-01-11 18:30:12,739+0100 ERROR ansible failed {'status': 'FAILED', 'ansible_type': 'task', 'ansible_task': u'Add glusterfs storage domain', 'ansible_result': u'type: <type \'dict\'>\nstr: {\'_ansible_parsed\': True, u\'exception\': u\'Traceback (most recent call last):\\n File "/tmp/ansible_ovirt_storage_domain_payload_WLrUHW/__main__.py", line 682, in main\\n ret = storage_domains_module.create()\\n File "/tmp/ansible_ovirt_storage_domain_payload_WLrUHW/ansible_ovirt_storage_domain_pay', 'ansible_host': u'localhost', 'ansible_playbook': u'/usr/share/ovirt-hosted-engine-setup/ansible/create_storage_domain.yml'} 2019-01-11 18:30:12,739+0100 DEBUG ansible on_any args <ansible.executor.task_result.TaskResult object at 0x7fa52df1f3d0> kwargs ignore_errors:None 2019-01-11 18:30:12,742+0100 INFO ansible stats {'status': 'FAILED', 'ansible_playbook_duration': 61.686781, 'ansible_result': u"type: <type 'dict'>\nstr: {u'localhost': {'unreachable': 0, 'skipped': 10, 'ok': 14, 'changed': 0, 'failures': 1}}", 'ansible_playbook': u'/usr/share/ovirt-hosted-engine-setup/ansible/create_storage_domain.yml', 'ansible_type': 'finish'} 2019-01-11 18:30:12,742+0100 DEBUG ansible on_any args <ansible.executor.stats.AggregateStats object at 0x7fa530118b50> kwargs Gianluca

On Fri, Jan 11, 2019 at 6:48 PM Gianluca Cecchi <gianluca.cecchi@gmail.com> wrote:
The problem now is at final step where I get
[ INFO ] TASK [oVirt.hosted-engine-setup : debug] [ INFO ] ok: [localhost] [ INFO ] TASK [oVirt.hosted-engine-setup : Fetch Datacenter ID] [ INFO ] ok: [localhost] [ INFO ] TASK [oVirt.hosted-engine-setup : Fetch Datacenter name] [ INFO ] ok: [localhost] [ INFO ] TASK [oVirt.hosted-engine-setup : Add NFS storage domain] [ INFO ] skipping: [localhost] [ INFO ] TASK [oVirt.hosted-engine-setup : Add glusterfs storage domain] [ ERROR ] Error: Fault reason is "Operation Failed". Fault detail is "[Invalid parameter]". HTTP response code is 400. [ ERROR ] fatal: [localhost]: FAILED! => {"changed": false, "msg": "Fault reason is \"Operation Failed\". Fault detail is \"[Invalid parameter]\". HTTP response code is 400."}
On engine VM I see a reference to " CreateStorageDomainVDS failed: Invalid parameter: 'block_size=None'" in engine.log 2019-01-11 18:30:07,890+01 ERROR [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (default task-1) [70f9b925] EVENT_ID: VDS_BROKER_COMMAND_FAILURE(10,802), VDSM ov4301.localdomain.local command CreateStorageDomainVDS failed: Invalid parameter: 'block_size=None' 2019-01-11 18:30:07,891+01 INFO [org.ovirt.engine.core.vdsbroker.vdsbroker.CreateStorageDomainVDSCommand] (default task-1) [70f9b925] Command 'org.ovirt.engine.core.vdsbroker.vdsbroker.CreateStorageDomainVDSCommand' return value 'StatusOnlyReturn [status=Status [code=1000, message=Invalid parameter: 'block_size=None']]' 2019-01-11 18:30:07,891+01 INFO [org.ovirt.engine.core.vdsbroker.vdsbroker.CreateStorageDomainVDSCommand] (default task-1) [70f9b925] HostName = ov4301.localdomain.local 2019-01-11 18:30:07,891+01 ERROR [org.ovirt.engine.core.vdsbroker.vdsbroker.CreateStorageDomainVDSCommand] (default task-1) [70f9b925] Command 'CreateStorageDomainVDSCommand(HostName = ov4301.localdomain.local, CreateStorageDomainVDSCommandParameters:{hostId='0ae6c1d6-a1cc-49fc-8ecd-2a252cf2a3c4', storageDomain='StorageDomainStatic:{name='hosted_storage', id='c8005311-1e16-4ffe-bb37-e88dd0c90654'}', args='192.168.123.211:/engine'})' execution failed: VDSGenericException: VDSErrorException: Failed in vdscommand to CreateStorageDomainVDS, error = Invalid parameter: 'block_size=None' BTW: in my second test with the workaround of pre-creation of directory I scratched all before and set the gluster network to 192.168.123.0 to avoid conflict with the temporary bridge. I didn't get anymore the json error described in item 3) of my first post Gianluca

On Fri, Jan 11, 2019 at 6:57 PM Gianluca Cecchi <gianluca.cecchi@gmail.com> wrote:
On Fri, Jan 11, 2019 at 6:48 PM Gianluca Cecchi <gianluca.cecchi@gmail.com> wrote:
The problem now is at final step where I get
[ INFO ] TASK [oVirt.hosted-engine-setup : debug] [ INFO ] ok: [localhost] [ INFO ] TASK [oVirt.hosted-engine-setup : Fetch Datacenter ID] [ INFO ] ok: [localhost] [ INFO ] TASK [oVirt.hosted-engine-setup : Fetch Datacenter name] [ INFO ] ok: [localhost] [ INFO ] TASK [oVirt.hosted-engine-setup : Add NFS storage domain] [ INFO ] skipping: [localhost] [ INFO ] TASK [oVirt.hosted-engine-setup : Add glusterfs storage domain] [ ERROR ] Error: Fault reason is "Operation Failed". Fault detail is "[Invalid parameter]". HTTP response code is 400. [ ERROR ] fatal: [localhost]: FAILED! => {"changed": false, "msg": "Fault reason is \"Operation Failed\". Fault detail is \"[Invalid parameter]\". HTTP response code is 400."}
On engine VM I see a reference to " CreateStorageDomainVDS failed: Invalid parameter: 'block_size=None'" in engine.log
So after starting from scratch and using also the info as detailed on thread: https://www.mail-archive.com/users@ovirt.org/msg52879.html the steps now have been: - install from ovirt-node-ng-installer-4.3.0-2019011010.el7.iso and reboot - connect to cockpit and open terminal mkdir /var/run/vdsm chmod 755 /var/run/vdsm chown vdsm.kvm /var/run/vdsm mkdir /var/run/vdsm/dhclientmon chmod 755 /var/run/vdsm/dhclientmon/ chown vdsm.kvm /var/run/vdsm/dhclientmon/ mkdir /var/run/vdsm/trackedInterfaces chmod 755 /var/run/vdsm/trackedInterfaces/ chown vdsm.kvm /var/run/vdsm/trackedInterfaces/ mkdir /var/run/vdsm/v2v chmod 700 /var/run/vdsm/v2v chown vdsm.kvm /var/run/vdsm/v2v/ mkdir /var/run/vdsm/vhostuser chmod 755 /var/run/vdsm/vhostuser/ chown vdsm.kvm /var/run/vdsm/vhostuser/ mkdir /var/run/vdsm/payload chmod 755 /var/run/vdsm/payload/ chown vdsm.kvm /var/run/vdsm/payload/ systemctl restart sshd - put in the newer version of vdsm-api.pickle from vdsm-api-4.30.5-2.gitf824ec2.el7.noarch.rpm in /usr/lib/python2.7/site-packages/vdsm/rpc/vdsm-api.pickle - run the wizard for the gluster+he setup (the right positioned option) inside the gdeploy text window click edit and add " [diskcount] 1 " under the section " [disktype] jbod " - first 2 steps ok - last step fails in finish part [ INFO ] TASK [oVirt.hosted-engine-setup : Fetch Datacenter name] [ INFO ] ok: [localhost] [ INFO ] TASK [oVirt.hosted-engine-setup : Add NFS storage domain] [ INFO ] skipping: [localhost] [ INFO ] TASK [oVirt.hosted-engine-setup : Add glusterfs storage domain] [ INFO ] changed: [localhost] [ INFO ] TASK [oVirt.hosted-engine-setup : Add iSCSI storage domain] [ INFO ] skipping: [localhost] [ INFO ] TASK [oVirt.hosted-engine-setup : Add Fibre Channel storage domain] [ INFO ] skipping: [localhost] [ INFO ] TASK [oVirt.hosted-engine-setup : Get storage domain details] [ INFO ] ok: [localhost] [ INFO ] TASK [oVirt.hosted-engine-setup : debug] [ INFO ] ok: [localhost] [ INFO ] TASK [oVirt.hosted-engine-setup : Find the appliance OVF] [ INFO ] ok: [localhost] [ INFO ] TASK [oVirt.hosted-engine-setup : debug] [ INFO ] ok: [localhost] [ INFO ] TASK [oVirt.hosted-engine-setup : Parse OVF] [ INFO ] ok: [localhost] [ INFO ] TASK [oVirt.hosted-engine-setup : Get required size] [ INFO ] ok: [localhost] [ INFO ] TASK [oVirt.hosted-engine-setup : debug] [ INFO ] ok: [localhost] [ INFO ] TASK [oVirt.hosted-engine-setup : Remove unsuitable storage domain] [ INFO ] skipping: [localhost] [ INFO ] TASK [oVirt.hosted-engine-setup : debug] [ INFO ] ok: [localhost] [ INFO ] TASK [oVirt.hosted-engine-setup : Check storage domain free space] [ INFO ] skipping: [localhost] [ INFO ] TASK [oVirt.hosted-engine-setup : Activate storage domain] [ ERROR ] Error: Fault reason is "Operation Failed". Fault detail is "[]". HTTP response code is 400. [ ERROR ] fatal: [localhost]: FAILED! => {"changed": false, "msg": "Fault reason is \"Operation Failed\". Fault detail is \"[]\". HTTP response code is 400."} On engine.log I see 2019-01-15 13:50:35,317+01 INFO [org.ovirt.engine.core.vdsbroker.vdsbroker.CreateStoragePoolVDSCommand] (default task-2) [51725212] START, CreateStoragePoolVDSCommand(HostName = ov4301.localdomain.lo cal, CreateStoragePoolVDSCommandParameters:{hostId='e8f105f1-37ed-4ac4-bfc3-b1e55ed3027f', storagePoolId='96a31a7e-18bb-11e9-9a34-00163e6196f3', storagePoolName='Default', masterDomainId='14ec2fc7-8c2 b-487c-8f4f-428644650928', domainsIdList='[14ec2fc7-8c2b-487c-8f4f-428644650928]', masterVersion='1'}), log id: 4baccd53 2019-01-15 13:50:36,345+01 ERROR [org.ovirt.engine.core.vdsbroker.vdsbroker.CreateStoragePoolVDSCommand] (default task-2) [51725212] Failed in 'CreateStoragePoolVDS' method 2019-01-15 13:50:36,354+01 ERROR [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (default task-2) [51725212] EVENT_ID: VDS_BROKER_COMMAND_FAILURE(10,802), VDSM ov4301.localdomain.local command CreateStoragePoolVDS failed: Cannot acquire host id: (u'14ec2fc7-8c2b-487c-8f4f-428644650928', SanlockException(-203, 'Sanlock lockspace add failure', 'Watchdog device error')) 2019-01-15 13:50:36,354+01 INFO [org.ovirt.engine.core.vdsbroker.vdsbroker.CreateStoragePoolVDSCommand] (default task-2) [51725212] Command 'org.ovirt.engine.core.vdsbroker.vdsbroker.CreateStoragePoolVDSCommand' return value 'StatusOnlyReturn [status=Status [code=661, message=Cannot acquire host id: (u'14ec2fc7-8c2b-487c-8f4f-428644650928', SanlockException(-203, 'Sanlock lockspace add failure', 'Watchdog device error'))]]' 2019-01-15 13:50:36,354+01 INFO [org.ovirt.engine.core.vdsbroker.vdsbroker.CreateStoragePoolVDSCommand] (default task-2) [51725212] HostName = ov4301.localdomain.local 2019-01-15 13:50:36,355+01 ERROR [org.ovirt.engine.core.vdsbroker.vdsbroker.CreateStoragePoolVDSCommand] (default task-2) [51725212] Command 'CreateStoragePoolVDSCommand(HostName = ov4301.localdomain.local, CreateStoragePoolVDSCommandParameters:{hostId='e8f105f1-37ed-4ac4-bfc3-b1e55ed3027f', storagePoolId='96a31a7e-18bb-11e9-9a34-00163e6196f3', storagePoolName='Default', masterDomainId='14ec2fc7-8c2b-487c-8f4f-428644650928', domainsIdList='[14ec2fc7-8c2b-487c-8f4f-428644650928]', masterVersion='1'})' execution failed: VDSGenericException: VDSErrorException: Failed to CreateStoragePoolVDS, error = Cannot acquire host id: (u'14ec2fc7-8c2b-487c-8f4f-428644650928', SanlockException(-203, 'Sanlock lockspace add failure', 'Watchdog device error')), code = 661 2019-01-15 13:50:36,355+01 INFO [org.ovirt.engine.core.vdsbroker.vdsbroker.CreateStoragePoolVDSCommand] (default task-2) [51725212] FINISH, CreateStoragePoolVDSCommand, return: , log id: 4baccd53 2019-01-15 13:50:36,355+01 ERROR [org.ovirt.engine.core.bll.storage.pool.AddStoragePoolWithStoragesCommand] (default task-2) [51725212] Command 'org.ovirt.engine.core.bll.storage.pool.AddStoragePoolWithStoragesCommand' failed: EngineException: org.ovirt.engine.core.vdsbroker.vdsbroker.VDSErrorException: VDSGenericException: VDSErrorException: Failed to CreateStoragePoolVDS, error = Cannot acquire host id: (u'14ec2fc7-8c2b-487c-8f4f-428644650928', SanlockException(-203, 'Sanlock lockspace add failure', 'Watchdog device error')), code = 661 (Failed with error AcquireHostIdFailure and code 661) 2019-01-15 13:50:36,379+01 INFO [org.ovirt.engine.core.bll.CommandCompensator] (default task-2) [51725212] Command [id=c55d9962-368e-4e0c-8fee-bd06e7570062]: Compensating DELETED_OR_UPDATED_ENTITY of org.ovirt.engine.core.common.businessentities.StoragePool; snapshot: id=96a31a7e-18bb-11e9-9a34-00163e6196f3. On host: [root@ov4301 log]# cat /etc/hosts 192.168.124.50 ov43eng.localdomain.local # temporary entry added by hosted-engine-setup for the bootstrap VM 127.0.0.1 localhost localhost.localdomain localhost4 localhost4.localdomain4 ::1 localhost localhost.localdomain localhost6 localhost6.localdomain6 192.168.122.210 ov43eng.localdomain.local ov43eng 192.168.122.211 ov4301.localdomain.local ov4301 [root@ov4301 log]# [root@ov4301 log]# df -h | grep gluster /dev/mapper/gluster_vg_sdb-gluster_lv_engine 64G 36M 64G 1% /gluster_bricks/engine /dev/mapper/gluster_vg_sdb-gluster_lv_data 30G 34M 30G 1% /gluster_bricks/data /dev/mapper/gluster_vg_sdb-gluster_lv_vmstore 20G 34M 20G 1% /gluster_bricks/vmstore 192.168.123.211:/engine 64G 691M 64G 2% /rhev/data-center/mnt/glusterSD/192.168.123.211:_engine [root@ov4301 log]# and in its messages: Jan 15 13:35:49 ov4301 dnsmasq-dhcp[22934]: DHCPREQUEST(virbr0) 192.168.124.50 00:16:3e:61:96:f3 Jan 15 13:35:49 ov4301 dnsmasq-dhcp[22934]: DHCPACK(virbr0) 192.168.124.50 00:16:3e:61:96:f3 ov43eng Jan 15 13:35:49 ov4301 dnsmasq-dhcp[22934]: not giving name ov43eng to the DHCP lease of 192.168.124.50 because the name exists in /etc/hosts with address 192.168.122.210 Jan 15 13:40:01 ov4301 systemd: Started Session 38 of user root. Jan 15 13:47:12 ov4301 vdsm[29591]: WARN MOM not available. Jan 15 13:47:12 ov4301 vdsm[29591]: WARN MOM not available, KSM stats will be missing. Jan 15 13:49:05 ov4301 python: ansible-setup Invoked with filter=* gather_subset=['all'] fact_path=/etc/ansible/facts.d gather_timeout=10 Jan 15 13:49:19 ov4301 python: ansible-stat Invoked with checksum_algorithm=sha1 get_checksum=True follow=False path=/var/tmp/localvmOIXI_W get_md5=None get_mime=True get_attributes=True Jan 15 13:49:24 ov4301 python: ansible-ovirt_auth Invoked with username=None kerberos=False timeout=0 url=None insecure=True hostname=None compress=True state=present headers=None token=None ovirt_auth=None ca_file=None password=NOT_LOGGING_PARAMETER Jan 15 13:49:29 ov4301 python: ansible-ovirt_host_facts Invoked with all_content=False pattern=name=ov4301.localdomain.local fetch_nested=False nested_attributes=[] auth={'timeout': 0, 'url': ' https://ov43eng.localdomain.local/ovirt-engine/api', 'insecure': True, 'kerberos': False, 'compress': True, 'headers': None, 'token': 'Q8qt0Z9DmHJRdg3wk7YxNOAs0JPpBMxxstVx3I8skbulwRWp1SsVXuZYq4DUuPWeEnUZ2bD8TAuwCzJ3qlFYlw', 'ca_file': None} Jan 15 13:49:35 ov4301 python: ansible-ovirt_cluster_facts Invoked with pattern= fetch_nested=False nested_attributes=[] auth={'timeout': 0, 'url': 'https://ov43eng.localdomain.local/ovirt-engine/api', 'insecure': True, 'kerberos': False, 'compress': True, 'headers': None, 'token': 'Q8qt0Z9DmHJRdg3wk7YxNOAs0JPpBMxxstVx3I8skbulwRWp1SsVXuZYq4DUuPWeEnUZ2bD8TAuwCzJ3qlFYlw', 'ca_file': None} Jan 15 13:49:43 ov4301 python: ansible-ovirt_datacenter_facts Invoked with pattern= fetch_nested=False nested_attributes=[] auth={'timeout': 0, 'url': 'https://ov43eng.localdomain.local/ovirt-engine/api', 'insecure': True, 'kerberos': False, 'compress': True, 'headers': None, 'token': 'Q8qt0Z9DmHJRdg3wk7YxNOAs0JPpBMxxstVx3I8skbulwRWp1SsVXuZYq4bD8TAuwCzJ3qlFYlw', 'ca_file': None} host=ov4301.localdomain.local nested_attributes=[] wait=True domain_function=data name=hosted_storage critical_space_action_blocker=None posixfs=None poll_interval=3 glusterfs=None nfs=None timeout=180 backup=None discard_after_delete=None Jan 15 13:50:34 ov4301 systemd: Started Session c24 of user root. Jan 15 13:50:34 ov4301 sanlock[26802]: 2019-01-15 13:50:34 11549 [21290]: s1 wdmd_connect failed -13 Jan 15 13:50:34 ov4301 sanlock[26802]: 2019-01-15 13:50:34 11549 [21290]: s1 connect_watchdog failed -1 Jan 15 13:50:35 ov4301 sanlock[26802]: 2019-01-15 13:50:35 11550 [26810]: s1 add_lockspace fail result -203 Jan 15 13:56:48 ov4301 dnsmasq-dhcp[22934]: DHCPREQUEST(virbr0) 192.168.124.50 00:16:3e:61:96:f3 Jan 15 13:56:48 ov4301 dnsmasq-dhcp[22934]: DHCPACK(virbr0) 192.168.124.50 00:16:3e:61:96:f3 ov43eng Jan 15 13:56:48 ov4301 dnsmasq-dhcp[22934]: not giving name ov43eng to the DHCP lease of 192.168.124.50 because the name exists in /etc/hosts with address 192.168.122.210 Jan 15 13:59:34 ov4301 chronyd[26447]: Source 212.45.144.206 replaced with 80.211.52.109 Jan 15 13:59:44 ov4301 vdsm[29591]: WARN MOM not available. Jan 15 13:59:44 ov4301 vdsm[29591]: WARN MOM not available, KSM stats will be missing. DUuPWeEnUZ2bD8TAuwCzJ3qlFYlw', 'ca_file': None} Jan 15 13:49:54 ov4301 python: ansible-ovirt_storage_domain Invoked with comment=None warning_low_space=None fetch_nested=False localfs=None data_center=Default id=None iscsi=None state=unattached wipe_after_delete=None destroy=None fcp=None description=None format=None auth={'username':********@internal', 'url': ' https://ov43eng.localdomain.local/ovirt-engine/api', 'insecure': True, 'password': 'passw0rd'} host=ov4301.localdomain.local nested_attributes=[] wait=True domain_function=data name=hosted_storage critical_space_action_blocker=None posixfs=None poll_interval=3 glusterfs={'path': '/engine', 'mount_options': None, 'address': '192.168.123.211'} nfs=None timeout=180 backup=None discard_after_delete=None Jan 15 13:50:01 ov4301 systemd: Started Session 39 of user root. Jan 15 13:50:01 ov4301 systemd: Created slice vdsm-glusterfs.slice. Jan 15 13:50:01 ov4301 systemd: Started /usr/bin/mount -t glusterfs 192.168.123.211:/engine /rhev/data-center/mnt/glusterSD/192.168.123.211: _engine. Jan 15 13:50:01 ov4301 kernel: fuse init (API version 7.22) Jan 15 13:50:01 ov4301 systemd: Mounting FUSE Control File System... Jan 15 13:50:02 ov4301 systemd: Mounted FUSE Control File System. Jan 15 13:50:02 ov4301 systemd: Started Session c20 of user root. Jan 15 13:50:02 ov4301 systemd: Started Session c21 of user root. Jan 15 13:50:03 ov4301 systemd: Started Session c22 of user root. Jan 15 13:50:03 ov4301 systemd: Started Session c23 of user root. Jan 15 13:50:12 ov4301 python: ansible-ovirt_storage_domain_facts Invoked with pattern=name=hosted_storage fetch_nested=False nested_attributes=[] auth={'timeout': 0, 'url': ' https://ov43eng.localdomain.local/ovirt-engine/api', 'insecure': True, 'kerberos': False, 'compress': True, 'headers': None, 'token': 'Q8qt0Z9DmHJRdg3wk7YxNOAs0JPpBMxxstVx3I8skbulwRWp1SsVXuZYq4DUuPWeEnUZ2bD8TAuwCzJ3qlFYlw', 'ca_file': None} Jan 15 13:50:18 ov4301 python: ansible-find Invoked with excludes=None paths=['/var/tmp/localvmOIXI_W/master'] file_type=file age=None contains=None recurse=True age_stamp=mtime patterns=['^.*.(?<!meta).ovf$'] depth=None get_checksum=False use_regex=True follow=False hidden=False size=None Jan 15 13:50:21 ov4301 python: ansible-xml Invoked with xpath=/ovf:Envelope/Section/Disk count=False set_children=None xmlstring=None strip_cdata_tags=False attribute=size pretty_print=False add_children=None value=None content=attribute state=present namespaces={'vssd': ' http://schemas.dmtf.org/wbem/wscim/1/cim-schema/2/CIM_VirtualSystemSettingDa...', 'rasd': ' http://schemas.dmtf.org/wbem/wscim/1/cim-schema/2/CIM_ResourceAllocationSett...', 'xsi': 'http://www.w3.org/2001/XMLSchema-instance', 'ovf': ' http://schemas.dmtf.org/ovf/envelope/1/'} input_type=yaml print_match=False path=/var/tmp/localvmOIXI_W/master/vms/c99e3e6b-db14-446f-aaee-48a056d3dd93/c99e3e6b-db14-446f-aaee-48a056d3dd93.ovf backup=False Jan 15 13:50:30 ov4301 python: ansible-ovirt_storage_domain Invoked with comment=None warning_low_space=None fetch_nested=False localfs=None data_center=Default id=None iscsi=None state=present wipe_after_delete=None destroy=None fcp=None description=None format=None auth={'timeout': 0, 'url': 'https://ov43eng.localdomain.local/ovirt-engine/api', 'insecure': True, 'kerberos': False, 'compress': True, 'headers': None, 'token': 'Q8qt0Z9DmHJRdg3wk7YxNOAs0JPpBMxxstVx3I8skbulwRWp1SsVXuZYq4DUuPWeEnUZ2bD8TAuwCzJ3qlFYlw', 'ca_file': None} host=ov4301.localdomain.local nested_attributes=[] wait=True domain_function=data name=hosted_storage critical_space_action_blocker=None posixfs=None poll_interval=3 glusterfs=None nfs=None timeout=180 backup=None discard_after_delete=None Jan 15 13:50:34 ov4301 systemd: Started Session c24 of user root. Jan 15 13:50:34 ov4301 sanlock[26802]: 2019-01-15 13:50:34 11549 [21290]: s1 wdmd_connect failed -13 Jan 15 13:50:34 ov4301 sanlock[26802]: 2019-01-15 13:50:34 11549 [21290]: s1 connect_watchdog failed -1 Jan 15 13:50:35 ov4301 sanlock[26802]: 2019-01-15 13:50:35 11550 [26810]: s1 add_lockspace fail result -203 Jan 15 13:56:48 ov4301 dnsmasq-dhcp[22934]: DHCPREQUEST(virbr0) 192.168.124.50 00:16:3e:61:96:f3 Jan 15 13:56:48 ov4301 dnsmasq-dhcp[22934]: DHCPACK(virbr0) 192.168.124.50 00:16:3e:61:96:f3 ov43eng Jan 15 13:56:48 ov4301 dnsmasq-dhcp[22934]: not giving name ov43eng to the DHCP lease of 192.168.124.50 because the name exists in /etc/hosts with address 192.168.122.210 Jan 15 13:59:34 ov4301 chronyd[26447]: Source 212.45.144.206 replaced with 80.211.52.109 Jan 15 13:59:44 ov4301 vdsm[29591]: WARN MOM not available. Jan 15 13:59:44 ov4301 vdsm[29591]: WARN MOM not available, KSM stats will be missing. In vdsm.log : 2019-01-15 13:50:34,980+0100 INFO (jsonrpc/2) [storage.StorageDomain] sdUUID=14ec2fc7-8c2b-487c-8f4f-428644650928 (fileSD:533) 2019-01-15 13:50:34,984+0100 INFO (jsonrpc/2) [storage.StoragePool] Creating pool directory '/rhev/data-center/96a31a7e-18bb-11e9-9a34-00163e6196f3' (sp:634) 2019-01-15 13:50:34,984+0100 INFO (jsonrpc/2) [storage.fileUtils] Creating directory: /rhev/data-center/96a31a7e-18bb-11e9-9a34-00163e6196f3 mode: None (fileUtils:199) 2019-01-15 13:50:34,985+0100 INFO (jsonrpc/2) [storage.SANLock] Acquiring host id for domain 14ec2fc7-8c2b-487c-8f4f-428644650928 (id=250, async=False) (clusterlock:294) 2019-01-15 13:50:35,987+0100 INFO (jsonrpc/2) [vdsm.api] FINISH createStoragePool error=Cannot acquire host id: (u'14ec2fc7-8c2b-487c-8f4f-428644650928', SanlockException(-203, 'Sanlock lockspace add failure', 'Watchdog device error')) from=::ffff:192.168.124.50,42356, flow_id=51725212, task_id=7cbd7c09-e934-4396-bd9d-61e9f0e00bd3 (api:52) 2019-01-15 13:50:35,988+0100 ERROR (jsonrpc/2) [storage.TaskManager.Task] (Task='7cbd7c09-e934-4396-bd9d-61e9f0e00bd3') Unexpected error (task:875) Traceback (most recent call last): File "/usr/lib/python2.7/site-packages/vdsm/storage/task.py", line 882, in _run return fn(*args, **kargs) File "<string>", line 2, in createStoragePool File "/usr/lib/python2.7/site-packages/vdsm/common/api.py", line 50, in method ret = func(*args, **kwargs) File "/usr/lib/python2.7/site-packages/vdsm/storage/hsm.py", line 1003, in createStoragePool leaseParams) File "/usr/lib/python2.7/site-packages/vdsm/storage/sp.py", line 636, in create self._acquireTemporaryClusterLock(msdUUID, leaseParams) File "/usr/lib/python2.7/site-packages/vdsm/storage/sp.py", line 567, in _acquireTemporaryClusterLock msd.acquireHostId(self.id) File "/usr/lib/python2.7/site-packages/vdsm/storage/sd.py", line 860, in acquireHostId self._manifest.acquireHostId(hostId, async) File "/usr/lib/python2.7/site-packages/vdsm/storage/sd.py", line 467, in acquireHostId self._domainLock.acquireHostId(hostId, async) File "/usr/lib/python2.7/site-packages/vdsm/storage/clusterlock.py", line 325, in acquireHostId raise se.AcquireHostIdFailure(self._sdUUID, e) AcquireHostIdFailure: Cannot acquire host id: (u'14ec2fc7-8c2b-487c-8f4f-428644650928', SanlockException(-203, 'Sanlock lockspace add failure', 'Watchdog device error')) 2019-01-15 13:50:35,988+0100 INFO (jsonrpc/2) [storage.TaskManager.Task] (Task='7cbd7c09-e934-4396-bd9d-61e9f0e00bd3') aborting: Task is aborted: "Cannot acquire host id: (u'14ec2fc7-8c2b-487c-8f4f-428644650928', SanlockException(-203, 'Sanlock lockspace add failure', 'Watchdog device error'))" - code 661 (task:1181) 2019-01-15 13:50:35,989+0100 ERROR (jsonrpc/2) [storage.Dispatcher] FINISH createStoragePool error=Cannot acquire host id: (u'14ec2fc7-8c2b-487c-8f4f-428644650928', SanlockException(-203, 'Sanlock lockspace add failure', 'Watchdog device error')) (dispatcher:81) 2019-01-15 13:50:35,990+0100 INFO (jsonrpc/2) [jsonrpc.JsonRpcServer] RPC call StoragePool.create failed (error 661) in 1.01 seconds (__init__:312) 2019-01-15 13:50:38,109+0100 INFO (vmrecovery) [vdsm.api] START getConnectedStoragePoolsList(options=None) from=internal, task_id=e69af7e1-c456-4822-ae06-7b309263257d (api:48) 2019-01-15 13:50:38,109+0100 INFO (vmrecovery) [vdsm.api] FINISH getConnectedStoragePoolsList return={'poollist': []} from=internal, task_id=e69af7e1-c456-4822-ae06-7b309263257d (api:54) 2019-01-15 13:50:38,110+0100 INFO (vmrecovery) [vds] recovery: waiting for storage pool to go up (clientIF:705) 2019-01-15 13:50:39,802+0100 INFO (jsonrpc/4) [jsonrpc.JsonRpcServer] RPC call GlusterHost.list succeeded in 0.33 seconds (__init__:312) 2019-01-15 13:50:39,996+0100 INFO (jsonrpc/7) [jsonrpc.JsonRpcServer] RPC call GlusterVolume.list succeeded in 0.18 seconds (__init__:312) 2019-01-15 13:50:43,115+0100 INFO (vmrecovery) [vdsm.api] START getConnectedStoragePoolsList(options=None) from=internal, task_id=729c054e-021a-4d2e-b36c-edfb186a1210 (api:48) 2019-01-15 13:50:43,116+0100 INFO (vmrecovery) [vdsm.api] FINISH getConnectedStoragePoolsList return={'poollist': []} from=internal, task_id=729c054e-021a-4d2e-b36c-edfb186a1210 (api:54) 2019-01-15 13:50:43,116+0100 INFO (vmrecovery) [vds] recovery: waiting for storage pool to go up (clientIF:705) 2019-01-15 13:50:43,611+0100 INFO (jsonrpc/5) [api.host] START getStats() from=::ffff:192.168.124.50,42356 (api:48) 2019-01-15 13:50:43,628+0100 INFO (jsonrpc/5) [vdsm.api] START repoStats(domains=()) from=::ffff:192.168.124.50,42356, task_id=868d05f4-5535-4cb3-b283-92d3c1595bb3 (api:48) 2019-01-15 13:50:43,628+0100 INFO (jsonrpc/5) [vdsm.api] FINISH repoStats return={} from=::ffff:192.168.124.50,42356, task_id=868d05f4-5535-4cb3-b283-92d3c1595bb3 (api:54) 2019-01-15 13:50:43,628+0100 INFO (jsonrpc/5) [vdsm.api] START multipath_health() from=::ffff:192.168.124.50,42356, task_id=7f4fdad0-2b2e-4dcf-88b8-b7cf2689d4d9 (api:48) Gianluca Gianluca

The mail was partly scrambled in its contents so I put some clarification here: On Tue, Jan 15, 2019 at 2:38 PM Gianluca Cecchi <gianluca.cecchi@gmail.com> wrote:
So after starting from scratch and using also the info as detailed on thread: https://www.mail-archive.com/users@ovirt.org/msg52879.html
the steps now have been:
- install from ovirt-node-ng-installer-4.3.0-2019011010.el7.iso and reboot
- connect to cockpit and open terminal
This step is related to ssh daemon cd /etc/ssh chmod 600 *key systemctl restart sshd The step below is related to ovirt-imageio-daemon
mkdir /var/run/vdsm chmod 755 /var/run/vdsm chown vdsm.kvm /var/run/vdsm mkdir /var/run/vdsm/dhclientmon chmod 755 /var/run/vdsm/dhclientmon/ chown vdsm.kvm /var/run/vdsm/dhclientmon/ mkdir /var/run/vdsm/trackedInterfaces chmod 755 /var/run/vdsm/trackedInterfaces/ chown vdsm.kvm /var/run/vdsm/trackedInterfaces/ mkdir /var/run/vdsm/v2v chmod 700 /var/run/vdsm/v2v chown vdsm.kvm /var/run/vdsm/v2v/ mkdir /var/run/vdsm/vhostuser chmod 755 /var/run/vdsm/vhostuser/ chown vdsm.kvm /var/run/vdsm/vhostuser/ mkdir /var/run/vdsm/payload chmod 755 /var/run/vdsm/payload/ chown vdsm.kvm /var/run/vdsm/payload/
systemctl restart sshd
Actually: systemctl restart ovirt-imageio-daemon
- put in the newer version of vdsm-api.pickle from vdsm-api-4.30.5-2.gitf824ec2.el7.noarch.rpm in /usr/lib/python2.7/site-packages/vdsm/rpc/vdsm-api.pickle
download of vdsm-api.pickle can be directly done here eventually: https://drive.google.com/file/d/1AhakKhm_dzx-Gxt-Y1OojzRUwHs75kot/view?usp=s...
- run the wizard for the gluster+he setup (the right positioned option) inside the gdeploy text window click edit and add " [diskcount] 1
" under the section " [disktype] jbod "
In my case with single disk I choose JBOD options
- first 2 steps ok
- last step fails in finish part
[ INFO ] TASK [oVirt.hosted-engine-setup : Fetch Datacenter name] [ INFO ] ok: [localhost] [ INFO ] TASK [oVirt.hosted-engine-setup : Add NFS storage domain] [ INFO ] skipping: [localhost] [ INFO ] TASK [oVirt.hosted-engine-setup : Add glusterfs storage domain] [ INFO ] changed: [localhost] [ INFO ] TASK [oVirt.hosted-engine-setup : Add iSCSI storage domain] [ INFO ] skipping: [localhost] [ INFO ] TASK [oVirt.hosted-engine-setup : Add Fibre Channel storage domain] [ INFO ] skipping: [localhost] [ INFO ] TASK [oVirt.hosted-engine-setup : Get storage domain details] [ INFO ] ok: [localhost] [ INFO ] TASK [oVirt.hosted-engine-setup : debug] [ INFO ] ok: [localhost] [ INFO ] TASK [oVirt.hosted-engine-setup : Find the appliance OVF] [ INFO ] ok: [localhost] [ INFO ] TASK [oVirt.hosted-engine-setup : debug] [ INFO ] ok: [localhost] [ INFO ] TASK [oVirt.hosted-engine-setup : Parse OVF] [ INFO ] ok: [localhost] [ INFO ] TASK [oVirt.hosted-engine-setup : Get required size] [ INFO ] ok: [localhost] [ INFO ] TASK [oVirt.hosted-engine-setup : debug] [ INFO ] ok: [localhost] [ INFO ] TASK [oVirt.hosted-engine-setup : Remove unsuitable storage domain] [ INFO ] skipping: [localhost] [ INFO ] TASK [oVirt.hosted-engine-setup : debug] [ INFO ] ok: [localhost] [ INFO ] TASK [oVirt.hosted-engine-setup : Check storage domain free space] [ INFO ] skipping: [localhost] [ INFO ] TASK [oVirt.hosted-engine-setup : Activate storage domain] [ ERROR ] Error: Fault reason is "Operation Failed". Fault detail is "[]". HTTP response code is 400. [ ERROR ] fatal: [localhost]: FAILED! => {"changed": false, "msg": "Fault reason is \"Operation Failed\". Fault detail is \"[]\". HTTP response code is 400."}
On engine.log I see
2019-01-15 13:50:35,317+01 INFO [org.ovirt.engine.core.vdsbroker.vdsbroker.CreateStoragePoolVDSCommand] (default task-2) [51725212] START, CreateStoragePoolVDSCommand(HostName = ov4301.localdomain.lo cal, CreateStoragePoolVDSCommandParameters:{hostId='e8f105f1-37ed-4ac4-bfc3-b1e55ed3027f', storagePoolId='96a31a7e-18bb-11e9-9a34-00163e6196f3', storagePoolName='Default', masterDomainId='14ec2fc7-8c2 b-487c-8f4f-428644650928', domainsIdList='[14ec2fc7-8c2b-487c-8f4f-428644650928]', masterVersion='1'}), log id: 4baccd53 2019-01-15 13:50:36,345+01 ERROR [org.ovirt.engine.core.vdsbroker.vdsbroker.CreateStoragePoolVDSCommand] (default task-2) [51725212] Failed in 'CreateStoragePoolVDS' method 2019-01-15 13:50:36,354+01 ERROR [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (default task-2) [51725212] EVENT_ID: VDS_BROKER_COMMAND_FAILURE(10,802), VDSM ov4301.localdomain.local command CreateStoragePoolVDS failed: Cannot acquire host id: (u'14ec2fc7-8c2b-487c-8f4f-428644650928', SanlockException(-203, 'Sanlock lockspace add failure', 'Watchdog device error')) 2019-01-15 13:50:36,354+01 INFO [org.ovirt.engine.core.vdsbroker.vdsbroker.CreateStoragePoolVDSCommand] (default task-2) [51725212] Command 'org.ovirt.engine.core.vdsbroker.vdsbroker.CreateStoragePoolVDSCommand' return value 'StatusOnlyReturn [status=Status [code=661, message=Cannot acquire host id: (u'14ec2fc7-8c2b-487c-8f4f-428644650928', SanlockException(-203, 'Sanlock lockspace add failure', 'Watchdog device error'))]]' 2019-01-15 13:50:36,354+01 INFO [org.ovirt.engine.core.vdsbroker.vdsbroker.CreateStoragePoolVDSCommand] (default task-2) [51725212] HostName = ov4301.localdomain.local 2019-01-15 13:50:36,355+01 ERROR [org.ovirt.engine.core.vdsbroker.vdsbroker.CreateStoragePoolVDSCommand] (default task-2) [51725212] Command 'CreateStoragePoolVDSCommand(HostName = ov4301.localdomain.local, CreateStoragePoolVDSCommandParameters:{hostId='e8f105f1-37ed-4ac4-bfc3-b1e55ed3027f', storagePoolId='96a31a7e-18bb-11e9-9a34-00163e6196f3', storagePoolName='Default', masterDomainId='14ec2fc7-8c2b-487c-8f4f-428644650928', domainsIdList='[14ec2fc7-8c2b-487c-8f4f-428644650928]', masterVersion='1'})' execution failed: VDSGenericException: VDSErrorException: Failed to CreateStoragePoolVDS, error = Cannot acquire host id: (u'14ec2fc7-8c2b-487c-8f4f-428644650928', SanlockException(-203, 'Sanlock lockspace add failure', 'Watchdog device error')), code = 661 2019-01-15 13:50:36,355+01 INFO [org.ovirt.engine.core.vdsbroker.vdsbroker.CreateStoragePoolVDSCommand] (default task-2) [51725212] FINISH, CreateStoragePoolVDSCommand, return: , log id: 4baccd53 2019-01-15 13:50:36,355+01 ERROR [org.ovirt.engine.core.bll.storage.pool.AddStoragePoolWithStoragesCommand] (default task-2) [51725212] Command 'org.ovirt.engine.core.bll.storage.pool.AddStoragePoolWithStoragesCommand' failed: EngineException: org.ovirt.engine.core.vdsbroker.vdsbroker.VDSErrorException: VDSGenericException: VDSErrorException: Failed to CreateStoragePoolVDS, error = Cannot acquire host id: (u'14ec2fc7-8c2b-487c-8f4f-428644650928', SanlockException(-203, 'Sanlock lockspace add failure', 'Watchdog device error')), code = 661 (Failed with error AcquireHostIdFailure and code 661) 2019-01-15 13:50:36,379+01 INFO [org.ovirt.engine.core.bll.CommandCompensator] (default task-2) [51725212] Command [id=c55d9962-368e-4e0c-8fee-bd06e7570062]: Compensating DELETED_OR_UPDATED_ENTITY of org.ovirt.engine.core.common.businessentities.StoragePool; snapshot: id=96a31a7e-18bb-11e9-9a34-00163e6196f3.
On host:
[root@ov4301 log]# cat /etc/hosts 192.168.124.50 ov43eng.localdomain.local # temporary entry added by hosted-engine-setup for the bootstrap VM 127.0.0.1 localhost localhost.localdomain localhost4 localhost4.localdomain4 ::1 localhost localhost.localdomain localhost6 localhost6.localdomain6 192.168.122.210 ov43eng.localdomain.local ov43eng 192.168.122.211 ov4301.localdomain.local ov4301 [root@ov4301 log]#
[root@ov4301 log]# df -h | grep gluster /dev/mapper/gluster_vg_sdb-gluster_lv_engine 64G 36M 64G 1% /gluster_bricks/engine /dev/mapper/gluster_vg_sdb-gluster_lv_data 30G 34M 30G 1% /gluster_bricks/data /dev/mapper/gluster_vg_sdb-gluster_lv_vmstore 20G 34M 20G 1% /gluster_bricks/vmstore 192.168.123.211:/engine 64G 691M 64G 2% /rhev/data-center/mnt/glusterSD/192.168.123.211:_engine [root@ov4301 log]#
and in its messages:
Jan 15 13:35:49 ov4301 dnsmasq-dhcp[22934]: DHCPREQUEST(virbr0) 192.168.124.50 00:16:3e:61:96:f3 Jan 15 13:35:49 ov4301 dnsmasq-dhcp[22934]: DHCPACK(virbr0) 192.168.124.50 00:16:3e:61:96:f3 ov43eng Jan 15 13:35:49 ov4301 dnsmasq-dhcp[22934]: not giving name ov43eng to the DHCP lease of 192.168.124.50 because the name exists in /etc/hosts with address 192.168.122.210 Jan 15 13:40:01 ov4301 systemd: Started Session 38 of user root. Jan 15 13:47:12 ov4301 vdsm[29591]: WARN MOM not available. Jan 15 13:47:12 ov4301 vdsm[29591]: WARN MOM not available, KSM stats will be missing. Jan 15 13:49:05 ov4301 python: ansible-setup Invoked with filter=* gather_subset=['all'] fact_path=/etc/ansible/facts.d gather_timeout=10 Jan 15 13:49:19 ov4301 python: ansible-stat Invoked with checksum_algorithm=sha1 get_checksum=True follow=False path=/var/tmp/localvmOIXI_W get_md5=None get_mime=True get_attributes=True Jan 15 13:49:24 ov4301 python: ansible-ovirt_auth Invoked with username=None kerberos=False timeout=0 url=None insecure=True hostname=None compress=True state=present headers=None token=None ovirt_auth=None ca_file=None password=NOT_LOGGING_PARAMETER Jan 15 13:49:29 ov4301 python: ansible-ovirt_host_facts Invoked with all_content=False pattern=name=ov4301.localdomain.local fetch_nested=False nested_attributes=[] auth={'timeout': 0, 'url': ' https://ov43eng.localdomain.local/ovirt-engine/api', 'insecure': True, 'kerberos': False, 'compress': True, 'headers': None, 'token': 'Q8qt0Z9DmHJRdg3wk7YxNOAs0JPpBMxxstVx3I8skbulwRWp1SsVXuZYq4DUuPWeEnUZ2bD8TAuwCzJ3qlFYlw', 'ca_file': None} Jan 15 13:49:35 ov4301 python: ansible-ovirt_cluster_facts Invoked with pattern= fetch_nested=False nested_attributes=[] auth={'timeout': 0, 'url': 'https://ov43eng.localdomain.local/ovirt-engine/api', 'insecure': True, 'kerberos': False, 'compress': True, 'headers': None, 'token': 'Q8qt0Z9DmHJRdg3wk7YxNOAs0JPpBMxxstVx3I8skbulwRWp1SsVXuZYq4DUuPWeEnUZ2bD8TAuwCzJ3qlFYlw', 'ca_file': None} Jan 15 13:49:43 ov4301 python: ansible-ovirt_datacenter_facts Invoked with pattern= fetch_nested=False nested_attributes=[] auth={'timeout': 0, 'url': 'https://ov43eng.localdomain.local/ovirt-engine/api', 'insecure': True, 'kerberos': False, 'compress': True, 'headers': None, 'token': 'Q8qt0Z9DmHJRdg3wk7YxNOAs0JPpBMxxstVx3I8skbulwRWp1SsVXuZYq4bD8TAuwCzJ3qlFYlw', 'ca_file': None} host=ov4301.localdomain.local nested_attributes=[] wait=True domain_function=data name=hosted_storage critical_space_action_blocker=None posixfs=None poll_interval=3 glusterfs=None nfs=None timeout=180 backup=None discard_after_delete=None Jan 15 13:50:34 ov4301 systemd: Started Session c24 of user root. Jan 15 13:50:34 ov4301 sanlock[26802]: 2019-01-15 13:50:34 11549 [21290]: s1 wdmd_connect failed -13 Jan 15 13:50:34 ov4301 sanlock[26802]: 2019-01-15 13:50:34 11549 [21290]: s1 connect_watchdog failed -1 Jan 15 13:50:35 ov4301 sanlock[26802]: 2019-01-15 13:50:35 11550 [26810]: s1 add_lockspace fail result -203 Jan 15 13:56:48 ov4301 dnsmasq-dhcp[22934]: DHCPREQUEST(virbr0) 192.168.124.50 00:16:3e:61:96:f3 Jan 15 13:56:48 ov4301 dnsmasq-dhcp[22934]: DHCPACK(virbr0) 192.168.124.50 00:16:3e:61:96:f3 ov43eng Jan 15 13:56:48 ov4301 dnsmasq-dhcp[22934]: not giving name ov43eng to the DHCP lease of 192.168.124.50 because the name exists in /etc/hosts with address 192.168.122.210 Jan 15 13:59:34 ov4301 chronyd[26447]: Source 212.45.144.206 replaced with 80.211.52.109 Jan 15 13:59:44 ov4301 vdsm[29591]: WARN MOM not available. Jan 15 13:59:44 ov4301 vdsm[29591]: WARN MOM not available, KSM stats will be missing. DUuPWeEnUZ2bD8TAuwCzJ3qlFYlw', 'ca_file': None} Jan 15 13:49:54 ov4301 python: ansible-ovirt_storage_domain Invoked with comment=None warning_low_space=None fetch_nested=False localfs=None data_center=Default id=None iscsi=None state=unattached wipe_after_delete=None destroy=None fcp=None description=None format=None auth={'username':********@internal', 'url': ' https://ov43eng.localdomain.local/ovirt-engine/api', 'insecure': True, 'password': 'passw0rd'} host=ov4301.localdomain.local nested_attributes=[] wait=True domain_function=data name=hosted_storage critical_space_action_blocker=None posixfs=None poll_interval=3 glusterfs={'path': '/engine', 'mount_options': None, 'address': '192.168.123.211'} nfs=None timeout=180 backup=None discard_after_delete=None Jan 15 13:50:01 ov4301 systemd: Started Session 39 of user root. Jan 15 13:50:01 ov4301 systemd: Created slice vdsm-glusterfs.slice. Jan 15 13:50:01 ov4301 systemd: Started /usr/bin/mount -t glusterfs 192.168.123.211:/engine /rhev/data-center/mnt/glusterSD/192.168.123.211: _engine. Jan 15 13:50:01 ov4301 kernel: fuse init (API version 7.22) Jan 15 13:50:01 ov4301 systemd: Mounting FUSE Control File System... Jan 15 13:50:02 ov4301 systemd: Mounted FUSE Control File System. Jan 15 13:50:02 ov4301 systemd: Started Session c20 of user root. Jan 15 13:50:02 ov4301 systemd: Started Session c21 of user root. Jan 15 13:50:03 ov4301 systemd: Started Session c22 of user root. Jan 15 13:50:03 ov4301 systemd: Started Session c23 of user root. Jan 15 13:50:12 ov4301 python: ansible-ovirt_storage_domain_facts Invoked with pattern=name=hosted_storage fetch_nested=False nested_attributes=[] auth={'timeout': 0, 'url': ' https://ov43eng.localdomain.local/ovirt-engine/api', 'insecure': True, 'kerberos': False, 'compress': True, 'headers': None, 'token': 'Q8qt0Z9DmHJRdg3wk7YxNOAs0JPpBMxxstVx3I8skbulwRWp1SsVXuZYq4DUuPWeEnUZ2bD8TAuwCzJ3qlFYlw', 'ca_file': None} Jan 15 13:50:18 ov4301 python: ansible-find Invoked with excludes=None paths=['/var/tmp/localvmOIXI_W/master'] file_type=file age=None contains=None recurse=True age_stamp=mtime patterns=['^.*.(?<!meta).ovf$'] depth=None get_checksum=False use_regex=True follow=False hidden=False size=None Jan 15 13:50:21 ov4301 python: ansible-xml Invoked with xpath=/ovf:Envelope/Section/Disk count=False set_children=None xmlstring=None strip_cdata_tags=False attribute=size pretty_print=False add_children=None value=None content=attribute state=present namespaces={'vssd': ' http://schemas.dmtf.org/wbem/wscim/1/cim-schema/2/CIM_VirtualSystemSettingDa...', 'rasd': ' http://schemas.dmtf.org/wbem/wscim/1/cim-schema/2/CIM_ResourceAllocationSett...', 'xsi': 'http://www.w3.org/2001/XMLSchema-instance', 'ovf': ' http://schemas.dmtf.org/ovf/envelope/1/'} input_type=yaml print_match=False path=/var/tmp/localvmOIXI_W/master/vms/c99e3e6b-db14-446f-aaee-48a056d3dd93/c99e3e6b-db14-446f-aaee-48a056d3dd93.ovf backup=False Jan 15 13:50:30 ov4301 python: ansible-ovirt_storage_domain Invoked with comment=None warning_low_space=None fetch_nested=False localfs=None data_center=Default id=None iscsi=None state=present wipe_after_delete=None destroy=None fcp=None description=None format=None auth={'timeout': 0, 'url': 'https://ov43eng.localdomain.local/ovirt-engine/api', 'insecure': True, 'kerberos': False, 'compress': True, 'headers': None, 'token': 'Q8qt0Z9DmHJRdg3wk7YxNOAs0JPpBMxxstVx3I8skbulwRWp1SsVXuZYq4DUuPWeEnUZ2bD8TAuwCzJ3qlFYlw', 'ca_file': None} host=ov4301.localdomain.local nested_attributes=[] wait=True domain_function=data name=hosted_storage critical_space_action_blocker=None posixfs=None poll_interval=3 glusterfs=None nfs=None timeout=180 backup=None discard_after_delete=None Jan 15 13:50:34 ov4301 systemd: Started Session c24 of user root. Jan 15 13:50:34 ov4301 sanlock[26802]: 2019-01-15 13:50:34 11549 [21290]: s1 wdmd_connect failed -13 Jan 15 13:50:34 ov4301 sanlock[26802]: 2019-01-15 13:50:34 11549 [21290]: s1 connect_watchdog failed -1 Jan 15 13:50:35 ov4301 sanlock[26802]: 2019-01-15 13:50:35 11550 [26810]: s1 add_lockspace fail result -203 Jan 15 13:56:48 ov4301 dnsmasq-dhcp[22934]: DHCPREQUEST(virbr0) 192.168.124.50 00:16:3e:61:96:f3 Jan 15 13:56:48 ov4301 dnsmasq-dhcp[22934]: DHCPACK(virbr0) 192.168.124.50 00:16:3e:61:96:f3 ov43eng Jan 15 13:56:48 ov4301 dnsmasq-dhcp[22934]: not giving name ov43eng to the DHCP lease of 192.168.124.50 because the name exists in /etc/hosts with address 192.168.122.210 Jan 15 13:59:34 ov4301 chronyd[26447]: Source 212.45.144.206 replaced with 80.211.52.109 Jan 15 13:59:44 ov4301 vdsm[29591]: WARN MOM not available. Jan 15 13:59:44 ov4301 vdsm[29591]: WARN MOM not available, KSM stats will be missing.
In vdsm.log :
2019-01-15 13:50:34,980+0100 INFO (jsonrpc/2) [storage.StorageDomain] sdUUID=14ec2fc7-8c2b-487c-8f4f-428644650928 (fileSD:533) 2019-01-15 13:50:34,984+0100 INFO (jsonrpc/2) [storage.StoragePool] Creating pool directory '/rhev/data-center/96a31a7e-18bb-11e9-9a34-00163e6196f3' (sp:634) 2019-01-15 13:50:34,984+0100 INFO (jsonrpc/2) [storage.fileUtils] Creating directory: /rhev/data-center/96a31a7e-18bb-11e9-9a34-00163e6196f3 mode: None (fileUtils:199) 2019-01-15 13:50:34,985+0100 INFO (jsonrpc/2) [storage.SANLock] Acquiring host id for domain 14ec2fc7-8c2b-487c-8f4f-428644650928 (id=250, async=False) (clusterlock:294) 2019-01-15 13:50:35,987+0100 INFO (jsonrpc/2) [vdsm.api] FINISH createStoragePool error=Cannot acquire host id: (u'14ec2fc7-8c2b-487c-8f4f-428644650928', SanlockException(-203, 'Sanlock lockspace add failure', 'Watchdog device error')) from=::ffff:192.168.124.50,42356, flow_id=51725212, task_id=7cbd7c09-e934-4396-bd9d-61e9f0e00bd3 (api:52) 2019-01-15 13:50:35,988+0100 ERROR (jsonrpc/2) [storage.TaskManager.Task] (Task='7cbd7c09-e934-4396-bd9d-61e9f0e00bd3') Unexpected error (task:875) Traceback (most recent call last): File "/usr/lib/python2.7/site-packages/vdsm/storage/task.py", line 882, in _run return fn(*args, **kargs) File "<string>", line 2, in createStoragePool File "/usr/lib/python2.7/site-packages/vdsm/common/api.py", line 50, in method ret = func(*args, **kwargs) File "/usr/lib/python2.7/site-packages/vdsm/storage/hsm.py", line 1003, in createStoragePool leaseParams) File "/usr/lib/python2.7/site-packages/vdsm/storage/sp.py", line 636, in create self._acquireTemporaryClusterLock(msdUUID, leaseParams) File "/usr/lib/python2.7/site-packages/vdsm/storage/sp.py", line 567, in _acquireTemporaryClusterLock msd.acquireHostId(self.id) File "/usr/lib/python2.7/site-packages/vdsm/storage/sd.py", line 860, in acquireHostId self._manifest.acquireHostId(hostId, async) File "/usr/lib/python2.7/site-packages/vdsm/storage/sd.py", line 467, in acquireHostId self._domainLock.acquireHostId(hostId, async) File "/usr/lib/python2.7/site-packages/vdsm/storage/clusterlock.py", line 325, in acquireHostId raise se.AcquireHostIdFailure(self._sdUUID, e) AcquireHostIdFailure: Cannot acquire host id: (u'14ec2fc7-8c2b-487c-8f4f-428644650928', SanlockException(-203, 'Sanlock lockspace add failure', 'Watchdog device error')) 2019-01-15 13:50:35,988+0100 INFO (jsonrpc/2) [storage.TaskManager.Task] (Task='7cbd7c09-e934-4396-bd9d-61e9f0e00bd3') aborting: Task is aborted: "Cannot acquire host id: (u'14ec2fc7-8c2b-487c-8f4f-428644650928', SanlockException(-203, 'Sanlock lockspace add failure', 'Watchdog device error'))" - code 661 (task:1181) 2019-01-15 13:50:35,989+0100 ERROR (jsonrpc/2) [storage.Dispatcher] FINISH createStoragePool error=Cannot acquire host id: (u'14ec2fc7-8c2b-487c-8f4f-428644650928', SanlockException(-203, 'Sanlock lockspace add failure', 'Watchdog device error')) (dispatcher:81) 2019-01-15 13:50:35,990+0100 INFO (jsonrpc/2) [jsonrpc.JsonRpcServer] RPC call StoragePool.create failed (error 661) in 1.01 seconds (__init__:312) 2019-01-15 13:50:38,109+0100 INFO (vmrecovery) [vdsm.api] START getConnectedStoragePoolsList(options=None) from=internal, task_id=e69af7e1-c456-4822-ae06-7b309263257d (api:48) 2019-01-15 13:50:38,109+0100 INFO (vmrecovery) [vdsm.api] FINISH getConnectedStoragePoolsList return={'poollist': []} from=internal, task_id=e69af7e1-c456-4822-ae06-7b309263257d (api:54) 2019-01-15 13:50:38,110+0100 INFO (vmrecovery) [vds] recovery: waiting for storage pool to go up (clientIF:705) 2019-01-15 13:50:39,802+0100 INFO (jsonrpc/4) [jsonrpc.JsonRpcServer] RPC call GlusterHost.list succeeded in 0.33 seconds (__init__:312) 2019-01-15 13:50:39,996+0100 INFO (jsonrpc/7) [jsonrpc.JsonRpcServer] RPC call GlusterVolume.list succeeded in 0.18 seconds (__init__:312) 2019-01-15 13:50:43,115+0100 INFO (vmrecovery) [vdsm.api] START getConnectedStoragePoolsList(options=None) from=internal, task_id=729c054e-021a-4d2e-b36c-edfb186a1210 (api:48) 2019-01-15 13:50:43,116+0100 INFO (vmrecovery) [vdsm.api] FINISH getConnectedStoragePoolsList return={'poollist': []} from=internal, task_id=729c054e-021a-4d2e-b36c-edfb186a1210 (api:54) 2019-01-15 13:50:43,116+0100 INFO (vmrecovery) [vds] recovery: waiting for storage pool to go up (clientIF:705) 2019-01-15 13:50:43,611+0100 INFO (jsonrpc/5) [api.host] START getStats() from=::ffff:192.168.124.50,42356 (api:48) 2019-01-15 13:50:43,628+0100 INFO (jsonrpc/5) [vdsm.api] START repoStats(domains=()) from=::ffff:192.168.124.50,42356, task_id=868d05f4-5535-4cb3-b283-92d3c1595bb3 (api:48) 2019-01-15 13:50:43,628+0100 INFO (jsonrpc/5) [vdsm.api] FINISH repoStats return={} from=::ffff:192.168.124.50,42356, task_id=868d05f4-5535-4cb3-b283-92d3c1595bb3 (api:54) 2019-01-15 13:50:43,628+0100 INFO (jsonrpc/5) [vdsm.api] START multipath_health() from=::ffff:192.168.124.50,42356, task_id=7f4fdad0-2b2e-4dcf-88b8-b7cf2689d4d9 (api:48)
Gianluca
regarding sanlock daemon and watchdog multiple daemon, the latter seems to have no log file but I can only see status of service: [root@ov4301 ~]# systemctl status wdmd ● wdmd.service - Watchdog Multiplexing Daemon Loaded: loaded (/usr/lib/systemd/system/wdmd.service; disabled; vendor preset: disabled) Active: active (running) since Tue 2019-01-15 10:38:13 CET; 4h 32min ago Main PID: 3763 (wdmd) Tasks: 1 CGroup: /system.slice/wdmd.service └─3763 /usr/sbin/wdmd Jan 15 10:38:12 ov4301.localdomain.local systemd[1]: Starting Watchdog Multiplexing Daemon... Jan 15 10:38:13 ov4301.localdomain.local systemd-wdmd[3707]: Loading the softdog kernel module: ...] Jan 15 10:38:13 ov4301.localdomain.local wdmd[3759]: group 'sanlock' not found, using socket gid: 0 Jan 15 10:38:13 ov4301.localdomain.local wdmd[3763]: wdmd started S0 H1 G0 Jan 15 10:38:13 ov4301.localdomain.local wdmd[3763]: /dev/watchdog0 armed with fire_timeout 60 Jan 15 10:38:13 ov4301.localdomain.local systemd[1]: Started Watchdog Multiplexing Daemon. Hint: Some lines were ellipsized, use -l to show in full. [root@ov4301 ~]# while for sanlock: [root@ov4301 ~]# cat /var/log/sanlock.log 2019-01-15 10:38:13 6 [3721]: sanlock daemon started 3.6.0 host 8fd6d41c-99e8-4c3a-8212-68dd1856927c.ov4301.loc 2019-01-15 10:38:18 11 [3721]: helper pid 3725 dead wait 0 2019-01-15 12:54:56 8211 [26802]: sanlock daemon started 3.6.0 host 4dc13694-53a6-41dc-93b2-8ce371e903f5.ov4301.loc 2019-01-15 12:54:57 8211 [26802]: set scheduler RR|RESET_ON_FORK priority 99 failed: Operation not permitted 2019-01-15 13:50:34 11549 [26810]: s1 lockspace 14ec2fc7-8c2b-487c-8f4f-428644650928:250:/rhev/data-center/mnt/glusterSD/192.168.123.211: _engine/14ec2fc7-8c2b-487c-8f4f-428644650928/dom_md/ids:0 2019-01-15 13:50:34 11549 [21290]: s1 wdmd_connect failed -13 2019-01-15 13:50:34 11549 [21290]: s1 connect_watchdog failed -1 2019-01-15 13:50:35 11550 [26810]: s1 add_lockspace fail result -203 [root@ov4301 ~]# The question is: why these daemons start (with error) upon the node boot after install? Or is it a bug that the groups are not in place yet and I can restart in the inital step (as for ovirt-imageio-daemon) other daemons too?

On Tue, Jan 15, 2019 at 4:32 PM Gianluca Cecchi <gianluca.cecchi@gmail.com> wrote:
The mail was partly scrambled in its contents so I put some clarification here:
On Tue, Jan 15, 2019 at 2:38 PM Gianluca Cecchi <gianluca.cecchi@gmail.com> wrote:
So after starting from scratch and using also the info as detailed on thread: https://www.mail-archive.com/users@ovirt.org/msg52879.html
the steps now have been:
- install from ovirt-node-ng-installer-4.3.0-2019011010.el7.iso and reboot
- connect to cockpit and open terminal
This step is related to ssh daemon cd /etc/ssh chmod 600 *key systemctl restart sshd
The step below is related to ovirt-imageio-daemon
mkdir /var/run/vdsm chmod 755 /var/run/vdsm chown vdsm.kvm /var/run/vdsm mkdir /var/run/vdsm/dhclientmon chmod 755 /var/run/vdsm/dhclientmon/ chown vdsm.kvm /var/run/vdsm/dhclientmon/ mkdir /var/run/vdsm/trackedInterfaces chmod 755 /var/run/vdsm/trackedInterfaces/ chown vdsm.kvm /var/run/vdsm/trackedInterfaces/ mkdir /var/run/vdsm/v2v chmod 700 /var/run/vdsm/v2v chown vdsm.kvm /var/run/vdsm/v2v/ mkdir /var/run/vdsm/vhostuser chmod 755 /var/run/vdsm/vhostuser/ chown vdsm.kvm /var/run/vdsm/vhostuser/ mkdir /var/run/vdsm/payload chmod 755 /var/run/vdsm/payload/ chown vdsm.kvm /var/run/vdsm/payload/
systemctl restart sshd
Actually:
systemctl restart ovirt-imageio-daemon
- put in the newer version of vdsm-api.pickle from vdsm-api-4.30.5-2.gitf824ec2.el7.noarch.rpm in /usr/lib/python2.7/site-packages/vdsm/rpc/vdsm-api.pickle
download of vdsm-api.pickle can be directly done here eventually:
https://drive.google.com/file/d/1AhakKhm_dzx-Gxt-Y1OojzRUwHs75kot/view?usp=s...
- run the wizard for the gluster+he setup (the right positioned option) inside the gdeploy text window click edit and add " [diskcount] 1
" under the section " [disktype] jbod "
In my case with single disk I choose JBOD options
- first 2 steps ok
- last step fails in finish part
[ INFO ] TASK [oVirt.hosted-engine-setup : Fetch Datacenter name] [ INFO ] ok: [localhost] [ INFO ] TASK [oVirt.hosted-engine-setup : Add NFS storage domain] [ INFO ] skipping: [localhost] [ INFO ] TASK [oVirt.hosted-engine-setup : Add glusterfs storage domain] [ INFO ] changed: [localhost] [ INFO ] TASK [oVirt.hosted-engine-setup : Add iSCSI storage domain] [ INFO ] skipping: [localhost] [ INFO ] TASK [oVirt.hosted-engine-setup : Add Fibre Channel storage domain] [ INFO ] skipping: [localhost] [ INFO ] TASK [oVirt.hosted-engine-setup : Get storage domain details] [ INFO ] ok: [localhost] [ INFO ] TASK [oVirt.hosted-engine-setup : debug] [ INFO ] ok: [localhost] [ INFO ] TASK [oVirt.hosted-engine-setup : Find the appliance OVF] [ INFO ] ok: [localhost] [ INFO ] TASK [oVirt.hosted-engine-setup : debug] [ INFO ] ok: [localhost] [ INFO ] TASK [oVirt.hosted-engine-setup : Parse OVF] [ INFO ] ok: [localhost] [ INFO ] TASK [oVirt.hosted-engine-setup : Get required size] [ INFO ] ok: [localhost] [ INFO ] TASK [oVirt.hosted-engine-setup : debug] [ INFO ] ok: [localhost] [ INFO ] TASK [oVirt.hosted-engine-setup : Remove unsuitable storage domain] [ INFO ] skipping: [localhost] [ INFO ] TASK [oVirt.hosted-engine-setup : debug] [ INFO ] ok: [localhost] [ INFO ] TASK [oVirt.hosted-engine-setup : Check storage domain free space] [ INFO ] skipping: [localhost] [ INFO ] TASK [oVirt.hosted-engine-setup : Activate storage domain] [ ERROR ] Error: Fault reason is "Operation Failed". Fault detail is "[]". HTTP response code is 400. [ ERROR ] fatal: [localhost]: FAILED! => {"changed": false, "msg": "Fault reason is \"Operation Failed\". Fault detail is \"[]\". HTTP response code is 400."}
On engine.log I see
2019-01-15 13:50:35,317+01 INFO [org.ovirt.engine.core.vdsbroker.vdsbroker.CreateStoragePoolVDSCommand] (default task-2) [51725212] START, CreateStoragePoolVDSCommand(HostName = ov4301.localdomain.lo cal, CreateStoragePoolVDSCommandParameters:{hostId='e8f105f1-37ed-4ac4-bfc3-b1e55ed3027f', storagePoolId='96a31a7e-18bb-11e9-9a34-00163e6196f3', storagePoolName='Default', masterDomainId='14ec2fc7-8c2 b-487c-8f4f-428644650928', domainsIdList='[14ec2fc7-8c2b-487c-8f4f-428644650928]', masterVersion='1'}), log id: 4baccd53 2019-01-15 13:50:36,345+01 ERROR [org.ovirt.engine.core.vdsbroker.vdsbroker.CreateStoragePoolVDSCommand] (default task-2) [51725212] Failed in 'CreateStoragePoolVDS' method 2019-01-15 13:50:36,354+01 ERROR [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (default task-2) [51725212] EVENT_ID: VDS_BROKER_COMMAND_FAILURE(10,802), VDSM ov4301.localdomain.local command CreateStoragePoolVDS failed: Cannot acquire host id: (u'14ec2fc7-8c2b-487c-8f4f-428644650928', SanlockException(-203, 'Sanlock lockspace add failure', 'Watchdog device error')) 2019-01-15 13:50:36,354+01 INFO [org.ovirt.engine.core.vdsbroker.vdsbroker.CreateStoragePoolVDSCommand] (default task-2) [51725212] Command 'org.ovirt.engine.core.vdsbroker.vdsbroker.CreateStoragePoolVDSCommand' return value 'StatusOnlyReturn [status=Status [code=661, message=Cannot acquire host id: (u'14ec2fc7-8c2b-487c-8f4f-428644650928', SanlockException(-203, 'Sanlock lockspace add failure', 'Watchdog device error'))]]' 2019-01-15 13:50:36,354+01 INFO [org.ovirt.engine.core.vdsbroker.vdsbroker.CreateStoragePoolVDSCommand] (default task-2) [51725212] HostName = ov4301.localdomain.local 2019-01-15 13:50:36,355+01 ERROR [org.ovirt.engine.core.vdsbroker.vdsbroker.CreateStoragePoolVDSCommand] (default task-2) [51725212] Command 'CreateStoragePoolVDSCommand(HostName = ov4301.localdomain.local, CreateStoragePoolVDSCommandParameters:{hostId='e8f105f1-37ed-4ac4-bfc3-b1e55ed3027f', storagePoolId='96a31a7e-18bb-11e9-9a34-00163e6196f3', storagePoolName='Default', masterDomainId='14ec2fc7-8c2b-487c-8f4f-428644650928', domainsIdList='[14ec2fc7-8c2b-487c-8f4f-428644650928]', masterVersion='1'})' execution failed: VDSGenericException: VDSErrorException: Failed to CreateStoragePoolVDS, error = Cannot acquire host id: (u'14ec2fc7-8c2b-487c-8f4f-428644650928', SanlockException(-203, 'Sanlock lockspace add failure', 'Watchdog device error')), code = 661 2019-01-15 13:50:36,355+01 INFO [org.ovirt.engine.core.vdsbroker.vdsbroker.CreateStoragePoolVDSCommand] (default task-2) [51725212] FINISH, CreateStoragePoolVDSCommand, return: , log id: 4baccd53 2019-01-15 13:50:36,355+01 ERROR [org.ovirt.engine.core.bll.storage.pool.AddStoragePoolWithStoragesCommand] (default task-2) [51725212] Command 'org.ovirt.engine.core.bll.storage.pool.AddStoragePoolWithStoragesCommand' failed: EngineException: org.ovirt.engine.core.vdsbroker.vdsbroker.VDSErrorException: VDSGenericException: VDSErrorException: Failed to CreateStoragePoolVDS, error = Cannot acquire host id: (u'14ec2fc7-8c2b-487c-8f4f-428644650928', SanlockException(-203, 'Sanlock lockspace add failure', 'Watchdog device error')), code = 661 (Failed with error AcquireHostIdFailure and code 661) 2019-01-15 13:50:36,379+01 INFO [org.ovirt.engine.core.bll.CommandCompensator] (default task-2) [51725212] Command [id=c55d9962-368e-4e0c-8fee-bd06e7570062]: Compensating DELETED_OR_UPDATED_ENTITY of org.ovirt.engine.core.common.businessentities.StoragePool; snapshot: id=96a31a7e-18bb-11e9-9a34-00163e6196f3.
On host:
[root@ov4301 log]# cat /etc/hosts 192.168.124.50 ov43eng.localdomain.local # temporary entry added by hosted-engine-setup for the bootstrap VM 127.0.0.1 localhost localhost.localdomain localhost4 localhost4.localdomain4 ::1 localhost localhost.localdomain localhost6 localhost6.localdomain6 192.168.122.210 ov43eng.localdomain.local ov43eng 192.168.122.211 ov4301.localdomain.local ov4301 [root@ov4301 log]#
[root@ov4301 log]# df -h | grep gluster /dev/mapper/gluster_vg_sdb-gluster_lv_engine 64G 36M 64G 1% /gluster_bricks/engine /dev/mapper/gluster_vg_sdb-gluster_lv_data 30G 34M 30G 1% /gluster_bricks/data /dev/mapper/gluster_vg_sdb-gluster_lv_vmstore 20G 34M 20G 1% /gluster_bricks/vmstore 192.168.123.211:/engine 64G 691M 64G 2% /rhev/data-center/mnt/glusterSD/192.168.123.211:_engine [root@ov4301 log]#
and in its messages:
Jan 15 13:35:49 ov4301 dnsmasq-dhcp[22934]: DHCPREQUEST(virbr0) 192.168.124.50 00:16:3e:61:96:f3 Jan 15 13:35:49 ov4301 dnsmasq-dhcp[22934]: DHCPACK(virbr0) 192.168.124.50 00:16:3e:61:96:f3 ov43eng Jan 15 13:35:49 ov4301 dnsmasq-dhcp[22934]: not giving name ov43eng to the DHCP lease of 192.168.124.50 because the name exists in /etc/hosts with address 192.168.122.210 Jan 15 13:40:01 ov4301 systemd: Started Session 38 of user root. Jan 15 13:47:12 ov4301 vdsm[29591]: WARN MOM not available. Jan 15 13:47:12 ov4301 vdsm[29591]: WARN MOM not available, KSM stats will be missing. Jan 15 13:49:05 ov4301 python: ansible-setup Invoked with filter=* gather_subset=['all'] fact_path=/etc/ansible/facts.d gather_timeout=10 Jan 15 13:49:19 ov4301 python: ansible-stat Invoked with checksum_algorithm=sha1 get_checksum=True follow=False path=/var/tmp/localvmOIXI_W get_md5=None get_mime=True get_attributes=True Jan 15 13:49:24 ov4301 python: ansible-ovirt_auth Invoked with username=None kerberos=False timeout=0 url=None insecure=True hostname=None compress=True state=present headers=None token=None ovirt_auth=None ca_file=None password=NOT_LOGGING_PARAMETER Jan 15 13:49:29 ov4301 python: ansible-ovirt_host_facts Invoked with all_content=False pattern=name=ov4301.localdomain.local fetch_nested=False nested_attributes=[] auth={'timeout': 0, 'url': ' https://ov43eng.localdomain.local/ovirt-engine/api', 'insecure': True, 'kerberos': False, 'compress': True, 'headers': None, 'token': 'Q8qt0Z9DmHJRdg3wk7YxNOAs0JPpBMxxstVx3I8skbulwRWp1SsVXuZYq4DUuPWeEnUZ2bD8TAuwCzJ3qlFYlw', 'ca_file': None} Jan 15 13:49:35 ov4301 python: ansible-ovirt_cluster_facts Invoked with pattern= fetch_nested=False nested_attributes=[] auth={'timeout': 0, 'url': 'https://ov43eng.localdomain.local/ovirt-engine/api', 'insecure': True, 'kerberos': False, 'compress': True, 'headers': None, 'token': 'Q8qt0Z9DmHJRdg3wk7YxNOAs0JPpBMxxstVx3I8skbulwRWp1SsVXuZYq4DUuPWeEnUZ2bD8TAuwCzJ3qlFYlw', 'ca_file': None} Jan 15 13:49:43 ov4301 python: ansible-ovirt_datacenter_facts Invoked with pattern= fetch_nested=False nested_attributes=[] auth={'timeout': 0, 'url': 'https://ov43eng.localdomain.local/ovirt-engine/api', 'insecure': True, 'kerberos': False, 'compress': True, 'headers': None, 'token': 'Q8qt0Z9DmHJRdg3wk7YxNOAs0JPpBMxxstVx3I8skbulwRWp1SsVXuZYq4bD8TAuwCzJ3qlFYlw', 'ca_file': None} host=ov4301.localdomain.local nested_attributes=[] wait=True domain_function=data name=hosted_storage critical_space_action_blocker=None posixfs=None poll_interval=3 glusterfs=None nfs=None timeout=180 backup=None discard_after_delete=None Jan 15 13:50:34 ov4301 systemd: Started Session c24 of user root. Jan 15 13:50:34 ov4301 sanlock[26802]: 2019-01-15 13:50:34 11549 [21290]: s1 wdmd_connect failed -13 Jan 15 13:50:34 ov4301 sanlock[26802]: 2019-01-15 13:50:34 11549 [21290]: s1 connect_watchdog failed -1 Jan 15 13:50:35 ov4301 sanlock[26802]: 2019-01-15 13:50:35 11550 [26810]: s1 add_lockspace fail result -203 Jan 15 13:56:48 ov4301 dnsmasq-dhcp[22934]: DHCPREQUEST(virbr0) 192.168.124.50 00:16:3e:61:96:f3 Jan 15 13:56:48 ov4301 dnsmasq-dhcp[22934]: DHCPACK(virbr0) 192.168.124.50 00:16:3e:61:96:f3 ov43eng Jan 15 13:56:48 ov4301 dnsmasq-dhcp[22934]: not giving name ov43eng to the DHCP lease of 192.168.124.50 because the name exists in /etc/hosts with address 192.168.122.210 Jan 15 13:59:34 ov4301 chronyd[26447]: Source 212.45.144.206 replaced with 80.211.52.109 Jan 15 13:59:44 ov4301 vdsm[29591]: WARN MOM not available. Jan 15 13:59:44 ov4301 vdsm[29591]: WARN MOM not available, KSM stats will be missing. DUuPWeEnUZ2bD8TAuwCzJ3qlFYlw', 'ca_file': None} Jan 15 13:49:54 ov4301 python: ansible-ovirt_storage_domain Invoked with comment=None warning_low_space=None fetch_nested=False localfs=None data_center=Default id=None iscsi=None state=unattached wipe_after_delete=None destroy=None fcp=None description=None format=None auth={'username':********@internal', 'url': ' https://ov43eng.localdomain.local/ovirt-engine/api', 'insecure': True, 'password': 'passw0rd'} host=ov4301.localdomain.local nested_attributes=[] wait=True domain_function=data name=hosted_storage critical_space_action_blocker=None posixfs=None poll_interval=3 glusterfs={'path': '/engine', 'mount_options': None, 'address': '192.168.123.211'} nfs=None timeout=180 backup=None discard_after_delete=None Jan 15 13:50:01 ov4301 systemd: Started Session 39 of user root. Jan 15 13:50:01 ov4301 systemd: Created slice vdsm-glusterfs.slice. Jan 15 13:50:01 ov4301 systemd: Started /usr/bin/mount -t glusterfs 192.168.123.211:/engine /rhev/data-center/mnt/glusterSD/192.168.123.211: _engine. Jan 15 13:50:01 ov4301 kernel: fuse init (API version 7.22) Jan 15 13:50:01 ov4301 systemd: Mounting FUSE Control File System... Jan 15 13:50:02 ov4301 systemd: Mounted FUSE Control File System. Jan 15 13:50:02 ov4301 systemd: Started Session c20 of user root. Jan 15 13:50:02 ov4301 systemd: Started Session c21 of user root. Jan 15 13:50:03 ov4301 systemd: Started Session c22 of user root. Jan 15 13:50:03 ov4301 systemd: Started Session c23 of user root. Jan 15 13:50:12 ov4301 python: ansible-ovirt_storage_domain_facts Invoked with pattern=name=hosted_storage fetch_nested=False nested_attributes=[] auth={'timeout': 0, 'url': ' https://ov43eng.localdomain.local/ovirt-engine/api', 'insecure': True, 'kerberos': False, 'compress': True, 'headers': None, 'token': 'Q8qt0Z9DmHJRdg3wk7YxNOAs0JPpBMxxstVx3I8skbulwRWp1SsVXuZYq4DUuPWeEnUZ2bD8TAuwCzJ3qlFYlw', 'ca_file': None} Jan 15 13:50:18 ov4301 python: ansible-find Invoked with excludes=None paths=['/var/tmp/localvmOIXI_W/master'] file_type=file age=None contains=None recurse=True age_stamp=mtime patterns=['^.*.(?<!meta).ovf$'] depth=None get_checksum=False use_regex=True follow=False hidden=False size=None Jan 15 13:50:21 ov4301 python: ansible-xml Invoked with xpath=/ovf:Envelope/Section/Disk count=False set_children=None xmlstring=None strip_cdata_tags=False attribute=size pretty_print=False add_children=None value=None content=attribute state=present namespaces={'vssd': ' http://schemas.dmtf.org/wbem/wscim/1/cim-schema/2/CIM_VirtualSystemSettingDa...', 'rasd': ' http://schemas.dmtf.org/wbem/wscim/1/cim-schema/2/CIM_ResourceAllocationSett...', 'xsi': 'http://www.w3.org/2001/XMLSchema-instance', 'ovf': ' http://schemas.dmtf.org/ovf/envelope/1/'} input_type=yaml print_match=False path=/var/tmp/localvmOIXI_W/master/vms/c99e3e6b-db14-446f-aaee-48a056d3dd93/c99e3e6b-db14-446f-aaee-48a056d3dd93.ovf backup=False Jan 15 13:50:30 ov4301 python: ansible-ovirt_storage_domain Invoked with comment=None warning_low_space=None fetch_nested=False localfs=None data_center=Default id=None iscsi=None state=present wipe_after_delete=None destroy=None fcp=None description=None format=None auth={'timeout': 0, 'url': 'https://ov43eng.localdomain.local/ovirt-engine/api', 'insecure': True, 'kerberos': False, 'compress': True, 'headers': None, 'token': 'Q8qt0Z9DmHJRdg3wk7YxNOAs0JPpBMxxstVx3I8skbulwRWp1SsVXuZYq4DUuPWeEnUZ2bD8TAuwCzJ3qlFYlw', 'ca_file': None} host=ov4301.localdomain.local nested_attributes=[] wait=True domain_function=data name=hosted_storage critical_space_action_blocker=None posixfs=None poll_interval=3 glusterfs=None nfs=None timeout=180 backup=None discard_after_delete=None Jan 15 13:50:34 ov4301 systemd: Started Session c24 of user root. Jan 15 13:50:34 ov4301 sanlock[26802]: 2019-01-15 13:50:34 11549 [21290]: s1 wdmd_connect failed -13 Jan 15 13:50:34 ov4301 sanlock[26802]: 2019-01-15 13:50:34 11549 [21290]: s1 connect_watchdog failed -1 Jan 15 13:50:35 ov4301 sanlock[26802]: 2019-01-15 13:50:35 11550 [26810]: s1 add_lockspace fail result -203 Jan 15 13:56:48 ov4301 dnsmasq-dhcp[22934]: DHCPREQUEST(virbr0) 192.168.124.50 00:16:3e:61:96:f3 Jan 15 13:56:48 ov4301 dnsmasq-dhcp[22934]: DHCPACK(virbr0) 192.168.124.50 00:16:3e:61:96:f3 ov43eng Jan 15 13:56:48 ov4301 dnsmasq-dhcp[22934]: not giving name ov43eng to the DHCP lease of 192.168.124.50 because the name exists in /etc/hosts with address 192.168.122.210 Jan 15 13:59:34 ov4301 chronyd[26447]: Source 212.45.144.206 replaced with 80.211.52.109 Jan 15 13:59:44 ov4301 vdsm[29591]: WARN MOM not available. Jan 15 13:59:44 ov4301 vdsm[29591]: WARN MOM not available, KSM stats will be missing.
In vdsm.log :
2019-01-15 13:50:34,980+0100 INFO (jsonrpc/2) [storage.StorageDomain] sdUUID=14ec2fc7-8c2b-487c-8f4f-428644650928 (fileSD:533) 2019-01-15 13:50:34,984+0100 INFO (jsonrpc/2) [storage.StoragePool] Creating pool directory '/rhev/data-center/96a31a7e-18bb-11e9-9a34-00163e6196f3' (sp:634) 2019-01-15 13:50:34,984+0100 INFO (jsonrpc/2) [storage.fileUtils] Creating directory: /rhev/data-center/96a31a7e-18bb-11e9-9a34-00163e6196f3 mode: None (fileUtils:199) 2019-01-15 13:50:34,985+0100 INFO (jsonrpc/2) [storage.SANLock] Acquiring host id for domain 14ec2fc7-8c2b-487c-8f4f-428644650928 (id=250, async=False) (clusterlock:294) 2019-01-15 13:50:35,987+0100 INFO (jsonrpc/2) [vdsm.api] FINISH createStoragePool error=Cannot acquire host id: (u'14ec2fc7-8c2b-487c-8f4f-428644650928', SanlockException(-203, 'Sanlock lockspace add failure', 'Watchdog device error')) from=::ffff:192.168.124.50,42356, flow_id=51725212, task_id=7cbd7c09-e934-4396-bd9d-61e9f0e00bd3 (api:52) 2019-01-15 13:50:35,988+0100 ERROR (jsonrpc/2) [storage.TaskManager.Task] (Task='7cbd7c09-e934-4396-bd9d-61e9f0e00bd3') Unexpected error (task:875) Traceback (most recent call last): File "/usr/lib/python2.7/site-packages/vdsm/storage/task.py", line 882, in _run return fn(*args, **kargs) File "<string>", line 2, in createStoragePool File "/usr/lib/python2.7/site-packages/vdsm/common/api.py", line 50, in method ret = func(*args, **kwargs) File "/usr/lib/python2.7/site-packages/vdsm/storage/hsm.py", line 1003, in createStoragePool leaseParams) File "/usr/lib/python2.7/site-packages/vdsm/storage/sp.py", line 636, in create self._acquireTemporaryClusterLock(msdUUID, leaseParams) File "/usr/lib/python2.7/site-packages/vdsm/storage/sp.py", line 567, in _acquireTemporaryClusterLock msd.acquireHostId(self.id) File "/usr/lib/python2.7/site-packages/vdsm/storage/sd.py", line 860, in acquireHostId self._manifest.acquireHostId(hostId, async) File "/usr/lib/python2.7/site-packages/vdsm/storage/sd.py", line 467, in acquireHostId self._domainLock.acquireHostId(hostId, async) File "/usr/lib/python2.7/site-packages/vdsm/storage/clusterlock.py", line 325, in acquireHostId raise se.AcquireHostIdFailure(self._sdUUID, e) AcquireHostIdFailure: Cannot acquire host id: (u'14ec2fc7-8c2b-487c-8f4f-428644650928', SanlockException(-203, 'Sanlock lockspace add failure', 'Watchdog device error')) 2019-01-15 13:50:35,988+0100 INFO (jsonrpc/2) [storage.TaskManager.Task] (Task='7cbd7c09-e934-4396-bd9d-61e9f0e00bd3') aborting: Task is aborted: "Cannot acquire host id: (u'14ec2fc7-8c2b-487c-8f4f-428644650928', SanlockException(-203, 'Sanlock lockspace add failure', 'Watchdog device error'))" - code 661 (task:1181) 2019-01-15 13:50:35,989+0100 ERROR (jsonrpc/2) [storage.Dispatcher] FINISH createStoragePool error=Cannot acquire host id: (u'14ec2fc7-8c2b-487c-8f4f-428644650928', SanlockException(-203, 'Sanlock lockspace add failure', 'Watchdog device error')) (dispatcher:81) 2019-01-15 13:50:35,990+0100 INFO (jsonrpc/2) [jsonrpc.JsonRpcServer] RPC call StoragePool.create failed (error 661) in 1.01 seconds (__init__:312) 2019-01-15 13:50:38,109+0100 INFO (vmrecovery) [vdsm.api] START getConnectedStoragePoolsList(options=None) from=internal, task_id=e69af7e1-c456-4822-ae06-7b309263257d (api:48) 2019-01-15 13:50:38,109+0100 INFO (vmrecovery) [vdsm.api] FINISH getConnectedStoragePoolsList return={'poollist': []} from=internal, task_id=e69af7e1-c456-4822-ae06-7b309263257d (api:54) 2019-01-15 13:50:38,110+0100 INFO (vmrecovery) [vds] recovery: waiting for storage pool to go up (clientIF:705) 2019-01-15 13:50:39,802+0100 INFO (jsonrpc/4) [jsonrpc.JsonRpcServer] RPC call GlusterHost.list succeeded in 0.33 seconds (__init__:312) 2019-01-15 13:50:39,996+0100 INFO (jsonrpc/7) [jsonrpc.JsonRpcServer] RPC call GlusterVolume.list succeeded in 0.18 seconds (__init__:312) 2019-01-15 13:50:43,115+0100 INFO (vmrecovery) [vdsm.api] START getConnectedStoragePoolsList(options=None) from=internal, task_id=729c054e-021a-4d2e-b36c-edfb186a1210 (api:48) 2019-01-15 13:50:43,116+0100 INFO (vmrecovery) [vdsm.api] FINISH getConnectedStoragePoolsList return={'poollist': []} from=internal, task_id=729c054e-021a-4d2e-b36c-edfb186a1210 (api:54) 2019-01-15 13:50:43,116+0100 INFO (vmrecovery) [vds] recovery: waiting for storage pool to go up (clientIF:705) 2019-01-15 13:50:43,611+0100 INFO (jsonrpc/5) [api.host] START getStats() from=::ffff:192.168.124.50,42356 (api:48) 2019-01-15 13:50:43,628+0100 INFO (jsonrpc/5) [vdsm.api] START repoStats(domains=()) from=::ffff:192.168.124.50,42356, task_id=868d05f4-5535-4cb3-b283-92d3c1595bb3 (api:48) 2019-01-15 13:50:43,628+0100 INFO (jsonrpc/5) [vdsm.api] FINISH repoStats return={} from=::ffff:192.168.124.50,42356, task_id=868d05f4-5535-4cb3-b283-92d3c1595bb3 (api:54) 2019-01-15 13:50:43,628+0100 INFO (jsonrpc/5) [vdsm.api] START multipath_health() from=::ffff:192.168.124.50,42356, task_id=7f4fdad0-2b2e-4dcf-88b8-b7cf2689d4d9 (api:48)
Gianluca
regarding sanlock daemon and watchdog multiple daemon, the latter seems to have no log file but I can only see status of service:
[root@ov4301 ~]# systemctl status wdmd ● wdmd.service - Watchdog Multiplexing Daemon Loaded: loaded (/usr/lib/systemd/system/wdmd.service; disabled; vendor preset: disabled) Active: active (running) since Tue 2019-01-15 10:38:13 CET; 4h 32min ago Main PID: 3763 (wdmd) Tasks: 1 CGroup: /system.slice/wdmd.service └─3763 /usr/sbin/wdmd
Jan 15 10:38:12 ov4301.localdomain.local systemd[1]: Starting Watchdog Multiplexing Daemon... Jan 15 10:38:13 ov4301.localdomain.local systemd-wdmd[3707]: Loading the softdog kernel module: ...] Jan 15 10:38:13 ov4301.localdomain.local wdmd[3759]: group 'sanlock' not found, using socket gid: 0
Looks like the host is not configured properly. Running vdsm-tool configure --force should fix this, and must be part of the install process.
Jan 15 10:38:13 ov4301.localdomain.local wdmd[3763]: wdmd started S0 H1 G0 Jan 15 10:38:13 ov4301.localdomain.local wdmd[3763]: /dev/watchdog0 armed with fire_timeout 60 Jan 15 10:38:13 ov4301.localdomain.local systemd[1]: Started Watchdog Multiplexing Daemon. Hint: Some lines were ellipsized, use -l to show in full. [root@ov4301 ~]#
while for sanlock:
[root@ov4301 ~]# cat /var/log/sanlock.log 2019-01-15 10:38:13 6 [3721]: sanlock daemon started 3.6.0 host 8fd6d41c-99e8-4c3a-8212-68dd1856927c.ov4301.loc 2019-01-15 10:38:18 11 [3721]: helper pid 3725 dead wait 0 2019-01-15 12:54:56 8211 [26802]: sanlock daemon started 3.6.0 host 4dc13694-53a6-41dc-93b2-8ce371e903f5.ov4301.loc 2019-01-15 12:54:57 8211 [26802]: set scheduler RR|RESET_ON_FORK priority 99 failed: Operation not permitted 2019-01-15 13:50:34 11549 [26810]: s1 lockspace 14ec2fc7-8c2b-487c-8f4f-428644650928:250:/rhev/data-center/mnt/glusterSD/192.168.123.211: _engine/14ec2fc7-8c2b-487c-8f4f-428644650928/dom_md/ids:0 2019-01-15 13:50:34 11549 [21290]: s1 wdmd_connect failed -13 2019-01-15 13:50:34 11549 [21290]: s1 connect_watchdog failed -1 2019-01-15 13:50:35 11550 [26810]: s1 add_lockspace fail result -203 [root@ov4301 ~]#
The question is: why these daemons start (with error) upon the node boot after install? Or is it a bug that the groups are not in place yet and I can restart in the inital step (as for ovirt-imageio-daemon) other daemons too?
_______________________________________________ Users mailing list -- users@ovirt.org To unsubscribe send an email to users-leave@ovirt.org Privacy Statement: https://www.ovirt.org/site/privacy-policy/ oVirt Code of Conduct: https://www.ovirt.org/community/about/community-guidelines/ List Archives: https://lists.ovirt.org/archives/list/users@ovirt.org/message/ZHLDIPDFQ5PBCQ...

Il giorno mar 15 gen 2019 alle ore 21:27 Nir Soffer <nsoffer@redhat.com> ha scritto:
On Tue, Jan 15, 2019 at 4:32 PM Gianluca Cecchi <gianluca.cecchi@gmail.com> wrote:
The mail was partly scrambled in its contents so I put some clarification here:
On Tue, Jan 15, 2019 at 2:38 PM Gianluca Cecchi < gianluca.cecchi@gmail.com> wrote:
So after starting from scratch and using also the info as detailed on thread: https://www.mail-archive.com/users@ovirt.org/msg52879.html
the steps now have been:
- install from ovirt-node-ng-installer-4.3.0-2019011010.el7.iso and reboot
- connect to cockpit and open terminal
This step is related to ssh daemon cd /etc/ssh chmod 600 *key systemctl restart sshd
The step below is related to ovirt-imageio-daemon
mkdir /var/run/vdsm chmod 755 /var/run/vdsm chown vdsm.kvm /var/run/vdsm mkdir /var/run/vdsm/dhclientmon chmod 755 /var/run/vdsm/dhclientmon/ chown vdsm.kvm /var/run/vdsm/dhclientmon/ mkdir /var/run/vdsm/trackedInterfaces chmod 755 /var/run/vdsm/trackedInterfaces/ chown vdsm.kvm /var/run/vdsm/trackedInterfaces/ mkdir /var/run/vdsm/v2v chmod 700 /var/run/vdsm/v2v chown vdsm.kvm /var/run/vdsm/v2v/ mkdir /var/run/vdsm/vhostuser chmod 755 /var/run/vdsm/vhostuser/ chown vdsm.kvm /var/run/vdsm/vhostuser/ mkdir /var/run/vdsm/payload chmod 755 /var/run/vdsm/payload/ chown vdsm.kvm /var/run/vdsm/payload/
systemctl restart sshd
Actually:
systemctl restart ovirt-imageio-daemon
- put in the newer version of vdsm-api.pickle from vdsm-api-4.30.5-2.gitf824ec2.el7.noarch.rpm in /usr/lib/python2.7/site-packages/vdsm/rpc/vdsm-api.pickle
download of vdsm-api.pickle can be directly done here eventually:
https://drive.google.com/file/d/1AhakKhm_dzx-Gxt-Y1OojzRUwHs75kot/view?usp=s...
- run the wizard for the gluster+he setup (the right positioned option) inside the gdeploy text window click edit and add " [diskcount] 1
" under the section " [disktype] jbod "
In my case with single disk I choose JBOD options
- first 2 steps ok
- last step fails in finish part
[ INFO ] TASK [oVirt.hosted-engine-setup : Fetch Datacenter name] [ INFO ] ok: [localhost] [ INFO ] TASK [oVirt.hosted-engine-setup : Add NFS storage domain] [ INFO ] skipping: [localhost] [ INFO ] TASK [oVirt.hosted-engine-setup : Add glusterfs storage domain] [ INFO ] changed: [localhost] [ INFO ] TASK [oVirt.hosted-engine-setup : Add iSCSI storage domain] [ INFO ] skipping: [localhost] [ INFO ] TASK [oVirt.hosted-engine-setup : Add Fibre Channel storage domain] [ INFO ] skipping: [localhost] [ INFO ] TASK [oVirt.hosted-engine-setup : Get storage domain details] [ INFO ] ok: [localhost] [ INFO ] TASK [oVirt.hosted-engine-setup : debug] [ INFO ] ok: [localhost] [ INFO ] TASK [oVirt.hosted-engine-setup : Find the appliance OVF] [ INFO ] ok: [localhost] [ INFO ] TASK [oVirt.hosted-engine-setup : debug] [ INFO ] ok: [localhost] [ INFO ] TASK [oVirt.hosted-engine-setup : Parse OVF] [ INFO ] ok: [localhost] [ INFO ] TASK [oVirt.hosted-engine-setup : Get required size] [ INFO ] ok: [localhost] [ INFO ] TASK [oVirt.hosted-engine-setup : debug] [ INFO ] ok: [localhost] [ INFO ] TASK [oVirt.hosted-engine-setup : Remove unsuitable storage domain] [ INFO ] skipping: [localhost] [ INFO ] TASK [oVirt.hosted-engine-setup : debug] [ INFO ] ok: [localhost] [ INFO ] TASK [oVirt.hosted-engine-setup : Check storage domain free space] [ INFO ] skipping: [localhost] [ INFO ] TASK [oVirt.hosted-engine-setup : Activate storage domain] [ ERROR ] Error: Fault reason is "Operation Failed". Fault detail is "[]". HTTP response code is 400. [ ERROR ] fatal: [localhost]: FAILED! => {"changed": false, "msg": "Fault reason is \"Operation Failed\". Fault detail is \"[]\". HTTP response code is 400."}
On engine.log I see
2019-01-15 13:50:35,317+01 INFO [org.ovirt.engine.core.vdsbroker.vdsbroker.CreateStoragePoolVDSCommand] (default task-2) [51725212] START, CreateStoragePoolVDSCommand(HostName = ov4301.localdomain.lo cal, CreateStoragePoolVDSCommandParameters:{hostId='e8f105f1-37ed-4ac4-bfc3-b1e55ed3027f', storagePoolId='96a31a7e-18bb-11e9-9a34-00163e6196f3', storagePoolName='Default', masterDomainId='14ec2fc7-8c2 b-487c-8f4f-428644650928', domainsIdList='[14ec2fc7-8c2b-487c-8f4f-428644650928]', masterVersion='1'}), log id: 4baccd53 2019-01-15 13:50:36,345+01 ERROR [org.ovirt.engine.core.vdsbroker.vdsbroker.CreateStoragePoolVDSCommand] (default task-2) [51725212] Failed in 'CreateStoragePoolVDS' method 2019-01-15 13:50:36,354+01 ERROR [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (default task-2) [51725212] EVENT_ID: VDS_BROKER_COMMAND_FAILURE(10,802), VDSM ov4301.localdomain.local command CreateStoragePoolVDS failed: Cannot acquire host id: (u'14ec2fc7-8c2b-487c-8f4f-428644650928', SanlockException(-203, 'Sanlock lockspace add failure', 'Watchdog device error')) 2019-01-15 13:50:36,354+01 INFO [org.ovirt.engine.core.vdsbroker.vdsbroker.CreateStoragePoolVDSCommand] (default task-2) [51725212] Command 'org.ovirt.engine.core.vdsbroker.vdsbroker.CreateStoragePoolVDSCommand' return value 'StatusOnlyReturn [status=Status [code=661, message=Cannot acquire host id: (u'14ec2fc7-8c2b-487c-8f4f-428644650928', SanlockException(-203, 'Sanlock lockspace add failure', 'Watchdog device error'))]]' 2019-01-15 13:50:36,354+01 INFO [org.ovirt.engine.core.vdsbroker.vdsbroker.CreateStoragePoolVDSCommand] (default task-2) [51725212] HostName = ov4301.localdomain.local 2019-01-15 13:50:36,355+01 ERROR [org.ovirt.engine.core.vdsbroker.vdsbroker.CreateStoragePoolVDSCommand] (default task-2) [51725212] Command 'CreateStoragePoolVDSCommand(HostName = ov4301.localdomain.local, CreateStoragePoolVDSCommandParameters:{hostId='e8f105f1-37ed-4ac4-bfc3-b1e55ed3027f', storagePoolId='96a31a7e-18bb-11e9-9a34-00163e6196f3', storagePoolName='Default', masterDomainId='14ec2fc7-8c2b-487c-8f4f-428644650928', domainsIdList='[14ec2fc7-8c2b-487c-8f4f-428644650928]', masterVersion='1'})' execution failed: VDSGenericException: VDSErrorException: Failed to CreateStoragePoolVDS, error = Cannot acquire host id: (u'14ec2fc7-8c2b-487c-8f4f-428644650928', SanlockException(-203, 'Sanlock lockspace add failure', 'Watchdog device error')), code = 661 2019-01-15 13:50:36,355+01 INFO [org.ovirt.engine.core.vdsbroker.vdsbroker.CreateStoragePoolVDSCommand] (default task-2) [51725212] FINISH, CreateStoragePoolVDSCommand, return: , log id: 4baccd53 2019-01-15 13:50:36,355+01 ERROR [org.ovirt.engine.core.bll.storage.pool.AddStoragePoolWithStoragesCommand] (default task-2) [51725212] Command 'org.ovirt.engine.core.bll.storage.pool.AddStoragePoolWithStoragesCommand' failed: EngineException: org.ovirt.engine.core.vdsbroker.vdsbroker.VDSErrorException: VDSGenericException: VDSErrorException: Failed to CreateStoragePoolVDS, error = Cannot acquire host id: (u'14ec2fc7-8c2b-487c-8f4f-428644650928', SanlockException(-203, 'Sanlock lockspace add failure', 'Watchdog device error')), code = 661 (Failed with error AcquireHostIdFailure and code 661) 2019-01-15 13:50:36,379+01 INFO [org.ovirt.engine.core.bll.CommandCompensator] (default task-2) [51725212] Command [id=c55d9962-368e-4e0c-8fee-bd06e7570062]: Compensating DELETED_OR_UPDATED_ENTITY of org.ovirt.engine.core.common.businessentities.StoragePool; snapshot: id=96a31a7e-18bb-11e9-9a34-00163e6196f3.
On host:
[root@ov4301 log]# cat /etc/hosts 192.168.124.50 ov43eng.localdomain.local # temporary entry added by hosted-engine-setup for the bootstrap VM 127.0.0.1 localhost localhost.localdomain localhost4 localhost4.localdomain4 ::1 localhost localhost.localdomain localhost6 localhost6.localdomain6 192.168.122.210 ov43eng.localdomain.local ov43eng 192.168.122.211 ov4301.localdomain.local ov4301 [root@ov4301 log]#
[root@ov4301 log]# df -h | grep gluster /dev/mapper/gluster_vg_sdb-gluster_lv_engine 64G 36M 64G 1% /gluster_bricks/engine /dev/mapper/gluster_vg_sdb-gluster_lv_data 30G 34M 30G 1% /gluster_bricks/data /dev/mapper/gluster_vg_sdb-gluster_lv_vmstore 20G 34M 20G 1% /gluster_bricks/vmstore 192.168.123.211:/engine 64G 691M 64G 2% /rhev/data-center/mnt/glusterSD/192.168.123.211:_engine [root@ov4301 log]#
and in its messages:
Jan 15 13:35:49 ov4301 dnsmasq-dhcp[22934]: DHCPREQUEST(virbr0) 192.168.124.50 00:16:3e:61:96:f3 Jan 15 13:35:49 ov4301 dnsmasq-dhcp[22934]: DHCPACK(virbr0) 192.168.124.50 00:16:3e:61:96:f3 ov43eng Jan 15 13:35:49 ov4301 dnsmasq-dhcp[22934]: not giving name ov43eng to the DHCP lease of 192.168.124.50 because the name exists in /etc/hosts with address 192.168.122.210 Jan 15 13:40:01 ov4301 systemd: Started Session 38 of user root. Jan 15 13:47:12 ov4301 vdsm[29591]: WARN MOM not available. Jan 15 13:47:12 ov4301 vdsm[29591]: WARN MOM not available, KSM stats will be missing. Jan 15 13:49:05 ov4301 python: ansible-setup Invoked with filter=* gather_subset=['all'] fact_path=/etc/ansible/facts.d gather_timeout=10 Jan 15 13:49:19 ov4301 python: ansible-stat Invoked with checksum_algorithm=sha1 get_checksum=True follow=False path=/var/tmp/localvmOIXI_W get_md5=None get_mime=True get_attributes=True Jan 15 13:49:24 ov4301 python: ansible-ovirt_auth Invoked with username=None kerberos=False timeout=0 url=None insecure=True hostname=None compress=True state=present headers=None token=None ovirt_auth=None ca_file=None password=NOT_LOGGING_PARAMETER Jan 15 13:49:29 ov4301 python: ansible-ovirt_host_facts Invoked with all_content=False pattern=name=ov4301.localdomain.local fetch_nested=False nested_attributes=[] auth={'timeout': 0, 'url': ' https://ov43eng.localdomain.local/ovirt-engine/api', 'insecure': True, 'kerberos': False, 'compress': True, 'headers': None, 'token': 'Q8qt0Z9DmHJRdg3wk7YxNOAs0JPpBMxxstVx3I8skbulwRWp1SsVXuZYq4DUuPWeEnUZ2bD8TAuwCzJ3qlFYlw', 'ca_file': None} Jan 15 13:49:35 ov4301 python: ansible-ovirt_cluster_facts Invoked with pattern= fetch_nested=False nested_attributes=[] auth={'timeout': 0, 'url': 'https://ov43eng.localdomain.local/ovirt-engine/api', 'insecure': True, 'kerberos': False, 'compress': True, 'headers': None, 'token': 'Q8qt0Z9DmHJRdg3wk7YxNOAs0JPpBMxxstVx3I8skbulwRWp1SsVXuZYq4DUuPWeEnUZ2bD8TAuwCzJ3qlFYlw', 'ca_file': None} Jan 15 13:49:43 ov4301 python: ansible-ovirt_datacenter_facts Invoked with pattern= fetch_nested=False nested_attributes=[] auth={'timeout': 0, 'url': 'https://ov43eng.localdomain.local/ovirt-engine/api', 'insecure': True, 'kerberos': False, 'compress': True, 'headers': None, 'token': 'Q8qt0Z9DmHJRdg3wk7YxNOAs0JPpBMxxstVx3I8skbulwRWp1SsVXuZYq4bD8TAuwCzJ3qlFYlw', 'ca_file': None} host=ov4301.localdomain.local nested_attributes=[] wait=True domain_function=data name=hosted_storage critical_space_action_blocker=None posixfs=None poll_interval=3 glusterfs=None nfs=None timeout=180 backup=None discard_after_delete=None Jan 15 13:50:34 ov4301 systemd: Started Session c24 of user root. Jan 15 13:50:34 ov4301 sanlock[26802]: 2019-01-15 13:50:34 11549 [21290]: s1 wdmd_connect failed -13 Jan 15 13:50:34 ov4301 sanlock[26802]: 2019-01-15 13:50:34 11549 [21290]: s1 connect_watchdog failed -1 Jan 15 13:50:35 ov4301 sanlock[26802]: 2019-01-15 13:50:35 11550 [26810]: s1 add_lockspace fail result -203 Jan 15 13:56:48 ov4301 dnsmasq-dhcp[22934]: DHCPREQUEST(virbr0) 192.168.124.50 00:16:3e:61:96:f3 Jan 15 13:56:48 ov4301 dnsmasq-dhcp[22934]: DHCPACK(virbr0) 192.168.124.50 00:16:3e:61:96:f3 ov43eng Jan 15 13:56:48 ov4301 dnsmasq-dhcp[22934]: not giving name ov43eng to the DHCP lease of 192.168.124.50 because the name exists in /etc/hosts with address 192.168.122.210 Jan 15 13:59:34 ov4301 chronyd[26447]: Source 212.45.144.206 replaced with 80.211.52.109 Jan 15 13:59:44 ov4301 vdsm[29591]: WARN MOM not available. Jan 15 13:59:44 ov4301 vdsm[29591]: WARN MOM not available, KSM stats will be missing. DUuPWeEnUZ2bD8TAuwCzJ3qlFYlw', 'ca_file': None} Jan 15 13:49:54 ov4301 python: ansible-ovirt_storage_domain Invoked with comment=None warning_low_space=None fetch_nested=False localfs=None data_center=Default id=None iscsi=None state=unattached wipe_after_delete=None destroy=None fcp=None description=None format=None auth={'username':********@internal', 'url': ' https://ov43eng.localdomain.local/ovirt-engine/api', 'insecure': True, 'password': 'passw0rd'} host=ov4301.localdomain.local nested_attributes=[] wait=True domain_function=data name=hosted_storage critical_space_action_blocker=None posixfs=None poll_interval=3 glusterfs={'path': '/engine', 'mount_options': None, 'address': '192.168.123.211'} nfs=None timeout=180 backup=None discard_after_delete=None Jan 15 13:50:01 ov4301 systemd: Started Session 39 of user root. Jan 15 13:50:01 ov4301 systemd: Created slice vdsm-glusterfs.slice. Jan 15 13:50:01 ov4301 systemd: Started /usr/bin/mount -t glusterfs 192.168.123.211:/engine /rhev/data-center/mnt/glusterSD/192.168.123.211: _engine. Jan 15 13:50:01 ov4301 kernel: fuse init (API version 7.22) Jan 15 13:50:01 ov4301 systemd: Mounting FUSE Control File System... Jan 15 13:50:02 ov4301 systemd: Mounted FUSE Control File System. Jan 15 13:50:02 ov4301 systemd: Started Session c20 of user root. Jan 15 13:50:02 ov4301 systemd: Started Session c21 of user root. Jan 15 13:50:03 ov4301 systemd: Started Session c22 of user root. Jan 15 13:50:03 ov4301 systemd: Started Session c23 of user root. Jan 15 13:50:12 ov4301 python: ansible-ovirt_storage_domain_facts Invoked with pattern=name=hosted_storage fetch_nested=False nested_attributes=[] auth={'timeout': 0, 'url': ' https://ov43eng.localdomain.local/ovirt-engine/api', 'insecure': True, 'kerberos': False, 'compress': True, 'headers': None, 'token': 'Q8qt0Z9DmHJRdg3wk7YxNOAs0JPpBMxxstVx3I8skbulwRWp1SsVXuZYq4DUuPWeEnUZ2bD8TAuwCzJ3qlFYlw', 'ca_file': None} Jan 15 13:50:18 ov4301 python: ansible-find Invoked with excludes=None paths=['/var/tmp/localvmOIXI_W/master'] file_type=file age=None contains=None recurse=True age_stamp=mtime patterns=['^.*.(?<!meta).ovf$'] depth=None get_checksum=False use_regex=True follow=False hidden=False size=None Jan 15 13:50:21 ov4301 python: ansible-xml Invoked with xpath=/ovf:Envelope/Section/Disk count=False set_children=None xmlstring=None strip_cdata_tags=False attribute=size pretty_print=False add_children=None value=None content=attribute state=present namespaces={'vssd': ' http://schemas.dmtf.org/wbem/wscim/1/cim-schema/2/CIM_VirtualSystemSettingDa...', 'rasd': ' http://schemas.dmtf.org/wbem/wscim/1/cim-schema/2/CIM_ResourceAllocationSett...', 'xsi': 'http://www.w3.org/2001/XMLSchema-instance', 'ovf': ' http://schemas.dmtf.org/ovf/envelope/1/'} input_type=yaml print_match=False path=/var/tmp/localvmOIXI_W/master/vms/c99e3e6b-db14-446f-aaee-48a056d3dd93/c99e3e6b-db14-446f-aaee-48a056d3dd93.ovf backup=False Jan 15 13:50:30 ov4301 python: ansible-ovirt_storage_domain Invoked with comment=None warning_low_space=None fetch_nested=False localfs=None data_center=Default id=None iscsi=None state=present wipe_after_delete=None destroy=None fcp=None description=None format=None auth={'timeout': 0, 'url': 'https://ov43eng.localdomain.local/ovirt-engine/api', 'insecure': True, 'kerberos': False, 'compress': True, 'headers': None, 'token': 'Q8qt0Z9DmHJRdg3wk7YxNOAs0JPpBMxxstVx3I8skbulwRWp1SsVXuZYq4DUuPWeEnUZ2bD8TAuwCzJ3qlFYlw', 'ca_file': None} host=ov4301.localdomain.local nested_attributes=[] wait=True domain_function=data name=hosted_storage critical_space_action_blocker=None posixfs=None poll_interval=3 glusterfs=None nfs=None timeout=180 backup=None discard_after_delete=None Jan 15 13:50:34 ov4301 systemd: Started Session c24 of user root. Jan 15 13:50:34 ov4301 sanlock[26802]: 2019-01-15 13:50:34 11549 [21290]: s1 wdmd_connect failed -13 Jan 15 13:50:34 ov4301 sanlock[26802]: 2019-01-15 13:50:34 11549 [21290]: s1 connect_watchdog failed -1 Jan 15 13:50:35 ov4301 sanlock[26802]: 2019-01-15 13:50:35 11550 [26810]: s1 add_lockspace fail result -203 Jan 15 13:56:48 ov4301 dnsmasq-dhcp[22934]: DHCPREQUEST(virbr0) 192.168.124.50 00:16:3e:61:96:f3 Jan 15 13:56:48 ov4301 dnsmasq-dhcp[22934]: DHCPACK(virbr0) 192.168.124.50 00:16:3e:61:96:f3 ov43eng Jan 15 13:56:48 ov4301 dnsmasq-dhcp[22934]: not giving name ov43eng to the DHCP lease of 192.168.124.50 because the name exists in /etc/hosts with address 192.168.122.210 Jan 15 13:59:34 ov4301 chronyd[26447]: Source 212.45.144.206 replaced with 80.211.52.109 Jan 15 13:59:44 ov4301 vdsm[29591]: WARN MOM not available. Jan 15 13:59:44 ov4301 vdsm[29591]: WARN MOM not available, KSM stats will be missing.
In vdsm.log :
2019-01-15 13:50:34,980+0100 INFO (jsonrpc/2) [storage.StorageDomain] sdUUID=14ec2fc7-8c2b-487c-8f4f-428644650928 (fileSD:533) 2019-01-15 13:50:34,984+0100 INFO (jsonrpc/2) [storage.StoragePool] Creating pool directory '/rhev/data-center/96a31a7e-18bb-11e9-9a34-00163e6196f3' (sp:634) 2019-01-15 13:50:34,984+0100 INFO (jsonrpc/2) [storage.fileUtils] Creating directory: /rhev/data-center/96a31a7e-18bb-11e9-9a34-00163e6196f3 mode: None (fileUtils:199) 2019-01-15 13:50:34,985+0100 INFO (jsonrpc/2) [storage.SANLock] Acquiring host id for domain 14ec2fc7-8c2b-487c-8f4f-428644650928 (id=250, async=False) (clusterlock:294) 2019-01-15 13:50:35,987+0100 INFO (jsonrpc/2) [vdsm.api] FINISH createStoragePool error=Cannot acquire host id: (u'14ec2fc7-8c2b-487c-8f4f-428644650928', SanlockException(-203, 'Sanlock lockspace add failure', 'Watchdog device error')) from=::ffff:192.168.124.50,42356, flow_id=51725212, task_id=7cbd7c09-e934-4396-bd9d-61e9f0e00bd3 (api:52) 2019-01-15 13:50:35,988+0100 ERROR (jsonrpc/2) [storage.TaskManager.Task] (Task='7cbd7c09-e934-4396-bd9d-61e9f0e00bd3') Unexpected error (task:875) Traceback (most recent call last): File "/usr/lib/python2.7/site-packages/vdsm/storage/task.py", line 882, in _run return fn(*args, **kargs) File "<string>", line 2, in createStoragePool File "/usr/lib/python2.7/site-packages/vdsm/common/api.py", line 50, in method ret = func(*args, **kwargs) File "/usr/lib/python2.7/site-packages/vdsm/storage/hsm.py", line 1003, in createStoragePool leaseParams) File "/usr/lib/python2.7/site-packages/vdsm/storage/sp.py", line 636, in create self._acquireTemporaryClusterLock(msdUUID, leaseParams) File "/usr/lib/python2.7/site-packages/vdsm/storage/sp.py", line 567, in _acquireTemporaryClusterLock msd.acquireHostId(self.id) File "/usr/lib/python2.7/site-packages/vdsm/storage/sd.py", line 860, in acquireHostId self._manifest.acquireHostId(hostId, async) File "/usr/lib/python2.7/site-packages/vdsm/storage/sd.py", line 467, in acquireHostId self._domainLock.acquireHostId(hostId, async) File "/usr/lib/python2.7/site-packages/vdsm/storage/clusterlock.py", line 325, in acquireHostId raise se.AcquireHostIdFailure(self._sdUUID, e) AcquireHostIdFailure: Cannot acquire host id: (u'14ec2fc7-8c2b-487c-8f4f-428644650928', SanlockException(-203, 'Sanlock lockspace add failure', 'Watchdog device error')) 2019-01-15 13:50:35,988+0100 INFO (jsonrpc/2) [storage.TaskManager.Task] (Task='7cbd7c09-e934-4396-bd9d-61e9f0e00bd3') aborting: Task is aborted: "Cannot acquire host id: (u'14ec2fc7-8c2b-487c-8f4f-428644650928', SanlockException(-203, 'Sanlock lockspace add failure', 'Watchdog device error'))" - code 661 (task:1181) 2019-01-15 13:50:35,989+0100 ERROR (jsonrpc/2) [storage.Dispatcher] FINISH createStoragePool error=Cannot acquire host id: (u'14ec2fc7-8c2b-487c-8f4f-428644650928', SanlockException(-203, 'Sanlock lockspace add failure', 'Watchdog device error')) (dispatcher:81) 2019-01-15 13:50:35,990+0100 INFO (jsonrpc/2) [jsonrpc.JsonRpcServer] RPC call StoragePool.create failed (error 661) in 1.01 seconds (__init__:312) 2019-01-15 13:50:38,109+0100 INFO (vmrecovery) [vdsm.api] START getConnectedStoragePoolsList(options=None) from=internal, task_id=e69af7e1-c456-4822-ae06-7b309263257d (api:48) 2019-01-15 13:50:38,109+0100 INFO (vmrecovery) [vdsm.api] FINISH getConnectedStoragePoolsList return={'poollist': []} from=internal, task_id=e69af7e1-c456-4822-ae06-7b309263257d (api:54) 2019-01-15 13:50:38,110+0100 INFO (vmrecovery) [vds] recovery: waiting for storage pool to go up (clientIF:705) 2019-01-15 13:50:39,802+0100 INFO (jsonrpc/4) [jsonrpc.JsonRpcServer] RPC call GlusterHost.list succeeded in 0.33 seconds (__init__:312) 2019-01-15 13:50:39,996+0100 INFO (jsonrpc/7) [jsonrpc.JsonRpcServer] RPC call GlusterVolume.list succeeded in 0.18 seconds (__init__:312) 2019-01-15 13:50:43,115+0100 INFO (vmrecovery) [vdsm.api] START getConnectedStoragePoolsList(options=None) from=internal, task_id=729c054e-021a-4d2e-b36c-edfb186a1210 (api:48) 2019-01-15 13:50:43,116+0100 INFO (vmrecovery) [vdsm.api] FINISH getConnectedStoragePoolsList return={'poollist': []} from=internal, task_id=729c054e-021a-4d2e-b36c-edfb186a1210 (api:54) 2019-01-15 13:50:43,116+0100 INFO (vmrecovery) [vds] recovery: waiting for storage pool to go up (clientIF:705) 2019-01-15 13:50:43,611+0100 INFO (jsonrpc/5) [api.host] START getStats() from=::ffff:192.168.124.50,42356 (api:48) 2019-01-15 13:50:43,628+0100 INFO (jsonrpc/5) [vdsm.api] START repoStats(domains=()) from=::ffff:192.168.124.50,42356, task_id=868d05f4-5535-4cb3-b283-92d3c1595bb3 (api:48) 2019-01-15 13:50:43,628+0100 INFO (jsonrpc/5) [vdsm.api] FINISH repoStats return={} from=::ffff:192.168.124.50,42356, task_id=868d05f4-5535-4cb3-b283-92d3c1595bb3 (api:54) 2019-01-15 13:50:43,628+0100 INFO (jsonrpc/5) [vdsm.api] START multipath_health() from=::ffff:192.168.124.50,42356, task_id=7f4fdad0-2b2e-4dcf-88b8-b7cf2689d4d9 (api:48)
Gianluca
regarding sanlock daemon and watchdog multiple daemon, the latter seems to have no log file but I can only see status of service:
[root@ov4301 ~]# systemctl status wdmd ● wdmd.service - Watchdog Multiplexing Daemon Loaded: loaded (/usr/lib/systemd/system/wdmd.service; disabled; vendor preset: disabled) Active: active (running) since Tue 2019-01-15 10:38:13 CET; 4h 32min ago Main PID: 3763 (wdmd) Tasks: 1 CGroup: /system.slice/wdmd.service └─3763 /usr/sbin/wdmd
Jan 15 10:38:12 ov4301.localdomain.local systemd[1]: Starting Watchdog Multiplexing Daemon... Jan 15 10:38:13 ov4301.localdomain.local systemd-wdmd[3707]: Loading the softdog kernel module: ...] Jan 15 10:38:13 ov4301.localdomain.local wdmd[3759]: group 'sanlock' not found, using socket gid: 0
Looks like the host is not configured properly.
Running vdsm-tool configure --force should fix this, and must be part of the install process.
Has this been fixed? If not, is this tracked in a BZ?
Jan 15 10:38:13 ov4301.localdomain.local wdmd[3763]: wdmd started S0 H1 G0 Jan 15 10:38:13 ov4301.localdomain.local wdmd[3763]: /dev/watchdog0 armed with fire_timeout 60 Jan 15 10:38:13 ov4301.localdomain.local systemd[1]: Started Watchdog Multiplexing Daemon. Hint: Some lines were ellipsized, use -l to show in full. [root@ov4301 ~]#
while for sanlock:
[root@ov4301 ~]# cat /var/log/sanlock.log 2019-01-15 10:38:13 6 [3721]: sanlock daemon started 3.6.0 host 8fd6d41c-99e8-4c3a-8212-68dd1856927c.ov4301.loc 2019-01-15 10:38:18 11 [3721]: helper pid 3725 dead wait 0 2019-01-15 12:54:56 8211 [26802]: sanlock daemon started 3.6.0 host 4dc13694-53a6-41dc-93b2-8ce371e903f5.ov4301.loc 2019-01-15 12:54:57 8211 [26802]: set scheduler RR|RESET_ON_FORK priority 99 failed: Operation not permitted 2019-01-15 13:50:34 11549 [26810]: s1 lockspace 14ec2fc7-8c2b-487c-8f4f-428644650928:250:/rhev/data-center/mnt/glusterSD/192.168.123.211: _engine/14ec2fc7-8c2b-487c-8f4f-428644650928/dom_md/ids:0 2019-01-15 13:50:34 11549 [21290]: s1 wdmd_connect failed -13 2019-01-15 13:50:34 11549 [21290]: s1 connect_watchdog failed -1 2019-01-15 13:50:35 11550 [26810]: s1 add_lockspace fail result -203 [root@ov4301 ~]#
The question is: why these daemons start (with error) upon the node boot after install? Or is it a bug that the groups are not in place yet and I can restart in the inital step (as for ovirt-imageio-daemon) other daemons too?
_______________________________________________ Users mailing list -- users@ovirt.org To unsubscribe send an email to users-leave@ovirt.org Privacy Statement: https://www.ovirt.org/site/privacy-policy/ oVirt Code of Conduct: https://www.ovirt.org/community/about/community-guidelines/ List Archives: https://lists.ovirt.org/archives/list/users@ovirt.org/message/ZHLDIPDFQ5PBCQ...
_______________________________________________ Users mailing list -- users@ovirt.org To unsubscribe send an email to users-leave@ovirt.org Privacy Statement: https://www.ovirt.org/site/privacy-policy/ oVirt Code of Conduct: https://www.ovirt.org/community/about/community-guidelines/ List Archives: https://lists.ovirt.org/archives/list/users@ovirt.org/message/RDYYVHJ6EZCYLQ...
-- SANDRO BONAZZOLA MANAGER, SOFTWARE ENGINEERING, EMEA R&D RHV Red Hat EMEA <https://www.redhat.com/> sbonazzo@redhat.com <https://red.ht/sig>
participants (3)
-
Gianluca Cecchi
-
Nir Soffer
-
Sandro Bonazzola