Multiple hosts stuck in Connecting state waiting for storage pool to go up.
by ivan.lezhnjov.iv@gmail.com
Hi!
We have a problem with multiple hosts stuck in Connecting state, which I hoped somebody here could help us wrap our heads around.
All hosts, except one, seem to have very similar symptoms but I'll focus on one host that represents the rest.
So, the host is stuck in Connecting state and this what we see in oVirt log files.
/var/log/ovirt-engine/engine.log:
2023-04-20 09:51:53,021+03 ERROR [org.ovirt.engine.core.vdsbroker.vdsbroker.GetCapabilitiesAsyncVDSCommand] (EE-ManagedScheduledExecutorService-engineScheduledThreadPool-Thread-37) [] Command 'GetCapabilitiesAsyncVDSCommand(HostName = ABC010-176-XYZ, VdsIdAndVdsVDSCommandParametersBase:{hostId='2c458562-3d4d-4408-afc9-9a9484984a91', vds='Host[ABC010-176-XYZ,2c458562-3d4d-4408-afc9-9a9484984a91]'})' execution failed: org.ovirt.vdsm.jsonrpc.client.ClientConnectionException: SSL session is invalid
2023-04-20 09:55:16,556+03 ERROR [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (EE-ManagedScheduledExecutorService-engineScheduledThreadPool-Thread-67) [] EVENT_ID: VDS_BROKER_COMMAND_FAILURE(10,802), VDSM ABC010-176-XYZ command Get Host Capabilities failed: Message timeout which can be caused by communication issues
/var/log/vdsm/vdsm.log:
2023-04-20 17:48:51,977+0300 INFO (vmrecovery) [vdsm.api] START getConnectedStoragePoolsList() from=internal, task_id=ebce7c8c-6ded-454e-9aee-86edf72764ef (api:31)
2023-04-20 17:48:51,977+0300 INFO (vmrecovery) [vdsm.api] FINISH getConnectedStoragePoolsList return={'poollist': []} from=internal, task_id=ebce7c8c-6ded-454e-9aee-86edf72764ef (api:37)
2023-04-20 17:48:51,978+0300 INFO (vmrecovery) [vds] recovery: waiting for storage pool to go up (clientIF:723)
Both engine.log and vdsm.log are flooded with these messages. They are repeated at regular intervals ad infinitum. This is one common symptom shared by multiple hosts in our deployment. They all have these message loops in engine.log and vdsm.log files. On all
Running vdsm-client Host getConnectedStoragePools also returns an empty list represented by [] on all hosts (but interestingly there is one that showed Storage Pool UUID and yet it was still stuck in Connecting state).
This particular host (ABC010-176-XYZ) is connected to 3 CEPH iSCSI Storage Domains and lsblk shows 3 block devices with matching UUIDs in their device components. So, the storage seems to be connected but the Storage Pool is not? How is that even possible?
Now, what's even more weird is that we tried rebooting the host (via Administrator Portal) and it didn't help. We even tried removing and re-adding the host in Administrator Portal but to no avail.
Additionally, the host refused to go into Maintenance mode so we had to enforce it by manually updating Engine DB.
We also tried reinstalling the host via Administrator Portal and ran into another weird problem, which I'm not sure if it's a related one or a problem that deserves a dedicated discussion thread but, basically, the underlying Ansible playbook exited with the following error message:
"stdout" : "fatal: [10.10.10.176]: UNREACHABLE! => {\"changed\": false, \"msg\": \"Data could not be sent to remote host \\\"10.10.10.176\\\". Make sure this host can be reached over ssh: \", \"unreachable\": true}",
Counterintuitively, just before running Reinstall via Administrator Portal we had been able to reboot the same host (which as you know oVirt does via Ansible as well). So, no changes on the host in between just different Ansible playbooks. To confirm that we actually had access to the host over ssh we successfully ran ssh -p $PORT root(a)10.10.10.176 -i /etc/pki/ovirt-engine/keys/engine_id_rsa and it worked.
That made us scratch our heads for a while but what seems to had fixed Ansible's ssh access problems was manual full stop of all VDSM-related systemd services on the host. It was just a wild guess but as soon as we stopped all VDSM services Ansible stopped complaining about not being able to reach the target host and successfully did its job.
I'm sure you'd like to see more logs but I'm not certain what exactly is relevant. There are a ton of logs as this deployment is comprised of nearly 80 hosts. So, I guess it's best if you just request to see specific logs, messages or configuration details and I'll cherry-pick what's relevant.
We don't really understand what's going on and would appreciate any help. We tried just about anything we could think of to resolve this issue and are running out of ideas what to do next.
If you have any questions just ask and I'll do my best to answer them.
8 months, 3 weeks
Direct LUN I/O errors with SCSI Pass-through enabled
by mgs@ordix.de
Hi,
in our environment (Version 4.4.10.7) we use fibre channel LUNs, which we attach directly to the VMs (as Direct LUN) with VirtIO-SCSI and SCSI pass-through enabled. The virtual machines run an application that requires 4096 as physical_block_size and 512 as logical_block_size. For this reason, we had to enable SCSI pass-through. Only with SCSI pass-through the correct physical_block_size is passed through to the VM.
Now we have the following problem on just about every VM:
Error messages of the following form occur in the VMs (in /var/log/messages):
kernel: blk_update_request: I/O error, dev sdd, sector 352194592 op 0x1:(WRITE) flags 0xc800 phys_seg 16 prio class 0
This error message coincides with a crash of the application. The error message seems to belong to SCSI.
We are currently trying to find an alternative to SCSI pass-through. We want to use VirtIO and somehow pass the physical_block_size. Since the XML files of the VMs are transient, we cannot make any changes there.
Does anyone have an idea what the error could be or how to pass the correct physical_block_size? Could VDSM hooks help with this?
Thank you and regards
Miguel
8 months, 3 weeks
Failed to login: PKIX path validation failed: java.security.cert.CertPathValidatorException: validity check failed
by p.olivera@telfy.com
Hi community,
We're encountering the following error when attempting to log in:
Warning alert: PKIX path validation failed: java.security.cert.CertPathValidatorException: validity check failed
The certificate is valid until 2027:
[root@engine9 certs]# openssl x509 -subject -noout -dates -in engine.cer
subject=C = US, O = telfy.com, CN = engine9.telfy.com
notBefore=Sep 25 10:18:03 2022 GMT
notAfter=Sep 27 10:18:03 2027 GMT
It's worth noting that our time zone recently switched to GMT+1. Could this change be related to the issue?
Has anyone else experienced this problem, and if so, how was it resolved?
Thank you.
8 months, 3 weeks
Migrated VM has snapshots with same id
by mihailmataras@gmail.com
Hi community,
We've encountered problem with virtual machines migrated from RHV(4.4.9.5-0.1.el8ev) to oVirt (4.5.4-1.el8) using data domain.
When we create snapshot on migrated VM (in oVirt), we see that snapshot has same id as "Active VM" snapshot and disk in illegal status. (we see that in web ui).
Meanwhile in database "engine" id's are different. And vdsm-tool shows that there's only one snapshot in status "LEGAL".
engine=# select image_guid,parentid,imagestatus,vm_snapshot_id,volume_type,volume_format,active from images where image_group_id='12a8999f-3f6e-4377-8378-40370aa68d3f';
image_guid | parentid | imagestatus | vm_snapshot_id | volume_type | volume_format | active
--------------------------------------+--------------------------------------+-------------+--------------------------------------+-------------+---------------+--------
5cf5a41d-7236-4c41-ae9c-7c6359210db6 | 6d9cc470-d66f-4e68-a0aa-6e8985f392b0 | 1 | 88d91f15-26ad-48c1-91ec-422c1c04e79b | 2 | 4 | t
6d9cc470-d66f-4e68-a0aa-6e8985f392b0 | 00000000-0000-0000-0000-000000000000 | 4 | a47261da-4ebb-42e3-a74a-0882f8e27c73 | 2 | 4 | f
(2 rows)
vdsm-tool dump-volume-chains 5e88e116-fb90-41ce-b9c3-01ccb9302b16 | grep 12a8999f-3f6e-4377-8378-40370aa68d3f -A 10
image: 12a8999f-3f6e-4377-8378-40370aa68d3f
- 6d9cc470-d66f-4e68-a0aa-6e8985f392b0
status: OK, voltype: LEAF, format: COW, legality: LEGAL, type: SPARSE, capacity: 214748364800, truesize: 121198608384
We know how to fix this using database commands. But when we create snapshot , it happens again. Question is how to prevent that?
Thank you.
8 months, 4 weeks
Re: Failed to synchronize networks of Provider ovirt-provider-ovn
by Mail SET Inc. Group
Yes, i use same manual to change WebUI SSL.
ovirt-ca-file= is a same SSL file which use WebUI.
Yes, i restart ovirt-provider-ovn, i restart engine, i restart all what i can restart. Nothing...
> 12 сент. 2018 г., в 16:11, Dominik Holler <dholler(a)redhat.com> написал(а):
>
> On Wed, 12 Sep 2018 14:23:54 +0300
> "Mail SET Inc. Group" <mail(a)set-pro.net> wrote:
>
>> Ok!
>
> Not exactly, please use users(a)ovirt.org for such questions.
> Other should benefit from this questions, too.
> Please write the next mail to users(a)ovirt.org and keep me in CC.
>
>> What i did:
>>
>> 1) install oVirt «from box» (4.2.5.2-1.el7);
>> 2) generate own ssl for my engine using my FreeIPA CA, Install it and
>
> What means "Install it"? You can use the doc from the following link
> https://access.redhat.com/documentation/en-us/red_hat_virtualization/4.2/...
>
> Ensure that ovirt-ca-file= in
> /etc/ovirt-provider-ovn/conf.d/10-setup-ovirt-provider-ovn.conf
> points to the correct file and ovirt-provider-ovn is restarted.
>
>> get tis issue;
>>
>>
>> [root@engine ~]# tail -n 50 /var/log/ovirt-provider-ovn.log
>> 2018-09-12 14:10:23,828 root [SSL: CERTIFICATE_VERIFY_FAILED]
>> certificate verify failed (_ssl.c:579) Traceback (most recent call
>> last): File "/usr/share/ovirt-provider-ovn/handlers/base_handler.py",
>> line 133, in _handle_request method, path_parts, content
>> File "/usr/share/ovirt-provider-ovn/handlers/selecting_handler.py",
>> line 175, in handle_request return
>> self.call_response_handler(handler, content, parameters) File
>> "/usr/share/ovirt-provider-ovn/handlers/keystone.py", line 33, in
>> call_response_handler return response_handler(content, parameters)
>> File "/usr/share/ovirt-provider-ovn/handlers/keystone_responses.py",
>> line 62, in post_tokens user_password=user_password) File
>> "/usr/share/ovirt-provider-ovn/auth/plugin_facade.py", line 26, in
>> create_token return auth.core.plugin.create_token(user_at_domain,
>> user_password) File
>> "/usr/share/ovirt-provider-ovn/auth/plugins/ovirt/plugin.py", line
>> 48, in create_token timeout=self._timeout()) File
>> "/usr/share/ovirt-provider-ovn/auth/plugins/ovirt/sso.py", line 75,
>> in create_token username, password, engine_url, ca_file, timeout)
>> File "/usr/share/ovirt-provider-ovn/auth/plugins/ovirt/sso.py", line
>> 91, in _get_sso_token timeout=timeout File
>> "/usr/share/ovirt-provider-ovn/auth/plugins/ovirt/sso.py", line 54,
>> in wrapper response = func(*args, **kwargs) File
>> "/usr/share/ovirt-provider-ovn/auth/plugins/ovirt/sso.py", line 47,
>> in wrapper raise BadGateway(e) BadGateway: [SSL:
>> CERTIFICATE_VERIFY_FAILED] certificate verify failed (_ssl.c:579)
>>
>>
>> [root@engine ~]# tail -n 20 /var/log/ovirt-engine/engine.log
>> 2018-09-12 14:10:23,773+03 INFO
>> [org.ovirt.engine.core.bll.provider.network.SyncNetworkProviderCommand]
>> (EE-ManagedThreadFactory-engineScheduled-Thread-47) [316db685] Lock
>> Acquired to object
>> 'EngineLock:{exclusiveLocks='[14e4fb72-9764-4757-b37d-4d487995571a=PROVIDER]',
>> sharedLocks=''}' 2018-09-12 14:10:23,778+03 INFO
>> [org.ovirt.engine.core.bll.provider.network.SyncNetworkProviderCommand]
>> (EE-ManagedThreadFactory-engineScheduled-Thread-47) [316db685]
>> Running command: SyncNetworkProviderCommand internal: true.
>> 2018-09-12 14:10:23,836+03 ERROR
>> [org.ovirt.engine.core.bll.provider.network.SyncNetworkProviderCommand]
>> (EE-ManagedThreadFactory-engineScheduled-Thread-47) [316db685]
>> Command
>> 'org.ovirt.engine.core.bll.provider.network.SyncNetworkProviderCommand'
>> failed: EngineException: (Failed with error Bad Gateway and code
>> 5050) 2018-09-12 14:10:23,837+03 INFO
>> [org.ovirt.engine.core.bll.provider.network.SyncNetworkProviderCommand]
>> (EE-ManagedThreadFactory-engineScheduled-Thread-47) [316db685] Lock
>> freed to object
>> 'EngineLock:{exclusiveLocks='[14e4fb72-9764-4757-b37d-4d487995571a=PROVIDER]',
>> sharedLocks=''}' 2018-09-12 14:14:12,477+03 INFO
>> [org.ovirt.engine.core.sso.utils.AuthenticationUtils] (default
>> task-6) [] User admin@internal successfully logged in with scopes:
>> ovirt-app-admin ovirt-app-api ovirt-app-portal
>> ovirt-ext=auth:sequence-priority=~ ovirt-ext=revoke:revoke-all
>> ovirt-ext=token-info:authz-search
>> ovirt-ext=token-info:public-authz-search
>> ovirt-ext=token-info:validate ovirt-ext=token:password-access
>> 2018-09-12 14:14:12,587+03 INFO
>> [org.ovirt.engine.core.bll.aaa.CreateUserSessionCommand] (default
>> task-6) [1bf1b763] Running command: CreateUserSessionCommand
>> internal: false. 2018-09-12 14:14:12,628+03 INFO
>> [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector]
>> (default task-6) [1bf1b763] EVENT_ID: USER_VDC_LOGIN(30), User
>> admin@internal-authz connecting from '10.0.3.61' using session
>> 's8jAm7BUJGlicthm6yZBA3CUM8QpRdtwFaK3M/IppfhB3fHFB9gmNf0cAlbl1xIhcJ2WX+ww7e71Ri+MxJSsIg=='
>> logged in. 2018-09-12 14:14:30,972+03 INFO
>> [org.ovirt.engine.core.bll.provider.ImportProviderCertificateCommand]
>> (default task-6) [ee3cc8a7-4485-4fdf-a0c2-e9d67b5cfcd3] Running
>> command: ImportProviderCertificateCommand internal: false. Entities
>> affected : ID: aaa00000-0000-0000-0000-123456789aaa Type:
>> SystemAction group CREATE_STORAGE_POOL with role type ADMIN
>> 2018-09-12 14:14:30,982+03 INFO
>> [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector]
>> (default task-6) [ee3cc8a7-4485-4fdf-a0c2-e9d67b5cfcd3] EVENT_ID:
>> PROVIDER_CERTIFICATE_IMPORTED(213), Certificate for provider
>> ovirt-provider-ovn was imported. (User: admin@internal-authz)
>> 2018-09-12 14:14:31,006+03 INFO
>> [org.ovirt.engine.core.bll.provider.TestProviderConnectivityCommand]
>> (default task-6) [a48d94ab-b0b2-42a2-a667-0525b4c652ea] Running
>> command: TestProviderConnectivityCommand internal: false. Entities
>> affected : ID: aaa00000-0000-0000-0000-123456789aaa Type:
>> SystemAction group CREATE_STORAGE_POOL with role type ADMIN
>> 2018-09-12 14:14:31,058+03 ERROR
>> [org.ovirt.engine.core.bll.provider.TestProviderConnectivityCommand]
>> (default task-6) [a48d94ab-b0b2-42a2-a667-0525b4c652ea] Command
>> 'org.ovirt.engine.core.bll.provider.TestProviderConnectivityCommand'
>> failed: EngineException: (Failed with error Bad Gateway and code
>> 5050) 2018-09-12 14:15:10,954+03 INFO
>> [org.ovirt.engine.core.bll.utils.ThreadPoolMonitoringService]
>> (EE-ManagedThreadFactory-engineThreadMonitoring-Thread-1) [] Thread
>> pool 'default' is using 0 threads out of 1, 5 threads waiting for
>> tasks. 2018-09-12 14:15:10,954+03 INFO
>> [org.ovirt.engine.core.bll.utils.ThreadPoolMonitoringService]
>> (EE-ManagedThreadFactory-engineThreadMonitoring-Thread-1) [] Thread
>> pool 'engine' is using 0 threads out of 500, 16 threads waiting for
>> tasks and 0 tasks in queue. 2018-09-12 14:15:10,954+03 INFO
>> [org.ovirt.engine.core.bll.utils.ThreadPoolMonitoringService]
>> (EE-ManagedThreadFactory-engineThreadMonitoring-Thread-1) [] Thread
>> pool 'engineScheduled' is using 0 threads out of 100, 100 threads
>> waiting for tasks. 2018-09-12 14:15:10,954+03 INFO
>> [org.ovirt.engine.core.bll.utils.ThreadPoolMonitoringService]
>> (EE-ManagedThreadFactory-engineThreadMonitoring-Thread-1) [] Thread
>> pool 'engineThreadMonitoring' is using 1 threads out of 1, 0 threads
>> waiting for tasks. 2018-09-12 14:15:10,954+03 INFO
>> [org.ovirt.engine.core.bll.utils.ThreadPoolMonitoringService]
>> (EE-ManagedThreadFactory-engineThreadMonitoring-Thread-1) [] Thread
>> pool 'hostUpdatesChecker' is using 0 threads out of 5, 2 threads
>> waiting for tasks. 2018-09-12 14:15:23,843+03 INFO
>> [org.ovirt.engine.core.bll.provider.network.SyncNetworkProviderCommand]
>> (EE-ManagedThreadFactory-engineScheduled-Thread-61) [2455041f] Lock
>> Acquired to object
>> 'EngineLock:{exclusiveLocks='[14e4fb72-9764-4757-b37d-4d487995571a=PROVIDER]',
>> sharedLocks=''}' 2018-09-12 14:15:23,849+03 INFO
>> [org.ovirt.engine.core.bll.provider.network.SyncNetworkProviderCommand]
>> (EE-ManagedThreadFactory-engineScheduled-Thread-61) [2455041f]
>> Running command: SyncNetworkProviderCommand internal: true.
>> 2018-09-12 14:15:23,900+03 ERROR
>> [org.ovirt.engine.core.bll.provider.network.SyncNetworkProviderCommand]
>> (EE-ManagedThreadFactory-engineScheduled-Thread-61) [2455041f]
>> Command
>> 'org.ovirt.engine.core.bll.provider.network.SyncNetworkProviderCommand'
>> failed: EngineException: (Failed with error Bad Gateway and code
>> 5050) 2018-09-12 14:15:23,901+03 INFO
>> [org.ovirt.engine.core.bll.provider.network.SyncNetworkProviderCommand]
>> (EE-ManagedThreadFactory-engineScheduled-Thread-61) [2455041f] Lock
>> freed to object
>> 'EngineLock:{exclusiveLocks='[14e4fb72-9764-4757-b37d-4d487995571a=PROVIDER]',
>> sharedLocks=''}'
>>
>>
>> [root@engine ~]#
>> cat /etc/ovirt-provider-ovn/conf.d/10-setup-ovirt-provider-ovn.conf #
>> This file is automatically generated by engine-setup. Please do not
>> edit manually [OVN REMOTE] ovn-remote=ssl:127.0.0.1:6641
>> [SSL]
>> https-enabled=true
>> ssl-cacert-file=/etc/pki/ovirt-engine/ca.pem
>> ssl-cert-file=/etc/pki/ovirt-engine/certs/ovirt-provider-ovn.cer
>> ssl-key-file=/etc/pki/ovirt-engine/keys/ovirt-provider-ovn.key.nopass
>> [OVIRT]
>> ovirt-sso-client-secret=Ms7Gw9qNT6IkXu7oA54tDmxaZDIukABV
>> ovirt-host=https://engine.set.local:443
>> ovirt-sso-client-id=ovirt-provider-ovn
>> ovirt-ca-file=/etc/pki/ovirt-engine/apache-ca.pem
>> [PROVIDER]
>> provider-host=engine.set.local
>>
>>
>>> 12 сент. 2018 г., в 13:59, Dominik Holler <dholler(a)redhat.com>
>>> написал(а):
>>>
>>> On Wed, 12 Sep 2018 13:04:53 +0300
>>> "Mail SET Inc. Group" <mail(a)set-pro.net> wrote:
>>>
>>>> Hello Dominik!
>>>> I have a same issue with OVN provider and SSL
>>>> https://www.mail-archive.com/users@ovirt.org/msg47020.html
>>>> <https://www.mail-archive.com/users@ovirt.org/msg47020.html> But
>>>> certificate changes not helps to resolve it. Maybe you can help me
>>>> with this?
>>>
>>> Sure. Can you please share the relevant lines of
>>> ovirt-provider-ovn.log and engine.log, and the information if you
>>> are using the certificates generated by engine-setup with
>>> users(a)ovirt.org ? Thanks,
>>> Dominik
>>>
>>
>
>
9 months
Unable to enable HPET component of an specific VM in oVirt 4.7
by ricardoot@gmail.com
Hello community members,
I'm currently using oVirt 4.7 as my virtualization environment, and I'm facing an issue with enabling the HPET (High Precision Event Timer) component in the XML configuration file of virtual machine (VM).
Upon inspecting the XML file, I noticed that the `<timer name='hpet' present='no'/>` line is missing, indicating that the HPET component is disabled.
Here are the steps I have taken so far:
1. I verified that the VM's XML configuration file does not include the `<timer name='hpet' present='yes'/>` line.
2. While the VM was powered on, I used the following command to edit the XML configuration file:
```
virsh edit VM_NAME
```
I added the `<timer name='hpet' present='yes'/>` line to the XML file. However, the changes did not persist after restarting the VM.
To provide additional information, on the host where oVirt is running, the available clock sources can be viewed by executing the following command:
```
cat /sys/devices/system/clocksource/clocksource0/available_clocksource
```
The output shows the available clock sources, such as `tsc`, `hpet`, and `acpi_pm`.
To resolve the authentication issue with the `virsh` command, I created a user with appropriate privileges using the following command:
```
sudo saslpasswd2 -a libvirt USERNAME
```
After creating the user, I was able to authenticate successfully with the `virsh` command using the newly created credentials.
However, I'm unable to find an option to add the HPET parameter in the web console of oVirt. It seems that the option to configure HPET is not available in the web console.
Has anyone else encountered a similar issue in oVirt 4.7? Could you please provide guidance or suggest a solution to enable the HPET component in the XML configuration file of a powered-off VM in oVirt 4.7? Any insights, experiences, or suggestions would be greatly appreciated.
Thank you in advance for your assistance!
Best regards,
9 months
Can I get a way to commit ovirt Project?
by rak kim
Hello,
I have been used to OLV/RHEV product about 3 years
I often visit our community to get the error solution as well
now I would try to use ovirt and develop to improve some function..
I'm using test environment using compiled oVirt 4.5 package.
but I havent yet set up build develpment environment
1. Can you share and advise your ovirt development environment experience?
2. ovirt has some project, and how merge it to compile one source?
3. Is there a process for sharing what features you want to fix before making improvements? Can I just do PR?
4. your tips.. how to debug, ..etc
9 months
fencing state/sequence diagrams
by Renaud
Hello everyone,
After a network outage, we experienced a situation of chaotic
connectivity between the oVirt manager (located on a VM outside the
cluster) and the hypervisors themselves.
During the incident, while only the connectivity between the manager and
the hypervisors was difficult, during the recovery we had fencing of
several of our hypervisors.
I'm looking for a diagram or text that describes all the rules for
triggering fencing.
Specifically, I have questions about the following (just in case):
* Does the manager play an active role in the decision to fence a
hypervisor?
Regards,
Renaud
9 months, 1 week
HE deploy fails at "Initialize lockspace volume" step
by Giuliano David
Hi everyone.
I need help understanding a failure deploying the hosted engine on a
fresh-installed oVirt 4.5.4 el8 node.
After the setup via official ISO, I login via ssh in the node and I
issue the command:
# hosted-engine --deploy --4 --ansible-extra-vars=he_offline_deployment=true
-- Note --
The extra ansible variable is the only way I found to inhibit the
deployed hosted engine downloading last OS updates that will break
Python compatibility between the ansible playbook in the node deploying
and the ansible host in the engine deployed.
Without that extra variable the deployment fails with fancy reasons.
-- End note --
The deployment goes, until i specify an iSCSI target and a (free) LUN.
The playbook adds the storage domain, creates the HE disk and transfert
the HE vm to the domain. Then an error occurs:
[ INFO ] TASK [ovirt.ovirt.hosted_engine_setup : Initialize lockspace
volume]
[ INFO ] TASK [ovirt.ovirt.hosted_engine_setup : Workaround for
ovirt-ha-broker start failures]
[ INFO ] changed: [localhost]
[ INFO ] TASK [ovirt.ovirt.hosted_engine_setup : Initialize lockspace
volume]
[ ERROR ] fatal: [localhost]: FAILED! => {"attempts": 5, "changed":
true, "cmd": ["hosted-engine", "--reinitialize-lockspace", "--force"],
"delta": "0:00:00.170053", "end": "2023-10-20 11:21:18.111299", "msg":
"non-zero return code", "rc": 1, "start": "2023-10-20 11:21:17.941246",
"stderr": "Traceback (most recent call last):\n File
\"/usr/lib64/python3.6/runpy.py\", line 193, in _run_module_as_main\n
\"__main__\", mod_spec)\n File \"/usr/lib64/python3.6/runpy.py\", line
85, in _run_code\n exec(code, run_globals)\n File
\"/usr/lib/python3.6/site-packages/ovirt_hosted_engine_setup/reinitialize_lockspace.py\",
line 30, in <module>\n ha_cli.reset_lockspace(force)\n File
\"/usr/lib/python3.6/site-packages/ovirt_hosted_engine_ha/client/client.py\",
line 286, in reset_lockspace\n stats =
broker.get_stats_from_storage()\n File
\"/usr/lib/python3.6/site-packages/ovirt_hosted_engine_ha/lib/brokerlink.py\",
line 148, in get_stats_from_storage\n result =
self._proxy.get_stats()\n File
\"/usr/lib64/python3.6/xmlrpc/client.py\", line 1112, in __call__\n
return self.__send(self.__name, args)\n File
\"/usr/lib64/python3.6/xmlrpc/client.py\", line 1452, in __request\n
verbose=self.__verbose\n File
\"/usr/lib64/python3.6/xmlrpc/client.py\", line 1154, in request\n
return self.single_request(host, handler, request_body, verbose)\n File
\"/usr/lib64/python3.6/xmlrpc/client.py\", line 1166, in
single_request\n http_conn = self.send_request(host, handler,
request_body, verbose)\n File
\"/usr/lib64/python3.6/xmlrpc/client.py\", line 1279, in
send_request\n self.send_content(connection, request_body)\n File
\"/usr/lib64/python3.6/xmlrpc/client.py\", line 1309, in
send_content\n connection.endheaders(request_body)\n File
\"/usr/lib64/python3.6/http/client.py\", line 1268, in endheaders\n
self._send_output(message_body, encode_chunked=encode_chunked)\n File
\"/usr/lib64/python3.6/http/client.py\", line 1044, in _send_output\n
self.send(msg)\n File \"/usr/lib64/python3.6/http/client.py\", line
982, in send\n self.connect()\n File
\"/usr/lib/python3.6/site-packages/ovirt_hosted_engine_ha/lib/unixrpc.py\",
line 76, in connect\n
self.sock.connect(base64.b16decode(self.host))\nFileNotFoundError:
[Errno 2] No such file or directory", "stderr_lines": ["Traceback (most
recent call last):", " File \"/usr/lib64/python3.6/runpy.py\", line
193, in _run_module_as_main", " \"__main__\", mod_spec)", " File
\"/usr/lib64/python3.6/runpy.py\", line 85, in _run_code", " exec(code,
run_globals)", " File
\"/usr/lib/python3.6/site-packages/ovirt_hosted_engine_setup/reinitialize_lockspace.py\",
line 30, in <module>", " ha_cli.reset_lockspace(force)", " File
\"/usr/lib/python3.6/site-packages/ovirt_hosted_engine_ha/client/client.py\",
line 286, in reset_lockspace", " stats =
broker.get_stats_from_storage()", " File
\"/usr/lib/python3.6/site-packages/ovirt_hosted_engine_ha/lib/brokerlink.py\",
line 148, in get_stats_from_storage", " result =
self._proxy.get_stats()", " File
\"/usr/lib64/python3.6/xmlrpc/client.py\", line 1112, in __call__", "
return self.__send(self.__name, args)", " File
\"/usr/lib64/python3.6/xmlrpc/client.py\", line 1452, in __request",
" verbose=self.__verbose", " File
\"/usr/lib64/python3.6/xmlrpc/client.py\", line 1154, in request", "
return self.single_request(host, handler, request_body, verbose)", "
File \"/usr/lib64/python3.6/xmlrpc/client.py\", line 1166, in
single_request", " http_conn = self.send_request(host, handler,
request_body, verbose)", " File
\"/usr/lib64/python3.6/xmlrpc/client.py\", line 1279, in send_request",
" self.send_content(connection, request_body)", " File
\"/usr/lib64/python3.6/xmlrpc/client.py\", line 1309, in send_content",
" connection.endheaders(request_body)", " File
\"/usr/lib64/python3.6/http/client.py\", line 1268, in endheaders", "
self._send_output(message_body, encode_chunked=encode_chunked)", " File
\"/usr/lib64/python3.6/http/client.py\", line 1044, in _send_output",
" self.send(msg)", " File \"/usr/lib64/python3.6/http/client.py\",
line 982, in send", " self.connect()", " File
\"/usr/lib/python3.6/site-packages/ovirt_hosted_engine_ha/lib/unixrpc.py\",
line 76, in connect", " self.sock.connect(base64.b16decode(self.host))",
"FileNotFoundError: [Errno 2] No such file or directory"], "stdout": "",
"stdout_lines": []}
Then the playbook cleans all the installation and exits.
Really cant' figure what's going on ...
This is the best point i reached in deploying HE after two weeks of
failures and errors of any kind.
Please, can someone point me in the right direction to solve this new issue?
Thanks.
giuliano
9 months, 1 week
No default gateway defined
by gbv.kris@mail.ru
When deploying isolated nested virtualization, we received the error "default gateway is not defined".
Two virtual machines were created in the zVirt virtualization environment: the first OS AstraLinux, the second OS zvirtNode. VM are connected to each other by a virtual network. After starting the installation in Hosted Engine mode using the command line, this error was received. The setup was performed without DNS and the default gateway
9 months, 1 week