September 2022 - Users - Ovirt List Archives

Re: Self-hosted-engine timeout and recovering time

by Yedidyah Bar David

On Wed, Sep 21, 2022 at 12:22 AM Marcos Sungaila <marcos.sungaila(a)oracle.com> wrote: > > Hi all, > > I have a cluster running the 4.4.10 release with 6 KVM hosts and Self-Hosted-Engine. What storage? > I'm testing some network outage scenarios, and I faced strange behavior. I suppose you have redundancy in your network. It's important to clarify (for yourself, mainly) what exactly you test, what's important, what's expected, etc. > After disconnecting the KVM hosts hosting the SHE, there was a long timeout until switching the Self-Hosted-Engine to another host as expected. I suggest studying the ha-agent logs, /var/log/ovirt-hosted-engine-ha/agent.log. Much of the relevant code is in ovirt_hosted_engine_ha/agent/states.py (in the git repo, or under /usr/lib/python3.6/site-packages/ on your machine). > Also, there took a relatively long time to take over the HA VMs from the failing server. That's a separate issue, about which I personally know very little. You might want to start a separate thread about it. I do know, though, that if you keep the storage connected, the host might be able to keep updating VM leases on the storage. See e.g.: https://www.ovirt.org/develop/release-management/features/storage/vm-leas... I didn't check the admin guide, but I suppose it has some material about HA VMs. > Is there a configuration where I can reduce the SHE timeout to make this recover process faster? IIRC there is nothing user-configurable. You can see most relevant constants in ovirt_hosted_engine_ha/agent/constants.py{,.in}. Nothing stops you from changing them, but please note that this is somewhat risky, and I strongly suggest to do very careful testing with your new settings. It might make sense to try to methodically go through all the possible state changes in the above state machine. The general assumption is that network and storage, for critical setups, are redundant, and that the engine itself is not considered critical, in the sense that if it's dead, all your VMs are still alive. And also, that it's more important to not corrupt VM disk images (e.g. by starting the VM concurrently on two hosts) than to keep the VM alive. Best regards, -- Didi

2 years, 9 months

2
1
0 / 0

all active domains with status unknown in old 4.3 cluster

by Jorick Astrego

Hi, Currently I'm debugging a client's ovirt 4.3 cluster. I was adding two new gluster domains and got a timeout "VDSM command AttachStorageDomainVDS failed: Resource timeout: ()" and "Failed to attach Storage Domain *** to Data Center **". Then I had to restart ovirt-engine and now all the domains including NFS domains have status "unknown" and I see "VDSM command GetStoragePoolInfoVDS failed: Resource timeout: ()" in the events. Anyone fixed this before or have any tips? Met vriendelijke groet, With kind regards, Jorick Astrego Netbulae Virtualization Experts ---------------- Tel: 053 20 30 270 info(a)netbulae.eu Staalsteden 4-3A KvK 08198180 Fax: 053 20 30 271 www.netbulae.eu 7547 TA Enschede BTW NL821234584B01 ----------------

2 years, 9 months

1
0
0 / 0

Snapshot task stuck at oVirt 4.4.8

by nicolas＠devels.es

Hi, We're running oVirt 4.4.8 and one of our users tried to create a snapshot on a VM. The snapshot task got stuck (not sure why) and since then a "locked" icon is being shown on the VM. We need to remove this VM, but since it has a pending task, we're unable. The ovirt-engine log shows hundreds of events like: [2022-09-20 09:23:09,286+01 INFO [org.ovirt.engine.core.bll.SerialChildCommandsExecutionCallback] (EE-ManagedScheduledExecutorService-engineScheduledThreadPool-Thread-27) [2769dad5-3ec3-4c46-90a2-924746ea8d97] Command 'CreateSnapshotForVm' (id: '4fcb6ab7-2cd7-4a0c-be97-f6979be25bb9') waiting on child command id: 'cbb7a2c0-2111-4958-a55d-d48bf2d8591b' type:'CreateLiveSnapshotForVm' to complete An ovirt-engine restart didn't make any difference. Is there a way to remove this task manually, even changing something in the DB? Thanks.

2 years, 9 months

4
4
0 / 0

oVirt Engine VM On Rocky Linux

by Matthew J Black

Hi Everybody (Hi Dr. Nick), Has anyone attempted to migrate the oVirt Engine VM over to Rocky Linux (v8.6), and if so, any "gotchas" we need to know about? Cheers Dulux-Oz

2 years, 9 months

1
0
0 / 0

oVirt & (Ceph) iSCSI

by Matthew J Black

Hi Everybody (Hi Dr. Nick), So, next question in my on-going saga: *somewhere* in the documentation I read that when using oVirt with multiple iSCSI paths (in my case, multiple Ceph iSCSI Gateways) we need to set up DM Multipath. My question is: Is this still relevant information when using oVirt v4.5.2? Relevant link referred to by the oVirt Documentation: - https://access.redhat.com/documentation/en-us/red_hat_enterprise_linux/7/... Cheers Dulux-Oz

2 years, 9 months

1
0
0 / 0

Self-hosted-engine timeout and recovering time

by Marcos Sungaila

Hi all, I have a cluster running the 4.4.10 release with 6 KVM hosts and Self-Hosted-Engine. I'm testing some network outage scenarios, and I faced strange behavior. After disconnecting the KVM hosts hosting the SHE, there was a long timeout until switching the Self-Hosted-Engine to another host as expected. Also, there took a relatively long time to take over the HA VMs from the failing server. Is there a configuration where I can reduce the SHE timeout to make this recover process faster? Regards, Marcos Sungaila

2 years, 9 months

1
0
0 / 0

How do I migrate a running VM off unassigned host?

by David White

Ok, now that I'm able to (re)deploy ovirt to new hosts, I now need to migrate VMs that are running on hosts that are currently in an "unassigned" state in the cluser. This is the result of having moved the oVirt engine OUT of a hyperconverged environment onto its own stand-alone system, while simultaneously upgrading oVirt from v4.4 to the latest v4.5. See the following email threads: - https://lists.ovirt.org/archives/list/users@ovirt.org/thread/TZAUCM3GB5ER... - https://lists.ovirt.org/archives/list/users@ovirt.org/thread/3IWXZ7VXM6CY... The oVirt engine knows about the VMs, and oVirt knows about the storage that those VMs are on. But the engine sees 2 of my hosts as "unassigned", and I've been unable to migrate the disks to new storage, nor live migrate a VM from an unassigned host, nor make a clone of an existing VM. Is there a way to recover from this scenario? I was thinking something along the lines of manually shutting down the VM on the unassigned host, and then somehow force the engine to bring the VM online again from a healthy host? Thanks, David Sent with Proton Mail secure email.

2 years, 9 months

2
4
0 / 0

long time running backup (hanged in image finalizing state )

by Jirka Simon

Hello there. we have issue with backups on our cluster, one backup started 2 days ago and is is still in state finalizing. select * from vm_backups; backup_id | from_checkpoint_id | to_checkpoint_id | vm_id | phase | _create_date | host_id | des cription | _update_date | backup_type | snapshot_id | is_stopped --------------------------------------+--------------------+--------------------------------------+--------------------------------------+-------+----------------------------+---------+---- ---------+----------------------------+-------------+--------------------------------------+------------ b9c458e6-64e2-41c2-93b8-96761e71f82b | | 7a558f2a-57b6-432f-b5dd-85f5fb9dac8e | c3b2199f-35cc-41dc-8787-835e945217d2 | Ready | 2022-09-17 00:44:56.877+02 | | | 2022-09-17 00:45:19.057+02 | hybrid | 0c6ebd56-dcfe-46a8-91cc-327cc94e9773 | f (1 row) and if I check imagetransfer table, I see bytes_sent = bytes_total. engine=# select it.disk_id,bd.disk_alias,it.last_updated, it.bytes_sent, it.bytes_total from image_transfers as it , base_disks as bd where it.disk_id = bd.disk_id; disk_id | disk_alias | last_updated | bytes_sent | bytes_total --------------------------------------+-------------------------------------------------------+----------------------------+--------------+-------------- 950279ef-485c-400e-ba66-a3f545618de5 | log1.util.prod.hq.sldev.cz_log1.util.prod.hq.sldev.cz | 2022-09-17 01:43:09.229+02 | 214748364800 | 214748364800 there is no error in logs if i use /usr/share/ovirt-engine/setup/dbutils/unlock_entity.sh -t all -qc there is no record in any part. I can clean these record from DB to fix it but it will happen again in few days. vdsm.x86_64 4.50.2.2-1.el8 ovirt-engine.noarch 4.5.2.4-1.el8 is there anything i can check to find reason of this ? Thank you Jirka

2 years, 9 months

2
8
0 / 0

Unable to deploy to new host

by David White

I currently have a self-hosted engine that was restored from a backup of an engine that was originally in a hyperconverged state. (See https://lists.ovirt.org/archives/list/users@ovirt.org/message/APQ3XBU...). This was also an upgrade from ovirt 4.4 to ovirt 4.5. There were 4 hosts in this cluster. Unfortunately, 2 of them are completely in an "Unassigned" state right now, and I don't know why. The VMs on those hosts are working fine, but I have no way to move the VMs or manage them. More to the point of this email: I'm trying to re-deploy onto a 3rd host. I did a fresh install of Rocky Linux 8, and followed the instructions at https://ovirt.org/download/ and at https://ovirt.org/download/install_on_rhel.html, including the part there that is specific to Rocky. After installing the centos-release-ovirt45 package, I then logged into the oVirt engine web UI, and went to Compute -> Hosts -> New, and have tried (and failed) many times to install / deploy to this new host. The last error in the host deploy log is the following: 2022-09-18 21:29:39 EDT - { "uuid" : "94b93e6a-5410-4d26-b058-d7d1db0a151e", "counter" : 404, "stdout" : "fatal: [cha2-storage.mgt.example.com]: FAILED! => {\"msg\": \"The conditional check 'cluster_switch == \\\"ovs\\\" or (ovn_central is defined and ovn_central | ipaddr)' failed. The error was: The ipaddr filter requires python's netaddr be installed on the ansible controller\\n\\nThe error appears to be in '/usr/share/ovirt-engine/ansible-runner-service-project/project/roles/ovirt-provider-ovn-driver/tasks/configure.yml': line 3, column 5, but may\\nbe elsewhere in the file depending on the exact syntax problem.\\n\\nThe offending line appears to be:\\n\\n- block:\\n - name: Install ovs\\n ^ here\\n\"}", "start_line" : 405, "end_line" : 406, "runner_ident" : "e2cbd38d-64fa-4ecd-82c6-114420ea14a4", "event" : "runner_on_failed", "pid" : 65899, "created" : "2022-09-19T01:29:38.983937", "parent_uuid" : "02113221-f1b3-920f-8bd4-00000000003d", "event_data" : { "playbook" : "ovirt-host-deploy.yml", "playbook_uuid" : "73a6e8f1-3836-49e1-82fd-5367b0bf4e90", "play" : "all", "play_uuid" : "02113221-f1b3-920f-8bd4-000000000006", "play_pattern" : "all", "task" : "Install ovs", "task_uuid" : "02113221-f1b3-920f-8bd4-00000000003d", "task_action" : "package", "task_args" : "", "task_path" : "/usr/share/ovirt-engine/ansible-runner-service-project/project/roles/ovirt-provider-ovn-driver/tasks/configure.yml:3", "role" : "ovirt-provider-ovn-driver", "host" : "cha2-storage.mgt.example.com", "remote_addr" : "cha2-storage.mgt.example.com", "res" : { "msg" : "The conditional check 'cluster_switch == \"ovs\" or (ovn_central is defined and ovn_central | ipaddr)' failed. The error was: The ipaddr filter requires python's netaddr be installed on the ansible controller\n\nThe error appears to be in '/usr/share/ovirt-engine/ansible-runner-service-project/project/roles/ovirt-provider-ovn-driver/tasks/configure.yml': line 3, column 5, but may\nbe elsewhere in the file depending on the exact syntax problem.\n\nThe offending line appears to be:\n\n- block:\n - name: Install ovs\n ^ here\n", "_ansible_no_log" : false }, "start" : "2022-09-19T01:29:38.919334", "end" : "2022-09-19T01:29:38.983680", "duration" : 0.064346, "ignore_errors" : null, "event_loop" : null, "uuid" : "94b93e6a-5410-4d26-b058-d7d1db0a151e" } } On the engine, I have verified that netaddr is installed. And just for kicks, I've installed as many different versions as I can find: [root@ovirt-engine1 host-deploy]# rpm -qa | grep netaddrpython38-netaddr-0.7.19-8.1.1.el8.noarch python2-netaddr-0.7.19-8.1.1.el8.noarch python3-netaddr-0.7.19-8.1.1.el8.noarch The engine is based on CentOS Stream 8 (when I moved the engine out of the hyperconverged environment, my goal was to keep things as close to the original environment as possible) [root@ovirt-engine1 host-deploy]# cat /etc/redhat-release CentOS Stream release 8 The engine is fully up-to-date: [root@ovirt-engine1 host-deploy]# uname -a Linux ovirt-engine1.mgt.barredowlweb.com 4.18.0-408.el8.x86_64 #1 SMP Mon Jul 18 17:42:52 UTC 2022 x86_64 x86_64 x86_64 GNU/Linux And the engine has the following repos: [root@ovirt-engine1 host-deploy]# yum repolistrepo id repo name appstream CentOS Stream 8 - AppStream baseos CentOS Stream 8 - BaseOS centos-ceph-pacific CentOS-8-stream - Ceph Pacific centos-gluster10 CentOS-8-stream - Gluster 10 centos-nfv-openvswitch CentOS-8 - NFV OpenvSwitch centos-opstools CentOS-OpsTools - collectd centos-ovirt45 CentOS Stream 8 - oVirt 4.5 extras CentOS Stream 8 - Extras extras-common CentOS Stream 8 - Extras common packages ovirt-45-centos-stream-openstack-yoga CentOS Stream 8 - oVirt 4.5 - OpenStack Yoga Repository ovirt-45-upstream oVirt upstream for CentOS Stream 8 - oVirt 4.5 powertools CentOS Stream 8 - PowerTools Why does deploying to this new Rocky host keep failing? Sent with Proton Mail secure email.

2 years, 9 months

2
5
0 / 0

How i can shutdown the virtual servers on a note when the ovirt network is down

by manfred.bohensky＠bohensky-it.gmbh

Hello, i have a question. When on an ovirt note the ovirt network down how i can shutdown the virtual server local on the ovirt note in the local cli? Thanks

2 years, 9 months

3
3
0 / 0

2025

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

Users September 2022