Problems after upgrade from 4.4.3 to 4.4.4
by tferic@swissonline.ch
Hi
I have problems after upgrading my 2-node cluster from 4.4.3 to 4.4.4.
Initially, I performed the upgrade of the oVirt hosts using the oVirt GUI (I wasn't planning any changes).
It appears that the upgrade broke the system.
On host1, the ovirt-engine was configured to run on the oVirt host itself (not self-hosted engine).
After the upgrade, the oVirt GUI didn't load in the Browser anymore.
I tried to fix the issue by migrating to self-hosted engine, which did not work, so I ran engine restore and engine-setup in order to get back to the initial state.
I am now able to login to the oVirt GUI again, but I am having the following problems:
host1 is in status "Unassigned", and it has the SPM role. It cannot be set to maintenance mode, nor re-installed from GUI, but I am able to reboot the host from oVirt.
All Storage Domains are inactive. (all NFS)
In the /var/log/messages log, I can see the following message appearing frequently: "vdsm[5935]: ERROR ssl handshake: socket error, address: ::ffff:192.168.100.61"
The cluster is down and no VM's can be run. I don't know how to fix either of the issues.
Does anyone have an idea?
I am appending a tar file containing log files to this email.
http://gofile.me/5fp92/d7iGEqh3H <http://gofile.me/5fp92/d7iGEqh3H>
Many thanks
Toni
3 years, 7 months
No option to remove OpenStack Glance storage domain.
by Gary Taylor
Hi,
I was playing around a few months ago and added an ovirt-image-repository storage domain. There wasn't a real purpose for doing it, just trying to learn and play around. I tried to remove it earlier but couldn't figure out how because the Remove button is greyed out. I got busy and forgot about it. I am trying to clean it up now but the Remove button is still greyed out for that domain. How do I get rid of it now? It isn't being used. It's unattached. I'm the admin.
https://imgur.com/KBKUu16.png
oVirt Open Virtualization Manager
Software Version:4.4.4.7-1.el8
Thank-you,
Gary
3 years, 7 months
Hosted-Engine vs Standalone Engine
by Ian Easter
Hello Folks,
I have had to install a Hosted-Engine a few times in my environment. There
have been some hardware issues and power issues that left the HE
unrecoverable.
In this situation, would the Standalone Engine install be more viable and
less prone to become inoperable due to the previous issues?
My assumption would be to have a head baremetal server run the Engine to
control and maintain my blades.
*Thank you,*
*Ian*
3 years, 7 months
Error while updating volume meta data: [Errno 17] File exists", ), code = 208
by adrianquintero@gmail.com
Hi,
I am encountering the following error every hour in the ovirt-engine.log file and I can also see the same in the oVirt engine UI
---------------------------------------
2021-04-05 01:49:20,072-04 ERROR [org.ovirt.engine.core.vdsbroker.irsbroker.SetVolumeDescriptionVDSCommand] (EE-ManagedThreadFactory-engineScheduled-Thread-76) [7dd3eb3d] Command 'SetVolumeDescriptionVDSCommand( SetVolumeDescriptionVDSCommandParameters:{storagePoolId='dfe84316-bbdb-11ea-beb6-00163e1ab088', ignoreFailoverLimit='false', storageDomainId='9ed31575-9cc4-4b05-a5d9-23d21d6e915e', imageGroupId='2af73aad-5079-4583-824c-a93b20b93835', imageId='ad5a523c-fdc1-49dd-900b-73f801564f1d'})' execution failed: IRSGenericException: IRSErrorException: Failed to SetVolumeDescriptionVDS, error = Error while updating volume meta data: ("(u'/rhev/data-center/mnt/glusterSD192.168.0.4:_data1/9ed31575-9cc4-4b05-a5d9-23d21d6e915e/images/2af73aad-5079-4583-824c-a93b20b93835/ad5a523c-fdc1-49dd-900b-73f801564f1d',)[Errno 17] File exists",), code = 208
2021-04-05 02:49:14,333-04 ERROR [org.ovirt.engine.core.bll.gluster.GlusterGeoRepSyncJob] (DefaultQuartzScheduler3) [2bf694b9] VDS error Command execution failed: rc=2 out='geo-replication command failed\n' err=''
2021-04-05 02:49:20,332-04 ERROR [org.ovirt.engine.core.vdsbroker.irsbroker.SetVolumeDescriptionVDSCommand] (EE-ManagedThreadFactory-engineScheduled-Thread-5) [6b6dd1af] Command 'SetVolumeDescriptionVDSCommand( SetVolumeDescriptionVDSCommandParameters:{storagePoolId='dfe84316-bbdb-11ea-beb6-00163e1ab088', ignoreFailoverLimit='false', storageDomainId='9ed31575-9cc4-4b05-a5d9-23d21d6e915e', imageGroupId='2af73aad-5079-4583-824c-a93b20b93835', imageId='ad5a523c-fdc1-49dd-900b-73f801564f1d'})' execution failed: IRSGenericException: IRSErrorException: Failed to SetVolumeDescriptionVDS, error = Error while updating volume meta data: ("(u'/rhev/data-center/mnt/glusterSD/192.168.0.4:_data1/9ed31575-9cc4-4b05-a5d9-23d21d6e915e/images/2af73aad-5079-4583-824c-a93b20b93835/ad5a523c-fdc1-49dd-900b-73f801564f1d',)[Errno 17] File exists",), code = 208
---------------------------------------
I looked up the mentioned error codes but not sure if these are the correct ones:
https://access.redhat.com/documentation/en-us/red_hat_virtualization/4.3/...
---------------------------------------
17 > VDS_MAINTENANCE_FAILED > Error > Failed to switch Host ${VdsName} to Maintenance mode.
208 > PROVIDER_UPDATE_FAILED > Error > Failed to update provider ${ProviderName}. (User: ${UserName})
---------------------------------------
From the Engine UI's event tab I can see:
---------------------------------------
Failed to update VMs/Templates OVF data for Storage Domain data1 in Data Center Default.
Failed to update OVF disks 2af73aad-5079-4583-824c-a93b20b93835, OVF data isn't updated on those OVF stores (Data Center Default, Storage Domain data1).
VDSM command SetVolumeDescriptionVDS failed: Error while updating volume meta data: ("(u'/rhev/data-center/mnt/glusterSD/192.168.0.4:_data1/9ed31575-9cc4-4b05-a5d9-23d21d6e915e/images/2af73aad-5079-4583-824c-a93b20b93835/ad5a523c-fdc1-49dd-900b-73f801564f1d',)[Errno 17] File exists",)
---------------------------------------
I know the error belongs to the "data1" storage domain and this is a 12 node Distributed Replicate volume (Replicat 3) HCI.
If I try to update the OVF using the UI I get the following log entries in the engine
---------------------------------------
2021-04-05 03:20:03,101-04 WARN [org.ovirt.engine.core.dal.job.ExecutionMessageDirector] (default task-1183) [6761980f-dcf6-400e-bfd8-fd2c70e7569a] The message key 'UpdateOvfStoreForStorageDomain' is missing from 'bundles/ExecutionMessages'
2021-04-05 03:20:03,178-04 INFO [org.ovirt.engine.core.bll.storage.domain.UpdateOvfStoreForStorageDomainCommand] (default task-1183) [6761980f-dcf6-400e-bfd8-fd2c70e7569a] Lock Acquired to object 'EngineLock:{exclusiveLocks='[9ed31575-9cc4-4b05-a5d9-23d21d6e915e=STORAGE]', sharedLocks=''}'
2021-04-05 03:20:03,204-04 INFO [org.ovirt.engine.core.bll.storage.domain.UpdateOvfStoreForStorageDomainCommand] (default task-1183) [6761980f-dcf6-400e-bfd8-fd2c70e7569a] Running command: UpdateOvfStoreForStorageDomainCommand internal: false. Entities affected : ID: 9ed31575-9cc4-4b05-a5d9-23d21d6e915e Type: StorageAction group MANIPULATE_STORAGE_DOMAIN with role type ADMIN
2021-04-05 03:20:03,505-04 ERROR [org.ovirt.engine.core.vdsbroker.irsbroker.SetVolumeDescriptionVDSCommand] (EE-ManagedThreadFactory-engineScheduled-Thread-62) [13f4ac4f] Failed in 'SetVolumeDescriptionVDS' method
2021-04-05 03:20:03,512-04 ERROR [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (EE-ManagedThreadFactory-engineScheduled-Thread-62) [13f4ac4f] EVENT_ID: IRS_BROKER_COMMAND_FAILURE(10,803), VDSM command SetVolumeDescriptionVDS failed: Error while updating volume meta data: ("(u'/rhev/data-center/mnt/glusterSD/192.168.0.4:_data1/9ed31575-9cc4-4b05-a5d9-23d21d6e915e/images/2af73aad-5079-4583-824c-a93b20b93835/ad5a523c-fdc1-49dd-900b-73f801564f1d',)[Errno 17] File exists",)
2021-04-05 03:20:03,513-04 ERROR [org.ovirt.engine.core.vdsbroker.irsbroker.SetVolumeDescriptionVDSCommand] (EE-ManagedThreadFactory-engineScheduled-Thread-62) [13f4ac4f] Command 'SetVolumeDescriptionVDSCommand( SetVolumeDescriptionVDSCommandParameters:{storagePoolId='dfe84316-bbdb-11ea-beb6-00163e1ab088', ignoreFailoverLimit='false', storageDomainId='9ed31575-9cc4-4b05-a5d9-23d21d6e915e', imageGroupId='2af73aad-5079-4583-824c-a93b20b93835', imageId='ad5a523c-fdc1-49dd-900b-73f801564f1d'})' execution failed: IRSGenericException: IRSErrorException: Failed to SetVolumeDescriptionVDS, error = Error while updating volume meta data: ("(u'/rhev/data-center/mnt/glusterSD/192.168.0.4:_data1/9ed31575-9cc4-4b05-a5d9-23d21d6e915e/images/2af73aad-5079-4583-824c-a93b20b93835/ad5a523c-fdc1-49dd-900b-73f801564f1d',)[Errno 17] File exists",), code = 208
2021-04-05 03:20:03,538-04 WARN [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (EE-ManagedThreadFactory-engineScheduled-Thread-62) [13f4ac4f] EVENT_ID: UPDATE_FOR_OVF_STORES_FAILED(1,016), Failed to update OVF disks 2af73aad-5079-4583-824c-a93b20b93835, OVF data isn't updated on those OVF stores (Data Center Default, Storage Domain data1).
2021-04-05 03:20:03,513-04 WARN [org.ovirt.engine.core.bll.storage.ovfstore.ProcessOvfUpdateForStorageDomainCommand] (EE-ManagedThreadFactory-engineScheduled-Thread-62) [13f4ac4f] failed to update domain '9ed31575-9cc4-4b05-a5d9-23d21d6e915e' ovf store disk '2af73aad-5079-4583-824c-a93b20b93835'
2021-04-05 03:20:03,558-04 WARN [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (EE-ManagedThreadFactory-engineScheduled-Thread-62) [13f4ac4f] EVENT_ID: UPDATE_OVF_FOR_STORAGE_DOMAIN_FAILED(190), Failed to update VMs/Templates OVF data for Storage Domain data1 in Data Center Default.
---------------------------------------
Any ideas on how to overcome this issue?
Thanks,
Adrian
3 years, 7 months
Upgrade to release for oVirt 4.4.5 failing
by Gary Pedretty
Upgraded hosted-engine installation to latest version 4.4.5 today and engine will not come online. It starts and I can ssh and connect via cockpit, but it never passes liveliness check according to command line vm-status and Web portal for engine never loads. All processes appear to be working and nothing jumps out to me in the logs but ha agent keeps rebooting it on different hosts with the same result. 8 host in the cluster all are also updated to the latest updates across the board. This is Centos Stream 8
Ideas?
Gary
_______________________________
Gary Pedretty
IT Manager
Ravn Alaska
Office: 907-266-8451
Mobile: 907-388-2247
Email: gary.pedretty(a)ravnalaska.com
"We call Alaska......Home!"
Ravn Alaska
CONFIDENTIALITY NOTICE:
The information in this email may be confidential and/or privileged. This email is intended to be reviewed by only the individual or organization named above. If you are not the intended recipient or an authorized representative of the intended recipient, you are hereby notified that any review, dissemination, forwarding or copying of the email and its attachments, if any, or the information contained herein is prohibited. If you have received this email in error, please immediately notify the sender by return email and delete this email from your system. Thank you.
3 years, 7 months
Standard operating procedure for a node failure on HCI required
by Thomas Hoberg
oVirt may have started as a vSphere 'look-alike', but it graduated to a Nutanix 'clone', at least in terms of marketing.
IMHO that means the 3-node hyperconverged default oVirt setup (2 replicas and 1 arbiter) deserves special love in terms of documenting failure scenarios.
3-node HCI is supposed to defend you against long-term effects of any single point of failure. There is no protection against the loss of dynamic state/session data, but state-free services should recover or resume: that's what it's all about.
Sadly, what I find missing in the oVirt and Gluster documentation is an SOP (standard operating procedure) that one should follow in case of a late-night/early-morning on-call wakeup when one of those three HCI nodes should have failed... dramatically or via a 'brown out' e.g. where only the storage part was actually lost.
My impression is that the oVirt and Gluster teams are barely talking, but in HCI that's fatal.
And I sure can't find those recovery procedures, not even in the commercial RH documents.
So please, either add them or show me where I missed them.
3 years, 7 months
oVirt cannot start VM after reboot with cluster chipset set to i440fx
by Joris Dobbelsteen
Hi
I’m running a oVirt 4.4.4 installation (with a single node and single cluster). After a recent reboot, I’m trying to start the VMs, which fails with:
VM udc1 is down with error. Exit message: XML error: The device at PCI address 0000:00:02.0 cannot be plugged into the PCI controller with index='0'. It requires a controller that accepts a pcie-root-port..
Dmesg shows:
Mar 15 17:07:48 ovirt1 journal[2245]: internal error: a PCI slot is needed to connect a PCI controller model='pcie-root-port', but none is available, and it cannot be automatically added
The issue seems with the “custom chipset/firmware” set to i440fx rather than Q35 for the cluster. The VM is configured to cluster default.
The rationale for running i440 is that Debian doesn’t actually support Q35 with cloud-init, leading to an uninitialized VM. Since I’m running mainly Debian/Ubuntu VMs it’s needed to have i440 set instead.
The error is also visible in oVirt Manager, when checking out the “Vm Devices” for the failed VM (all of them).
The workaround for starting the VM seems to be to do:
Edit -> System -> Advanced Parameters -> Custom Chipset/Firmware Type -> Change to Q35 and back to cluster default (being i440). This resets the list to remove the need for a pcie-root-port.
I gives the impressions that there is an initialization error of the ovirt engine, where the Q35 devices are loaded in incorrectly or maybe not fetched as the right cluster default.
I’ve not tested what happens if the configured for i440fx is applied at VM level, rather than cluster level.
Does anyone know how to resolve this for future versions?
Or what can be done to make this work after a reboot?
Best regards,
Joris
3 years, 7 months
VMConsole: Certificate invalid: name is not a listed principal
by Stefan Seifried
Hi,
I recently tried out the vm console feature on an ovirt eval environment running 4.4.4.7-1.el8:
1.) pasted the pub-key of my local user into the web interface (User->Options)
2.) connected via ssh like so
ssh -p 2222 sevm -l ovirt-vmconsole
3.) got the list of all the vm's :)
4.) choose a vm with a virtio serial console enabled
5.) Certificate invalid :(
---
Please, enter the id of the Serial Console you want to connect to.
To disconnect from a Serial Console, enter the sequence: <Enter><~><.>
SELECT> 24
Certificate invalid: name is not a listed principal
Host key verification failed.
Connection to sevm closed.
---
I guess somethings wrong in "/etc/pki/ovirt-vmconsole". Is there any additional information I can get out from the logs or the console proxy (e.g. the "name" which is not a listed principal)?
To be honest I never really worked with SSH certificates, so I fear I will do something stupid if I try to fix this head on.
So any advice or help on this issue is appreciated.
Thanks,
Stefan
3 years, 7 months
Re: Power failure makes cluster and hosted engine unusable
by Thomas Hoberg
I am glad you got it done!
I find that oVirt resembles more an adventure game (with all its huge emotional rewards, once you prevail), than a streamlined machine, that just works every time you push a button.
Those are boring, sure, but really what I am looking for when the mission is to run infrastructure.
3 years, 7 months
Re: Power failure makes cluster and hosted engine unusable
by Roman Bednar
Hi Seann,
On Mon, Mar 29, 2021 at 8:31 PM Seann G. Clark via Users <users(a)ovirt.org>
wrote:
> All,
>
>
>
> After a power failure, and generator failure I lost my cluster, and the
> Hosted engine refused to restart after power was restored. I would expect,
> once storage comes up that the hosted engine comes back online without too
> much of a fight. In practice because the SPM went down as well, there is no
> (clearly documented) way to clear any of the stale locks, and no way to
> recover both the hosted engine and the cluster.
>
Could you provide more details/logs on storage not coming up? Also more
information about the current locks would be great, is there any procedure
you tried that did not work for cleaning those up?
I have spent the last 12 hours trying to get a functional hosted-engine
> back online, on a new node and each attempt hits a new error, from the
> installer not understanding that 16384mb of dedicated VM memory out of
> 192GB free on the host is indeed bigger than 4096MB, to ansible dying on
> an error like this “Error while executing action: Cannot add Storage
> Connection. Storage connection already exists.”
>
> The memory error referenced above shows up as:
>
> [ ERROR ] fatal: [localhost]: FAILED! => {"changed": false, "msg":
> "Available memory ( {'failed': False, 'changed': False, 'ansible_facts':
> {u'max_mem': u'180746'}}MB ) is less then the minimal requirement (4096MB).
> Be aware that 512MB is reserved for the host and cannot be allocated to the
> engine VM."}
>
> That is what I typically get when I try the steps outlined in the KB
> “CHAPTER 7. RECOVERING A SELF-HOSTED ENGINE FROM AN EXISTING BACKUP” from
> the RH Customer portal. I have tried this numerous ways, and the cluster
> still remains in a bad state, with the hosted engine being 100% inoperable.
>
This could be a bug in the ansible role, did that happen during
"hosted-engine --deploy" or other part of the recovery guide? Provide logs
here as well please, its seems like a completely separate issue though.
>
> What I do have are the two host that are part of the cluster and can host
> the engine, and backups of the original hosted engine, both disk and
> engine-backup generated. I am not sure what I can do next, to recover this
> cluster, any suggestions would be apricated.
>
>
>
> Regards,
>
> Seann
>
>
>
>
> _______________________________________________
> Users mailing list -- users(a)ovirt.org
> To unsubscribe send an email to users-leave(a)ovirt.org
> Privacy Statement: https://www.ovirt.org/privacy-policy.html
> oVirt Code of Conduct:
> https://www.ovirt.org/community/about/community-guidelines/
> List Archives:
> https://lists.ovirt.org/archives/list/users@ovirt.org/message/JLDIFTKYDPQ...
>
3 years, 7 months