February 2019 - Users - oVirt List Archives

Version 4.3.0 installed, but shows 4.2.8
by arturo＠somosespeciales.com 05 Feb '19

05 Feb '19

Hi: I just made a new installation from scratch since the last version of oVirt (ovirt-node-ng-installer-4.3.0-2019020409.el7.iso) and at the end of the installation and entering the cockpit, appears that the version is 4.2.8. Here is a screenshot: https://drive.google.com/uc?export=download&id=1muIsMPeiMYi-FN4Y4NTxSfQ-EeV… Any ideas? Thanks in advance. Greetings

1 2

oVirt 4.3 node-ng host kernel options for nvidia vGPU
by Edward Berger 05 Feb '19

05 Feb '19

Hi, One of our projects wants to try offering VMs with nvidia vGPU. My co-worker had some problems before, so I thought I'd try the latest 4.3 ovirt-node-ng. In the "Edit Host" -> kernel dialog I see two promising checkbox options Hostdev Passthrough & SR-IOV (which adds to kernel line intel_iommu=on) and Blacklist Nouveau (which adds to kernel line rdblacklist=nouveau) but they seem to be acting as mutually exclusive options, when both are selected the kernel command line box is outlined in red and I can't continue on. Am I wrong to want both options?

1 0

Host compatibility issue after upgrade from 4.2.8 to 4.3.0
by ronjero＠gmail.com 05 Feb '19

05 Feb '19

I have a three node (hyper converged) cluster, all hosts are identical hardware wise, however after the upgrade one of the three is getting kicked out of the cluster with the following error: Host is compatible with versions (3.6,4.0,4.1,4.2) and cannot join Cluster... The hosts have SandyBridge processors but do have SSBD: CPU Type: Intel SandyBridge IBRS SSBD Family. virsh -r capabilities | grep ssbd <feature name='ssbd'/> The only other thing I should mention is that I *may* have change the cluster compatibly level to 4.3 while this particular node was in maintenance mode (don't know if that has any influence on this issue). Any help getting this node back into the cluster would be greatly appreciated. Ron.

3 4

ERROR running your engine inside of the hosted-engine VM and are not in "Global Maintenance" mode
by mhumaj＠gmail.com 05 Feb '19

05 Feb '19

Hi, We run ovirt upgrade to 4.3, after upgrade we wanted to run engine-setup but we do not know how to put this host which is simply another virtual machine with ovirt-engine. hosted-engine is running on hosts. During execution engine service will be stopped (OK, Cancel) [OK]: [ ERROR ] It seems that you are running your engine inside of the hosted-engine VM and are not in "Global Maintenance" mode. In that case you should put the system into the "Global Maintenance" mode before running engine-setup, or the hosted-engine HA agent might kill the machine, which might corrupt your data. [ ERROR ] Failed to execute stage 'Setup validation': Hosted Engine setup detected, but Global Maintenance is not set. [ INFO ] Stage: Clean up Log file is located at /var/log/ovirt-engine/setup/ovirt-engine-setup-20190205121802-l7llrw.log [ INFO ] Generating answer file '/var/lib/ovirt-engine/setup/answers/20190205121855-setup.conf' [ INFO ] Stage: Pre-termination [ INFO ] Stage: Termination [ ERROR ] Execution of setup failed from the hosted nodes --== Host 2 status ==-- Host ID : 2 Engine status : {"reason": "vm not running on this host", "health": "bad", "vm": "down", "detail": "unknown"} Can anyone please tell me how to put the global maintenance on virtual machine where the ovirt-engine is? not the hosts even if I put them on the global maintenance I am unable to run engine-setup on vm with ovirt-enging. thanks

2 1

[4.3.0] VNC Virt-viewer console not opening
by Nicolas Ecarnot 05 Feb '19

05 Feb '19

Hello, First, congratulations to all of you who worked for this 4.3.0 release, and obviously thank you. Today, I upgraded 4 oVirt setups (4 DC) from 4.2.7 to 4.3.0. I went well on all 4 DCs. But on one of them, when I try to open a console, I see it open as a flash (it opens and closes immediately). I'm using Firefox 64.0 with Ubuntu 18.10, and all my VMs are setup like this : - video type : QXL - Gfx protocol : VNC - VNC Kbd layout : fr and I'm using virt-viewer On the problematic DC, all the VMs are showing the same issue. When I try to use Spice instead of VNc, it is working nicely. When I try to use noVNC, the additional tab opens and shows "Unsupported security types: 19" I tried to track down this issue thanks to the firefox dev console, but it's beyond my understanding. Trying the same with Chromium does the same blinking open/close. I'd rather learn how to provide additionnal debug messages, but /var/log/ovirt-engine/engine.log does not give any useful hint : 2019-02-04 16:57:04,150+01 INFO [org.ovirt.engine.core.bll.SetVmTicketCommand] (default task-24) [1fb01d42] Running command: SetVmTicketCommand internal: false. Entities affected : ID: 0c3e02b3-7fec-4bb1-b3d6-2e6c228e7278 Type: VMAction group CONNECT_TO_VM with role type USER 2019-02-04 16:57:04,155+01 INFO [org.ovirt.engine.core.vdsbroker.vdsbroker.SetVmTicketVDSCommand] (default task-24) [1fb01d42] START, SetVmTicketVDSCommand(HostName = hv01.prd.sdis38.fr, SetVmTicketVDSCommandParameters:{hostId=' 687c1c01-a5e1-449c-89d2-9713ccfc2487', vmId='0c3e02b3-7fec-4bb1-b3d6-2e6c228e7278', protocol='VNC', ticket='IivrpGHx5zSw', validTime='120', userName='admin', userId='4a340386-851a-11e8-863d-3417ebeef1af', disconnectAction='NONE'} ), log id: 2a897f30 2019-02-04 16:57:04,188+01 INFO [org.ovirt.engine.core.vdsbroker.vdsbroker.SetVmTicketVDSCommand] (default task-24) [1fb01d42] FINISH, SetVmTicketVDSCommand, return: , log id: 2a897f30 2019-02-04 16:57:04,211+01 INFO [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (default task-24) [1fb01d42] EVENT_ID: VM_SET_TICKET(164), User admin@internal-authz initiated console session for VM ad02.ct at.sdis38.fr What could I give to help you help me? -- Nicolas ECARNOT

4 5

Upgrade guide for node from 4.2->4.3
by Juhani Rautiainen 05 Feb '19

05 Feb '19

Hi! Thanks for the new release. I managed to upgrade the engine from 4.2->4.3 with old upgrade instructions but I'm having problems with 4.2 node upgrades. Adding repos with ovirt-release43 doesn't allow node upgrade. Lot's of missing dependencies. Dpdk depencies could be solved with adding Centos Extras, but what repo you should use for openscap, openscap-utils and scap-security-guide packages? -Juhani

3 5

Ovirt cluster unstable; gluster to blame (again)
by Jim Kusznir 04 Feb '19

04 Feb '19

hi all: Once again my production ovirt cluster is collapsing in on itself. My servers are intermittently unavailable or degrading, customers are noticing and calling in. This seems to be yet another gluster failure that I haven't been able to pin down. I posted about this a while ago, but didn't get anywhere (no replies that I found). The problem started out as a glusterfsd process consuming large amounts of ram (up to the point where ram and swap were exhausted and the kernel OOM killer killed off the glusterfsd process). For reasons not clear to me at this time, that resulted in any VMs running on that host and that gluster volume to be paused with I/O error (the glusterfs process is usually unharmed; why it didn't continue I/O with other servers is confusing to me). I have 3 servers and a total of 4 gluster volumes (engine, iso, data, and data-hdd). The first 3 are replica 2+arb; the 4th (data-hdd) is replica 3. The first 3 are backed by an LVM partition (some thin provisioned) on an SSD; the 4th is on a seagate hybrid disk (hdd + some internal flash for acceleration). data-hdd is the only thing on the disk. Servers are Dell R610 with the PERC/6i raid card, with the disks individually passed through to the OS (no raid enabled). The above RAM usage issue came from the data-hdd volume. Yesterday, I cought one of the glusterfsd high ram usage before the OOM-Killer had to run. I was able to migrate the VMs off the machine and for good measure, reboot the entire machine (after taking this opportunity to run the software updates that ovirt said were pending). Upon booting back up, the necessary volume healing began. However, this time, the healing caused all three servers to go to very, very high load averages (I saw just under 200 on one server; typically they've been 40-70) with top reporting IO Wait at 7-20%. Network for this volume is a dedicated gig network. According to bwm-ng, initially the network bandwidth would hit 50MB/s (yes, bytes), but tailed off to mostly in the kB/s for a while. All machines' load averages were still 40+ and gluster volume heal data-hdd info reported 5 items needing healing. Server's were intermittently experiencing IO issues, even on the 3 gluster volumes that appeared largely unaffected. Even the OS activities on the hosts itself (logging in, running commands) would often be very delayed. The ovirt engine was seemingly randomly throwing engine down / engine up / engine failed notifications. Responsiveness on ANY VM was horrific most of the time, with random VMs being inaccessible. I let the gluster heal run overnight. By morning, there were still 5 items needing healing, all three servers were still experiencing high load, and servers were still largely unstable. I've noticed that all of my ovirt outages (and I've had a lot, way more than is acceptable for a production cluster) have come from gluster. I still have 3 VMs who's hard disk images have become corrupted by my last gluster crash that I haven't had time to repair / rebuild yet (I believe this crash was caused by the OOM issue previously mentioned, but I didn't know it at the time). Is gluster really ready for production yet? It seems so unstable to me.... I'm looking at replacing gluster with a dedicated NFS server likely FreeNAS. Any suggestions? What is the "right" way to do production storage on this (3 node cluster)? Can I get this gluster volume stable enough to get my VMs to run reliably again until I can deploy another storage solution? --Jim

11 23

ETL service aggregation to hourly tables has encountered an error. Please consult the service log for more details.
by melnyksergii＠gmail.com 04 Feb '19

04 Feb '19

Dears, I have an a some error in Ovirt 4.2.7 In dash I see: ETL service aggregation to hourly tables has encountered an error. Please consult the service log for more details. In log ovirt engine server: 2019-01-14 15:59:59|rwL6AB|euUXph|wfcjQ7|OVIRT_ENGINE_DWH|HourlyTimeKeepingJob|Default|5|tWarn|tWarn_1|2019-01-14 15:59:59| ETL service aggregation to hourly tables has encountered an error. lastHourAgg value =Mon Jan 14 14:00:00 EET 2019 and runTime = Mon Jan 14 15:59:59 EET 2019 .Please consult the service log for more details.|42 In some sources people said the problem is in PostgreSQL DB, but I don't understand how can I fix this problem? Thanks

4 9

One disk in illegal state after deleting snapshot with several disks
by Florian Schmid 04 Feb '19

04 Feb '19

Hi, I'm using oVirt 4.2.5. I know, it is not the newest anymore, but our cluster is quite big and I will upgrade to 4.2.8 as soon as possible. I have a VM with several disks, one with virtio (boot device) and 3 other disks with virtio-scsi: log 8190c797-0ed8-421f-85c7-cc1f540408f8 1 GiB root 5edab51c-9113-466c-bd27-e73d4bfb29c4 10 GiB tmp 11d74762-6053-4347-bdf2-4838dc2ea6f0 1 GiB web_web-content bb5b1881-d40f-4ad1-a8c8-8ee594b3fe8a 20 GiB Snapshots where quite small, because not much is changing there. All disks are on NFS v3 share running on NetApp cluster. Some IDs: VM ID: bc25c5c9-353b-45ba-b0d5-5dbba41e9c5f affected disk ID: 6cbd2f85-8335-416f-a208-ef60ecd839a4 Snapshot ID: c8103ae8-3432-4b69-8b91-790cdc37a2da Snapshot disk ID: 2564b125-857e-41fa-b187-2832df277ccf Task ID: 2a60efb5-1a11-49ac-a7f0-406faac219d6 Storage domain ID: 14794a3e-16fc-4dd3-a867-10507acfe293 After triggering snapshot delete task (2a60efb5-1a11-49ac-a7f0-406faac219d6), deletion was running for about one hour and I though it was hanging and I restarted the engine process on self-hosted engine... After that, snapshot was still in lock state, therefore, I deleted the lock: /usr/share/ovirt-engine/setup/dbutils/unlock_entity.sh -t snapshot c8103ae8-3432-4b69-8b91-790cdc37a2da ########################################## CAUTION, this operation may lead to data corruption and should be used with care. Please contact support prior to running this command ########################################## Are you sure you want to proceed? [y/n] y select fn_db_unlock_snapshot('c8103ae8-3432-4b69-8b91-790cdc37a2da'); INSERT 0 1 unlock snapshot c8103ae8-3432-4b69-8b91-790cdc37a2da completed successfully. After trying to delete snapshot again, engine gave the error, that the Disk is in status Illegal. Snapshot file is still there for the disk log: -rw-rw----. 1 vdsm kvm 282M Feb 1 13:33 2564b125-857e-41fa-b187-2832df277ccf -rw-rw----. 1 vdsm kvm 1.0M Jan 22 03:22 2564b125-857e-41fa-b187-2832df277ccf.lease -rw-r--r--. 1 vdsm kvm 267 Feb 1 14:05 2564b125-857e-41fa-b187-2832df277ccf.meta -rw-rw----. 1 vdsm kvm 1.0G Feb 1 11:53 6cbd2f85-8335-416f-a208-ef60ecd839a4 -rw-rw----. 1 vdsm kvm 1.0M Jan 10 12:07 6cbd2f85-8335-416f-a208-ef60ecd839a4.lease -rw-r--r--. 1 vdsm kvm 272 Jan 22 03:22 6cbd2f85-8335-416f-a208-ef60ecd839a4.meta All other snapshots have been merged successfully. I have umounted the disk inside the VM, after I saw, that the snapshot disk is still in use. That's why, the date is not changed anymore. The strange thing is, that it looks like that the merge was working for a short time, because also time of the underlying disk has changed... In database, I have this data about the VM and its snapshot: engine=# select snapshot_id,snapshot_type,status,description from snapshots where vm_id='bc25c5c9-353b-45ba-b0d5-5dbba41e9c5f'; snapshot_id | snapshot_type | status | description --------------------------------------+---------------+--------+------------- f596ba1c-4a6e-4372-9df4-c8e870c55fea | ACTIVE | OK | Active VM c8103ae8-3432-4b69-8b91-790cdc37a2da | REGULAR | OK | cab-3449 engine=# select image_guid,parentid,imagestatus,vm_snapshot_id,volume_type,volume_format,active from images where image_group_id='8190c797-0ed8-421f-85c7-cc1f540408f8'; image_guid | parentid | imagestatus | vm_snapshot_id | volume_type | volume_format | active --------------------------------------+--------------------------------------+-------------+--------------------------------------+-------------+---------------+-------- 2564b125-857e-41fa-b187-2832df277ccf | 6cbd2f85-8335-416f-a208-ef60ecd839a4 | 1 | f596ba1c-4a6e-4372-9df4-c8e870c55fea | 2 | 4 | t 6cbd2f85-8335-416f-a208-ef60ecd839a4 | 00000000-0000-0000-0000-000000000000 | 4 | c8103ae8-3432-4b69-8b91-790cdc37a2da | 2 | 5 | f vdsm-tool dump-volume-chains 14794a3e-16fc-4dd3-a867-10507acfe293: image: 8190c797-0ed8-421f-85c7-cc1f540408f8 - 6cbd2f85-8335-416f-a208-ef60ecd839a4 status: OK, voltype: INTERNAL, format: RAW, legality: LEGAL, type: SPARSE - 2564b125-857e-41fa-b187-2832df277ccf status: OK, voltype: LEAF, format: COW, legality: LEGAL, type: SPARSE @bzlotnik, it would be great, when you can help me to get the disk back only without stopping or starting the VM. I'm really afraid now of deleting snapshots... I will send you the vdsm log from host running the VM and from SPM and engine.log. Thank you very much! BR Florian Schmid

1 0

Wrong CPU performance report
by Hetz Ben Hamo 04 Feb '19

04 Feb '19

Hi, I'm running oVirt 4.2.7.1. I installed Windows 10 pro as a guest, along with QXL and all the drivers, as well as QEMU guest agent. While Windows reports a CPU usage of something like 2-4% when idle, oVirt reports 24-27% CPU usage. Bug? should I report it? Thanks

2 1