September 2020 - Users - oVirt List Archives

Random hosts disconnects
by anton.louw＠voxtelecom.co.za 16 Sep '20

16 Sep '20

Hi All, I have a strange issue in my oVirt environment. I currently have a standalone manager which is running in VMware. In my oVirt environment, I have two Data Centers. The manager is currently sitting on the same subnet as DC1. Randomly, hosts in DC2 will say “Not Responding” and then 2 seconds later, the hosts will activate again. The strange thing is, when the manager was sitting on the same subnet as DC2, hosts in DC1 will randomly say “Not Responding” I have tried going through the logs, but I cannot see anything out of the ordinary regarding why the hosts would drop connection. I have attached the engine.log for anybody that would like to do a spot check. Thanks

1 0

Disconnected Server has closed the connection.
by info＠worldhostess.com 15 Sep '20

15 Sep '20

It seems that the installation is all done, but I have a problem. it takes very long to open the web pages, plus it disconnect all the time. it is impossible to do anything. I can ping the hostname as I set up a sub-domain for it. to be honest, I am new to this and it took me days to get to this point. I think there are some issues with my network settings. if there are any oVirt experts that can check my installation and give me advice about how to improve it, it will be greatly appreciated. I have done an "Installing oVirt as a self-hosted engine using the Cockpit web interface"

3 3

Gluster quorum issue on 3-node HCI with extra 5-nodes as compute and storage nodes
by thomas＠hoberg.net 15 Sep '20

15 Sep '20

Yes, I've also posted this on the Gluster Slack. But I am using Gluster mostly because it's part of oVirt HCI, so don't just send me away, please! Problem: GlusterD refusing to start due to quorum issues for volumes where it isn’t contributing any brick (I've had this before on a different farm, but there it was transitory. Now I have it in a more observable manner, that's why I open a new topic) In a test farm with recycled servers, I started running Gluster via oVirt 3node-HCI, because I got 3 machines originally. They were set up as group A in a 2:1 (replica:arbiter) oVirt HCI setup with 'engine', 'vmstore' and 'data' volumes, one brick on each node. I then got another five machines with hardware specs that were rather different to group A, so I set those up as group B to mostly act as compute nodes, but also to provide extra storage, mostly to be used externally as GlusterFS shares. It took a bit of fiddling with Ansible but I got these 5 nodes to serve two more Gluster volumes 'tape' and 'scratch' using dispersed bricks (4 disperse:1 redundancy), RAID5 in my mind. The two groups are in one Gluster, not because they serve bricks to the same volumes, but because oVirt doesn't like nodes to be in different Glusters (or actually, to already be in a Gluster when you add them as host node). But the two groups provide bricks to distinct volumes, there is no overlap. After setup things have been running fine for weeks, but now I needed to restart a machine from group B, which has ‘tape’ and ‘scratch’ bricks, but none from original oVirt ‘engine’, ‘vmstore’ and ‘data’ in group A. Yet the gluster daemon refuses to start, citing a loss of quorum for these three volumes, even if it has no bricks in them… which makes no sense to me. I am afraid the source of the issue is concept issues: I clearly don't really understand some design assumptions of Gluster. And I'm afraid the design assumptions of Gluster and of oVirt (even with HCI), are not as related as one might assume from the marketing materials on the oVirt home-page. But most of all I'd like to know: How do I fix this now? I can't heal 'tape' and 'scratch', which are growing ever more apart while the glusterd on this machine in group B refuses to come online for lack of a quorum on volumes where it is not contributing bricks.

3 2

What is the purpose of memory deflation in oVirt memory ballooning?
by pub.virtualization＠gmail.com 15 Sep '20

15 Sep '20

Hi, guys Why does momd(ballooning manager in oVirt) explicitly deflate the balloon when host gets plenty of memory? as far as I know, momd is supporting memory ballooning with setMemory API to inflate/deflate the balloon in guest and I've just checked the memory change in the guest after inflating the balloon. as expected, memory(total, free, available) in the guest was reduced just after inflating the balloon, but it was "automatically" restored to its initial memory after a few seconds. So, here I'm wondering why deflation is additionally required even though it can be restored automatically just after seconds. Thanks.

1 1

Posts not updating on lists.ovirt.org web site since September 10th?
by thomas＠hoberg.net 15 Sep '20

15 Sep '20

I can see new posts coming in via e-mail, but updates on the web sites have stopped and posts don't disappear?

2 1

Any eta for 4.4.2 final?
by Gianluca Cecchi 15 Sep '20

15 Sep '20

Hello, I would like to upgrade a 4.4.0 environment to the latest 4.4.2 when available. Any indication if there are any show stoppers after the rc5 released on 27th of August or any eta about other release candidates? Thanks, Gianluca

4 3

oVirt disk disappeared after importing iSCSI domain
by gantonjo-ovirt＠yahoo.com 15 Sep '20

15 Sep '20

Hi all oVirt experts. So, I managed to f.. up a VM today. I had one VM running on a oVirt 4.3 cluster, where the disk was located on an iSCSI data domain. I stopped the VM, put the storage domain in maintenance mode, detached adn removed the storage domain from the ovirt 4.3 cluster. Then I imported the storage domain to our new oVirt 4.4.1 cluster, let the process convert it from V4 to V5 format and activated the storage domain in the new cluster. Entering the information page for the imported storage domain, I expected to see "Import VM" and "Import Disk" menus, but these menues did not appear as they did for other storage domains that I successfully had moved from old to new cluster. Clicking "Scan Disks" did not help either. So, now I am stuck with a storage domain where the VM's disk is located, but oVirt is not able to see the disk. What can I do to recover the lost disk from the storage domain? Any CLI commands available in ovirt 4.4.1 would be nice. Thanks in advance for your quick and good help.

2 1

Disconnected Server has closed the connection.
by info＠worldhostess.com 15 Sep '20

15 Sep '20

It seems that the installation is all done, but I have a problem. it takes very long to open the web pages, plus it disconnect all the time. it is impossible to do anything. I can ping the hostname as I set up a sub-domain for it. to be honest, I am new to this and it took me days to get to this point. I think there are some issues with my network settings. if there are any oVirt experts that can check my installation and give me advice about how to improve it, it will be greatly appreciated. I have done an "Installing oVirt as a self-hosted engine using the Cockpit web interface"

1 0

Multiple GPU Passthrough with NVLink (Invalid I/O region)
by Vinícius Ferrão 14 Sep '20

14 Sep '20

Hello, here we go again. I’m trying to passthrough 4x NVIDIA Tesla V100 GPUs (with NVLink) to a single VM; but things aren’t that good. Only one GPU shows up on the VM. lspci is able to show the GPUs, but three of them are unusable: 08:00.0 3D controller: NVIDIA Corporation GV100GL [Tesla V100 SXM2 16GB] (rev a1) 09:00.0 3D controller: NVIDIA Corporation GV100GL [Tesla V100 SXM2 16GB] (rev a1) 0a:00.0 3D controller: NVIDIA Corporation GV100GL [Tesla V100 SXM2 16GB] (rev a1) 0b:00.0 3D controller: NVIDIA Corporation GV100GL [Tesla V100 SXM2 16GB] (rev a1) There are some errors on dmesg, regarding a misconfigured BIOS: [ 27.295972] nvidia: loading out-of-tree module taints kernel. [ 27.295980] nvidia: module license 'NVIDIA' taints kernel. [ 27.295981] Disabling lock debugging due to kernel taint [ 27.304180] nvidia: module verification failed: signature and/or required key missing - tainting kernel [ 27.364244] nvidia-nvlink: Nvlink Core is being initialized, major device number 241 [ 27.579261] nvidia 0000:09:00.0: enabling device (0000 -> 0002) [ 27.579560] NVRM: This PCI I/O region assigned to your NVIDIA device is invalid: NVRM: BAR1 is 0M @ 0x0 (PCI:0000:09:00.0) [ 27.579560] NVRM: The system BIOS may have misconfigured your GPU. [ 27.579566] nvidia: probe of 0000:09:00.0 failed with error -1 [ 27.580727] NVRM: This PCI I/O region assigned to your NVIDIA device is invalid: NVRM: BAR0 is 0M @ 0x0 (PCI:0000:0a:00.0) [ 27.580729] NVRM: The system BIOS may have misconfigured your GPU. [ 27.580734] nvidia: probe of 0000:0a:00.0 failed with error -1 [ 27.581299] NVRM: This PCI I/O region assigned to your NVIDIA device is invalid: NVRM: BAR0 is 0M @ 0x0 (PCI:0000:0b:00.0) [ 27.581300] NVRM: The system BIOS may have misconfigured your GPU. [ 27.581305] nvidia: probe of 0000:0b:00.0 failed with error -1 [ 27.581333] NVRM: The NVIDIA probe routine failed for 3 device(s). [ 27.581334] NVRM: loading NVIDIA UNIX x86_64 Kernel Module 450.51.06 Sun Jul 19 20:02:54 UTC 2020 [ 27.649128] nvidia-modeset: Loading NVIDIA Kernel Mode Setting Driver for UNIX platforms 450.51.06 Sun Jul 19 20:06:42 UTC 2020 The host is Secure Intel Skylake (x86_64). VM is running with Q35 Chipset with UEFI (pc-q35-rhel8.2.0) I’ve tried to change the I/O mapping options on the host, tried with 56TB and 12TB without success. Same results. Didn’t tried with 512GB since the machine have 768GB of system RAM. Tried blacklisting the nouveau on the host, nothing. Installed NVIDIA drivers on the host, nothing. In the host I can use the 4x V100, but inside a single VM it’s impossible. Any suggestions?

3 7

Testing ovirt 4.4.1 Nested KVM on Skylake-client (core i5) does not work
by wodel youchi 14 Sep '20

14 Sep '20

Hi, I've been using my core i5 6500 (skylake-client) for some time now to test oVirt on my machine. However this is no longer the case. I am using Fedora 32 as my base system with nested-kvm enabled, when I try to install oVirt 4.4 as HCI single node, I get an error in the last phase which consists of copying the VM-Manager to the engine volume and boot it. It is the boot that causes the problem, I get an error about the CPU : *the CPU is incompatible with host CPU: Host CPU does not provide required features: mpx* *This is the CPU part from virsh domcapabilities on my physical machine* <cpu> <mode name='host-passthrough' supported='yes'/> <mode name='host-model' supported='yes'> *<model fallback='forbid'>Skylake-Client-IBRS</model> * <vendor>Intel</vendor> <feature policy='require' name='ss'/> <feature policy='require' name='vmx'/> <feature policy='require' name='pdcm'/> <feature policy='require' name='hypervisor'/> <feature policy='require' name='tsc_adjust'/> <feature policy='require' name='clflushopt'/> <feature policy='require' name='umip'/> <feature policy='require' name='md-clear'/> <feature policy='require' name='stibp'/> <feature policy='require' name='arch-capabilities'/> <feature policy='require' name='ssbd'/> <feature policy='require' name='xsaves'/> <feature policy='require' name='pdpe1gb'/> <feature policy='require' name='invtsc'/> <feature policy='require' name='ibpb'/> <feature policy='require' name='amd-ssbd'/> <feature policy='require' name='skip-l1dfl-vmentry'/> </mode> <mode name='custom' supported='yes'> <model usable='yes'>qemu64</model> <model usable='yes'>qemu32</model> <model usable='no'>phenom</model> <model usable='yes'>pentium3</model> <model usable='yes'>pentium2</model> <model usable='yes'>pentium</model> <model usable='yes'>n270</model> <model usable='yes'>kvm64</model> <model usable='yes'>kvm32</model> <model usable='yes'>coreduo</model> <model usable='yes'>core2duo</model> <model usable='no'>athlon</model> <model usable='yes'>Westmere-IBRS</model> <model usable='yes'>Westmere</model> <model usable='no'>Skylake-Server-IBRS</model> <model usable='no'>Skylake-Server</model> <model usable='yes'>Skylake-Client-IBRS</model> <model usable='yes'>Skylake-Client</model> <model usable='yes'>SandyBridge-IBRS</model> <model usable='yes'>SandyBridge</model> <model usable='yes'>Penryn</model> <model usable='no'>Opteron_G5</model> <model usable='no'>Opteron_G4</model> <model usable='no'>Opteron_G3</model> <model usable='yes'>Opteron_G2</model> <model usable='yes'>Opteron_G1</model> <model usable='yes'>Nehalem-IBRS</model> <model usable='yes'>Nehalem</model> <model usable='yes'>IvyBridge-IBRS</model> <model usable='yes'>IvyBridge</model> <model usable='no'>Icelake-Server</model> <model usable='no'>Icelake-Client</model> <model usable='yes'>Haswell-noTSX-IBRS</model> <model usable='yes'>Haswell-noTSX</model> <model usable='yes'>Haswell-IBRS</model> <model usable='yes'>Haswell</model> <model usable='no'>EPYC-IBPB</model> <model usable='no'>EPYC</model> <model usable='no'>Dhyana</model> <model usable='yes'>Conroe</model> <model usable='no'>Cascadelake-Server</model> <model usable='yes'>Broadwell-noTSX-IBRS</model> <model usable='yes'>Broadwell-noTSX</model> <model usable='yes'>Broadwell-IBRS</model> <model usable='yes'>Broadwell</model> <model usable='yes'>486</model> </mode> </cpu> *Here is the lscpu of my physical machine* # lscpu Architecture: x86_64 CPU op-mode(s): 32-bit, 64-bit Byte Order: Little Endian Address sizes: 39 bits physical, 48 bits virtual CPU(s): 4 On-line CPU(s) list: 0-3 Thread(s) per core: 1 Core(s) per socket: 4 Socket(s): 1 NUMA node(s): 1 Vendor ID: GenuineIntel CPU family: 6 Model: 94 Model name: Intel(R) Core(TM) i5-6500 CPU @ 3.20GHz Stepping: 3 CPU MHz: 954.588 CPU max MHz: 3600.0000 CPU min MHz: 800.0000 BogoMIPS: 6399.96 Virtualization: VT-x L1d cache: 128 KiB L1i cache: 128 KiB L2 cache: 1 MiB L3 cache: 6 MiB NUMA node0 CPU(s): 0-3 Vulnerability Itlb multihit: KVM: Mitigation: Split huge pages Vulnerability L1tf: Mitigation; PTE Inversion; VMX conditional cache flushes, SMT disabled Vulnerability Mds: Mitigation; Clear CPU buffers; SMT disabled Vulnerability Meltdown: Mitigation; PTI Vulnerability Spec store bypass: Mitigation; Speculative Store Bypass disabled via prctl and seccomp Vulnerability Spectre v1: Mitigation; usercopy/swapgs barriers and __user pointer sanitization Vulnerability Spectre v2: Mitigation; Full generic retpoline, IBPB conditional, IBRS_FW, STIBP disabled, RSB filling Vulnerability Srbds: Vulnerable: No microcode Vulnerability Tsx async abort: Mitigation; Clear CPU buffers; SMT disabled Flags: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx pdpe1gb rdtscp lm constan t_tsc art arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc cpuid aperfmperf pni pclmulqdq dtes64 monitor ds_cpl vmx smx est tm2 ssse3 sdbg fma cx16 xtpr pdcm pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand lahf_lm abm 3dnowprefetch cpuid_fault invpcid_single pti ssbd ibrs ibpb stibp tpr_shadow vnmi flexpriority ept vpid ept_ad fsgsbase tsc_adjust bmi1 hle avx2 smep bmi2 erms invpcid rtm *mpx* rdseed adx smap clflushopt in tel_pt xsaveopt xsavec xgetbv1 xsaves dtherm ida arat pln pts hwp hwp_notify hwp_act_window hwp_epp md_clear flush_l1d *Here is the CPU part from virsh dumpxml of my ovirt hypervisor* <cpu mode='custom' match='exact' check='full'> <model fallback='forbid'>Skylake-Client-IBRS</model> <vendor>Intel</vendor> <feature policy='require' name='ss'/> <feature policy='require' name='vmx'/> <feature policy='require' name='pdcm'/> <feature policy='require' name='hypervisor'/> <feature policy='require' name='tsc_adjust'/> <feature policy='require' name='clflushopt'/> <feature policy='require' name='umip'/> <feature policy='require' name='md-clear'/> <feature policy='require' name='stibp'/> <feature policy='require' name='arch-capabilities'/> <feature policy='require' name='ssbd'/> <feature policy='require' name='xsaves'/> <feature policy='require' name='pdpe1gb'/> <feature policy='require' name='ibpb'/> <feature policy='require' name='amd-ssbd'/> <feature policy='require' name='skip-l1dfl-vmentry'/> <feature policy='disable' name='mpx'/> </cpu> *Here is the lcpu of my ovirt hypervisor* [root@node1 ~]# lscpu Architecture : x86_64 Mode(s) opératoire(s) des processeurs : 32-bit, 64-bit Boutisme : Little Endian Processeur(s) : 4 Liste de processeur(s) en ligne : 0-3 Thread(s) par cœur : 1 Cœur(s) par socket : 1 Socket(s) : 4 Nœud(s) NUMA : 1 Identifiant constructeur : GenuineIntel Famille de processeur : 6 Modèle : 94 Nom de modèle : Intel Core Processor (Skylake, IBRS) Révision : 3 Vitesse du processeur en MHz : 3191.998 BogoMIPS : 6383.99 Virtualisation : VT-x Constructeur d'hyperviseur : KVM Type de virtualisation : complet Cache L1d : 32K Cache L1i : 32K Cache L2 : 4096K Cache L3 : 16384K Nœud NUMA 0 de processeur(s) : 0-3 Drapaux : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ss syscall nx pdpe1gb rdtscp lm constant_tsc rep_go od nopl xtopology cpuid tsc_known_freq pni pclmulqdq vmx ssse3 fma cx16 pdcm pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand hypervisor lahf_lm abm 3dnow prefetch cpuid_fault invpcid_single pti ssbd ibrs ibpb stibp tpr_shadow vnmi flexpriority ept vpid fsgsbase tsc_adjust bmi1 hle avx2 smep bmi2 erms invpcid rtm rdseed adx smap clflushopt xs aveopt xsavec xgetbv1 xsaves arat umip md_clear arch_capabilities it seems not all the flags are presented to the hypervisor especially the mpx which causes the error Is there a workaround for this? Regards.

4 6