May 2020 - Users - oVirt List Archives

Tasks stuck waiting on another after failed storage migration (yet not visible on SPM)
by David Sekne 05 Jun '20

05 Jun '20

Hello, I'm running oVirt version 4.3.9.4-1.el7. After a failed live storage migration a VM got stuck with snapshot. Checking the engine logs I can see that the snapshot removal task is waiting for Merge to complete and vice versa. 2020-05-26 18:34:04,826+02 INFO [org.ovirt.engine.core.bll.snapshots.RemoveSnapshotSingleDiskLiveCommandCallback] (EE-ManagedThreadFactory-engineScheduled-Thread-70) [90f428b0-9c4e-4ac0-8de6-1103fc13da9e] Command 'RemoveSnapshotSingleDiskLive' (id: '60ce36c1-bf74-40a9-9fb0-7fcf7eb95f40') waiting on child command id: 'f7d1de7b-9e87-47ba-9ba0-ee04301ba3b1' type:'Merge' to complete 2020-05-26 18:34:04,827+02 INFO [org.ovirt.engine.core.bll.MergeCommandCallback] (EE-ManagedThreadFactory-engineScheduled-Thread-70) [90f428b0-9c4e-4ac0-8de6-1103fc13da9e] Waiting on merge command to complete (jobId = f694590a-1577-4dce-bf0c-3a8d74adf341) 2020-05-26 18:34:04,845+02 INFO [org.ovirt.engine.core.bll.ConcurrentChildCommandsExecutionCallback] (EE-ManagedThreadFactory-engineScheduled-Thread-70) [90f428b0-9c4e-4ac0-8de6-1103fc13da9e] Command 'RemoveSnapshot' (id: '47c9a847-5b4b-4256-9264-a760acde8275') waiting on child command id: '60ce36c1-bf74-40a9-9fb0-7fcf7eb95f40' type:'RemoveSnapshotSingleDiskLive' to complete 2020-05-26 18:34:14,277+02 INFO [org.ovirt.engine.core.vdsbroker.monitoring.VmJobsMonitoring] (EE-ManagedThreadFactory-engineScheduled-Thread-96) [] VM Job [f694590a-1577-4dce-bf0c-3a8d74adf341]: In progress (no change) I cannot see any runnig tasks on the SPM (vdsm-client Host getAllTasksInfo). I also cannot find the task ID in any of the other node's logs. I already tried restarting the Engine (didn't help). To start I'm puzzled as to where this task is queueing? Any Ideas on how I could resolve this? Thank you. Regards, David

3 12

Re: Single instance scaleup.
by Strahil 05 Jun '20

05 Jun '20

Hi Leo, As you do not have a distributed volume , you can easily switch to replica 2 arbiter 1 or replica 3 volumes. You can use the following for adding the bricks: https://access.redhat.com/documentation/en-US/Red_Hat_Storage/2.1/html/Admi… Best Regards, Strahil NikolivOn May 26, 2019 10:54, Leo David <leoalex(a)gmail.com> wrote: > > Hi Stahil, > Thank you so much for yout input ! > > gluster volume info > > > Volume Name: engine > Type: Distribute > Volume ID: d7449fc2-cc35-4f80-a776-68e4a3dbd7e1 > Status: Started > Snapshot Count: 0 > Number of Bricks: 1 > Transport-type: tcp > Bricks: > Brick1: 192.168.80.191:/gluster_bricks/engine/engine > Options Reconfigured: > nfs.disable: on > transport.address-family: inet > storage.owner-uid: 36 > storage.owner-gid: 36 > features.shard: on > performance.low-prio-threads: 32 > performance.strict-o-direct: off > network.remote-dio: off > network.ping-timeout: 30 > user.cifs: off > performance.quick-read: off > performance.read-ahead: off > performance.io-cache: off > cluster.eager-lock: enable > Volume Name: ssd-samsung > Type: Distribute > Volume ID: 76576cc6-220b-4651-952d-99846178a19e > Status: Started > Snapshot Count: 0 > Number of Bricks: 1 > Transport-type: tcp > Bricks: > Brick1: 192.168.80.191:/gluster_bricks/sdc/data > Options Reconfigured: > cluster.eager-lock: enable > performance.io-cache: off > performance.read-ahead: off > performance.quick-read: off > user.cifs: off > network.ping-timeout: 30 > network.remote-dio: off > performance.strict-o-direct: on > performance.low-prio-threads: 32 > features.shard: on > storage.owner-gid: 36 > storage.owner-uid: 36 > transport.address-family: inet > nfs.disable: on > > The other two hosts will be 192.168.80.192/193 - this is gluster dedicated network over 10GB sfp+ switch. > - host 2 wil have identical harware configuration with host 1 ( each disk is actually a raid0 array ) > - host 3 has: > - 1 ssd for OS > - 1 ssd - for adding to engine volume in a full replica 3 > - 2 ssd's in a raid 1 array to be added as arbiter for the data volume ( ssd-samsung ) > So the plan is to have "engine" scaled in a full replica 3, and "ssd-samsung" scalled in a replica 3 arbitrated. > > > > > On Sun, May 26, 2019 at 10:34 AM Strahil <hunter86_bg(a)yahoo.com> wrote: >> >> Hi Leo, >> >> Gluster is quite smart, but in order to provide any hints , can you provide output of 'gluster volume info <glustervol>'. >> If you have 2 more systems , keep in mind that it is best to mirror the storage on the second replica (2 disks on 1 machine -> 2 disks on the new machine), while for the arbiter this is not neccessary. >> >> What is your network and NICs ? Based on my experience , I can recommend at least 10 gbit/s interfase(s). >> >> Best Regards, >> Strahil Nikolov >> >> On May 26, 2019 07:52, Leo David <leoalex(a)gmail.com> wrote: >>> >>> Hello Everyone, >>> Can someone help me to clarify this ? >>> I have a single-node 4.2.8 installation ( only two gluster storage domains - distributed single drive volumes ). Now I just got two identintical servers and I would like to go for a 3 nodes bundle. >>> Is it possible ( after joining the new nodes to the cluster ) to expand the existing volumes across the new nodes and change them to replica 3 arbitrated ? >>> If so, could you share with me what would it be the procedure ? >>> Thank you very much ! >>> >>> Leo > > > > -- > Best regards, Leo David

4 5

oVirt 4.4 install fails
by Me 04 Jun '20

04 Jun '20

Hi All Not sure where to start, but here goes. I'm not totally new to oVirt, I used RHEV V3.x in production for several years, it was a breeze to setup. Installing 4.4 on to a host with local SSD and FC for storage. Issue 1, having selected the SSD for install which has failed 4.4 beta on it (several times), I reclaim the space and after a few minutes of not being able to enter a root password on the next install screen, it fails as it can't delete the data on the SSD! Yes, really tried this several times, choosing the recover option and getting a prompt, fdisk /dev/sda delete the two partitions created by oVirt and I can install. This was the case with the beta I tried a few weeks ago too. Having reconfigured the switch attached the the host as a dumb 10GBE port as the enterprise OS installer still doesn't appear to support anything more advanced like teaming and VLANS, I have the initial install on the single SSD and a network connection. Issue 2, I use FF 72.0.2 on Linux x64 to connect by https://hostname:9090 to the web interface, but I can't enter login details as the boxes (everything) are disabled???? There is no warning like "we don't like your choice of browser", but the screen is a not very accessible dark grey on darker grey (a poor choice in what I thought were more enlightened times) so this maybe the case. I have disabled all security add-ons in FF, makes no difference. Any suggestions? M

4 5

basic infra and glusterfs sizing question
by Jiří Sléžka 04 Jun '20

04 Jun '20

Hello, I am just curious if basic gluster HCI layout which is suggested in cockpit has some deeper meaning. There are suggested 3 volumes * engine - it is clear, it is the volume where engine vm is running. When this vm is 51GB big how small could this volume be? I have 1TB SSD storage and I would like utilize it as much as possible. Could I create this volume as small as this vm is? Is it safe for example for future upgrades? * vmstore - it make sense it is a space for all other vms running in oVirt. Right? * data - which purpose has this volume? other data like for example ISOs? Direct disks? Another infra question... or maybe request for comment I have small amount of public ipv4 addresses in my housing (but I have own switches there so I can create vlans and separate internal traffic). I can access only these public ipv4 addresses directly. I would like to conserve these addressess as much as possible so what is the best approach in your opinion? * Install all hosts and HE with management network on private addressess * have small router (hw appliance with for example LEDE) which will utilize one ipv4 address and will do NAT and vpn for accessing my internals vlans. + looks like simple approach to me - single point of failure in this router (not really - just in case oVirt is badly broken and I need to access internal vlans to recover it) * have this router as virtual appliance inside oVirt (something like pfSense for example) + no need hw router + not sure but I could probably configure vrrp redundancy - still single point of failure like in first case * any other approach? Could ovn help here somehow? * Install all hosts and HE with public addresses :-) + access to all hosts directly - 3 node HCI cluster uses 4 public ip addressess Thanks for your opinions Cheers, Jiri

4 6

ETL service aggregation error
by Ayansh Rocks 04 Jun '20

04 Jun '20

Hi, I am using 4.3.7 self hosted engine. From Few days i am getting regular below error messages :- [image: image.png] Logs in /var/log/ovirt-engine-dwh/ovirt-engine-dwhd.log [image: image.png] What could be the reason for this? Thanks Shashank

2 2

4.4 regression: engine-setup fails if admin password in answerfile contains a "%"
by Stephen Panicho 03 Jun '20

03 Jun '20

I encountered this error when deploying the Hosted Engine via Cockpit: [ INFO ] TASK [ovirt.engine-setup : Run engine-setup with answerfile] [ ERROR ] fatal: [localhost -> engine.ovirt.trashnet.xyz]: FAILED! => {"changed": true, "cmd": ["engine-setup", "--accept-defaults", "--config-append=/root/ovirt-engine-answers"], "delta": "0:00:01.396490", "end": "2020-05-22 18:32:41.965984", "msg": "non-zero return code", "rc": 1, "start": "2020-05-22 18:32:40.569494", "stderr": "", "stderr_lines": [], "stdout": "[ INFO ] Stage: Initializing\n[ ERROR ] Failed to execute stage 'Initializing': '%' must be followed by '%' or '(', found: '%JUUj'\n[ INFO ] Stage: Clean up\n Log file is located at /var/log/ovirt-engine/setup/ovirt-engine-setup-20200522183241-c7d1kh.log\n[ ERROR ] Failed to execute stage 'Clean up': 'NoneType' object has no attribute 'cleanup'\n[ INFO ] Generating answer file '/var/lib/ovirt-engine/setup/answers/20200522183241-setup.conf'\n[ INFO ] Stage: Pre-termination\n[ INFO ] Stage: Termination\n[ ERROR ] Execution of setup failed", "stdout_lines": ["[ INFO ] Stage: Initializing", "[ ERROR ] Failed to execute stage 'Initializing': '%' must be followed by '%' or '(', found: '%JUUj'", "[ INFO ] Stage: Clean up", " Log file is located at /var/log/ovirt-engine/setup/ovirt-engine-setup-20200522183241-c7d1kh.log", "[ ERROR ] Failed to execute stage 'Clean up': 'NoneType' object has no attribute 'cleanup'", "[ INFO ] Generating answer file '/var/lib/ovirt-engine/setup/answers/20200522183241-setup.conf'", "[ INFO ] Stage: Pre-termination", "[ INFO ] Stage: Termination", "[ ERROR ] Execution of setup failed"]} The important bit is this: Failed to execute stage 'Initializing': '%' must be followed by '%' or '(', found: '%JUUj'" Hey! Those are the last few characters of the admin password. Note that I don't mean the root password to the VM, but the one for the "admin" user of the web interface. I added some debug lines to the Ansible play to see the answerfile that was being generated. OVESETUP_CONFIG/adminPassword=str:&6&yGfcWf#b%JUUj Apparently engine-setup can no longer handle an answerfile with a "%" character in it. This same password worked in 4.3. Once I changed the admin password, installation progressed normally.

5 14

ovirt imageio problem...
by matteo fedeli 01 Jun '20

01 Jun '20

Hi! I' installed CentOS 8 and ovirt package following this step: systemctl enable --now cockpit.socket yum install https://resources.ovirt.org/pub/yum-repo/ovirt-release44.rpm yum module -y enable javapackages-tools yum module -y enable pki-deps yum module -y enable postgresql:12 yum -y install glibc-locale-source glibc-langpack-en localedef -v -c -i en_US -f UTF-8 en_US.UTF-8 yum update yum install ovirt-engine engine-setup (by keeping all default) It's possible ovirt-imageio-proxy service is not installed? (service ovirt-imageio-proxy status --> not found, yum install ovirt-imageio-proxy --> not found) I'm not able to upload iso... I also installed CA cert in firefox...

5 7

Issues deploying 4.4 with HE on new EPYC hosts
by Mark R 01 Jun '20

01 Jun '20

Hello all, I have some EPYC servers that are not yet in production, so I wanted to go ahead and move them off of 4.3 (which was working) to 4.4. I flattened and reinstalled the hosts with CentOS 8.1 Minimal and installed all updates. Some very simple networking, just a bond and two iSCSI interfaces. After adding the oVirt 4.4 repo and installing the requirements, I run 'hosted-engine --deploy' and proceed through the setup. Everything looks as though it is going nicely and the local HE starts and runs perfectly. After copying the HE disks out to storage, the system tries to start it there but is using a different CPU definition and it's impossible to start it. At this point I'm stuck but hoping someone knows the fix, because this is as vanilla a deployment as I could attempt and it appears EPYC CPUs are a no-go right now with 4.4. When the HostedEngineLocal VM is running, the CPU definition is: <cpu mode='custom' match='exact' check='full'> <model fallback='forbid'>EPYC-IBPB</model> <vendor>AMD</vendor> <feature policy='require' name='x2apic'/> <feature policy='require' name='tsc-deadline'/> <feature policy='require' name='hypervisor'/> <feature policy='require' name='tsc_adjust'/> <feature policy='require' name='clwb'/> <feature policy='require' name='umip'/> <feature policy='require' name='arch-capabilities'/> <feature policy='require' name='cmp_legacy'/> <feature policy='require' name='perfctr_core'/> <feature policy='require' name='wbnoinvd'/> <feature policy='require' name='amd-ssbd'/> <feature policy='require' name='skip-l1dfl-vmentry'/> <feature policy='disable' name='monitor'/> <feature policy='disable' name='svm'/> <feature policy='require' name='topoext'/> </cpu> Once the HostedEngine VM is defined and trying to start, the CPU definition is simply: <cpu mode='custom' match='exact' check='partial'> <model fallback='allow'>EPYC</model> <topology sockets='16' cores='4' threads='1'/> <feature policy='require' name='ibpb'/> <feature policy='require' name='virt-ssbd'/> <numa> <cell id='0' cpus='0-63' memory='16777216' unit='KiB'/> </numa> </cpu> On attempts to start it, the host is logging this error: "CPU is incompatible with host CPU: Host CPU does not provide required features: virt-ssbd". So, the HostedEngineLocal VM works because it has a requirement set for 'amd-ssbd' instead of 'virt-ssbd', and a VM requiring 'virt-ssbd' can't run on EPYC CPUs with CentOS 8.1. As mentioned, the HostedEngine ran fine on oVirt 4.3 with CentOS 7.8, and on 4.3 the cpu definition also required 'virt-ssbd', so I can only imagine that perhaps this is due to the more recent 4.x kernel that I now need HE to require 'amd-ssbd' instead? Any clues to help with this? I can completely wipe/reconfigure the hosts as needed so I'm willing to try whatever so that I can move forward with a 4.4 deployment. Thanks! Mark

7 26

tun: unexpected GSO type: 0x0, gso_size 1368, hdr_len 66
by lejeczek 01 Jun '20

01 Jun '20

hi everyone, With 4.4 I get: ... tun: unexpected GSO type ... It happens to a "third-party" kernel, namely: 5.6.15-1.el8.elrepo.x86_64 on Centos 8. I wonder if anybody sees the same or similar and I also wonder if I should report it somewhere in Bugzilla as "heads-up" towards new kernels? [Fri May 29 08:00:02 2020] tun: unexpected GSO type: 0x0, gso_size 1368, hdr_len 66 [Fri May 29 08:00:02 2020] tun: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ [Fri May 29 08:00:02 2020] tun: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ [Fri May 29 08:00:02 2020] tun: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ [Fri May 29 08:00:02 2020] tun: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ [Fri May 29 08:00:02 2020] ------------[ cut here ]------------ [Fri May 29 08:00:02 2020] WARNING: CPU: 2 PID: 3605 at drivers/net/tun.c:2123 tun_do_read+0x524/0x6c0 [tun] [Fri May 29 08:00:02 2020] Modules linked in: sd_mod sg vhost_net vhost tap xt_CHECKSUM xt_MASQUERADE xt_conntrack nf_nat_tftp nf_conntrack_tftp tun nft_nat ipt_REJECT bridge nft_counter nft_objref nf_conntrack_netbios_ns nf_conntrack_broadcast nft_masq nft_fib_inet nft_fib_ipv4 nft_fib_ipv6 nft_fib nft_reject_inet nf_reject_ipv4 nf_reject_ipv6 nft_reject nft_ct nf_tables_set nft_chain_nat nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 qedf qed ip6_tables crc8 nft_compat bnx2fc ip_set cnic uio libfcoe 8021q garp mrp stp llc libfc scsi_transport_fc nf_tables nfnetlink sunrpc vfat fat ext4 mbcache jbd2 snd_hda_codec_hdmi snd_hda_intel snd_intel_dspcfg snd_hda_codec snd_hda_core snd_hwdep snd_seq snd_seq_device edac_mce_amd snd_pcm kvm_amd kvm eeepc_wmi asus_wmi sp5100_tco sparse_keymap irqbypass rfkill wmi_bmof pcspkr joydev i2c_piix4 k10temp snd_timer snd soundcore gpio_amdpt gpio_generic acpi_cpufreq ip_tables xfs libcrc32c dm_crypt ax88179_178a usbnet mii hid_lenovo nouveau video mxm_wmi i2c_algo_bit [Fri May 29 08:00:02 2020] drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops crct10dif_pclmul ttm crc32_pclmul crc32c_intel ahci libahci drm ghash_clmulni_intel libata nvme ccp r8169 nvme_core realtek wmi t10_pi pinctrl_amd dm_mirror dm_region_hash dm_log dm_mod [Fri May 29 08:00:02 2020] CPU: 2 PID: 3605 Comm: vhost-3578 Not tainted 5.6.15-1.el8.elrepo.x86_64 #1 [Fri May 29 08:00:02 2020] Hardware name: System manufacturer System Product Name/PRIME B450M-A, BIOS 2006 11/13/2019 [Fri May 29 08:00:02 2020] RIP: 0010:tun_do_read+0x524/0x6c0 [tun] [Fri May 29 08:00:02 2020] Code: 00 6a 01 0f b7 44 24 22 b9 10 00 00 00 48 c7 c6 cb 33 09 c1 48 c7 c7 d1 33 09 c1 83 f8 40 48 0f 4f c2 31 d2 50 e8 4c 14 df c5 <0f> 0b 58 5a 48 c7 c5 ea ff ff ff e9 d2 fc ff ff 4c 89 e2 be 04 00 [Fri May 29 08:00:02 2020] RSP: 0018:ffffaaf301dfbcb8 EFLAGS: 00010292 [Fri May 29 08:00:02 2020] RAX: 0000000000000000 RBX: ffff88ceae6b4800 RCX: 0000000000000007 [Fri May 29 08:00:02 2020] RDX: 0000000000000000 RSI: 0000000000000096 RDI: ffff88d14e8996b0 [Fri May 29 08:00:02 2020] RBP: 000000000000004e R08: 0000000000000516 R09: 0000000000000055 [Fri May 29 08:00:02 2020] R10: 000000000000072e R11: ffffaaf301dfba88 R12: ffffaaf301dfbe50 [Fri May 29 08:00:02 2020] R13: ffff88d0e98b8900 R14: 0000000000000000 R15: 0000000000000000 [Fri May 29 08:00:02 2020] FS: 0000000000000000(0000) GS:ffff88d14e880000(0000) knlGS:0000000000000000 [Fri May 29 08:00:02 2020] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [Fri May 29 08:00:02 2020] CR2: 000055f6b3af2bd8 CR3: 00000003c6c84000 CR4: 0000000000340ee0 [Fri May 29 08:00:02 2020] Call Trace: [Fri May 29 08:00:02 2020] ? __wake_up_common+0x77/0x140 [Fri May 29 08:00:02 2020] tun_recvmsg+0x6b/0xf0 [tun] [Fri May 29 08:00:02 2020] handle_rx+0x573/0x940 [vhost_net] [Fri May 29 08:00:02 2020] ? log_used.part.45+0x20/0x20 [vhost] [Fri May 29 08:00:02 2020] vhost_worker+0xcc/0x140 [vhost] [Fri May 29 08:00:02 2020] kthread+0x10c/0x130 [Fri May 29 08:00:02 2020] ? kthread_park+0x80/0x80 [Fri May 29 08:00:02 2020] ret_from_fork+0x22/0x40 [Fri May 29 08:00:02 2020] ---[ end trace 9df20668f2e81977 ]--- many thanks, L.

2 1

Problem with Ovirt Machines
by aigini82＠gmail.com 31 May '20

31 May '20

Hi, Our company uses Ovirt to host some of its virtual machines. The version used is 4.2.6.4-1.el7. There are about 36 virtual hosts in it. The specifications used for the host machine is 30G RAM and 6 CPUs. Some of the VMs in the ovirt host run with 4 CPUs. Some with 2 CPUs. The problem I face now is that recently there was a need for high CPU and memory specs to setup a VM for DR. I created a VM with 16G RAM and 6 CPUs, without checking the CPUs available in the host first. After DR, the VM was brought down already. Then later another person in the team brought the VM back up for a different DR use, for a much larger DB restoration purpose. This caused the VM to pause due to storage error. And then worse things happened, whereby 2 other VMs inadvertently went down. Although I assumed that this was caused by storage errors/problems, the senior admins in the team concluded that the problem was due to fencing because of the max allotted CPU for the host being used for the VM. Now what I need to know is how to properly allocate CPU resources to a host to run multiple virtual machines in it like the situation above. I even tried to look for errors in vdsm.log, but this log was not available in the host machine nor in the affected VM. My colleague asked me to check "Events" section of the ovirt management interface to see past the past events. However, I don't find much details about the fencing activity or how the fencing occurred or what caused the fencing. And how did they conclude that the CPU count caused the fencing and not the storage?

2 1