June 2020 - Users - oVirt List Archives

Lots of problems with deploying the hosted-engine (ovirt 4.4 | CentOS 8.2.2004)
by jonas 13 Jul '20

13 Jul '20

Hi! I have banged my head against deploying the ovirt 4.4 self-hosted engine on Centos 8.2 for last couple of days. First I was astonished that resources.ovirt.org has no IPv6 connectivity, which made my initial plan for a mostly IPv6-only deployment impossible. CentOS was installed from scratch using the ks.cgf Kickstart file below, which also adds the ovirt 4.4 repo and installs cockpit-ovirt-dashboard & ovirt-engine-appliance. When deploying the hosted-engine from cockpit while logged in as a non-root (although privileged) user, the "(3) Prepare VM" step instantly fails with a nondescript error message and without generating any logs. By using the browser dev tools it was determined that this was because the ansible vars file could not be created as the non-root user did not have write permissions in '/var/lib/ovirt-hosted-engine-setup/cockpit/' . Shouldn't cockpit be capable of using sudo when appropriate, or at least give a more descriptive error message? After login into cockpit as root, or when using the command line ovirt-hosted-engine-setup tool, the deployment fails with "Failed to download metadata for repo 'AppStream'". This seems to be because a) the dnsmasq running on the host does not forward dns queries, even though the host itself can resolve dns queries just fine, and b) there also does not seem to be any functioning routing setup to reach anything outside the host. Regarding a) it is strange that dnsmasq is running with a config file '/var/lib/libvirt/dnsmasq/default.conf' containing the 'no-resolv' option. Could the operation of systemd-resolved be interfering with dnsmasq (see ss -tulpen output)? I tried to manually stop systemd-resolved, but got the same behaviour as before. I hope someone could give me a hint how I could get past this problem, as so far my ovirt experience has been a little bit sub-par. :D Also when running ovirt-hosted-engine-cleanup, the extracted engine VMs in /var/tmp/localvm* are not removed, leading to a "disk-memory-leak" with subsequent runs. Best regards Jonas --- ss -tulpen output post deploy-run --- [root@nxtvirt ~]# ss -tulpen | grep ':53 ' udp UNCONN 0 0 127.0.0.53%lo:53 0.0.0.0:* users:(("systemd-resolve",pid=1379,fd=18)) uid:193 ino:32910 sk:6 <-> udp UNCONN 0 0 [fd00:1234:5678:900::1]:53 [::]:* users:(("dnsmasq",pid=13525,fd=15)) uid:979 ino:113580 sk:d v6only:1 <-> udp UNCONN 0 0 [fe80::5054:ff:fe94:f314]%virbr0:53 [::]:* users:(("dnsmasq",pid=13525,fd=12)) uid:979 ino:113575 sk:e v6only:1 <-> tcp LISTEN 0 32 [fd00:1234:5678:900::1]:53 [::]:* users:(("dnsmasq",pid=13525,fd=16)) uid:979 ino:113581 sk:20 v6only:1 <-> tcp LISTEN 0 32 [fe80::5054:ff:fe94:f314]%virbr0:53 [::]:* users:(("dnsmasq",pid=13525,fd=13)) uid:979 ino:113576 sk:21 v6only:1 <-> --- running dnsmasq processes on host ('nxtvirt') post deploy-run --- dnsmasq 13525 0.0 0.0 71888 2344 ? S 12:31 0:00 /usr/sbin/dnsmasq --conf-file=/var/lib/libvirt/dnsmasq/default.conf --leasefile-ro --dhcp-script=/usr/libexec/libvirt_leaseshelper root 13526 0.0 0.0 71860 436 ? S 12:31 0:00 /usr/sbin/dnsmasq --conf-file=/var/lib/libvirt/dnsmasq/default.conf --leasefile-ro --dhcp-script=/usr/libexec/libvirt_leaseshelper --- var/lib/libvirt/dnsmasq/default.conf --- ##WARNING: THIS IS AN AUTO-GENERATED FILE. CHANGES TO IT ARE LIKELY TO BE ##OVERWRITTEN AND LOST. Changes to this configuration should be made using: ## virsh net-edit default ## or other application using the libvirt API. ## ## dnsmasq conf file created by libvirt strict-order pid-file=/run/libvirt/network/default.pid except-interface=lo bind-dynamic interface=virbr0 dhcp-option=3 no-resolv ra-param=*,0,0 dhcp-range=fd00:1234:5678:900::10,fd00:1234:5678:900::ff,64 dhcp-lease-max=240 dhcp-hostsfile=/var/lib/libvirt/dnsmasq/default.hostsfile addn-hosts=/var/lib/libvirt/dnsmasq/default.addnhosts enable-ra --- cockpit wizard overview before the 'Prepare VM' step --- VM Engine FQDN:engine.*REDACTED* MAC Address:00:16:3e:20:13:b3 Network Configuration:Static VM IP Address:*REDACTED*:1099:babe::3/64 Gateway Address:*REDACTED*:1099::1 DNS Servers:*REDACTED*:1052::11 Root User SSH Access:yes Number of Virtual CPUs:4 Memory Size (MiB):4096 Root User SSH Public Key:(None) Add Lines to /etc/hosts:yes Bridge Name:ovirtmgmt Apply OpenSCAP profile:no Engine SMTP Server Name:localhost SMTP Server Port Number:25 Sender E-Mail Address:root@localhost Recipient E-Mail Addresses:root@localhost --- ks.cgf --- #version=RHEL8 ignoredisk --only-use=vda autopart --type=lvm # Partition clearing information clearpart --drives=vda --all --initlabel # Use graphical install #graphical text # Use CDROM installation media cdrom # Keyboard layouts keyboard --vckeymap=de --xlayouts='de','us' # System language lang en_US.UTF-8 # Network information network --bootproto=static --device=enp1s0 --ip=192.168.199.250 --netmask=255.255.255.0 --gateway=192.168.199.10 --ipv6=*REDACTED*:1090:babe::250/64 --ipv6gateway=*REDACTED*:1090::1 --hostname=nxtvirt.*REDACTED* --nameserver=*REDACTED*:1052::11 --activate network --hostname=nxtvirt.*REDACTED* # Root password rootpw --iscrypted $6$*REDACTED* firewall --enabled --service=cockpit --service=ssh # Run the Setup Agent on first boot firstboot --enable # Do not configure the X Window System skipx # System services services --enabled="chronyd" # System timezone timezone Etc/UTC --isUtc --ntpservers=ntp.*REDACTED*,ntp2.*REDACTED* user --name=nonrootuser --groups=wheel --password=$6$*REDACTED* --iscrypted # KVM Users/Groups group --name=kvm --gid=36 user --name=vdsm --uid=36 --gid=36 %packages @^server-product-environment #@graphical-admin-tools @headless-management kexec-tools cockpit %end %addon com_redhat_kdump --enable --reserve-mb='auto' %end %anaconda pwpolicy root --minlen=6 --minquality=1 --notstrict --nochanges --notempty pwpolicy user --minlen=6 --minquality=1 --notstrict --nochanges --emptyok pwpolicy luks --minlen=6 --minquality=1 --notstrict --nochanges --notempty %end %post --erroronfail --log=/root/ks-post.log #!/bin/sh dnf update -y # NFS storage mkdir -p /opt/ovirt/nfs-storage chown -R 36:36 /opt/ovirt/nfs-storage chmod 0755 /opt/ovirt/nfs-storage echo "/opt/ovirt/nfs-storage localhost" > /etc/exports echo "/opt/ovirt/nfs-storage engine.*REDACTED*" >> /etc/exports dnf install -y nfs-utils systemctl enable nfs-server.service # Install ovirt packages dnf install -y https://resources.ovirt.org/pub/yum-repo/ovirt-release44.rpm dnf install -y cockpit-ovirt-dashboard ovirt-engine-appliance # Enable cockpit systemctl enable cockpit.socket %end #reboot --eject --kexec reboot --eject --- Host (nxtvirt) ip -a post deploy-run --- 1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000 link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00 inet 127.0.0.1/8 scope host lo valid_lft forever preferred_lft forever inet6 ::1/128 scope host valid_lft forever preferred_lft forever 2: enp1s0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc fq_codel state UP group default qlen 1000 link/ether 52:54:00:ad:79:1b brd ff:ff:ff:ff:ff:ff inet 192.168.199.250/24 brd 192.168.199.255 scope global noprefixroute enp1s0 valid_lft forever preferred_lft forever inet6 *REDACTED*:1099:babe::250/64 scope global noprefixroute valid_lft forever preferred_lft forever inet6 fe80::5054:ff:fead:791b/64 scope link noprefixroute valid_lft forever preferred_lft forever 5: virbr0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default qlen 1000 link/ether 52:54:00:94:f3:14 brd ff:ff:ff:ff:ff:ff inet6 fd00:1234:5678:900::1/64 scope global valid_lft forever preferred_lft forever inet6 fe80::5054:ff:fe94:f314/64 scope link valid_lft forever preferred_lft forever 6: virbr0-nic: <BROADCAST,MULTICAST> mtu 1500 qdisc fq_codel master virbr0 state DOWN group default qlen 1000 link/ether 52:54:00:94:f3:14 brd ff:ff:ff:ff:ff:ff 7: vnet0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc fq_codel master virbr0 state UNKNOWN group default qlen 1000 link/ether fe:16:3e:68:d3:8a brd ff:ff:ff:ff:ff:ff inet6 fe80::fc16:3eff:fe68:d38a/64 scope link valid_lft forever preferred_lft forever --- iptables-save post deploy-run --- # Generated by iptables-save v1.8.4 on Sun Jun 28 13:20:53 2020 *filter :INPUT ACCEPT [4007:8578553] :FORWARD ACCEPT [0:0] :OUTPUT ACCEPT [3920:7633249] :LIBVIRT_INP - [0:0] :LIBVIRT_OUT - [0:0] :LIBVIRT_FWO - [0:0] :LIBVIRT_FWI - [0:0] :LIBVIRT_FWX - [0:0] -A INPUT -j LIBVIRT_INP -A FORWARD -j LIBVIRT_FWX -A FORWARD -j LIBVIRT_FWI -A FORWARD -j LIBVIRT_FWO -A OUTPUT -j LIBVIRT_OUT -A LIBVIRT_INP -i virbr0 -p udp -m udp --dport 53 -j ACCEPT -A LIBVIRT_INP -i virbr0 -p tcp -m tcp --dport 53 -j ACCEPT -A LIBVIRT_INP -i virbr0 -p udp -m udp --dport 67 -j ACCEPT -A LIBVIRT_INP -i virbr0 -p tcp -m tcp --dport 67 -j ACCEPT -A LIBVIRT_OUT -o virbr0 -p udp -m udp --dport 53 -j ACCEPT -A LIBVIRT_OUT -o virbr0 -p tcp -m tcp --dport 53 -j ACCEPT -A LIBVIRT_OUT -o virbr0 -p udp -m udp --dport 68 -j ACCEPT -A LIBVIRT_OUT -o virbr0 -p tcp -m tcp --dport 68 -j ACCEPT -A LIBVIRT_FWO -i virbr0 -j REJECT --reject-with icmp-port-unreachable -A LIBVIRT_FWI -o virbr0 -j REJECT --reject-with icmp-port-unreachable -A LIBVIRT_FWX -i virbr0 -o virbr0 -j ACCEPT COMMIT # Completed on Sun Jun 28 13:20:53 2020 # Generated by iptables-save v1.8.4 on Sun Jun 28 13:20:53 2020 *security :INPUT ACCEPT [3959:8576054] :FORWARD ACCEPT [0:0] :OUTPUT ACCEPT [3920:7633249] COMMIT # Completed on Sun Jun 28 13:20:53 2020 # Generated by iptables-save v1.8.4 on Sun Jun 28 13:20:53 2020 *raw :PREROUTING ACCEPT [4299:8608260] :OUTPUT ACCEPT [3920:7633249] COMMIT # Completed on Sun Jun 28 13:20:53 2020 # Generated by iptables-save v1.8.4 on Sun Jun 28 13:20:53 2020 *mangle :PREROUTING ACCEPT [4299:8608260] :INPUT ACCEPT [4007:8578553] :FORWARD ACCEPT [0:0] :OUTPUT ACCEPT [3920:7633249] :POSTROUTING ACCEPT [3923:7633408] :LIBVIRT_PRT - [0:0] -A POSTROUTING -j LIBVIRT_PRT COMMIT # Completed on Sun Jun 28 13:20:53 2020 # Generated by iptables-save v1.8.4 on Sun Jun 28 13:20:53 2020 *nat :PREROUTING ACCEPT [337:32047] :INPUT ACCEPT [0:0] :POSTROUTING ACCEPT [159:9351] :OUTPUT ACCEPT [159:9351] :LIBVIRT_PRT - [0:0] -A POSTROUTING -j LIBVIRT_PRT COMMIT # Completed on Sun Jun 28 13:20:53 2020

5 6

Weird problem starting VMs in oVirt-4.4
by Joop 10 Jul '20

10 Jul '20

Hi All, Just had a rather new experience in that starting a VM worked but the kernel entered grub2 rescue console due to the fact that something was wrong with its virtio-scsi disk. The message is Booting from Hard Disk .... error: ../../grub-core/kern/dl.c:266:invalid arch-independent ELF maginc. entering rescue mode... Doing a CTRL-ALT-Del through the spice console let the VM boot correctly. Shutting it down and repeating the procedure I get a disk problem everytime. Weird thing is if I activate the BootMenu and then straight away start the VM all is OK. I don't see any ERROR messages in either vdsm.log, engine.log If I would have to guess it looks like the disk image isn't connected yet when the VM boots but thats weird isn't it? Regards, Joop

10 22

Ovirt 4.3.10 Glusterfs SSD slow performance over 10GE
by jury cat 07 Jul '20

07 Jul '20

Hello all, I am using Ovirt 4.3.10 on Centos 7.8 with glusterfs 6.9 . My Gluster setup is of 3 hosts in replica 3 (2 hosts + 1 arbiter). All the 3 hosts are Dell R720 with Perc Raid Controller H710 mini(that has maximim throughtout 6Gbs) and with 2×1TB samsumg SSD in RAID 0. The volume is partitioned using LVM thin provision and formated XFS. The hosts have separate 10GE network cards for storage traffic. The Gluster Network is connected to this 10GE network cards and is mounted using Fuse Glusterfs(NFS is disabled).Also Migration Network is activated on the same storage network. The problem is that the 10GE network is not used at full potential by the Gluster. If i do live Migration of Vms i can see speeds of 7GB/s ~ 9GB/s. The same network tests using iperf3 reported 9.9GB/s , these exluding the network setup as a bottleneck(i will not paste all the iperf3 tests here for now). I did not enable all the Volume options from "Optimize for Virt Store", because of the bug that cant set volume cluster.granural-heal to enable(this was fixed in vdsm-4 40, but that is working only on Centos 8 with ovirt 4.4 ) . i whould be happy to know what are all these "Optimize for Virt Store" options, so i can set them manually. The speed on the disk inside the host using dd is b etween 1GB/s to 700Mbs. [root@host1 ~]# dd if=/dev/zero of=test bs=100M count=40 cou nt=80 status=progress 8074035200 bytes (8.1 GB) copied, 11.059372 s, 730 MB/s 80+0 records in 80+0 records out 8388608000 bytes (8.4 GB) copied, 11.9928 s, 699 MB/s The dd write test on the gluster volme inside the host is poor only ~ 120MB/s . During the dd test, if i look at Networks->Gluster network ->Hosts at Tx and Rx the network speed barerly reaches over 1Gbs (~1073 Mbs) out of maximum of 10000 Mbs. dd if=/dev/zero of=/rhev/data-center/mnt/glu sterSD/gluster1.domain.local\:_data/test bs=100M count=80 status=progress 8283750400 bytes (8.3 GB) copied, 71.297942 s, 116 MB/s 80+0 records in 80+0 records out 8388608000 bytes (8.4 GB) copied, 71.9545 s, 117 MB/s I have attached my Gluster volume settings and mount options. Thanks, Emy

4 10

oVirt-node 4.4.0 - Hosted engine deployment fails when host is unable to download updates
by Marco Fais 04 Jul '20

04 Jul '20

Hi, fresh installation of oVirt-node 4.4.0 on a cluster -- the hosted-engine --deploy command fails if DNF is unable to download updates. This cluster is not connected to the public network at the moment. If I use a proxy (setting the relevant env. variables) it fails at a later stage (I think the engine VM is trying to download updates as well, but encounters the same issue and doesn't seem to use the proxy). With oVirt-node 4.3.x I didn't have this issue -- any suggestions? [~]# hosted-engine --deploy [ INFO ] Stage: Initializing [ INFO ] Stage: Environment setup During customization use CTRL-D to abort. Continuing will configure this host for serving as hypervisor and will create a local VM with a running engine. The locally running engine will be used to configure a new storage domain and create a VM there. At the end the disk of the local VM will be moved to the shared storage. Are you sure you want to continue? (Yes, No)[Yes]: Configuration files: Log file: /var/log/ovirt-hosted-engine-setup/ovirt-hosted-engine-setup-20200604214638-vae2wf.log Version: otopi-1.9.1 (otopi-1.9.1-1.el8) [ INFO ] DNF Downloading 1 files, 0.00KB [ INFO ] DNF Downloaded Extra Packages for Enterprise Linux 8 - x86_64 [ * ERROR ] DNF Failed to download metadata for repo 'ovirt-4.4-epel'[ ERROR ] DNF Failed to download metadata for repo 'ovirt-4.4-epel'[ ERROR ] Failed to execute stage 'Environment setup': Failed to download metadata for repo 'ovirt-4.4-epel'* [ INFO ] Stage: Clean up [...] Thanks, Marco

2 2

New fenceType in oVirt code for IBM OpenBMC
by Vinícius Ferrão 03 Jul '20

03 Jul '20

Hello, After some days scratching my head I found that oVirt is probably missing fenceTypes for IBM’s implementation of OpenBMC in the Power Management section. The host machine is an OpenPOWER AC922 (ppc64le). The BMC basically is an “ipmilan” device but the ciphers must be defined as 3 or 17 by default: [root@h01 ~]# ipmitool -I lanplus -H 10.20.10.2 root -P 0penBmc -L operator -C 3 channel getciphers ipmi ID IANA Auth Alg Integrity Alg Confidentiality Alg 3 N/A hmac_sha1 hmac_sha1_96 aes_cbc_128 17 N/A hmac_sha256 sha256_128 aes_cbc_128 The default ipmilan connector forces the option cipher=1 which breaks the communication. So I was reading the code and found this “fenceType” class, but I wasn't able to found where to define those classes. So I can create another one called something like openbmc to set cipher=17 by default. Another question is how bad the output is, it only returns a JSON-RPC generic error. But I don’t know how to suggest a fix for this. Thanks,

2 3

VMs shutdown mysteriously
by Bobby 01 Jul '20

01 Jul '20

Hello, All 4 VMs on one of my oVirt cluster node shutdown for an unknown reason almost simultaneously. Please help me to find the root cause. Thanks. Please note the host seems doing fine and never crash or hangs and I can migrate VMs back to it later. Here is the exact timeline of all the related events combined from the host and the VM(s): On oVirt host: /var/log/vdsm/vdsm.log: 2020-06-25 15:25:16,944-0500 WARN (qgapoller/3) [virt.periodic.VmDispatcher] could not run <function <lambda> at 0x7f4ed2f9f5f0> on ['e0257b06-28fd-4d41-83a9-adf1904d3622'] (periodic:289) 2020-06-25 15:25:19,203-0500 WARN (libvirt/events) [root] File: /var/lib/libvirt/qemu/channels/e0257b06-28fd-4d41-83a9-adf1904d3622.ovirt-guest-agent.0 already removed (fileutils:54) 2020-06-25 15:25:19,203-0500 WARN (libvirt/events) [root] File: /var/lib/libvirt/qemu/channels/e0257b06-28fd-4d41-83a9-adf1904d3622.org.qemu.guest_agent.0 already removed (fileutils:54) [root@athos log]# journalctl -u NetworkManager --since=today -- Logs begin at Wed 2020-05-20 22:07:33 CDT, end at Thu 2020-06-25 16:36:05 CDT. -- Jun 25 15:25:18 athos NetworkManager[1600]: <info> [1593116718.1136] device (vnet0): state change: disconnected -> unmanaged (reason 'unmanaged', sys-iface-state: 'removed') Jun 25 15:25:18 athos NetworkManager[1600]: <info> [1593116718.1146] device (vnet0): released from master device SRV-VL /var/log/messages: Jun 25 15:25:18 athos kernel: SRV-VL: port 2(vnet0) entered disabled state Jun 25 15:25:18 athos NetworkManager[1600]: <info> [1593116718.1136] device (vnet0): state change: disconnected -> unmanaged (reason 'unmanaged', sys-iface-state: 'removed') Jun 25 15:25:18 athos NetworkManager[1600]: <info> [1593116718.1146] device (vnet0): released from master device SRV-VL Jun 25 15:25:18 athos libvirtd: 2020-06-25 20:25:18.122+0000: 2713: error : qemuMonitorIO:718 : internal error: End of file from qemu monitor /var/log/libvirt/qemu/aries.log: 2020-06-25T20:25:28.353975Z qemu-kvm: terminating on signal 15 from pid 2713 (/usr/sbin/libvirtd) 2020-06-25 20:25:28.584+0000: shutting down, reason=shutdown ============================================================================================= On the first VM effected (same thing on others): /var/log/ovirt-guest-agent/ovirt-guest-agent.log: MainThread::INFO::2020-06-25 15:25:20,270::ovirt-guest-agent::104::root::Stopping oVirt guest agent CredServer::INFO::2020-06-25 15:25:20,626::CredServer::262::root::CredServer has stopped. MainThread::INFO::2020-06-25 15:25:21,150::ovirt-guest-agent::78::root::oVirt guest agent is down. ============================================================================================= Packages version installated: Host OS version: CentOS 7.7.1908: ovirt-hosted-engine-ha-2.3.5-1.el7.noarch ovirt-provider-ovn-driver-1.2.22-1.el7.noarch ovirt-release43-4.3.6-1.el7.noarch ovirt-imageio-daemon-1.5.2-0.el7.noarch ovirt-vmconsole-1.0.7-2.el7.noarch ovirt-imageio-common-1.5.2-0.el7.x86_64 ovirt-engine-sdk-python-3.6.9.1-1.el7.noarch ovirt-vmconsole-host-1.0.7-2.el7.noarch ovirt-host-4.3.4-1.el7.x86_64 libvirt-4.5.0-23.el7_7.1.x86_64 libvirt-daemon-4.5.0-23.el7_7.1.x86_6 qemu-kvm-ev-2.12.0-33.1.el7.x86_64 qemu-kvm-common-ev-2.12.0-33.1.el7.x86_64 On guest VM: ovirt-guest-agent-1.0.13-1.el6.noarch qemu-guest-agent-0.12.1.2-2.491.el6_8.3.x86_64

4 4

status of oVirt 4.4.x and CentOS 8.2
by Gianluca Cecchi 30 Jun '20

30 Jun '20

Hello, what is the current status both if using plain CentOS based nodes and ovirt-node-ng? Do the release of CentOS 8.2 impact new installation for 4.4.0 and/or 4.4.1rc? Thanks, Gianluca

5 8

Localdisk hook not working
by tim-nospam＠bordemann.com 30 Jun '20

30 Jun '20

Hi, I'd like to use the localdisk hook of vdsm and have configured everything according to the readme: https://github.com/oVirt/vdsm/tree/master/vdsm_hooks/localdisk After installing the hook, configuring the ovirt-engine, creating the volume group, adding the custom property 'localdisk' to the virtual machine and fixing a small bug in the localdisk-helper, vdsm creates the logical volume on the 'ovirt-local' volume group after starting the virtual machine. Unfortunately nothing more seems to happen then. There is no activity on the NAS which would indicate that the disk image is pulled to the host. I can also see no errors on the host. Is anyone currently running ovirt 4.4 with the localdisk hook? What else can I do to find out why the image is not being copied to my host? Thanks, Tim

2 2

Re: VDSM not binding on IPv4 after ovirt-engine restart
by Dominik Holler 30 Jun '20

30 Jun '20

On Tue, Jun 30, 2020 at 6:13 PM Erez Zarum <erezz(a)nanosek.com> wrote: > While troubleshooting a fresh installation of (after a failed one) that > caused all the hosts but the one running the hosted-engine to become in > “Unassigned” state I noticed that the ovirt-engine complains about not > being able to contact the VDSM. > I noticed that VDSM has stopped listening on IPv4. > > Thanks for sharing the details. > I didn’t disable any IPv6 as it states not to disable it on hosts that are > capable running the hosted-engine and it seems that the reason behind it > is that the hosted-engine talks to the host it runs on through “localhost”, > this also explains why the host which the hosted-engine runs on is “OK”. > > Below is from a host that does not run the hosted-engine: > # ss -atn | grep 543 > LISTEN 0 5 *:54322 *:* > ESTAB 0 0 127.0.0.1:54792 127.0.0.1:54321 > ESTAB 0 0 127.0.0.1:54798 127.0.0.1:54321 > LISTEN 0 5 [::]:54321 [::]:* > ESTAB 0 0 [::ffff:127.0.0.1]:54321 > [::ffff:127.0.0.1]:54798 > ESTAB 0 0 [::ffff:127.0.0.1]:54321 > [::ffff:127.0.0.1]:54792 > ESTAB 0 0 [::1]:54321 [::1]:50238 > ESTAB 0 0 [::1]:50238 [::1]:54321 > > Below is from a host that runs the hosted-engine at the moment: > # ss -atn | grep 543 > LISTEN 0 5 *:54322 *:* > LISTEN 0 5 [::]:54321 [::]:* > ESTAB 0 0 [::1]:51230 [::1]:54321 > ESTAB 0 0 [::1]:54321 [::1]:51242 > ESTAB 0 0 [::ffff:10.46.20.23]:54321 > [::ffff:10.46.20.20]:45706 > ESTAB 0 0 [::ffff:10.46.20.23]:54321 > [::ffff:10.46.20.20]:45746 > ESTAB 0 0 [::1]:51240 [::1]:54321 > ESTAB 0 0 [::1]:54321 [::1]:51230 > ESTAB 0 0 [::1]:51242 [::1]:54321 > ESTAB 0 0 [::1]:54321 [::1]:51240 > > The hosted-engine IP is 10.46.20.20 and the host is 10.46.20.23. > > Why do you think the host does not listen to IPv4 anymore? Can you please share the output of "nc -vz 10.46.20.23 54321" executed on engine VM or another host? > /etc/hosts on all hosts: > 127.0.0.1 localhost localhost.localdomain localhost4 > localhost4.localdomain4 > ::1 localhost localhost.localdomain localhost6 > localhost6.localdomain6 > > Perhaps this is relevant but all hosts are enrolled into IDM (FreeIPA) and > as an outcome they all have a DNS record and a PTR record as well as the > ovirt-engine VM. > > # cat /etc/vdsm/vdsm.conf > [vars] > ssl = true > ssl_ciphers = HIGH:!aNULL > ssl_excludes = OP_NO_TLSv1,OP_NO_TLSv1_1 > > [addresses] > management_port = 54321 > > I have tried adding “management_ip = 0.0.0.0” but then it only binds to > IPv4 and yet, the host still shows as Unassigned, sometimes it switches to > “NonResponsive” and trying to “Reinstall” the host fails, the ovirt-engine > complains it can't contact/reach the VDSM, while using netcat from the > ovirt-engine it works. > > I have KSM and Memory Ballooning enabled on the Cluster as well. > > oVirt 4.3.10 installed on CentOS 7.8.2003 > The self-hosted Engine runs on an external GlusterFS, before reinstalling > everything (fresh start of OS, etc..) I tried iSCSI as well. > > > _______________________________________________ > Users mailing list -- users(a)ovirt.org > To unsubscribe send an email to users-leave(a)ovirt.org > Privacy Statement: https://www.ovirt.org/privacy-policy.html > oVirt Code of Conduct: > https://www.ovirt.org/community/about/community-guidelines/ > List Archives: > https://lists.ovirt.org/archives/list/users@ovirt.org/message/NCTWZLS2VPIAB… >

2 1

VDSM not binding on IPv4 after ovirt-engine restart
by Erez Zarum 30 Jun '20

30 Jun '20

While troubleshooting a fresh installation of (after a failed one) that caused all the hosts but the one running the hosted-engine to become in “Unassigned” state I noticed that the ovirt-engine complains about not being able to contact the VDSM. I noticed that VDSM has stopped listening on IPv4. I didn’t disable any IPv6 as it states not to disable it on hosts that are capable running the hosted-engine and it seems that the reason behind it is that the hosted-engine talks to the host it runs on through “localhost”, this also explains why the host which the hosted-engine runs on is “OK”. Below is from a host that does not run the hosted-engine: # ss -atn | grep 543 LISTEN 0 5 *:54322 *:* ESTAB 0 0 127.0.0.1:54792 127.0.0.1:54321 ESTAB 0 0 127.0.0.1:54798 127.0.0.1:54321 LISTEN 0 5 [::]:54321 [::]:* ESTAB 0 0 [::ffff:127.0.0.1]:54321 [::ffff:127.0.0.1]:54798 ESTAB 0 0 [::ffff:127.0.0.1]:54321 [::ffff:127.0.0.1]:54792 ESTAB 0 0 [::1]:54321 [::1]:50238 ESTAB 0 0 [::1]:50238 [::1]:54321 Below is from a host that runs the hosted-engine at the moment: # ss -atn | grep 543 LISTEN 0 5 *:54322 *:* LISTEN 0 5 [::]:54321 [::]:* ESTAB 0 0 [::1]:51230 [::1]:54321 ESTAB 0 0 [::1]:54321 [::1]:51242 ESTAB 0 0 [::ffff:10.46.20.23]:54321 [::ffff:10.46.20.20]:45706 ESTAB 0 0 [::ffff:10.46.20.23]:54321 [::ffff:10.46.20.20]:45746 ESTAB 0 0 [::1]:51240 [::1]:54321 ESTAB 0 0 [::1]:54321 [::1]:51230 ESTAB 0 0 [::1]:51242 [::1]:54321 ESTAB 0 0 [::1]:54321 [::1]:51240 The hosted-engine IP is 10.46.20.20 and the host is 10.46.20.23. /etc/hosts on all hosts: 127.0.0.1 localhost localhost.localdomain localhost4 localhost4.localdomain4 ::1 localhost localhost.localdomain localhost6 localhost6.localdomain6 Perhaps this is relevant but all hosts are enrolled into IDM (FreeIPA) and as an outcome they all have a DNS record and a PTR record as well as the ovirt-engine VM. # cat /etc/vdsm/vdsm.conf [vars] ssl = true ssl_ciphers = HIGH:!aNULL ssl_excludes = OP_NO_TLSv1,OP_NO_TLSv1_1 [addresses] management_port = 54321 I have tried adding “management_ip = 0.0.0.0” but then it only binds to IPv4 and yet, the host still shows as Unassigned, sometimes it switches to “NonResponsive” and trying to “Reinstall” the host fails, the ovirt-engine complains it can't contact/reach the VDSM, while using netcat from the ovirt-engine it works. I have KSM and Memory Ballooning enabled on the Cluster as well. oVirt 4.3.10 installed on CentOS 7.8.2003 The self-hosted Engine runs on an external GlusterFS, before reinstalling everything (fresh start of OS, etc..) I tried iSCSI as well.

1 0