Weird problem starting VMs in oVirt-4.4
by Joop
Hi All,
Just had a rather new experience in that starting a VM worked but the
kernel entered grub2 rescue console due to the fact that something was
wrong with its virtio-scsi disk.
The message is Booting from Hard Disk ....
error: ../../grub-core/kern/dl.c:266:invalid arch-independent ELF maginc.
entering rescue mode...
Doing a CTRL-ALT-Del through the spice console let the VM boot
correctly. Shutting it down and repeating the procedure I get a disk
problem everytime. Weird thing is if I activate the BootMenu and then
straight away start the VM all is OK.
I don't see any ERROR messages in either vdsm.log, engine.log
If I would have to guess it looks like the disk image isn't connected
yet when the VM boots but thats weird isn't it?
Regards,
Joop
4 years, 4 months
Ovirt 4.3.10 Glusterfs SSD slow performance over 10GE
by jury cat
Hello all,
I am using Ovirt 4.3.10 on Centos 7.8 with glusterfs 6.9 .
My Gluster setup is of 3 hosts in replica 3 (2 hosts + 1 arbiter).
All the 3 hosts are Dell R720 with Perc Raid Controller H710 mini(that has
maximim throughtout 6Gbs) and with 2×1TB samsumg SSD in RAID 0. The
volume is partitioned using LVM thin provision and formated XFS.
The hosts have separate 10GE network cards for storage traffic.
The Gluster Network is connected to this 10GE network cards and is mounted
using Fuse Glusterfs(NFS is disabled).Also Migration Network is activated
on the same storage network.
The problem is that the 10GE network is not used at full potential by the
Gluster.
If i do live Migration of Vms i can see speeds of 7GB/s ~ 9GB/s.
The same network tests using iperf3 reported 9.9GB/s , these exluding the
network setup as a bottleneck(i will not paste all the iperf3 tests here
for now).
I did not enable all the Volume options from "Optimize for Virt Store",
because of the bug that cant set volume cluster.granural-heal to
enable(this was fixed in vdsm-4
40, but that is working only on Centos 8 with ovirt 4.4 ) .
i whould be happy to know what are all these "Optimize for Virt Store"
options, so i can set them manually.
The speed on the disk inside the host using dd is b etween 1GB/s to 700Mbs.
[root@host1 ~]# dd if=/dev/zero of=test bs=100M count=40 cou nt=80
status=progress 8074035200 bytes (8.1 GB) copied, 11.059372 s, 730 MB/s
80+0 records in 80+0 records out 8388608000 bytes (8.4 GB) copied, 11.9928
s, 699 MB/s
The dd write test on the gluster volme inside the host is poor only ~
120MB/s .
During the dd test, if i look at Networks->Gluster network ->Hosts at Tx
and Rx the network speed barerly reaches over 1Gbs (~1073 Mbs) out of
maximum of 10000 Mbs.
dd if=/dev/zero of=/rhev/data-center/mnt/glu
sterSD/gluster1.domain.local\:_data/test
bs=100M count=80 status=progress 8283750400 bytes (8.3 GB) copied,
71.297942 s, 116 MB/s 80+0 records in 80+0 records out 8388608000 bytes
(8.4 GB) copied, 71.9545 s, 117 MB/s
I have attached my Gluster volume settings and mount options.
Thanks,
Emy
4 years, 4 months
oVirt-node 4.4.0 - Hosted engine deployment fails when host is unable to download updates
by Marco Fais
Hi,
fresh installation of oVirt-node 4.4.0 on a cluster -- the hosted-engine
--deploy command fails if DNF is unable to download updates.
This cluster is not connected to the public network at the moment.
If I use a proxy (setting the relevant env. variables) it fails at a later
stage (I think the engine VM is trying to download updates as well, but
encounters the same issue and doesn't seem to use the proxy).
With oVirt-node 4.3.x I didn't have this issue -- any suggestions?
[~]# hosted-engine --deploy
[ INFO ] Stage: Initializing
[ INFO ] Stage: Environment setup
During customization use CTRL-D to abort.
Continuing will configure this host for serving as hypervisor and
will create a local VM with a running engine.
The locally running engine will be used to configure a new
storage domain and create a VM there.
At the end the disk of the local VM will be moved to the shared
storage.
Are you sure you want to continue? (Yes, No)[Yes]:
Configuration files:
Log file:
/var/log/ovirt-hosted-engine-setup/ovirt-hosted-engine-setup-20200604214638-vae2wf.log
Version: otopi-1.9.1 (otopi-1.9.1-1.el8)
[ INFO ] DNF Downloading 1 files, 0.00KB
[ INFO ] DNF Downloaded Extra Packages for Enterprise Linux 8 - x86_64
[
* ERROR ] DNF Failed to download metadata for repo 'ovirt-4.4-epel'[ ERROR
] DNF Failed to download metadata for repo 'ovirt-4.4-epel'[ ERROR ] Failed
to execute stage 'Environment setup': Failed to download metadata for repo
'ovirt-4.4-epel'*
[ INFO ] Stage: Clean up
[...]
Thanks,
Marco
4 years, 4 months
New fenceType in oVirt code for IBM OpenBMC
by Vinícius Ferrão
Hello,
After some days scratching my head I found that oVirt is probably missing fenceTypes for IBM’s implementation of OpenBMC in the Power Management section. The host machine is an OpenPOWER AC922 (ppc64le).
The BMC basically is an “ipmilan” device but the ciphers must be defined as 3 or 17 by default:
[root@h01 ~]# ipmitool -I lanplus -H 10.20.10.2 root -P 0penBmc -L operator -C 3 channel getciphers ipmi
ID IANA Auth Alg Integrity Alg Confidentiality Alg
3 N/A hmac_sha1 hmac_sha1_96 aes_cbc_128
17 N/A hmac_sha256 sha256_128 aes_cbc_128
The default ipmilan connector forces the option cipher=1 which breaks the communication.
So I was reading the code and found this “fenceType” class, but I wasn't able to found where to define those classes. So I can create another one called something like openbmc to set cipher=17 by default.
Another question is how bad the output is, it only returns a JSON-RPC generic error. But I don’t know how to suggest a fix for this.
Thanks,
4 years, 4 months
VMs shutdown mysteriously
by Bobby
Hello,
All 4 VMs on one of my oVirt cluster node shutdown for an unknown reason
almost simultaneously.
Please help me to find the root cause.
Thanks.
Please note the host seems doing fine and never crash or hangs and I can
migrate VMs back to it later.
Here is the exact timeline of all the related events combined from the host
and the VM(s):
On oVirt host:
/var/log/vdsm/vdsm.log:
2020-06-25 15:25:16,944-0500 WARN (qgapoller/3)
[virt.periodic.VmDispatcher] could not run <function <lambda> at
0x7f4ed2f9f5f0> on ['e0257b06-28fd-4d41-83a9-adf1904d3622'] (periodic:289)
2020-06-25 15:25:19,203-0500 WARN (libvirt/events) [root] File:
/var/lib/libvirt/qemu/channels/e0257b06-28fd-4d41-83a9-adf1904d3622.ovirt-guest-agent.0
already removed (fileutils:54)
2020-06-25 15:25:19,203-0500 WARN (libvirt/events) [root] File:
/var/lib/libvirt/qemu/channels/e0257b06-28fd-4d41-83a9-adf1904d3622.org.qemu.guest_agent.0
already removed (fileutils:54)
[root@athos log]# journalctl -u NetworkManager --since=today
-- Logs begin at Wed 2020-05-20 22:07:33 CDT, end at Thu 2020-06-25
16:36:05 CDT. --
Jun 25 15:25:18 athos NetworkManager[1600]: <info> [1593116718.1136]
device (vnet0): state change: disconnected -> unmanaged (reason
'unmanaged', sys-iface-state: 'removed')
Jun 25 15:25:18 athos NetworkManager[1600]: <info> [1593116718.1146]
device (vnet0): released from master device SRV-VL
/var/log/messages:
Jun 25 15:25:18 athos kernel: SRV-VL: port 2(vnet0) entered disabled state
Jun 25 15:25:18 athos NetworkManager[1600]: <info> [1593116718.1136]
device (vnet0): state change: disconnected -> unmanaged (reason
'unmanaged', sys-iface-state: 'removed')
Jun 25 15:25:18 athos NetworkManager[1600]: <info> [1593116718.1146]
device (vnet0): released from master device SRV-VL
Jun 25 15:25:18 athos libvirtd: 2020-06-25 20:25:18.122+0000: 2713: error :
qemuMonitorIO:718 : internal error: End of file from qemu monitor
/var/log/libvirt/qemu/aries.log:
2020-06-25T20:25:28.353975Z qemu-kvm: terminating on signal 15 from pid
2713 (/usr/sbin/libvirtd)
2020-06-25 20:25:28.584+0000: shutting down, reason=shutdown
=============================================================================================
On the first VM effected (same thing on others):
/var/log/ovirt-guest-agent/ovirt-guest-agent.log:
MainThread::INFO::2020-06-25
15:25:20,270::ovirt-guest-agent::104::root::Stopping oVirt guest agent
CredServer::INFO::2020-06-25
15:25:20,626::CredServer::262::root::CredServer has stopped.
MainThread::INFO::2020-06-25
15:25:21,150::ovirt-guest-agent::78::root::oVirt guest agent is down.
=============================================================================================
Packages version installated:
Host OS version: CentOS 7.7.1908:
ovirt-hosted-engine-ha-2.3.5-1.el7.noarch
ovirt-provider-ovn-driver-1.2.22-1.el7.noarch
ovirt-release43-4.3.6-1.el7.noarch
ovirt-imageio-daemon-1.5.2-0.el7.noarch
ovirt-vmconsole-1.0.7-2.el7.noarch
ovirt-imageio-common-1.5.2-0.el7.x86_64
ovirt-engine-sdk-python-3.6.9.1-1.el7.noarch
ovirt-vmconsole-host-1.0.7-2.el7.noarch
ovirt-host-4.3.4-1.el7.x86_64
libvirt-4.5.0-23.el7_7.1.x86_64
libvirt-daemon-4.5.0-23.el7_7.1.x86_6
qemu-kvm-ev-2.12.0-33.1.el7.x86_64
qemu-kvm-common-ev-2.12.0-33.1.el7.x86_64
On guest VM:
ovirt-guest-agent-1.0.13-1.el6.noarch
qemu-guest-agent-0.12.1.2-2.491.el6_8.3.x86_64
4 years, 4 months
status of oVirt 4.4.x and CentOS 8.2
by Gianluca Cecchi
Hello,
what is the current status both if using plain CentOS based nodes and
ovirt-node-ng?
Do the release of CentOS 8.2 impact new installation for 4.4.0 and/or
4.4.1rc?
Thanks,
Gianluca
4 years, 4 months
Localdisk hook not working
by tim-nospam@bordemann.com
Hi,
I'd like to use the localdisk hook of vdsm and have configured everything according to the readme: https://github.com/oVirt/vdsm/tree/master/vdsm_hooks/localdisk
After installing the hook, configuring the ovirt-engine, creating the volume group, adding the custom property 'localdisk' to the virtual machine and fixing a small bug in the localdisk-helper, vdsm creates the logical volume on the 'ovirt-local' volume group after starting the virtual machine. Unfortunately nothing more seems to happen then. There is no activity on the NAS which would indicate that the disk image is pulled to the host. I can also see no errors on the host.
Is anyone currently running ovirt 4.4 with the localdisk hook?
What else can I do to find out why the image is not being copied to my host?
Thanks,
Tim
4 years, 4 months
Re: VDSM not binding on IPv4 after ovirt-engine restart
by Dominik Holler
On Tue, Jun 30, 2020 at 6:13 PM Erez Zarum <erezz(a)nanosek.com> wrote:
> While troubleshooting a fresh installation of (after a failed one) that
> caused all the hosts but the one running the hosted-engine to become in
> “Unassigned” state I noticed that the ovirt-engine complains about not
> being able to contact the VDSM.
> I noticed that VDSM has stopped listening on IPv4.
>
>
Thanks for sharing the details.
> I didn’t disable any IPv6 as it states not to disable it on hosts that are
> capable running the hosted-engine and it seems that the reason behind it
> is that the hosted-engine talks to the host it runs on through “localhost”,
> this also explains why the host which the hosted-engine runs on is “OK”.
>
> Below is from a host that does not run the hosted-engine:
> # ss -atn | grep 543
> LISTEN 0 5 *:54322 *:*
> ESTAB 0 0 127.0.0.1:54792 127.0.0.1:54321
> ESTAB 0 0 127.0.0.1:54798 127.0.0.1:54321
> LISTEN 0 5 [::]:54321 [::]:*
> ESTAB 0 0 [::ffff:127.0.0.1]:54321
> [::ffff:127.0.0.1]:54798
> ESTAB 0 0 [::ffff:127.0.0.1]:54321
> [::ffff:127.0.0.1]:54792
> ESTAB 0 0 [::1]:54321 [::1]:50238
> ESTAB 0 0 [::1]:50238 [::1]:54321
>
> Below is from a host that runs the hosted-engine at the moment:
> # ss -atn | grep 543
> LISTEN 0 5 *:54322 *:*
> LISTEN 0 5 [::]:54321 [::]:*
> ESTAB 0 0 [::1]:51230 [::1]:54321
> ESTAB 0 0 [::1]:54321 [::1]:51242
> ESTAB 0 0 [::ffff:10.46.20.23]:54321
> [::ffff:10.46.20.20]:45706
> ESTAB 0 0 [::ffff:10.46.20.23]:54321
> [::ffff:10.46.20.20]:45746
> ESTAB 0 0 [::1]:51240 [::1]:54321
> ESTAB 0 0 [::1]:54321 [::1]:51230
> ESTAB 0 0 [::1]:51242 [::1]:54321
> ESTAB 0 0 [::1]:54321 [::1]:51240
>
> The hosted-engine IP is 10.46.20.20 and the host is 10.46.20.23.
>
>
Why do you think the host does not listen to IPv4 anymore?
Can you please share the output of
"nc -vz 10.46.20.23 54321"
executed on engine VM or another host?
> /etc/hosts on all hosts:
> 127.0.0.1 localhost localhost.localdomain localhost4
> localhost4.localdomain4
> ::1 localhost localhost.localdomain localhost6
> localhost6.localdomain6
>
> Perhaps this is relevant but all hosts are enrolled into IDM (FreeIPA) and
> as an outcome they all have a DNS record and a PTR record as well as the
> ovirt-engine VM.
>
> # cat /etc/vdsm/vdsm.conf
> [vars]
> ssl = true
> ssl_ciphers = HIGH:!aNULL
> ssl_excludes = OP_NO_TLSv1,OP_NO_TLSv1_1
>
> [addresses]
> management_port = 54321
>
> I have tried adding “management_ip = 0.0.0.0” but then it only binds to
> IPv4 and yet, the host still shows as Unassigned, sometimes it switches to
> “NonResponsive” and trying to “Reinstall” the host fails, the ovirt-engine
> complains it can't contact/reach the VDSM, while using netcat from the
> ovirt-engine it works.
>
> I have KSM and Memory Ballooning enabled on the Cluster as well.
>
> oVirt 4.3.10 installed on CentOS 7.8.2003
> The self-hosted Engine runs on an external GlusterFS, before reinstalling
> everything (fresh start of OS, etc..) I tried iSCSI as well.
>
>
> _______________________________________________
> Users mailing list -- users(a)ovirt.org
> To unsubscribe send an email to users-leave(a)ovirt.org
> Privacy Statement: https://www.ovirt.org/privacy-policy.html
> oVirt Code of Conduct:
> https://www.ovirt.org/community/about/community-guidelines/
> List Archives:
> https://lists.ovirt.org/archives/list/users@ovirt.org/message/NCTWZLS2VPI...
>
4 years, 4 months
VDSM not binding on IPv4 after ovirt-engine restart
by Erez Zarum
While troubleshooting a fresh installation of (after a failed one) that caused all the hosts but the one running the hosted-engine to become in “Unassigned” state I noticed that the ovirt-engine complains about not being able to contact the VDSM.
I noticed that VDSM has stopped listening on IPv4.
I didn’t disable any IPv6 as it states not to disable it on hosts that are capable running the hosted-engine and it seems that the reason behind it is that the hosted-engine talks to the host it runs on through “localhost”, this also explains why the host which the hosted-engine runs on is “OK”.
Below is from a host that does not run the hosted-engine:
# ss -atn | grep 543
LISTEN 0 5 *:54322 *:*
ESTAB 0 0 127.0.0.1:54792 127.0.0.1:54321
ESTAB 0 0 127.0.0.1:54798 127.0.0.1:54321
LISTEN 0 5 [::]:54321 [::]:*
ESTAB 0 0 [::ffff:127.0.0.1]:54321 [::ffff:127.0.0.1]:54798
ESTAB 0 0 [::ffff:127.0.0.1]:54321 [::ffff:127.0.0.1]:54792
ESTAB 0 0 [::1]:54321 [::1]:50238
ESTAB 0 0 [::1]:50238 [::1]:54321
Below is from a host that runs the hosted-engine at the moment:
# ss -atn | grep 543
LISTEN 0 5 *:54322 *:*
LISTEN 0 5 [::]:54321 [::]:*
ESTAB 0 0 [::1]:51230 [::1]:54321
ESTAB 0 0 [::1]:54321 [::1]:51242
ESTAB 0 0 [::ffff:10.46.20.23]:54321 [::ffff:10.46.20.20]:45706
ESTAB 0 0 [::ffff:10.46.20.23]:54321 [::ffff:10.46.20.20]:45746
ESTAB 0 0 [::1]:51240 [::1]:54321
ESTAB 0 0 [::1]:54321 [::1]:51230
ESTAB 0 0 [::1]:51242 [::1]:54321
ESTAB 0 0 [::1]:54321 [::1]:51240
The hosted-engine IP is 10.46.20.20 and the host is 10.46.20.23.
/etc/hosts on all hosts:
127.0.0.1 localhost localhost.localdomain localhost4 localhost4.localdomain4
::1 localhost localhost.localdomain localhost6 localhost6.localdomain6
Perhaps this is relevant but all hosts are enrolled into IDM (FreeIPA) and as an outcome they all have a DNS record and a PTR record as well as the ovirt-engine VM.
# cat /etc/vdsm/vdsm.conf
[vars]
ssl = true
ssl_ciphers = HIGH:!aNULL
ssl_excludes = OP_NO_TLSv1,OP_NO_TLSv1_1
[addresses]
management_port = 54321
I have tried adding “management_ip = 0.0.0.0” but then it only binds to IPv4 and yet, the host still shows as Unassigned, sometimes it switches to “NonResponsive” and trying to “Reinstall” the host fails, the ovirt-engine complains it can't contact/reach the VDSM, while using netcat from the ovirt-engine it works.
I have KSM and Memory Ballooning enabled on the Cluster as well.
oVirt 4.3.10 installed on CentOS 7.8.2003
The self-hosted Engine runs on an external GlusterFS, before reinstalling everything (fresh start of OS, etc..) I tried iSCSI as well.
4 years, 4 months
4.4.1-rc5: Looking for correct way to configure machine=q35 instead of machine=pc for arch=x86_64
by Glenn Marcy
Hello, I am hoping for some insight from folks with more hosted engine install experience.
When I try to install the hosted engine using the RC5 dist I get the following error during the startup
of the HostedEngine VM:
XML error: The PCI controller with index='0' must be model='pci-root' for this machine type, but model='pcie-root' was found instead
This is due to the HE Domain XML description using machine="pc-i440fx-rhel7.6.0".
I've tried to override the default of 'pc' from ovirt-ansible-hosted-engine-setup/defaults/main.yml:
he_emulated_machine: pc
by passing to the ovirt-hosted-engine-setup script a --config-append=file parameter where file contains:
[environment:default]
OVEHOSTED_VM/emulatedMachine=str:q35
When the "Create ovirt-hosted-engine-ha run directory" step finishes the vm.conf file contains:
cpuType=IvyBridge,+pcid,+spec-ctrl,+ssbd,+md-clear
emulatedMachine=q35
At the "Start ovirt-ha-broker service on the host" step that file is removed. When that file appears
again during the "Check engine VM health" step it now contains:
cpuType=IvyBridge,+pcid,+spec-ctrl,+ssbd,+md-clear
emulatedMachine=pc-i440fx-rhel7.6.0
After that the install fails with the metadata from "virsh dumpxml HostedEngine" containing:
<ovirt-vm:exitCode type="int">1</ovirt-vm:exitCode>
<ovirt-vm:exitMessage>XML error: The PCI controller with index='0' must be model='pci-root' for this machine type, but model='pcie-root' was found instead</ovirt-vm:exitMessage>
Interestingly enough, the HostedEngineLocal VM that is running the appliance image has the value I need:
<type arch='x86_64' machine='pc-q35-rhel8.2.0'>hvm</type>
Does anyone on the list have any experience with where this needs to be overridden? Somewhere in the
hosted engine setup or do I need to do something at a deeper level like vdsm or libvirt?
Help much appreciated !
Thanks,
Glenn
4 years, 4 months