Re: [External] : Unrecoverable NMI error on HP Gen8 hosts.
by Diggy Mc
> Hi Diggy,
>
> I'm not sure if it's an oVirt issue, but it can be a network or firewall issue.
> Did you test the connection between oVirt hosts and the iLO interfaces?
> Simple tests like ping to ensure one host can reach others iLO interfaces and ipmitool to
> ensure you can connect to the management interfaces?
>
> Marcos
>
It is not network or firewall related. There is no firewall between the oVirt hosts/engine and the iLO interfaces. When I configured oVirt power management, the offered test passed. I'm not running IPMI. On a side note, the problem exists both before and after enabling oVirt's power management feature.
2 years, 10 months
Instability after update
by Andrea Chierici
Cheers,
I am in trouble...
I am running an ovirt manager v4.4.9.5-1.el8. This week I updated all
the hosts to latest release (starting from repo
ovirt-release44-4.4.9.3-1.el8.noarch) since I hadn't done that when I
upgraded the manager. Before that the system was rock solid.
Unfortunately, after the upgrade, I get frequent errors on running VMs:
VM XXXXXXXX is down with error. Exit message: Lost connection with qemu
process.
I can't get any hint from the logs, I wonder if someone has any idea of
what is going on.
Thanks,
Andrea
--
Andrea Chierici - INFN-CNAF
Viale Berti Pichat 6/2, 40127 BOLOGNA
Office Tel: +39 051 2095463
SkypeID ataruz
--
2 years, 10 months
Unrecoverable NMI error on HP Gen8 hosts.
by Diggy Mc
I have oVirt Node v4.4.8.3 running on several HP ProLiant Gen8 servers. I receive the following error under certain circumstances:
"An Unrecoverable System Error (NMI) has occurred (iLO application watchdog timeout NMI, Service Information: 0x0000002B, 0x00000000)"
When a host starts taking a load (but nowhere near a threshold), I encounter the above iLO-logged error and the host locks-up. I have had to grossly under-utilize my hosts to avoid this problem. I'm hoping for a better fix or work-around.
I've had the same problem beginning with my oVirt 4.3.x hosts, so it isn't oVirt version specific.
The little information I could find on the error wasn't helpful. Red Hat acknowledges the issue, but limited to shutdown/reboot operations; not during "normal" operations.
Anyone else experienced this problem? How did you fix it or work around it? I'd like to better utilize my servers if possible.
In advance, thank you to anyone and everyone who offers help.
2 years, 10 months
support of AMD EPYC 3rd Genneration Milan
by samuel.xhu@horebdata.cn
Helllo, Ovirt experts,
Does Ovirt now support the use of AMD EPYC 3rd Genneration Milan CPU? and if yes, from which version?
best regards,
Samuel
Do Right Thing (做正确的事) / Pursue Excellence (追求卓越) / Help Others Succeed (成就他人)
2 years, 10 months
VM pause and snapshot is in illegal state
by vtse@bardel.ca
Hi,
One of our VM is pause due to lack of storage space while the VM is in snapshot deleting task. Now can't restart or shutdown the VM, and can't delete the snapshot.
How i can fix this problem? Any help is much appreciated!
Thank you in advanced!
Regards,
Victor
2 years, 10 months
How to find out I / O usge from servers
by ovirt.org@nevim.eu
Hi everyone, I would like via API, or. pull information about I / O usage by individual servers directly from the postre database. But I have a data warehouse turned off for performance reasons. Is this information collected somewhere so that I can collect it from somewhere in my external database or read it from the virus database?
Thanks for the information.
2 years, 10 months
sanlock issues after 4.3 to 4.4 migration
by Strahil Nikolov
Hello All,
I was trying to upgrade my single node setup (Actually it used to be 2+1 arbiter, but one of the data nodes died) from 4.3.10 to 4.4.?
The deployment failed on 'hosted-engine --reinitialize-lockspace --force' and it seems that sanlock fails to obtain a lock:
# hosted-engine --reinitialize-lockspace --force
Traceback (most recent call last):
File "/usr/lib64/python3.6/runpy.py", line 193, in _run_module_as_main
"__main__", mod_spec)
File "/usr/lib64/python3.6/runpy.py", line 85, in _run_code
exec(code, run_globals)
File "/usr/lib/python3.6/site-packages/ovirt_hosted_engine_setup/reinitialize_lockspace.py", line 30, in <module>
ha_cli.reset_lockspace(force)
File "/usr/lib/python3.6/site-packages/ovirt_hosted_engine_ha/client/client.py", line 286, in reset_lockspace
stats = broker.get_stats_from_storage()
File "/usr/lib/python3.6/site-packages/ovirt_hosted_engine_ha/lib/brokerlink.py", line 148, in get_stats_from_storage
result = self._proxy.get_stats()
File "/usr/lib64/python3.6/xmlrpc/client.py", line 1112, in __call__
return self.__send(self.__name, args)
File "/usr/lib64/python3.6/xmlrpc/client.py", line 1452, in __request
verbose=self.__verbose
File "/usr/lib64/python3.6/xmlrpc/client.py", line 1154, in request
return self.single_request(host, handler, request_body, verbose)
File "/usr/lib64/python3.6/xmlrpc/client.py", line 1166, in single_request
http_conn = self.send_request(host, handler, request_body, verbose)
File "/usr/lib64/python3.6/xmlrpc/client.py", line 1279, in send_request
self.send_content(connection, request_body)
File "/usr/lib64/python3.6/xmlrpc/client.py", line 1309, in send_content
connection.endheaders(request_body)
File "/usr/lib64/python3.6/http/client.py", line 1268, in endheaders
self._send_output(message_body, encode_chunked=encode_chunked)
File "/usr/lib64/python3.6/http/client.py", line 1044, in _send_output
self.send(msg)
File "/usr/lib64/python3.6/http/client.py", line 982, in send
self.connect()
File "/usr/lib/python3.6/site-packages/ovirt_hosted_engine_ha/lib/unixrpc.py", line 74, in connect
self.sock.connect(base64.b16decode(self.host))
FileNotFoundError: [Errno 2] No such file or directory
# grep sanlock /var/log/messages | tail
Jan 6 08:29:48 ovirt2 sanlock[1269]: 2022-01-06 08:29:48 19341 [77108]: s1777 failed to read device to find sector size error -223 /run/vdsm/storage/ca3807b9-5afc-4bcd-a557-aacbcc53c340/39ee18b2-3d7b-4d48-8a0e-3ed7947b5038/d95ae3ee-b6d3-46c4-b6a2-75f96134c7f1
Jan 6 08:29:49 ovirt2 sanlock[1269]: 2022-01-06 08:29:49 19342 [1310]: s1777 add_lockspace fail result -223
Jan 6 08:29:54 ovirt2 sanlock[1269]: 2022-01-06 08:29:54 19347 [77113]: s1778 failed to read device to find sector size error -223 /run/vdsm/storage/ca3807b9-5afc-4bcd-a557-aacbcc53c340/39ee18b2-3d7b-4d48-8a0e-3ed7947b5038/d95ae3ee-b6d3-46c4-b6a2-75f96134c7f1
Jan 6 08:29:55 ovirt2 sanlock[1269]: 2022-01-06 08:29:55 19348 [1310]: s1778 add_lockspace fail result -223
Jan 6 08:30:00 ovirt2 sanlock[1269]: 2022-01-06 08:30:00 19353 [77138]: s1779 failed to read device to find sector size error -223 /run/vdsm/storage/ca3807b9-5afc-4bcd-a557-aacbcc53c340/39ee18b2-3d7b-4d48-8a0e-3ed7947b5038/d95ae3ee-b6d3-46c4-b6a2-75f96134c7f1
Jan 6 08:30:01 ovirt2 sanlock[1269]: 2022-01-06 08:30:01 19354 [1311]: s1779 add_lockspace fail result -223
Jan 6 08:30:06 ovirt2 sanlock[1269]: 2022-01-06 08:30:06 19359 [77144]: s1780 failed to read device to find sector size error -223 /run/vdsm/storage/ca3807b9-5afc-4bcd-a557-aacbcc53c340/39ee18b2-3d7b-4d48-8a0e-3ed7947b5038/d95ae3ee-b6d3-46c4-b6a2-75f96134c7f1
Jan 6 08:30:07 ovirt2 sanlock[1269]: 2022-01-06 08:30:07 19360 [1310]: s1780 add_lockspace fail result -223
Jan 6 08:30:12 ovirt2 sanlock[1269]: 2022-01-06 08:30:12 19365 [77151]: s1781 failed to read device to find sector size error -223 /run/vdsm/storage/ca3807b9-5afc-4bcd-a557-aacbcc53c340/39ee18b2-3d7b-4d48-8a0e-3ed7947b5038/d95ae3ee-b6d3-46c4-b6a2-75f96134c7f1
Jan 6 08:30:13 ovirt2 sanlock[1269]: 2022-01-06 08:30:13 19366 [1310]: s1781 add_lockspace fail result -223
# sanlock client status
daemon 5f37f400-b865-11dc-a4f5-2c4d54502372
p -1 helper
p -1 listener
p -1 status
s ca3807b9-5afc-4bcd-a557-aacbcc53c340:1:/rhev/data-center/mnt/glusterSD/ovirt2\:_engine44/ca3807b9-5afc-4bcd-a557-aacbcc53c340/dom_md/ids:0
Could it be related to the sector size of the Gluster's Brick?
# smartctl -a /dev/sdb | grep 'Sector Sizes'
Sector Sizes: 512 bytes logical, 4096 bytes physical
Any hint will be helpful
Best Regads,
Strahil Nikolov
2 years, 10 months
Lots of storage.MailBox.SpmMailMonitor
by Fabrice Bacchella
My vdsm log files are huge:
-rw-r--r-- 1 vdsm kvm 1.8G Nov 22 11:32 vdsm.log
And this is juste half an hour of logs:
$ head -1 vdsm.log
2018-11-22 11:01:12,132+0100 ERROR (mailbox-spm) [storage.MailBox.SpmMailMonitor] mailbox 2 checksum failed, not clearing mailbox, clearing new mail (data='...lots of data', expected='\xa4\x06\x08\x00') (mailbox:612)
I just upgraded vdsm:
$ rpm -qi vdsm
Name : vdsm
Version : 4.20.43
2 years, 10 months
w2k19 runOnce GuestTools ISO missing
by hellweiss
Hello List,
I've latest ovirt Node on the Servers: ovirt-node-ng-4.4.9.3-0.20211215.0+1 and latest Engine 4.4.9 on CentOS Stream.
When trying to run Windows 2019 Server installation from ISO with runOnce and Attach Windows Guest ISO
Machine can not be started. If I remove Attach Windows Guest CD VM boots just fine.
Workaround is to use VM Portal and change CD there when Windows Installer doesnt find any Disk.
Latest Guest Tools ISO is installed on engine with using dnf install virtio-win in /usr /share/virtio-win
I copied the Folder /usr/share/virtio-win from engine to every Node, but that also doesn't work.
dnf search virtio-win on ovirt Node gives no results.
What to do to get the VM with runOnce with Windows Guest Tools ISO attached to boot ?
Why not let the User choose the ISO to use from the ISO Storage Domain when runOnce and Attach Windows Guest ISO is enabled ?
Thanks
Uli
2 years, 10 months
Ovirt 4.4.9 install fails (guestfish)?
by Andy Kress
Support,
Happy new year, I am trying a fresh install of ovirt (4.4.9) and seem to not be able to get past deploying the host to the glusterfs volume. I am able to mount the volume from each host and have checked the configuration of the engine volume which allows vdsm and KVM to mount. I dont know if this is a glustermount problem or guestfish as the install fails when moving the VM to the hosted storage. Attached is the install log as well as the installed packages.
The specific error is genereated when the install "inject network configuration with guestfish" on install is: libguestfs: error: appliance closed the connection unexpectedly.\nThis usually means the libguestfs appliance crashed
Based off the error, "relabeling" I have even attempted to disable selinux in the engine during install which produced the same error. Additionally I also downgraded
Any help is appreciated.
Thanks
AK
2 years, 10 months