Is the udev settling issue more wide spread? Getting failed 'qemu-img -convert' also while copying disks between data and vmstore domains
by thomas@hoberg.net
While trying to diagnose an issue with a set of VMs that get stopped for I/O problems at startup, I try to deal with the fact that their boot disks cause this issue, no matter where I connect them. They might have been the first disks I ever tried to sparsify and I was afraid that might have messed them up. The images are for a nested oVirt deployment and they worked just fine, before I shut down those VMs...
So I first tried to hook them as secondary disks to another VM to have a look, but that just cause the other VM to stop at boot.
Also tried downloading, exporting, and plain copying the disks to no avail, OVA exports on the entire VM fail again (fix is in!).
So to make sure copying disks between volumes *generally* work, I tried copying a disk from a working (but stopped) VM from 'vmstore' to 'data' on my 3nHCI farm, but that failed, too!
Plenty of space all around, but all disks are using thin/sparse/VDO on SSD underneath.
Before I open a bug, I'd like to have some feedback if this is a standard QA test, this is happening to you etc.
Still on oVirt 4.3.11 with pack_ova.py patched to wait for the udev settle,
This is from the engine.log on the hosted-engine:
2020-08-12 00:04:15,870+02 ERROR [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (EE-ManagedThreadFactory-engineScheduled-Thread-67) [] EVENT_ID: VDS_BROKER_COMMAND_FAILURE(10,802), VDSM gem2 command HSMGetAllTasksStatusesVDS failed: low level Image copy failed: ("Command ['/usr/bin/qemu-img', 'convert', '-p', '-t', 'none', '-T', 'none', '-f', 'raw', u'/rhev/data-center/mnt/glusterSD/192.168.0.91:_vmstore/9d1b8774-c5dc-46a8-bfa2-6a6db5851195/images/aca27b96-7215-476f-b793-fb0396543a2e/311f853c-e9cc-4b9e-8a00-5885ec7adf14', '-O', 'raw', u'/rhev/data-center/mnt/glusterSD/192.168.0.91:_data/32129b5f-d47c-495b-a282-7eae1079257e/images/f6a08d2a-4ddb-42da-88e6-4f92a38b9c95/e0d00d46-61a1-4d8c-8cb4-2e5f1683d7f5'] failed with rc=1 out='' err=bytearray(b'qemu-img: error while reading sector 131072: Transport endpoint is not connected\\nqemu-img: error while reading sector 135168: Transport endpoint is not connected\\nqemu-img: error while reading sector 139264: Transport
endpoint is not connected\\nqemu-img: error while reading sector 143360: Transport endpoint is not connected\\nqemu-img: error while reading sector 147456: Transport endpoint is not connected\\nqemu-img: error while reading sector 151552: Transport endpoint is not connected\\n')",)
and this is from the vdsm.log on the gem2 node:
Error: Command ['/usr/bin/qemu-img', 'convert', '-p', '-t', 'none', '-T', 'none', '-f', 'raw', u'/rhev/data-center/mnt/glusterSD/192.168.0.91:_vmstore/9d1b8774-c5dc-46a8-bfa2-6a6db5851195/images/aca27b96-7215-476f-b793-fb0396543a2e/311f853c-e9cc-4b9e-8a00-5885ec7adf14', '-O', 'raw', u'/rhev/data-center/mnt/glusterSD/192.168.0.91:_data/32129b5f-d47c-495b-a282-7eae1079257e/images/f6a08d2a-4ddb-42da-88e6-4f92a38b9c95/e0d00d46-61a1-4d8c-8cb4-2e5f1683d7f5'] failed with rc=1 out='' err=bytearray(b'qemu-img: error while reading sector 131072: Transport endpoint is not connected\nqemu-img: error while reading sector 135168: Transport endpoint is not connected\nqemu-img: error while reading sector 139264: Transport endpoint is not connected\nqemu-img: error while reading sector 143360: Transport endpoint is not connected\nqemu-img: error while reading sector 147456: Transport endpoint is not connected\nqemu-img: error while reading sector 151552: Transport endpoint is not connected\n')
2020-08-12 00:03:15,428+0200 ERROR (tasks/7) [storage.Image] Unexpected error (image:849)
Traceback (most recent call last):
File "/usr/lib/python2.7/site-packages/vdsm/storage/image.py", line 837, in copyCollapsed
raise se.CopyImageError(str(e))
CopyImageError: low level Image copy failed: ("Command ['/usr/bin/qemu-img', 'convert', '-p', '-t', 'none', '-T', 'none', '-f', 'raw', u'/rhev/data-center/mnt/glusterSD/192.168.0.91:_vmstore/9d1b8774-c5dc-46a8-bfa2-6a6db5851195/images/aca27b96-7215-476f-b793-fb0396543a2e/311f853c-e9cc-4b9e-8a00-5885ec7adf14', '-O', 'raw', u'/rhev/data-center/mnt/glusterSD/192.168.0.91:_data/32129b5f-d47c-495b-a282-7eae1079257e/images/f6a08d2a-4ddb-42da-88e6-4f92a38b9c95/e0d00d46-61a1-4d8c-8cb4-2e5f1683d7f5'] failed with rc=1 out='' err=bytearray(b'qemu-img: error while reading sector 131072: Transport endpoint is not connected\\nqemu-img: error while reading sector 135168: Transport endpoint is not connected\\nqemu-img: error while reading sector 139264: Transport endpoint is not connected\\nqemu-img: error while reading sector 143360: Transport endpoint is not connected\\nqemu-img: error while reading sector 147456: Transport endpoint is not connected\\nqemu-img: error while reading sector 151552: T
ransport endpoint is not connected\\n')",)
2020-08-12 00:03:15,429+0200 ERROR (tasks/7) [storage.TaskManager.Task] (Task='6399d533-e96a-412d-b0c3-0548e24d658d') Unexpected error (task:875)
Traceback (most recent call last):
File "/usr/lib/python2.7/site-packages/vdsm/storage/task.py", line 882, in _run
return fn(*args, **kargs)
File "/usr/lib/python2.7/site-packages/vdsm/storage/task.py", line 336, in run
return self.cmd(*self.argslist, **self.argsdict)
File "/usr/lib/python2.7/site-packages/vdsm/storage/securable.py", line 79, in wrapper
return method(self, *args, **kwargs)
File "/usr/lib/python2.7/site-packages/vdsm/storage/sp.py", line 1633, in copyImage
postZero, force, discard)
File "/usr/lib/python2.7/site-packages/vdsm/storage/image.py", line 837, in copyCollapsed
raise se.CopyImageError(str(e))
CopyImageError: low level Image copy failed: ("Command ['/usr/bin/qemu-img', 'convert', '-p', '-t', 'none', '-T', 'none', '-f', 'raw', u'/rhev/data-center/mnt/glusterSD/192.168.0.91:_vmstore/9d1b8774-c5dc-46a8-bfa2-6a6db5851195/images/aca27b96-7215-476f-b793-fb0396543a2e/311f853c-e9cc-4b9e-8a00-5885ec7adf14', '-O', 'raw', u'/rhev/data-center/mnt/glusterSD/192.168.0.91:_data/32129b5f-d47c-495b-a282-7eae1079257e/images/f6a08d2a-4ddb-42da-88e6-4f92a38b9c95/e0d00d46-61a1-4d8c-8cb4-2e5f1683d7f5'] failed with rc=1 out='' err=bytearray(b'qemu-img: error while reading sector 131072: Transport endpoint is not connected\\nqemu-img: error while reading sector 135168: Transport endpoint is not connected\\nqemu-img: error while reading sector 139264: Transport endpoint is not connected\\nqemu-img: error while reading sector 143360: Transport endpoint is not connected\\nqemu-img: error while reading sector 147456: Transport endpoint is not connected\\nqemu-img: error while reading sector 151552: T
ransport endpoint is not connected\\n')",)
4 years, 4 months
oVirt 4.4.0 Release is now generally available
by Sandro Bonazzola
oVirt 4.4.0 Release is now generally available
The oVirt Project is excited to announce the general availability of the
oVirt 4.4.0 Release, as of May 20th, 2020
This release unleashes an altogether more powerful and flexible open source
virtualization solution that encompasses hundreds of individual changes and
a wide range of enhancements across the engine, storage, network, user
interface, and analytics, as compared to oVirt 4.3.
Important notes before you install / upgrade
Some of the features included in the oVirt 4.4.0 release require content
that will be available in CentOS Linux 8.2 but cannot be tested on RHEL 8.2
yet due to some incompatibility in the openvswitch package that is shipped
in CentOS Virt SIG, which requires rebuilding openvswitch on top of CentOS
8.2. The cluster switch type OVS is not implemented for CentOS 8 hosts.
Please note that oVirt 4.4 only supports clusters and datacenters with
compatibility version 4.2 and above. If clusters or datacenters are running
with an older compatibility version, you need to upgrade them to at least
4.2 (4.3 is recommended).
Please note that in RHEL 8 / CentOS 8 several devices that worked on EL7
are no longer supported.
For example, megaraid_sas driver is removed. If you use Enterprise Linux 8
hosts you can try to provide the necessary drivers for the deprecated
hardware using the DUD method (See users mailing list thread on this at
https://lists.ovirt.org/archives/list/users@ovirt.org/thread/NDSVUZSESOXE...
)
Installation instructions
For the engine: either use the oVirt appliance or install CentOS Linux 8
minimal by following these steps:
- Install the CentOS Linux 8 image from
http://centos.mirror.garr.it/centos/8.1.1911/isos/x86_64/CentOS-8.1.1911-...
- dnf install https://resources.ovirt.org/pub/yum-repo/ovirt-release44.rpm
- dnf update (reboot if needed)
- dnf module enable -y javapackages-tools pki-deps postgresql:12
- dnf install ovirt-engine
- engine-setup
For the nodes:
Either use oVirt Node ISO or:
- Install CentOS Linux 8 from
http://centos.mirror.garr.it/centos/8.1.1911/isos/x86_64/CentOS-8.1.1911-...,
selecting the minimal installation.
- dnf install https://resources.ovirt.org/pub/yum-repo/ovirt-release44.rpm
- dnf update (reboot if needed)
- Attach the host to the engine and let it be deployed.
Update instructionsUpdate from oVirt 4.4 Release Candidate
On the engine side and on CentOS hosts, you’ll need to switch from
ovirt44-pre to ovirt44 repositories.
In order to do so, you need to:
1.
dnf remove ovirt-release44-pre
2.
rm -f /etc/yum.repos.d/ovirt-4.4-pre-dependencies.repo
3.
rm -f /etc/yum.repos.d/ovirt-4.4-pre.repo
4.
dnf install https://resources.ovirt.org/pub/yum-repo/ovirt-release44.rpm
5.
dnf update
On the engine side you’ll need to run engine-setup only if you were not
already on the latest release candidate.
On oVirt Node, you’ll need to upgrade with:
1.
Move node to maintenance
2.
dnf install
https://resources.ovirt.org/pub/ovirt-4.4/rpm/el8/noarch/ovirt-node-ng-im...
3.
Reboot
4.
Activate the host
Update from oVirt 4.3
oVirt 4.4 is available only for CentOS 8. In-place upgrades from previous
installations, based on CentOS 7, are not possible. For the engine, use
backup, and restore that into a new engine. Nodes will need to be
reinstalled.
A 4.4 engine can still manage existing 4.3 hosts, but you can’t add new
ones.
For a standalone engine, please refer to upgrade procedure at
https://ovirt.org/documentation/upgrade_guide/#Upgrading_from_4-3
If needed, run ovirt-engine-rename (see engine rename tool documentation at
https://www.ovirt.org/documentation/admin-guide/chap-Utilities.html )
When upgrading hosts:
You need to upgrade one host at a time.
1.
Turn host to maintenance. Virtual machines on that host should migrate
automatically to a different host.
2.
Remove it from the engine
3.
Re-install it with el8 or oVirt Node as per installation instructions
4.
Re-add the host to the engine
Please note that you may see some issues live migrating VMs from el7 to
el8. If you hit such a case, please turn off the vm on el7 host and get it
started on the new el8 host in order to be able to move the next el7 host
to maintenance.
What’s new in oVirt 4.4.0 Release?
-
Hypervisors based on CentOS Linux 8 (rebuilt from award winning RHEL8),
for both oVirt Node and standalone CentOS Linux hosts.
-
Easier network management and configuration flexibility with
NetworkManager.
-
VMs based on a more modern Q35 chipset with legacy SeaBIOS and UEFI
firmware.
-
Support for direct passthrough of local host disks to VMs.
-
Live migration improvements for High Performance guests.
-
New Windows guest tools installer based on WiX framework now moved to
VirtioWin project.
-
Dropped support for cluster level prior to 4.2.
-
Dropped API/SDK v3 support deprecated in past versions.
-
4K block disk support only for file-based storage. iSCSI/FC storage do
not support 4K disks yet.
-
You can export a VM to a data domain.
-
You can edit floating disks.
-
Ansible Runner (ansible-runner) is integrated within the engine,
enabling more detailed monitoring of playbooks executed from the engine.
-
Adding and reinstalling hosts is now completely based on Ansible,
replacing ovirt-host-deploy, which is not used anymore.
-
The OpenStack Neutron Agent cannot be configured by oVirt anymore, it
should be configured by TripleO instead.
This release is available now on x86_64 architecture for:
* Red Hat Enterprise Linux 8.1
* CentOS Linux (or similar) 8.1
This release supports Hypervisor Hosts on x86_64 and ppc64le architectures
for:
* Red Hat Enterprise Linux 8.1
* CentOS Linux (or similar) 8.1
* oVirt Node 4.4 based on CentOS Linux 8.1 (available for x86_64 only)
See the release notes [1] for installation instructions and a list of new
features and bugs fixed.
If you manage more than one oVirt instance, OKD or RDO we also recommend to
try ManageIQ <http://manageiq.org/>.
In such a case, please be sure to take the qc2 image and not the ova image.
Notes:
- oVirt Appliance is already available for CentOS Linux 8
- oVirt Node NG is already available for CentOS Linux 8
Additional Resources:
* Read more about the oVirt 4.4.0 release highlights:
http://www.ovirt.org/release/4.4.0/
* Get more oVirt project updates on Twitter: https://twitter.com/ovirt
* Check out the latest project news on the oVirt blog:
http://www.ovirt.org/blog/
[1] http://www.ovirt.org/release/4.4.0/
[2] http://resources.ovirt.org/pub/ovirt-4.4/iso/
--
Sandro Bonazzola
MANAGER, SOFTWARE ENGINEERING, EMEA R&D RHV
Red Hat EMEA <https://www.redhat.com/>
sbonazzo(a)redhat.com
<https://www.redhat.com/>
[image: |Our code is open_] <https://www.redhat.com/en/our-code-is-open>
*Red Hat respects your work life balance. Therefore there is no need to
answer this email out of your office hours.*
4 years, 4 months
Reverting to Snapshot
by Christian Reiss
Hey folks,
this, I hope, is a trivial question. But I really can't find the button
for it. If you take a snapshot of any vm - how do you revert to that
snapshot, discarding all changes after?
I see options to clone, delete, take snapshot, preview... but revert?
Really in the true sense of "discard all changes, go back to this
snapshot state".
What am I not seeing here? :)
--
with kind regards,
mit freundlichen Gruessen,
Christian Reiss
4 years, 4 months
oVirt Node 4.1.1 Hosted Engine Deployment Fibre Channel No LUNS found
by hkexdong@yahoo.com.hk
I've an external RAID subsystem connect to the host by SAS cable (SFF-8644).
I follow the instructions of the RAID card manufacture. Include the driver during host installation. And I can see the created RAID volumes (LUNs) available in "Installation Destination". Although I choose to install the engine in host local disk.
The installation success and I proceed to hosted engine deployment by Cockpit. And now I stuck at part 4 Storage.
I believe "Storage Type" select "Fibre Channel" is correct even I'm not using fibre cable. As NFS and iSCSI are utilize network.
I confirm there are 2 RAID volume (LUN0 & LUN1) created in the external RAID subsystem. Why oVirt cannot discover them. What could be wrong :(
4 years, 4 months
Hosted Engine Deployment stuck at 3. Prepare VM
by hkexdong@yahoo.com.hk
oVirt Node version is 4.4.1. This version can successful deploy before.
But after I compile the RAID driver. It now stuck at Prepare VM.
The last message is "[ INFO ] TASK [ovirt.hosted_engine_setup : Install ovirt-engine-appliance rpm]"
According to the instructions from RAID controller manufacture. I'd install the "kernel-headers-xxx.rpm" and "kernel-devel-xxx.rpm" which extracted from official CentOS 8.2.2004 ISO. Also include a bunch of packages for compile driver (e.g. gcc, make, zlib-devel, etc.)
I think those packages ruined the deployment. But I still want to know it actually stuck at which part and the reason. Is there any way to check the deployment log?
4 years, 4 months
ovirt4.4 and ldap auth with starttls
by Jiří Sléžka
Hello,
better start new thread...
it looks like tls1.0 is not supported anymore in
ovirt-engine-extension-aaa-ldap
I just migrated engine from 4.3 to 4.4 and cannot use my ldap profile
because
server_error: The connection reader was unable to successfully complete
TLS negotiation: SSLHandshakeException(The server selected protocol
version TLS10 is not accepted by client preferences [TLS12]),
ldapSDKVersion=4.0.14, revision=c0fb784eebf9d36a67c736d0428fb3577f2e25bb
but when I try to force tls 1.0 by setting
...
pool.default.ssl.startTLS = true
pool.default.ssl.startTLSProtocol = TLSv1
...
I got
server_error: The connection reader was unable to successfully complete
TLS negotiation: SSLHandshakeException(No appropriate protocol (protocol
is disabled or cipher suites are inappropriate)), ldapSDKVersion=4.0.14,
revision=c0fb784eebf9d36a67c736d0428fb3577f2e25bb
I can't switch to something better on server side, is it possible to
allow weak ciphers/protocols on client side?
Thanks in advance,
Jiri
4 years, 4 months
ovirt4.4.1 engine Deployment failure
by xilazz@126.com
Hello, everyone
I am using these versions for my test :
- ovirt-engine-appliance-4.4-20200723102445.1.el8.x86_64.rpm
- ovirt-node-ng-installer-4.4.1-2020072310.el8.iso
But I'm always prompted when I'm hosting a cubide-engine deploy :The error was: error while evaluating conditional ((otopi_host_net.ansible_facts.otopi_host_net | length == 0)).The specific hints are:fatal: [localhost]: FAILED! => {"msg": "The conditional check '(otopi_host_net.ansible_facts.otopi_host_net | length == 0)' failed. The error was: error while evaluating conditional ((otopi_host_net.ansible_facts.otopi_host_net | length == 0)): 'list object' has no attribute 'ansible_facts'\n\nThe error appears to be in '/usr/share/ansible/roles/ovirt.hosted_engine_setup/tasks/filter_team_devices.yml': line 29, column 13, but may\nbe elsewhere in the file depending on the exact syntax problem.\n\nThe offending line appears to be:\n\n- debug: var=otopi_host_net\n ^ here\n\nThere appears to be both 'k=v' shorthand syntax and YAML in this task. Only one syntax may be used.\n"}
[ ERROR ] Failed to execute stage 'Environment customization': Failed executing ansible-playbook
The test machine node has four network CARDS, but I haven't configured the team. I don't know why, it has been suffering for several days.I don't know if you've ever been in a situation like this, other than installing a lower version.
4 years, 4 months
hosted-engine upgrade from 4.3 to 4.4 fails with "Cannot edit VM."
by d@sekretev.ru
Hi!
hosted-engine --deploy --restore-from-file=ovirt_engine_full.arch
fails with
[ ERROR ] ovirtsdk4.Error: Fault reason is "Operation Failed". Fault detail is "[Cannot edit VM. A VM running the engine ("hosted engine") cannot be set to highly available as it has its own HA mechanism.]". HTTP response code is 409.
[ ERROR ] fatal: [localhost]: FAILED! => {"changed": false, "msg": "Fault reason is \"Operation Failed\". Fault detail is \"[Cannot edit VM. A VM running the engine (\"hosted engine\") cannot be set to highly available as it has its own HA mechanism.]\". HTTP response code is 409."}
[ ERROR ] Failed to execute stage 'Closing up': Failed executing ansible-playbook
Can anybody help with this error? May be someone has access to this page https://access.redhat.com/solutions/5303571?
4 years, 4 months