Hostedengine Database broken
by marcel@deheureu.se
Hi ovirt list,
in our test system is the engine crashed and the database is broken. Of cause it was a test system which has no database backup but there are three importent VMs there.
Our Plan is to deploy a new hosted engine and connect it to the old datacenter and glusterfs.
if i am remember right, the vms will be stopped but we will not lose them. So if i connect the existing data center and click of one VM he will ask me to import it right?
Or did you have an better idea?
Br
Marcel
3 years, 3 months
Upgrading Node that use Local Storage Domain
by Nur Imam Febrianto
Hi,
Currently we have a several node (using oVirt Node) that configured using Local Storage Domain. How do I appropriately update the node ? Using yum update always failed while installing ovirt image update. (scriptlet failed). Is there any specific step to do the update for node that using local storage ?
Thanks before.
Regards,
Nur Imam Febrianto
Sent from Mail<https://go.microsoft.com/fwlink/?LinkId=550986> for Windows
3 years, 3 months
Time Drift Issues
by Nur Imam Febrianto
Hi All,
Recently I got an warning in our cluster about time drift :
Host xxxxx has time-drift of 672 seconds while maximum configured value is 300 seconds.
What should I do to address this issue ? Should I reconfigure / configure ntp client on all host ?
Thanks before.
Regards,
Nur Imam Febrianto
Sent from Mail<https://go.microsoft.com/fwlink/?LinkId=550986> for Windows
3 years, 3 months
Change Cluster CPU type
by Andrey Rusakov
We are using oVirt for several years.
And we start with Xeon v2 CPUs.
At this moment of time we got all v3 aтв we will chnage them to v4 in a month.
So i would like to change Cluster default CPU to Haswell, But got error with custom cpu on HostedEngine.
HostedEngine config is locked.
Is there a solution to chnage HostedEngine cpu type?
P.S. some problem with Cluster compability version (4.5 4.6)
3 years, 3 months
Update to 4.4.8 leaves cluster in a circular error-state
by Paul-Erik Törrönen
Having updated rpm-packages for a DC with a cluster containing 2 hosts
(and executed the engine-setup on the engine machine), I now face the
following issue:
One of the VMs had a couple of snapshots and apparently this interferes
with the upgrade of the cluster version, which currently is 4.4.
The 4.4 has become incompatible with the chosen CPU architecture (Intel
Westmere Family, the hosts are both Dell x10-series with Xeon X56xx
CPUs) and requires an upgrade to newer compatibility version, 4.6
presumably, so the DC/Cluster state is "unknown". However I can't do
this because the snapshots exists. I get the "Cannot change cluster
version since following VMs are previewing snapshots:" listing a VM with
snapshots. The snapshots in turn can not be Commit/Undo because: "Cannot
revert to Snapshot. Unknown Data Center status.".
So how do I break this circular error? The VM itself is not essential,
it can be removed if that solves the case. However this can not be
completed from the UI because: "Cannot remove VM. Unknown Data Center
status.".
Poltsi
3 years, 3 months
Getting issue in finalize section after disk successfully uploaded
by avishek.sarkar@broadcom.com
I have used upload-disk.py to upload qcow2 image to data center...however getting below errror
./upload-disk.py --engine-url 'https://xyz.net' --username='xyz(a)abc.net' --password-file=password --sd-name='SEC-ABC' --cafile='/home/jenkins/crt/ca.crt' --disk-format="qcow2" --disk-sparse image-XYZ
Checking image...
Image format: qcow2
Disk format: cow
Disk content type: data
Disk provisioned size: 10737418240
Disk initial size: 5526847488
Disk name: image-avishek.qcow2
Connecting...
Creating disk...
Creating transfer session...
Uploading image...
[ 100.00% ] 10.00 GiB, 11.34 seconds, 903.09 MiB/s
Finalizing transfer session...
Traceback (most recent call last):
File "./upload-disk.py", line 300, in <module>
transfer_service.finalize()
File "/usr/lib64/python3.6/site-packages/ovirtsdk4/services.py", line 13971, in finalize
return self._internal_action(action, 'finalize', None, headers, query, wait)
File "/usr/lib64/python3.6/site-packages/ovirtsdk4/service.py", line 299, in _internal_action
return future.wait() if wait else future
File "/usr/lib64/python3.6/site-packages/ovirtsdk4/service.py", line 55, in wait
return self._code(response)
File "/usr/lib64/python3.6/site-packages/ovirtsdk4/service.py", line 296, in callback
self._check_fault(response)
File "/usr/lib64/python3.6/site-packages/ovirtsdk4/service.py", line 132, in _check_fault
self._raise_error(response, body)
File "/usr/lib64/python3.6/site-packages/ovirtsdk4/service.py", line 118, in _raise_error
raise error
ovirtsdk4.AuthError: Fault reason is "Operation Failed". Fault detail is "[User is not authorized to perform this action.]". HTTP response code is 403.
3 years, 3 months
NFS storage was locked for 45 minutes after I attempted a clone operation
by David White
I have an HCI cluster running on Gluster storage. I exposed an NFS share into oVirt as a storage domain so that I could clone all of my VMs (I'm preparing to move physically to a new datacenter). I got 3-4 VMs cloned perfectly fine yesterday. But then this evening, I tried to clone a big VM, and it caused the disk to lock up. The VM went totally unresponsive, and I didn't see a way to cancel the clone. Nagios NRPE (on the client VM) was reporting server load over 65+, but I was never able to establish an SSH connection.
Eventually, I tried restarting the ovirt-engine, per https://access.redhat.com/solutions/396753. When that didn't work, I powered down the VM completely. But the disks were still locked. So I then tried to put the storage domain into maintenance mode, but that wound up putting the entire domain into a "locked" state. Finally, eventually, the disks unlocked, and I was able to power the VM back online.
From start to finish, my VM was down for about 45 minutes, including the time when NRPE was still sending data to Nagios.
What logs should I look at, and how can I troubleshoot what went wrong here, and hopefully avoid this from happening again?
Sent with ProtonMail Secure Email.
3 years, 3 months
oVirt Node 4.4.8.3 Async update
by Sandro Bonazzola
oVirt Node 4.4.8.3 Async update
On September 3rd 2021 the oVirt project released an async update for oVirt
Node consuming the following packages:
-
ovirt-release44 4.4.8.3
-
ovirt-node-ng-image-update 4.4.8.3
oVirt Node respin also consumed most recent CentOS Stream and Advanced
Virtualization content.
It also includes fixes for:
-
Bug 1996602 <https://bugzilla.redhat.com/show_bug.cgi?id=1996602> - VM
remains in paused state when trying to write on a resized disk resides on
iscsi
-
Via qemu-kvm-6.0.0-29.el8s which shipped live.
Here’s the full list of changes:
--- ovirt-node-ng-image-4.4.8.2.manifest-rpm 2021-09-01 12:59:05.195037688
+0200
+++ ovirt-node-ng-image-4.4.8.3.manifest-rpm 2021-09-03 13:27:52.006368887
+0200
@@ -47 +46,0 @@
-boost-iostreams-1.66.0-10.el8.x86_64
@@ -112 +111 @@
-device-mapper-persistent-data-0.9.0-1.el8.x86_64
+device-mapper-persistent-data-0.9.0-4.el8.x86_64
@@ -355 +354 @@
-libcap-2.26-4.el8.x86_64
+libcap-2.26-5.el8.x86_64
@@ -637 +636 @@
-osinfo-db-20210215-1.el8.noarch
+osinfo-db-20210809-1.el8.noarch
@@ -648 +647 @@
-ovirt-node-ng-image-update-placeholder-4.4.8.2-1.el8.noarch
+ovirt-node-ng-image-update-placeholder-4.4.8.3-1.el8.noarch
@@ -656,2 +655,2 @@
-ovirt-release-host-node-4.4.8.2-1.el8.noarch
-ovirt-release44-4.4.8.2-1.el8.noarch
+ovirt-release-host-node-4.4.8.3-1.el8.noarch
+ovirt-release44-4.4.8.3-1.el8.noarch
@@ -768 +767 @@
-python3-eventlet-0.25.2-3.el8.noarch
+python3-eventlet-0.25.2-3.1.el8.noarch
@@ -814 +813 @@
-python3-os-brick-4.0.3-1.el8.noarch
+python3-os-brick-4.0.3-2.el8.noarch
@@ -890,14 +889,14 @@
-qemu-guest-agent-6.0.0-27.el8s.x86_64
-qemu-img-6.0.0-27.el8s.x86_64
-qemu-kvm-6.0.0-27.el8s.x86_64
-qemu-kvm-block-curl-6.0.0-27.el8s.x86_64
-qemu-kvm-block-gluster-6.0.0-27.el8s.x86_64
-qemu-kvm-block-iscsi-6.0.0-27.el8s.x86_64
-qemu-kvm-block-rbd-6.0.0-27.el8s.x86_64
-qemu-kvm-block-ssh-6.0.0-27.el8s.x86_64
-qemu-kvm-common-6.0.0-27.el8s.x86_64
-qemu-kvm-core-6.0.0-27.el8s.x86_64
-qemu-kvm-docs-6.0.0-27.el8s.x86_64
-qemu-kvm-hw-usbredir-6.0.0-27.el8s.x86_64
-qemu-kvm-ui-opengl-6.0.0-27.el8s.x86_64
-qemu-kvm-ui-spice-6.0.0-27.el8s.x86_64
+qemu-guest-agent-6.0.0-29.el8s.x86_64
+qemu-img-6.0.0-29.el8s.x86_64
+qemu-kvm-6.0.0-29.el8s.x86_64
+qemu-kvm-block-curl-6.0.0-29.el8s.x86_64
+qemu-kvm-block-gluster-6.0.0-29.el8s.x86_64
+qemu-kvm-block-iscsi-6.0.0-29.el8s.x86_64
+qemu-kvm-block-rbd-6.0.0-29.el8s.x86_64
+qemu-kvm-block-ssh-6.0.0-29.el8s.x86_64
+qemu-kvm-common-6.0.0-29.el8s.x86_64
+qemu-kvm-core-6.0.0-29.el8s.x86_64
+qemu-kvm-docs-6.0.0-29.el8s.x86_64
+qemu-kvm-hw-usbredir-6.0.0-29.el8s.x86_64
+qemu-kvm-ui-opengl-6.0.0-29.el8s.x86_64
+qemu-kvm-ui-spice-6.0.0-29.el8s.x86_64
@@ -934,2 +933,2 @@
-selinux-policy-3.14.3-78.el8.noarch
-selinux-policy-targeted-3.14.3-78.el8.noarch
+selinux-policy-3.14.3-79.el8.noarch
+selinux-policy-targeted-3.14.3-79.el8.noarch
@@ -1007 +1006 @@
-virt-v2v-1.42.0-14.el8s.x86_64
+virt-v2v-1.42.0-15.el8s.x86_64
--
Sandro Bonazzola
MANAGER, SOFTWARE ENGINEERING, EMEA R&D RHV
Red Hat EMEA <https://www.redhat.com/>
sbonazzo(a)redhat.com
<https://www.redhat.com/>
*Red Hat respects your work life balance. Therefore there is no need to
answer this email out of your office hours.*
3 years, 3 months
posix storage mount path error when creating volumes
by Sketch
My cluster was originally built on 4.3, and things were working as long as
my SPM was on 4.3. I just killed off the last 4.3 host and rebuilt it as
4.4, and upgraded my cluster and DC to compatibility level 4.6.
We had cephfs mounted as a posix FS which worked fine, but oddly in 4.3 we
would end up with two mounts for the same volume. The configuration had a
comma separated list of IPs as that is how ceph was configured for
redundancy, and this is the mount that shows up on both 4.3 and 4.4 hosts
(/rhev/data-center/mnt/10.1.88.75,10.1.88.76,10.1.88.77:_vmstore/). But
the 4.3 hosts would also have a duplicate mount which had the FQDN of one
of the servers instead of the comma separated list.
In 4.4, there's only a single mount and existing VMs will start just fine,
but you can't create new disks or migrate existing disks onto the posix
storage volume. My suspicion is this is an issue with the mount parser
not liking the comma in the name of the mount from the error that I get on
the SPM host when it tries to create a volume (migration would also fail
on the volume creation task):
2021-08-31 19:34:07,767-0700 INFO (jsonrpc/6) [vdsm.api] START createVolume(sdUUID='e8ec5645-fc1b-4d64-a145-44aa8ac5ef48', spUUID='2948c860-9bdf-11e8-a6b3-00163e0419f0', imgUUID='7d704b4d-1ebe-462f-b11e-b91039f43637', size='1073741824', volFormat=5, preallocate=1, diskType='DATA', volUUID='be6cb033-4e42-4bf5-a4a3-6ab5bf03edee', desc='{"DiskAlias":"test","DiskDescription":""}', srcImgUUID='00000000-0000-0000-0000-000000000000', srcVolUUID='00000000-0000-0000-0000-000000000000', initialSize=None, addBitmaps=False) from=::ffff:10.1.2.37,43490, flow_id=bb137995-1ffa-429f-b6eb-5b9ca9f8dfd7, task_id=2ddfd1bc-d7e1-4a1e-877a-68e1c2a897ed (api:48)
2021-08-31 19:34:07,767-0700 INFO (jsonrpc/6) [IOProcessClient] (Global) Starting client (__init__:340)
2021-08-31 19:34:07,782-0700 INFO (ioprocess/3193398) [IOProcess] (Global) Starting ioprocess (__init__:465)
2021-08-31 19:34:07,803-0700 INFO (jsonrpc/6) [vdsm.api] FINISH createVolume return=None from=::ffff:10.1.2.37,43490, flow_id=bb137995-1ffa-429f-b6eb-5b9ca9f8dfd7, task_id=2ddfd1bc-d7e1-4a1e-877a-68e1c2a897ed (api:54)
2021-08-31 19:34:07,844-0700 INFO (tasks/5) [storage.ThreadPool.WorkerThread] START task 2ddfd1bc-d7e1-4a1e-877a-68e1c2a897ed (cmd=<bound method Task.commit of <vdsm.storage.task.Task object at 0x7f4894279860>>, args=None) (threadPool:146)
2021-08-31 19:34:07,869-0700 INFO (tasks/5) [storage.StorageDomain] Create placeholder /rhev/data-center/mnt/10.1.88.75,10.1.88.76,10.1.88.77:_vmstore/e8ec5645-fc1b-4d64-a145-44aa8ac5ef48/images/7d704b4d-1ebe-462f-b11e-b91039f43637 for image's volumes (sd:1718)
2021-08-31 19:34:07,869-0700 ERROR (tasks/5) [storage.TaskManager.Task] (Task='2ddfd1bc-d7e1-4a1e-877a-68e1c2a897ed') Unexpected error (task:877)
Traceback (most recent call last):
File "/usr/lib/python3.6/site-packages/vdsm/storage/task.py", line 884, in _run
return fn(*args, **kargs)
File "/usr/lib/python3.6/site-packages/vdsm/storage/task.py", line 350, in run
return self.cmd(*self.argslist, **self.argsdict)
File "/usr/lib/python3.6/site-packages/vdsm/storage/securable.py", line 79, in wrapper
return method(self, *args, **kwargs)
File "/usr/lib/python3.6/site-packages/vdsm/storage/sp.py", line 1945, in createVolume
initial_size=initialSize, add_bitmaps=addBitmaps)
File "/usr/lib/python3.6/site-packages/vdsm/storage/sd.py", line 1216, in createVolume
initial_size=initial_size, add_bitmaps=add_bitmaps)
File "/usr/lib/python3.6/site-packages/vdsm/storage/volume.py", line 1174, in create
imgPath = dom.create_image(imgUUID)
File "/usr/lib/python3.6/site-packages/vdsm/storage/sd.py", line 1721, in create_image
"create_image_rollback", [image_dir])
File "/usr/lib/python3.6/site-packages/vdsm/storage/task.py", line 385, in __init__
self.params = ParamList(argslist)
File "/usr/lib/python3.6/site-packages/vdsm/storage/task.py", line 298, in __init__
raise ValueError("ParamsList: sep %s in %s" % (sep, i))
ValueError: ParamsList: sep , in /rhev/data-center/mnt/10.1.88.75,10.1.88.76,10.1.88.77:_vmstore/e8ec5645-fc1b-4d64-a145-44aa8ac5ef48/images/7d704b4d-1ebe-462f-b11e-b91039f43637
2021-08-31 19:34:07,964-0700 INFO (tasks/5) [storage.ThreadPool.WorkerThread] FINISH task 2ddfd1bc-d7e1-4a1e-877a-68e1c2a897ed (threadPool:148)
This is a pretty major issue since we can no longer create new VMs. As a
workaround, I could change the mount path of the volume to only reference
a single IP, but oVirt won't let me edit the mount. I wonder if I could
manually edit in the database, then reboot the hosts one by one to make
the change take effect without having to shut down hundreds of VMs at
once?
3 years, 3 months