Image upload (with ovirt-imageio) stops abruptly (logs included) - Ovirt 4.5.4, hosted-engine
by adam.puchejda@nask.pl
Hi,
I have a problem using the upload feature in Ovirt GUI (Storage/Disks -> Upload button) that maybe you could help me resolve. The problem is with the transfer that starts... and stops abruptly. What's more, logs (below) do not register any errors. I am using a hosted-engine in a brand new Ovirt 4.5.4 environment with Ovirt nodes running CentOS 8 Stream. I have uploaded the CA certificate to the browser (Chrome, Edge).
Thank you and best wishes,
Adam
Logs:
/var/log/ovirt-imageio/daemon.log
2023-03-15 08:51:17,123 INFO (Thread-25) [http] OPEN connection=25 client=local
2023-03-15 08:51:17,125 INFO (Thread-25) [tickets] [local] ADD transfer={'dirty': False, 'ops': ['write'], 'size': 160432128, 'sparse': True, 'inactivity_timeout': 60, 'transfer_id': '32a96cf6-a0d9-4e85-b80b-034a16a9214b', 'uuid': 'fc5acb59-e8a4-4cbf-bc55-d115aeb760b9', 'timeout': 300, 'url': 'nbd:unix:/run/vdsm/nbd/32a96cf6-a0d9-4e85-b80b-034a16a9214b.sock'}
2023-03-15 08:51:17,127 INFO (Thread-25) [http] CLOSE connection=25 client=local [connection 1 ops, 0.002591 s] [dispatch 1 ops, 0.001178 s]
2023-03-15 08:51:19,440 INFO (Thread-26) [http] OPEN connection=26 client=local
2023-03-15 08:51:19,442 INFO (Thread-26) [http] CLOSE connection=26 client=local [connection 1 ops, 0.001493 s] [dispatch 1 ops, 0.000457 s]
(..)
2023-03-15 08:52:21,808 INFO (Thread-33) [http] OPEN connection=33 client=local
2023-03-15 08:52:21,809 INFO (Thread-33) [tickets] [local] EXTEND timeout=300 transfer=32a96cf6-a0d9-4e85-b80b-034a16a9214b
2023-03-15 08:52:21,810 INFO (Thread-33) [http] CLOSE connection=33 client=local [connection 1 ops, 0.001537 s] [dispatch 1 ops, 0.000518 s]
/var/log/vdsm/vdsm.log
2023-03-15 08:51:16,810+0000 INFO (jsonrpc/5) [vdsm.api] START start_nbd_server(server_id='32a96cf6-a0d9-4e85-b80b-034a16a9214b', config={'detect_zeroes': True, 'discard': True, 'readonly': False, 'bitmap': None, 'sd_id': 'd3896a51-efde-4723-875f-bdc4c737d0d3', 'img_id': '4fbae53f-2f12-4e46-b54f-3ca09fb00755', 'vol_id': '104ea438-49d1-465e-a489-0695f100d325', 'backing_chain': True}) from=::ffff:[IPV4_NUMBER],41446, flow_id=cdd21264-4021-4ee9-aac1-aefe6f244bb1, task_id=0dea4f64-e790-49fe-bc0d-1f40e6f31610 (api:31)
2023-03-15 08:51:16,823+0000 INFO (jsonrpc/5) [storage.nbd] Starting transient service vdsm-nbd-32a96cf6-a0d9-4e85-b80b-034a16a9214b.service, serving /rhev/data-center/mnt/glusterSD/[HOSTNAME]:_ovirt-engine/d3896a51-efde-4723-875f-bdc4c737d0d3/images/4fbae53f-2f12-4e46-b54f-3ca09fb00755/104ea438-49d1-465e-a489-0695f100d325 via unix socket /run/vdsm/nbd/32a96cf6-a0d9-4e85-b80b-034a16a9214b.sock (nbd:138)
2023-03-15 08:51:16,881+0000 INFO (jsonrpc/5) [vdsm.api] FINISH start_nbd_server return={'result': 'nbd:unix:/run/vdsm/nbd/32a96cf6-a0d9-4e85-b80b-034a16a9214b.sock'} from=::ffff:[IPV4],41446, flow_id=cdd21264-4021-4ee9-aac1-aefe6f244bb1, task_id=0dea4f64-e790-49fe-bc0d-1f40e6f31610 (api:37)
2023-03-15 08:51:17,121+0000 INFO (jsonrpc/1) [vdsm.api] START add_image_ticket(ticket={'dirty': False, 'ops': ['write'], 'size': 160432128, 'sparse': True, 'inactivity_timeout': 60, 'transfer_id': '32a96cf6-a0d9-4e85-b80b-034a16a9214b', 'uuid': 'fc5acb59-e8a4-4cbf-bc55-d115aeb760b9', 'timeout': 300, 'url': 'nbd:unix:/run/vdsm/nbd/32a96cf6-a0d9-4e85-b80b-034a16a9214b.sock'}) from=::ffff:[IPV4_NUMBER],41446, flow_id=cdd21264-4021-4ee9-aac1-aefe6f244bb1, task_id=10f6ed38-2135-4c06-a948-af70af88f743 (api:31)
2 years
watchdog: BUG: soft lockup - CPU#3 stuck for XXXs! mitigations?
by Diego Ercolani
Hello,
I noticed that when you have poor "storage" performances, al the VMs are frustrated with entry like the one in the subject.
Searching around there is a case under redhat:
https://access.redhat.com/solutions/5427
that is suggesting to address the issue (if not possible to have rocket performances from the storage) regulating the elevator scheduler in kernel.
but in a virtual machine I have this default queue:
root@openproject:# cat /sys/block/sda/queue/scheduler
[none] mq-deadline
So it seem a little bit outdated
Is this solution resolutive?
Kernel Guru here what suggests?
The problem is also that the datetime function of local VM (also hosted engine) return also not correct date time (often in 2177 year) after these avents creating annoying problems in DaraWareHouse and stucking all the certificate clients....
2 years
RBD Mirror support
by Murilo Morais
Good evening everyone!
Guys, I managed to raise the RBD through Cinder without problem. Everything
works, including removing the Storage Domain (through postgres).
The initial objective was to go up with RBD Mirror but I'm not succeeding,
because Cinder is connecting the volume through KRBD, it doesn't support
Journaling which ends up breaking the Mirror...
Is there any way/configuration for Cinder to start the machine using librbd
instead of KRBD? Because in my scenario we have to use Mirror.
Thanks in advance.
2 years
Failed to remove MBS
by Murilo Morais
Good afternoon everybody!
I have an MBD (Managed Block Storage) Storage Domain that we no longer use,
we want to remove it. We are using version 4.4.10.
When trying to put the Storage Domain in Maintenance, a message appears
saying that it was not possible to remove it and that there is a Task being
executed. I already looked for it but I didn't find it.
Therefore, I cannot put the Storage Domain in Maintenance in the
Datacenter, making it impossible to carry out the removal.
According to a Bug Report [1] the problem has been fixed in version 4.5.0,
the problem is that we cannot perform the update.
In the DB I found two references to this Storage Domain, one in the
`storage_domain_static` table and another in the `cinder_storage` table. Is
removing these two references enough to remove this Storage Domain?
Is there any other way to perform this process manually?
[1] https://bugzilla.redhat.com/show_bug.cgi?id=1959385
2 years
Self Hosted Engine in unaligned state: node are
by Diego Ercolani
Hello,
ovirt-release-host-node-4.5.4-1.el8.x86_64
Today I found my cluster in an unconsinstent state
I have three nodes: ovirt-node2 ovirt-node3 ovirt-node4 with self hosted engine deployed using external nfs storage
My first attempt was to launch hosted-engine --vm-statos on three nodes and I get three inconsinstent states:
[root@ovirt-node2 ~]# hosted-engine --vm-status
The hosted engine configuration has not been retrieved from shared storage yet,
please ensure that ovirt-ha-agent service is running.
--== Host ovirt-node3.ovirt (id: 1) status ==--
Host ID : 1
Host timestamp : 1942858
Score : 3400
Engine status : unknown stale-data
Hostname : ovirt-node3.ovirt
Local maintenance : False
stopped : False
crc32 : 37cf5256
conf_on_shared_storage : True
local_conf_timestamp : 1942859
Status up-to-date : False
Extra metadata (valid at timestamp):
metadata_parse_version=1
metadata_feature_version=1
timestamp=1942858 (Sun Mar 12 01:26:20 2023)
host-id=1
score=3400
vm_conf_refresh_time=1942859 (Sun Mar 12 01:26:22 2023)
conf_on_shared_storage=True
maintenance=False
state=EngineDown
stopped=False
--== Host ovirt-node2.ovirt (id: 2) status ==--
Host ID : 2
Host timestamp : 4425500
Score : 3400
Engine status : unknown stale-data
Hostname : ovirt-node2.ovirt
Local maintenance : False
stopped : False
crc32 : ab944a8a
conf_on_shared_storage : True
local_conf_timestamp : 4425500
Status up-to-date : False
Extra metadata (valid at timestamp):
metadata_parse_version=1
metadata_feature_version=1
timestamp=4425500 (Sun Mar 12 01:26:01 2023)
host-id=2
score=3400
vm_conf_refresh_time=4425500 (Sun Mar 12 01:26:01 2023)
conf_on_shared_storage=True
maintenance=False
state=EngineUp
stopped=False
[root@ovirt-node3 ~]# hosted-engine --vm-status
--== Host ovirt-node4.ovirt (id: 3) status ==--
Host ID : 3
Host timestamp : 4452814
Score : 3400
Engine status : unknown stale-data
Hostname : ovirt-node4.ovirt
Local maintenance : False
stopped : False
crc32 : 95890d21
conf_on_shared_storage : True
local_conf_timestamp : 4452814
Status up-to-date : False
Extra metadata (valid at timestamp):
metadata_parse_version=1
metadata_feature_version=1
timestamp=4452814 (Sun Mar 12 01:25:55 2023)
host-id=3
score=3400
vm_conf_refresh_time=4452814 (Sun Mar 12 01:25:55 2023)
conf_on_shared_storage=True
maintenance=False
state=EngineDown
stopped=False
[root@ovirt-node4 ~]# hosted-engine --vm-status
--== Host ovirt-node3.ovirt (id: 1) status ==--
Host ID : 1
Host timestamp : 1942848
Score : 3400
Engine status : unknown stale-data
Hostname : ovirt-node3.ovirt
Local maintenance : False
stopped : False
crc32 : 7f645fbc
conf_on_shared_storage : True
local_conf_timestamp : 1942848
Status up-to-date : False
Extra metadata (valid at timestamp):
metadata_parse_version=1
metadata_feature_version=1
timestamp=1942848 (Sun Mar 12 01:26:10 2023)
host-id=1
score=3400
vm_conf_refresh_time=1942848 (Sun Mar 12 01:26:10 2023)
conf_on_shared_storage=True
maintenance=False
state=EngineDown
stopped=False
--== Host ovirt-node2.ovirt (id: 2) status ==--
Host ID : 2
Host timestamp : 4428404
Score : 3400
Engine status : unknown stale-data
Hostname : ovirt-node2.ovirt
Local maintenance : False
stopped : False
crc32 : af938ff8
conf_on_shared_storage : True
local_conf_timestamp : 4428404
Status up-to-date : False
Extra metadata (valid at timestamp):
metadata_parse_version=1
metadata_feature_version=1
timestamp=4428404 (Sun Mar 12 02:14:45 2023)
host-id=2
score=3400
vm_conf_refresh_time=4428404 (Sun Mar 12 02:14:45 2023)
conf_on_shared_storage=True
maintenance=False
state=EngineUp
stopped=False
--== Host ovirt-node4.ovirt (id: 3) status ==--
Host ID : 3
Host timestamp : 4470173
Score : 3400
Engine status : unknown stale-data
Hostname : ovirt-node4.ovirt
Local maintenance : False
stopped : False
crc32 : d8fdb650
conf_on_shared_storage : True
local_conf_timestamp : 4470173
Status up-to-date : False
Extra metadata (valid at timestamp):
metadata_parse_version=1
metadata_feature_version=1
timestamp=4470173 (Sun Mar 12 06:15:15 2023)
host-id=3
score=3400
vm_conf_refresh_time=4470173 (Sun Mar 12 06:15:15 2023)
conf_on_shared_storage=True
maintenance=False
state=EngineStarting
stopped=False
Obviously there is something weird happening.
Currently I put my cluster in global maintenance mode but I had to launch hosted-engine --set-maintenance --mode=global both on node3 and node4
Please give me some hint.... During this weekend I received hundreds of mail telling that hosted-engine went in inconsistent state
2 years
VM migration failed
by dobrodej88@gmail.com
Hi, when i tried to migrate vm between hardwarehosts? i got an error like
ERROR (migsrc/53accc0c) [virt.vm] (vmId='53accc0c-15c6-4917-a9c9-5237f7f358b1') Cannot access backing file '/rhev/data-center/mnt/blockSD/30b8a383-f390-405f-ac0c-4ef7a19207fe/images/ca67b349-5a51-43de-96f0-592db000de05/360f96d3-f63f-43c4-bfd5-3a8fb43bd007' of storage file '/rhev/data-center/mnt/blockSD/30b8a383-f390-405f-ac0c-4ef7a19207fe/images/ca67b349-5a51-43de-96f0-592db000de05/f3a72252-904c-48b1-8591-f2dabd343b8e' (as uid:107, gid:107): No such file or directory (migration:282)
but file are exist
PROD [root@olvm1 vdsm]# ll /rhev/data-center/mnt/blockSD/30b8a383-f390-405f-ac0c-4ef7a19207fe/images/ca67b349-5a51-43de-96f0-592db000de05/360f96d3-f63f-43c4-bfd5-3a8fb43bd007
lrwxrwxrwx. 1 vdsm kvm 78 Mar 10 14:59 /rhev/data-center/mnt/blockSD/30b8a383-f390-405f-ac0c-4ef7a19207fe/images/ca67b349-5a51-43de-96f0-592db000de05/360f96d3-f63f-43c4-bfd5-3a8fb43bd007 -> /dev/30b8a383-f390-405f-ac0c-4ef7a19207fe/360f96d3-f63f-43c4-bfd5-3a8fb43bd007
PROD [root@olvm1 vdsm]# qemu-img check /rhev/data-center/mnt/blockSD/30b8a383-f390-405f-ac0c-4ef7a19207fe/images/ca67b349-5a51-43de-96f0-592db000de05/360f96d3-f63f-43c4-bfd5-3a8fb43bd007
No errors were found on the image.
14/1638400 = 0.00% allocated, 57.14% fragmented, 0.00% compressed clusters
Image end offset: 1310720
PROD [root@olvm1 vdsm]# qemu-img info /rhev/data-center/mnt/blockSD/30b8a383-f390-405f-ac0c-4ef7a19207fe/images/ca67b349-5a51-43de-96f0-592db000de05/360f96d3-f63f-43c4-bfd5-3a8fb43bd007
image: /rhev/data-center/mnt/blockSD/30b8a383-f390-405f-ac0c-4ef7a19207fe/images/ca67b349-5a51-43de-96f0-592db000de05/360f96d3-f63f-43c4-bfd5-3a8fb43bd007
file format: qcow2
virtual size: 100 GiB (107374182400 bytes)
disk size: 0 B
cluster_size: 65536
backing file: b7c03917-6fd4-4c88-98e3-e8fcdb3955bd (actual path: /rhev/data-center/mnt/blockSD/30b8a383-f390-405f-ac0c-4ef7a19207fe/images/ca67b349-5a51-43de-96f0-592db000de05/b7c03917-6fd4-4c88-98e3-e8fcdb3955bd)
backing file format: qcow2
Format specific information:
compat: 1.1
lazy refcounts: false
refcount bits: 16
corrupt: false
Please tell me what the reason could be, thank you
2 years
Re: Move disk between POSIX FS and Managed Block Storage
by Benny Zlotnik
This is correct, but since move is essentially means "copy and remove from
source", it can be emulated by performing the two operations separately
(should work for MBS to MBS move as well)
On Fri, Mar 10, 2023 at 4:51 PM Murilo Morais <murilo(a)evocorp.com.br> wrote:
> Benny, thanks for replying, this is exactly what I was looking for.
>
> Another question:
> If the move to functionality has not yet been implemented, then moving
> between Managed Block Devices does not work either, correct?
>
> Em sex., 10 de mar. de 2023 às 11:46, Benny Zlotnik <bzlotnik(a)redhat.com>
> escreveu:
>
>> Move wasn't implement, you can use the copy dialog and delete from the
>> source afterwards
>>
>> On Fri, Mar 10, 2023 at 4:17 PM Murilo Morais <murilo(a)evocorp.com.br>
>> wrote:
>>
>>> Good morning everybody!
>>>
>>> Guys, I managed to connect my oVirt cluster to the CEPH RBD, the VM
>>> disks are stored in CephFS (POSIX FS). How can I move the disk from POSIX
>>> FS to Managed Block Storage?
>>>
>>> Thanks in advance!
>>> _______________________________________________
>>> Users mailing list -- users(a)ovirt.org
>>> To unsubscribe send an email to users-leave(a)ovirt.org
>>> Privacy Statement: https://www.ovirt.org/privacy-policy.html
>>> oVirt Code of Conduct:
>>> https://www.ovirt.org/community/about/community-guidelines/
>>> List Archives:
>>> https://lists.ovirt.org/archives/list/users@ovirt.org/message/XZLN6IFQIQS...
>>>
>>
2 years
Move disk between POSIX FS and Managed Block Storage
by Murilo Morais
Good morning everybody!
Guys, I managed to connect my oVirt cluster to the CEPH RBD, the VM disks
are stored in CephFS (POSIX FS). How can I move the disk from POSIX FS to
Managed Block Storage?
Thanks in advance!
2 years
renew certificates
by Demeter Tibor
Dear listmembers,
We have an ovirt 4.3 hyperconverged system and couple certificates will expire in next month.
/etc/pki/ovirt-engine/certs/ovirt-provider-ovn Apr 11 08:16:33 2023 GMT
/etc/pki/ovirt-engine/certs/ovn-ndb.cer Apr 11 08:16:32 2023 GMT
/etc/pki/ovirt-engine/certs/ovn-sdb.cer Apr 11 08:16:32 2023 GMT
How can I renew these certficates?
Thanks in advance,
Regards
Tibor
2 years
vm disk stuck on "paused by system"
by andreas_nikiforou@hotmail.com
Hi,
i was working on my own backup application (web front end for ovirtsdk) and i somehow managed to get my Vm disks, stuck on status: "paused by system".
I have tried to stop the backup and finalize the backup manually, but nothing works.
ovirtsdk4.Error: Fault reason is "Operation Failed". Fault detail is "[Cannot backup VM: Disk is locked. Please try again later.]". HTTP response code is 409.
ovirtsdk4.Error: Fault reason is "Operation Failed". Fault detail is "[Cannot stop VM backup. There is an active image transfer for VM backup]". HTTP response code is 409.
The API is showing that a backup exists and is in state: <phase>ready</phase>, the backup cannot be transferd.
I i try and take a full new backup of the VM, all i get is more snapshots stuck on "paused by system" i cannot delete the VM or the disk.
Using lastest version of oVirt Software Version:4.5.4-1.el8
2 years