Error attach snapshot via python SDK after update to ovirt 4.5
by luis.figueiredo10@gmail.com
Hi,
I have 1 problem when using the Python SDK to make backups of virtual machine images, after updating to Ovirt 4.5.
Before the update and with Ovirt 4.4 and Centos 8 everything worked.
Scenario:
Ovirt-engine - installed standalone, centos 9 Stream
Host -> Centos 9 Strem - ISO used was ovirt-node-ng-installer-lastest-el0.iso
Storage -> Iscsi
Ovirt-backup -> Vm runnnig on host with centos 8 Stream with python3-ovirt-engine-sdk4 installed
Versions:
Version 4.5.6-1.el9
To reproduce the error, do the following:
1 -> Create a snapshot via interface GUI
2 -> Run this script to find my snaphot id
```
#!/bin/python
import sys
import printf
import ovirtsdk4 as sdk
import time
import configparser
import re
cfg = configparser.ConfigParser()
cfg.readfp(open("/opt/VirtBKP/default.conf"))
url=cfg.get('ovirt-engine', 'api_url')
user=cfg.get('ovirt-engine', 'api_user')
password=cfg.get('ovirt-engine', 'api_password')
ca_file=cfg.get('ovirt-engine', 'api_ca_file')
connection = None
try:
connection = sdk.Connection(url,user,password,ca_file)
# printf.OK("Connection to oVIrt API success %s" % url)
except Exception as ex:
print(ex)
printf.ERROR("Connection to oVirt API has failed")
vm_service = connection.service("vms")
system_service = connection.system_service()
vms_service = system_service.vms_service()
vms = vm_service.list()
for vm in vms:
vm_service = vms_service.vm_service(vm.id)
snaps_service = vm_service.snapshots_service()
snaps_map = {
snap.id: snap.description
for snap in snaps_service.list()
}
for snap_id, snap_description in snaps_map.items():
snap_service = snaps_service.snapshot_service(snap_id)
print("VM: "+vm.name+": "+snap_description+" "+snap_id)
# Close the connection to the server:
connection.close()
```
When i run the script i have the result of VM name + snap ID, so connection with api is OK.
```
[root@ovirt-backup-lab VirtBKP]# ./list_machines_with_snapshots_all
VM: ovirt-backup: Active VM d9631ff9-3a67-49af-b6ae-ed4d164c38ee
VM: ovirt-backup: clean 07478a39-a1e8-4d42-bdf9-aa3464ee85a2
VM: testes: Active VM 46f6ea97-b182-496a-89db-278ca8bcc952
VM: testes: testes123 729d64cb-c939-400c-b7f7-8e675c37a882
```
3- The problem occurs when I try to attach the snapshot to the VM
I run the script below thats run perfect on previous ovirt version 4.4 and centos 8
```
#!/usr/bin/env python
# -*- coding: utf-8 -*-
#
# Copyright (c) 2017 Red Hat, Inc.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
#
import subprocess
import os
import time
DATE = str((time.strftime("%Y%m%d-%H%M")))
# In order to send events we need to also send unique integer ids. These
# should usually come from an external database, but in this example we
# will just generate them from the current time in seconds since Jan 1st
# 1970.
event_id = int(time.time())
from dateutil.relativedelta import relativedelta
import uuid
import ovirtsdk4 as sdk
import ovirtsdk4.types as types
# Init the logging
import logging
logging.basicConfig(level=logging.DEBUG, filename='/var/log/virtbkp/dumpdebug.log')
# This function try to find the device by the SCSI-SERIAL
# which its the disk id.
#
def get_logical_name(diskid):
logicalname="None"
loop=True
timeout=60
i=int(1)
while loop:
if i <= timeout:
logging.debug('[%s] Looking for disk with id \'%s\' (%s/%s).',str((time.strftime("%Y%m%d-%H%M"))),diskid,str(i),str(timeout))
# import udev rules
import pyudev
devices = pyudev.Context().list_devices()
for d in devices.match_property('SCSI_IDENT_SERIAL',diskid):
if d.properties.get('DEVTYPE') == "disk":
logging.debug('[%s] found disk with logical name \'%s\'',str((time.strftime("%Y%m%d-%H%M"))),d.device_node)
logicalname = d.device_node
loop=False
continue
if i == int(timeout/3) or i == int((timeout/3)*2):
os.system("udevadm control --reload-rules && udevadm trigger")
logging.debug('[%s] Reloading udev.',str((time.strftime("%Y%m%d-%H%M"))))
i+=1
time.sleep(1)
else:
logging.error('[%s] Timeout reached, something wrong because we did not find the disk!',str((time.strftime("%Y%m%d-%H%M"))))
loop=False
return logicalname
# cmd="for d in `echo /sys/block/[sv]d*`; do disk=\"`echo $d | cut -d '/' -f4`\"; udevadm info --query=property --name /dev/${disk} | grep '"+diskid+"' 1>/dev/null && echo ${disk}; done"
#
# logging.debug('[%s] using cmd \'[%s]\'.',str((time.strftime("%Y%m%d-%H%M"))),cmd)
# while loop:
# try:
# logging.debug('[%s] running command %s/%s: \'%s\'.',str((time.strftime("%Y%m%d-%H%M"))),str(i),str(timeout),cmd)
# path = subprocess.check_output(cmd, shell=True, universal_newlines=True).replace("\n","")
# logging.debug('[%s] path is \'[%s]\'.',str((time.strftime("%Y%m%d-%H%M"))),str(path))
# if path.startswith("vd") or path.startswith("sd") :
# logicalname = "/dev/" + path
# except:
# if i <= timeout:
# logging.debug('[%s] Looking for disk with id \'%s\'. %s/%s.',str((time.strftime("%Y%m%d-%H%M"))),diskid,str(i),str(timeout))
# time.sleep(1)
# else:
# logging.debug('[%s] something wrong because we did not find this, will dump the disks attached now!',str((time.strftime("%Y%m%d-%H%M"))))
# cmd="for disk in `echo /dev/sd*`; do echo -n \"${disk}: \"; udevadm info --query=property --name $disk|grep SCSI_SERIAL; done"
# debug = subprocess.check_output(cmd, shell=True, universal_newlines=True)
# logging.debug('%s',str(debug))
# loop=False
# i+=1
# continue
# if str(logicalname) != "None":
# logging.debug('[%s] Found disk with id \'%s\' have logical name \'%s\'.',str((time.strftime("%Y%m%d-%H%M"))),diskid,logicalname)
# loop=False
# return logicalname
# This function it's intended to be used to create the image
# from identified this in agent machine.
# We assume this will be run on the guest machine with the
# the disks attached.
def create_qemu_backup(backupdir,logicalname,diskid,diskalias,event_id):
# Advanced options for qemu-img convert, check "man qemu-img"
qemu_options = "-o cluster_size=2M"
# Timout defined for the qemu execution time
# 3600 = 1h, 7200 = 2h, ...
qemu_exec_timeout = 7200
# Define output file name and path
ofile = backupdir + "/" + diskalias + ".qcow2"
# Exec command for making the backup
cmd = "qemu-img convert -O qcow2 "+qemu_options+" "+logicalname+" "+ofile
logging.debug('[%s] Will backup with command \'%s\' with defined timeout \'%s\' seconds.',str((time.strftime("%Y%m%d-%H%M"))),cmd,str(qemu_exec_timeout))
try:
disktimeStarted = time.time()
logging.info('[%s] QEMU backup starting, please hang on while we finish...',str((time.strftime("%Y%m%d-%H%M"))))
events_service.add(
event=types.Event(
vm=types.Vm(
id=data_vm.id,
),
origin=APPLICATION_NAME,
severity=types.LogSeverity.NORMAL,
custom_id=event_id,
description=(
'QEMU backup starting for disk \'%s\'.' % diskalias
),
),
)
event_id += 1
run = subprocess.check_output(cmd,shell=True, timeout=qemu_exec_timeout,universal_newlines=True,stderr=subprocess.STDOUT)
disktimeDelta = time.time() - disktimeStarted
diskrt = relativedelta(seconds=disktimeDelta)
diskexectime=('{:02d}:{:02d}:{:02d}'.format(int(diskrt.hours), int(diskrt.minutes), int(diskrt.seconds)))
logging.info('[%s] Backup finished successfully for disk \'%s\' in \'%s\' .',str((time.strftime("%Y%m%d-%H%M"))),diskalias,str(diskexectime))
events_service.add(
event=types.Event(
vm=types.Vm(
id=data_vm.id,
),
origin=APPLICATION_NAME,
severity=types.LogSeverity.NORMAL,
custom_id=event_id,
description=(
'QEMU backup finished for disk \'%s\' in \'%s\'.' % (diskalias, str(diskexectime))
),
),
)
event_id += 1
return event_id
except subprocess.TimeoutExpired as t:
logging.error('[%s] Timeout of \'%s\' seconds expired, process \'%s\' killed.',str((time.strftime("%Y%m%d-%H%M"))),str(t.timeout),cmd)
events_service.add(
event=types.Event(
vm=types.Vm(
id=data_vm.id,
),
origin=APPLICATION_NAME,
severity=types.LogSeverity.ERROR,
custom_id=event_id,
description=(
'Timeout of \'%s\' seconds expired, process \'%s\' killed.' % (str((time.strftime("%Y%m%d-%H%M"))),str(t.timeout))
),
),
)
event_id += 1
return event_id
except subprocess.CalledProcessError as e:
logging.error('[%s] Execution error, command output was:',str((time.strftime("%Y%m%d-%H%M"))))
events_service.add(
event=types.Event(
vm=types.Vm(
id=data_vm.id,
),
origin=APPLICATION_NAME,
severity=types.LogSeverity.ERROR,
custom_id=event_id,
description=('Execution error.'),
),
)
event_id += 1
logging.error('[%s] %s',str((time.strftime("%Y%m%d-%H%M"))),str(e.output))
events_service.add(
event=types.Event(
vm=types.Vm(
id=data_vm.id,
),
origin=APPLICATION_NAME,
severity=types.LogSeverity.ERROR,
custom_id=event_id,
description=(
'\'%s\'' % (str(e.output))
),
),
)
event_id += 1
return event_id
# Arguments
import sys
DATA_VM_BYPASSDISKS="None"
if len(sys.argv) < 3:
print("You must specify the right arguments!")
exit(1)
elif len(sys.argv) < 4:
DATA_VM_NAME = sys.argv[1]
SNAP_ID = sys.argv[2]
else:
exit(2)
logging.debug(
'[%s] Launched with arguments on vm \'%s\' and bypass disks \'%s\'.',
str((time.strftime("%Y%m%d-%H%M"))),
DATA_VM_NAME,
DATA_VM_BYPASSDISKS
)
# Parse de ini file with the configurations
import configparser
cfg = configparser.ConfigParser()
cfg.readfp(open("/opt/VirtBKP/default.conf"))
#BCKDIR = '/mnt/backups'
BCKDIR = cfg.get('ovirt-engine','backupdir')
# The connection details:
API_URL = cfg.get('ovirt-engine','api_url')
API_USER = cfg.get('ovirt-engine','api_user')
API_PASSWORD = cfg.get('ovirt-engine','api_password')
# The file containing the certificat of the CA used by the server. In
# an usual installation it will be in the file '/etc/pki/ovirt-engine/ca.pem'.
#API_CA_FILE = '/opt/VirtBKP/ca.crt'
API_CA_FILE = cfg.get('ovirt-engine','api_ca_file')
# The name of the application, to be used as the 'origin' of events
# sent to the audit log:
APPLICATION_NAME = 'Image Backup Service'
# The name of the virtual machine where we will attach the disks in
# order to actually back-up them. This virtual machine will usually have
# some kind of back-up software installed.
#AGENT_VM_NAME = 'ovirt-backup'
AGENT_VM_NAME = cfg.get('ovirt-engine','agent_vm_name')
## Connect to the server:
#connection = sdk.Connection(
# url=API_URL,
# username=API_USER,
# password=API_PASSWORD,
# ca_file=API_CA_FILE,
# debug=True,
# log=logging.getLogger(),
#)
#logging.info('[%s] Connected to the server.',str((time.strftime("%Y%m%d-%H%M"))))
cfg = configparser.ConfigParser()
cfg.readfp(open("/opt/VirtBKP/default.conf"))
url=cfg.get('ovirt-engine', 'api_url')
user=cfg.get('ovirt-engine', 'api_user')
password=cfg.get('ovirt-engine', 'api_password')
ca_file=cfg.get('ovirt-engine', 'api_ca_file')
connection = None
try:
connection = sdk.Connection(url,user,password,ca_file)
logging.info('[%s] Connected to the server.',str((time.strftime("%Y%m%d-%H%M"))))
except Exception as ex:
print(ex)
printf.ERROR("Connection to oVirt API has failed")
# Get the reference to the root of the services tree:
system_service = connection.system_service()
# Get the reference to the service that we will use to send events to
# the audit log:
events_service = system_service.events_service()
# Timer count for global process
totaltimeStarted = time.time()
# Get the reference to the service that manages the virtual machines:
vms_service = system_service.vms_service()
# Find the virtual machine that we want to back up. Note that we need to
# use the 'all_content' parameter to retrieve the retrieve the OVF, as
# it isn't retrieved by default:
data_vm = vms_service.list(
search='name=%s' % DATA_VM_NAME,
all_content=True,
)[0]
logging.info(
'[%s] Found data virtual machine \'%s\', the id is \'%s\'.',
str((time.strftime("%Y%m%d-%H%M"))), data_vm.name, data_vm.id,
)
# Find the virtual machine were we will attach the disks in order to do
# the backup:
agent_vm = vms_service.list(
search='name=%s' % AGENT_VM_NAME,
)[0]
logging.info(
'[%s] Found agent virtual machine \'%s\', the id is \'%s\'.',
str((time.strftime("%Y%m%d-%H%M"))), agent_vm.name, agent_vm.id,
)
# Find the services that manage the data and agent virtual machines:
data_vm_service = vms_service.vm_service(data_vm.id)
agent_vm_service = vms_service.vm_service(agent_vm.id)
# Create an unique description for the snapshot, so that it is easier
# for the administrator to identify this snapshot as a temporary one
# created just for backup purposes:
#snap_description = '%s-backup-%s' % (data_vm.name, uuid.uuid4())
snap_description = 'BACKUP_%s_%s' % (data_vm.name, DATE)
# Send an external event to indicate to the administrator that the
# backup of the virtual machine is starting. Note that the description
# of the event contains the name of the virtual machine and the name of
# the temporary snapshot, this way, if something fails, the administrator
# will know what snapshot was used and remove it manually.
#events_service.add(
# event=types.Event(
# vm=types.Vm(
# id=data_vm.id,
# ),
# origin=APPLICATION_NAME,
# severity=types.LogSeverity.NORMAL,
# custom_id=event_id,
# description=(
# 'Backup of virtual machine \'%s\' using snapshot \'%s\' is '
# 'starting.' % (data_vm.name, snap_description)
# ),
# ),
#)
#event_id += 1
# Create the structure we will use to deploy the backup data
#bckfullpath = BCKDIR + "/" + data_vm.name + "/" + str((time.strftime("%Y%m%d-%H%M")))
#mkdir = "mkdir -p " + bckfullpath
#subprocess.call(mkdir, shell=True)
#logging.debug(
# '[%s] Created directory \'%s\' as backup destination.',
# str((time.strftime("%Y%m%d-%H%M"))),
# bckfullpath
#)
# Send the request to create the snapshot. Note that this will return
# before the snapshot is completely created, so we will later need to
# wait till the snapshot is completely created.
# The snapshot will not include memory. Change to True the parameter
# persist_memorystate to get it (in that case the VM will be paused for a while).
snaps_service = data_vm_service.snapshots_service()
#snap = snaps_service.add(
# snapshot=types.Snapshot(
# description=snap_description,
# persist_memorystate=False,
# ),
#)
#logging.info(
# '[%s] Sent request to create snapshot \'%s\', the id is \'%s\'.',
# str((time.strftime("%Y%m%d-%H%M"))), snap.description, snap.id,
#)
# Poll and wait till the status of the snapshot is 'ok', which means
# that it is completely created:
snap_id = SNAP_ID
snap_service = snaps_service.snapshot_service(snap_id)
#while snap.snapshot_status != types.SnapshotStatus.OK:
# logging.info(
# '[%s] Waiting till the snapshot is created, the status is now \'%s\'.',
# str((time.strftime("%Y%m%d-%H%M"))),
# snap.snapshot_status
# )
# time.sleep(1)
# snap = snap_service.get()
#logging.info('[%s] The snapshot is now complete.',str((time.strftime("%Y%m%d-%H%M"))))
# Retrieve the descriptions of the disks of the snapshot:
snap_disks_service = snap_service.disks_service()
snap_disks = snap_disks_service.list()
# Attach all the disks of the snapshot to the agent virtual machine, and
# save the resulting disk attachments in a list so that we can later
# detach them easily:
attachments_service = agent_vm_service.disk_attachments_service()
attachments = []
for snap_disk in snap_disks:
attachment = attachments_service.add(
attachment=types.DiskAttachment(
disk=types.Disk(
id=snap_disk.id,
snapshot=types.Snapshot(
id=snap_id,
),
),
active=True,
bootable=False,
interface=types.DiskInterface.VIRTIO_SCSI,
),
)
attachments.append(attachment)
logging.info(
'[%s] Attached disk \'%s\' to the agent virtual machine \'%s\'.',
str((time.strftime("%Y%m%d-%H%M"))),attachment.disk.id, agent_vm.name
)
print(f"Attach disk:{attachment.disk.id} to the agent vm:{agent_vm.name}")
# Now the disks are attached to the virtual agent virtual machine, we
# can then ask that virtual machine to perform the backup. Doing that
# requires a mechanism to talk to the backup software that runs inside the
# agent virtual machine. That is outside of the scope of the SDK. But if
# the guest agent is installed in the virtual machine then we can
# provide useful information, like the identifiers of the disks that have
# just been attached.
#for attachment in attachments:
# if attachment.logical_name is not None:
# logging.info(
# '[%s] Logical name for disk \'%s\' is \'%s\'.',
# str((time.strftime("%Y%m%d-%H%M"))), attachment.disk.id, attachment.logical_name,
# )
# else:
# logging.info(
# '[%s] The logical name for disk \'%s\' isn\'t available. Is the '
# 'guest agent installed?',
# str((time.strftime("%Y%m%d-%H%M"))),
# attachment.disk.id,
# )
# Close the connection to the server:
connection.close()
```
The result of command is
root@ovirt-backup-lab VirtBKP]# ./teste.py ovirt-backup "729d64cb-c939-400c-b7f7-8e675c37a882"
Traceback (most recent call last):
File "./teste.py", line 412, in <module>
interface=types.DiskInterface.VIRTIO_SCSI,
File "/usr/lib64/python3.6/site-packages/ovirtsdk4/services.py", line 7147, in add
return self._internal_add(attachment, headers, query, wait)
File "/usr/lib64/python3.6/site-packages/ovirtsdk4/service.py", line 232, in _internal_add
return future.wait() if wait else future
File "/usr/lib64/python3.6/site-packages/ovirtsdk4/service.py", line 55, in wait
return self._code(response)
File "/usr/lib64/python3.6/site-packages/ovirtsdk4/service.py", line 229, in callback
self._check_fault(response)
File "/usr/lib64/python3.6/site-packages/ovirtsdk4/service.py", line 132, in _check_fault
self._raise_error(response, body)
File "/usr/lib64/python3.6/site-packages/ovirtsdk4/service.py", line 118, in _raise_error
raise error
ovirtsdk4.Error: Fault reason is "Operation Failed". Fault detail is "[Failed to hot-plug disk]". HTTP response code is 400.
On engine log i have this
2024-09-25 18:30:00,132+01 INFO [org.ovirt.engine.core.sso.service.AuthenticationService] (default task-27) [] User admin@internal-authz with profile [internal] successfully logged in with scopes: ovirt-app-api ovirt-ext=token-info:authz-search ovirt-ext=token-info:public-authz-search ovirt-ext=token-info:validate ovirt-ext=token:password-access
2024-09-25 18:30:00,146+01 INFO [org.ovirt.engine.core.bll.aaa.CreateUserSessionCommand] (default task-27) [3f712b2] Running command: CreateUserSessionCommand internal: false.
2024-09-25 18:30:00,149+01 INFO [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (default task-27) [3f712b2] EVENT_ID: USER_VDC_LOGIN(30), User admin@internal-authz connecting from '172.24.0.13' using session 'zVRF1cTF22ex9oRy2wcK/NIjfr2FwY2+AqhcRFv02J6tDPEoAC7YB329VVcrGrPoQcxJLRIokEDvt8j/PySxcg==' logged in.
2024-09-25 18:30:00,404+01 INFO [org.ovirt.engine.core.bll.storage.disk.AttachDiskToVmCommand] (default task-27) [8ee9996a-2ef8-42a4-a72b-18d406cc9199] Lock Acquired to object 'EngineLock:{exclusiveLocks='[6dc760fb-13bd-4285-b474-2ff5e39af74e=DISK]', sharedLocks=''}'
2024-09-25 18:30:00,415+01 INFO [org.ovirt.engine.core.bll.storage.disk.AttachDiskToVmCommand] (default task-27) [8ee9996a-2ef8-42a4-a72b-18d406cc9199] Running command: AttachDiskToVmCommand internal: false. Entities affected : ID: e302e3ac-6101-4317-b19d-46d951237122 Type: VMAction group CONFIGURE_VM_STORAGE with role type USER, ID: 6dc760fb-13bd-4285-b474-2ff5e39af74e Type: DiskAction group ATTACH_DISK with role type USER
2024-09-25 18:30:00,421+01 INFO [org.ovirt.engine.core.vdsbroker.vdsbroker.HotPlugDiskVDSCommand] (default task-27) [8ee9996a-2ef8-42a4-a72b-18d406cc9199] START, HotPlugDiskVDSCommand(HostName = hv1, HotPlugDiskVDSParameters:{hostId='35bd2f95-464f-4080-98e9-729a05f1a39b', vmId='e302e3ac-6101-4317-b19d-46d951237122', diskId='6dc760fb-13bd-4285-b474-2ff5e39af74e'}), log id: 7c9d787c
2024-09-25 18:30:00,422+01 INFO [org.ovirt.engine.core.vdsbroker.vdsbroker.HotPlugDiskVDSCommand] (default task-27) [8ee9996a-2ef8-42a4-a72b-18d406cc9199] Disk hot-plug: <?xml version="1.0" encoding="UTF-8"?><hotplug>
<devices>
<disk snapshot="no" type="file" device="disk">
<target dev="sda" bus="scsi"/>
<source file="/rhev/data-center/mnt/blockSD/0b52e59e-fe08-4e18-8273-228955bba3b7/images/6dc760fb-13bd-4285-b474-2ff5e39af74e/46bcd0f2-33ff-40a3-a32b-f39b34e99f77">
<seclabel model="dac" type="none" relabel="no"/>
</source>
<driver name="qemu" io="threads" type="qcow2" error_policy="stop" cache="writethrough"/>
<alias name="ua-6dc760fb-13bd-4285-b474-2ff5e39af74e"/>
<address bus="0" controller="0" unit="1" type="drive" target="0"/>
<serial>6dc760fb-13bd-4285-b474-2ff5e39af74e</serial>
</disk>
</devices>
<metadata xmlns:ovirt-vm="http://ovirt.org/vm/1.0">
<ovirt-vm:vm>
<ovirt-vm:device devtype="disk" name="sda">
<ovirt-vm:poolID>042813a0-7a69-11ef-a340-ac1f6b165d0d</ovirt-vm:poolID>
<ovirt-vm:volumeID>46bcd0f2-33ff-40a3-a32b-f39b34e99f77</ovirt-vm:volumeID>
<ovirt-vm:shared>transient</ovirt-vm:shared>
<ovirt-vm:imageID>6dc760fb-13bd-4285-b474-2ff5e39af74e</ovirt-vm:imageID>
<ovirt-vm:domainID>0b52e59e-fe08-4e18-8273-228955bba3b7</ovirt-vm:domainID>
</ovirt-vm:device>
</ovirt-vm:vm>
</metadata>
</hotplug>
2024-09-25 18:30:11,069+01 ERROR [org.ovirt.engine.core.vdsbroker.vdsbroker.HotPlugDiskVDSCommand] (default task-27) [8ee9996a-2ef8-42a4-a72b-18d406cc9199] Failed in 'HotPlugDiskVDS' method
2024-09-25 18:30:11,075+01 ERROR [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (default task-27) [8ee9996a-2ef8-42a4-a72b-18d406cc9199] EVENT_ID: VDS_BROKER_COMMAND_FAILURE(10,802), VDSM hv1 command HotPlugDiskVDS failed: internal error: unable to execute QEMU command 'blockdev-add': Could not open '/rhev/data-center/mnt/blockSD/0b52e59e-fe08-4e18-8273-228955bba3b7/images/6dc760fb-13bd-4285-b474-2ff5e39af74e/46bcd0f2-33ff-40a3-a32b-f39b34e99f77': No such file or directory
2024-09-25 18:30:11,075+01 INFO [org.ovirt.engine.core.vdsbroker.vdsbroker.HotPlugDiskVDSCommand] (default task-27) [8ee9996a-2ef8-42a4-a72b-18d406cc9199] Command 'org.ovirt.engine.core.vdsbroker.vdsbroker.HotPlugDiskVDSCommand' return value 'StatusOnlyReturn [status=Status [code=45, message=internal error: unable to execute QEMU command 'blockdev-add': Could not open '/rhev/data-center/mnt/blockSD/0b52e59e-fe08-4e18-8273-228955bba3b7/images/6dc760fb-13bd-4285-b474-2ff5e39af74e/46bcd0f2-33ff-40a3-a32b-f39b34e99f77': No such file or directory]]'
2024-09-25 18:30:11,075+01 INFO [org.ovirt.engine.core.vdsbroker.vdsbroker.HotPlugDiskVDSCommand] (default task-27) [8ee9996a-2ef8-42a4-a72b-18d406cc9199] HostName = hv1
2024-09-25 18:30:11,075+01 ERROR [org.ovirt.engine.core.vdsbroker.vdsbroker.HotPlugDiskVDSCommand] (default task-27) [8ee9996a-2ef8-42a4-a72b-18d406cc9199] Command 'HotPlugDiskVDSCommand(HostName = hv1, HotPlugDiskVDSParameters:{hostId='35bd2f95-464f-4080-98e9-729a05f1a39b', vmId='e302e3ac-6101-4317-b19d-46d951237122', diskId='6dc760fb-13bd-4285-b474-2ff5e39af74e'})' execution failed: VDSGenericException: VDSErrorException: Failed to HotPlugDiskVDS, error = internal error: unable to execute QEMU command 'blockdev-add': Could not open '/rhev/data-center/mnt/blockSD/0b52e59e-fe08-4e18-8273-228955bba3b7/images/6dc760fb-13bd-4285-b474-2ff5e39af74e/46bcd0f2-33ff-40a3-a32b-f39b34e99f77': No such file or directory, code = 45
2024-09-25 18:30:11,075+01 INFO [org.ovirt.engine.core.vdsbroker.vdsbroker.HotPlugDiskVDSCommand] (default task-27) [8ee9996a-2ef8-42a4-a72b-18d406cc9199] FINISH, HotPlugDiskVDSCommand, return: , log id: 7c9d787c
2024-09-25 18:30:11,075+01 ERROR [org.ovirt.engine.core.bll.storage.disk.AttachDiskToVmCommand] (default task-27) [8ee9996a-2ef8-42a4-a72b-18d406cc9199] Command 'org.ovirt.engine.core.bll.storage.disk.AttachDiskToVmCommand' failed: EngineException: org.ovirt.engine.core.vdsbroker.vdsbroker.VDSErrorException: VDSGenericException: VDSErrorException: Failed to HotPlugDiskVDS, error = internal error: unable to execute QEMU command 'blockdev-add': Could not open '/rhev/data-center/mnt/blockSD/0b52e59e-fe08-4e18-8273-228955bba3b7/images/6dc760fb-13bd-4285-b474-2ff5e39af74e/46bcd0f2-33ff-40a3-a32b-f39b34e99f77': No such file or directory, code = 45 (Failed with error FailedToPlugDisk and code 45)
2024-09-25 18:30:11,076+01 INFO [org.ovirt.engine.core.bll.CommandCompensator] (default task-27) [8ee9996a-2ef8-42a4-a72b-18d406cc9199] Command [id=4181b4c0-f988-4e96-8dbc-6917328bfdc5]: Compensating NEW_ENTITY_ID of org.ovirt.engine.core.common.businessentities.storage.DiskVmElement; snapshot: VmDeviceId:{deviceId='6dc760fb-13bd-4285-b474-2ff5e39af74e', vmId='e302e3ac-6101-4317-b19d-46d951237122'}.
2024-09-25 18:30:11,076+01 INFO [org.ovirt.engine.core.bll.CommandCompensator] (default task-27) [8ee9996a-2ef8-42a4-a72b-18d406cc9199] Command [id=4181b4c0-f988-4e96-8dbc-6917328bfdc5]: Compensating NEW_ENTITY_ID of org.ovirt.engine.core.common.businessentities.VmDevice; snapshot: VmDeviceId:{deviceId='6dc760fb-13bd-4285-b474-2ff5e39af74e', vmId='e302e3ac-6101-4317-b19d-46d951237122'}.
2024-09-25 18:30:11,083+01 ERROR [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (default task-27) [8ee9996a-2ef8-42a4-a72b-18d406cc9199] EVENT_ID: USER_FAILED_ATTACH_DISK_TO_VM(2,017), Failed to attach Disk testes_Disk1 to VM ovirt-backup (User: admin@internal-authz).
2024-09-25 18:30:11,083+01 INFO [org.ovirt.engine.core.bll.storage.disk.AttachDiskToVmCommand] (default task-27) [8ee9996a-2ef8-42a4-a72b-18d406cc9199] Lock freed to object 'EngineLock:{exclusiveLocks='[6dc760fb-13bd-4285-b474-2ff5e39af74e=DISK]', sharedLocks=''}'
2024-09-25 18:30:11,083+01 ERROR [org.ovirt.engine.api.restapi.resource.AbstractBackendResource] (default task-27) [] Operation Failed: [Failed to hot-plug disk]
This is scenario on my LAB Test, in production i have 4 hosts, 3 on Centos 8 and 1 on Centos 9 Ovirt node image, and if my ovirt-backup machine running on host with centos 9 Ovirt Node i have the same error that mentioned above, but if i migrate my ovirt-backup to another host with Centos 8 do not have this problem.
Any idea what could cause this problem?
Sorry if my English isn't understandable
Thanks
Luís Figueiredo
1 month, 1 week
Re: [External] : Lost console access to VMs after updating
by malcolm.strydom@pacxa.com
Thanks Marcos,
I verified VNC encryption had not been turned on at the Cluster level as you described. I used the UI and put each host into maintenance mode one at a time and then chose the reinstall option. After it was done and the host rebooted I tested again and no change. I still have no consoles to any VMs.
Malcolm
1 month, 1 week
cannot access the Storage Domain(s) <UNKNOWN> attached to the Data Center Default.
by Bill James
I'm trying to add 3 more nodes to an active cluster. 3 previous nodes are
working fine.
I've tried 2 new nodes and they get the same error.
It seems strange to me cause the volume is mounted.
[root@ovirt5n prod vdsm]# df -h|grep rhev
10.2.2.230:/vol/ovirt_inside_export 440G 57G 384G 13% /rhev
/data-center/mnt/10.2.2.230:_vol_ovirt__inside__export
ovirt1n-gl.j2noc.com:/gv0 11T 239G 11T 3% /rhev
/data-center/mnt/glusterSD/ovirt1n-gl.j2noc.com:_gv0
engine.log says:
cannot access the Storage Domain(s) <UNKNOWN> attached to the Data Center
Default.
vdsm.log says:
2024-10-09 14:55:06,726-0700 ERROR (jsonrpc/2) [storage.dispatcher] FINISH
connectStoragePool error=[Errno 13] Permission de
nied (dispatcher:70)
Traceback (most recent call last):
File "/usr/lib/python3.9/site-packages/vdsm/storage/dispatcher.py", line
57, in wrapper
result = ctask.prepare(func, *args, **kwargs)
File "/usr/lib/python3.9/site-packages/vdsm/storage/task.py", line 93, in
wrapper
return m(self, *a, **kw)
File "/usr/lib/python3.9/site-packages/vdsm/storage/task.py", line 1173,
in prepare
.....
File "/usr/lib/python3.9/site-packages/ioprocess/__init__.py", line 479,
in _sendCommand
raise OSError(errcode, errstr)
PermissionError: [Errno 13] Permission denied
2024-10-09 14:55:06,726-0700 INFO (jsonrpc/2) [jsonrpc.JsonRpcServer] RPC
call StoragePool.connect failed (error 302) in 2.
09 seconds (__init__:300)
OS: Rocky 9
vdsm-4.50.5.1-1.el9.x86_64
Any ideas welcome.
Thanks
vdsm.log and engine.log attached.
1 month, 1 week
freezing VMs
by change_jeeringly679@dralias.com
Hey,
I'm wondering if anyone is experiencing freezing VMs? Especially Windows servers and especially with a lot of RAM.
I found the following here:
https://forum.proxmox.com/threads/vms-freeze-with-100-cpu.127459/page-11
Post #218 and further down are really interesting, as the conclusion is that there might a problem in kernels between 4.18.0-372.26.1 all the way up to Mainstream 6.3 or LTS 6.1 kernels. The problem appears to be, that mmu_notifier_seq is referenced as an integer in the is_page_fault_stale() function, causing KVM to freeze when the counter reaches max integer - 2,147,483,647.
Post #220 seems to state, that it was fixed in commit ba6e3fe25543 in the kernel.
The problem is, that the latest kernel of my nodes running on Centos 8 are running 4.18.0-408.el8.x86_64, which is affected. I could try and downgrade on those nodes, but would lock me to the unsupported CentOS 8 oVirt nodes.
I tried a new oVirt node based on CentOS 9, en it comes with 5.14.0-514.el9.x86_64, which is also affected by the looks of it. I tried upgrading the kernel on CentOS 9 to the latest 6.1 LTS kernel og the latest 6.11 Mainstream kernel, and while the node works fine, it does not work for oVirt. The node cannot be activated, once the new kernel is in use.
Is I'm a fixer and not a develloper, I think the task migh be too big for me to fix ovirt and make it work with 6.1/6.3 kernels. My last attempt is going to be an attempt to backport the fix to the 5.14 kernel supplied with oVirt based on CentOS 9 nodes.
I know... I should probably look for a new solution, but oVirt has been running our many VMs quite well, at an affordable price. Yes we have more work fixing various issues that pop up from time to time, but if left alone, it does work quite well and stable.
Has anyone else encountered these issues?
//J
1 month, 1 week
Re: [External] : cannot access the Storage Domain(s) <UNKNOWN> attached to the Data Center Default.
by Bill James
basically following these steps:
https://www.ovirt.org/download/install_on_rhel.html
How do I know what storage domain it is saying it can't access?
From what I see in engine log its:
2024-10-10 08:40:09,219-07 ERROR
[org.ovirt.engine.core.bll.InitVdsOnUpCommand]
(EE-ManagedThreadFactory-engine-Thread-14) [
] Could not connect host 'ovirt4n.j2noc.com' to pool 'Default': Error
storage pool connection: ("spUUID=16933ed6-28dc-11ef-baff-c4cbe1c7b014,
msdUUID=9df6c434-c848-4987-afd2-9fd8458ed78a, masterVersion=1, hostID=4,
domainsMap={'9cc85468-53f3-49a9-bf05-359b0d715fd3': 'active',
'9df6c434-c848-4987-afd2-9fd8458ed78a': 'active'}",)
9df6c434-c848-4987-afd2-9fd8458ed78a is ovirt1n-gv0 which is already
mounted and working fine.
[root@ovirt5n prod vdsm]# df -h |grep gv0
ovirt1n-gl.j2noc.com:/gv0 11T 239G 11T 3%
/rhev/data-center/mnt/glusterSD/ovirt1n-gl.j2noc.com:_gv0
[root@ovirt5n prod vdsm]# cd
/rhev/data-center/mnt/glusterSD/ovirt1n-gl.j2noc.com:_gv0
[root@ovirt5n prod ovirt1n-gl.j2noc.com:_gv0]# ls -l
total 4
drwxr-xr-x 5 vdsm kvm 4096 Sep 13 11:27 9df6c434-c848-4987-afd2-9fd8458ed78a
[root@ovirt5n prod ovirt1n-gl.j2noc.com:_gv0]# cd
9df6c434-c848-4987-afd2-9fd8458ed78a/images/
[root@ovirt5n prod images]# ls -l
total 48
drwxr-xr-x 2 vdsm kvm 4096 Sep 30 17:33 09393484-e5e8-49cc-bce4-393b01d84ebf
drwxr-xr-x 2 vdsm kvm 4096 Sep 13 14:41 0b761aa1-f61f-422f-acf4-43315745c671
drwxr-xr-x 2 vdsm kvm 4096 Sep 17 11:58 109e1d23-3007-424e-a787-56b5c2631e58
drwxr-xr-x 2 vdsm kvm 4096 Sep 13 15:53 11e7c409-686e-4e71-b9e8-8bb503fc91c0
drwxr-xr-x 2 vdsm kvm 4096 Sep 17 12:00 2e84682f-9351-4401-99fa-e58575567136
drwxr-xr-x 2 vdsm kvm 4096 Sep 13 14:46 8f1cd1ad-9836-4abd-bba8-e569ec38311e
drwxr-xr-x 2 vdsm kvm 4096 Sep 13 14:41 aaceb015-e849-4b40-bb4a-22af2ccb8a9d
drwxr-xr-x 2 vdsm kvm 4096 Sep 13 15:23 ada1430e-ba3d-4b02-85c8-d0ee01f5eb37
drwxr-xr-x 2 vdsm kvm 4096 Sep 17 11:59 b1950739-d63d-41e7-8b5b-c00279b6453b
drwxr-xr-x 2 vdsm kvm 4096 Sep 13 15:26 bb6c8234-23e9-47f4-93e2-5ebf6a028320
drwxr-xr-x 2 vdsm kvm 4096 Sep 30 16:55 c15562ca-a94e-4e8b-a957-81eb0441d9da
drwxr-xr-x 2 vdsm kvm 4096 Sep 30 17:33 cb1ffbda-229a-4393-a10a-b2e67bb80bfc
[root@ovirt5n prod images]#
why does it say it can't access it??
On Thu, Oct 10, 2024 at 5:22 AM Marcos Sungaila <marcos.sungaila(a)oracle.com>
wrote:
> Hi Bill,
>
>
>
> Which steps you ran through on the KVM hosts before trying to add them to
> the Cluster?
>
>
>
> Marcos
>
>
>
> *From:* Bill James <bill.james(a)consensus.com>
> *Sent:* Wednesday, October 9, 2024 7:02 PM
> *To:* users <users(a)ovirt.org>
> *Subject:* [External] : [ovirt-users] cannot access the Storage Domain(s)
> <UNKNOWN> attached to the Data Center Default.
>
>
>
> I'm trying to add 3 more nodes to an active cluster. 3 previous nodes are
> working fine.
>
> I've tried 2 new nodes and they get the same error.
>
> It seems strange to me cause the volume is mounted.
>
>
>
> [root@ovirt5n prod vdsm]# df -h|grep rhev
> 10.2.2.230:/vol/ovirt_inside_export 440G 57G 384G 13% /*rhev*
> /data-center/mnt/10.2.2.230:_vol_ovirt__inside__export
> ovirt1n-gl.j2noc.com:/gv0 11T 239G 11T 3% /*rhev*
> /data-center/mnt/glusterSD/ovirt1n-gl.j2noc.com:_gv0
>
>
>
> engine.log says:
>
> cannot access the Storage Domain(s) <UNKNOWN> attached to the Data Center
> Default.
>
>
>
> vdsm.log says:
>
> 2024-10-09 14:55:06,726-0700 ERROR (jsonrpc/2) [storage.dispatcher] FINISH
> connectStoragePool error=[Errno 13] Permission de
> nied (dispatcher:70)
> Traceback (most recent call last):
> File "/usr/lib/python3.9/site-packages/vdsm/storage/dispatcher.py", line
> 57, in wrapper
> result = ctask.prepare(func, *args, **kwargs)
> File "/usr/lib/python3.9/site-packages/vdsm/storage/task.py", line 93, in
> wrapper
> return m(self, *a, **kw)
> File "/usr/lib/python3.9/site-packages/vdsm/storage/task.py", line 1173,
> in prepare
> .....
>
> File "/usr/lib/python3.9/site-packages/ioprocess/__init__.py", line 479,
> in _sendCommand
> raise OSError(errcode, errstr)
> PermissionError: [Errno 13] Permission denied
> 2024-10-09 14:55:06,726-0700 INFO (jsonrpc/2) [jsonrpc.JsonRpcServer] RPC
> call StoragePool.connect failed (error 302) in 2.
> 09 seconds (__init__:300)
>
>
>
> OS: Rocky 9
>
> *vdsm*-4.50.5.1-1.el9.x86_64
>
>
>
> Any ideas welcome.
>
> Thanks
>
>
>
> vdsm.log and engine.log attached.
>
>
>
1 month, 1 week
Importing Storage Domain's Export from one Data Center to Another Data Center
by calebmibaker@outlook.com
Is it possible to perform disaster recovery importing a Storage Domain's Export from one Data Center to another in a manually fashion without Ansible? I'm trying to import Data Center 1's Storage Domain Export by cloning the storage domain export and importing it into Data Center 2's environment through IP address & NFS name within the oVirt's web portal. However, I receive this error message: there is no storage domain under the specified path. Below is what I have done to tried to troubleshoot the issue. After looking into the documentation, I haven't seen information for this method. Should I consider another option? Is there a preferred way of testing DR within an environment? Thank you.
-I have detach both of the export storage domains from both data centers before importing the cloned export.
-Changing import settings
-I have adjusted the metadata information within the cloned export after mounting it and reviewing the logs.
1 month, 1 week
iSCSI Storage Issue - move Hosts!?
by steve-pa@hotmail.com
Hello everyone,
I am currently trying to set up my VM environment again.
Status now:
Host1 + iSCSI storage connection
Plan:
Host2 + same iSCSI storage as above
Now i want to move all VMs to Host2, Host2 is already included and I can start all VMs on Host2.
Problem:
Storage -> Manage Domain: Host1 is connected to the storage here. I have no possibility to switch to Host2.
What happens if I now put Host1 into maintenance mode or delete it completely? Will all my systems then crash because the storage connection has been lost?
1 month, 1 week
Connect to hypervisor node with Virt-Manager
by luc.lalonde@polymtl.ca
Hello,
I can connect directly to one of my hypervisor node using Virt-manager with this auth:
Login: vdsm@ovirt
Password: shibboleth
I then get a listing of the machines on the node.
However, if I try to open the VM console, I'm prompted for another password... I have no idea what password to use.
Any ideas?
Thanks.
1 month, 1 week