Error on incremental backup
by francesco@shellrent.com
Hi all,
I use a custom python script for performing full and incremental backup on few hundreds vms. I customized the example script in https://github.com/oVirt/ovirt-engine-sdk/blob/master/sdk/examples/backup....
Everything was working fine, but starting from a few days ago, some vms starts to fail the backup. I upgraded the engine (self-hosted) from 4.4.4 to 4.4.5 yesterday afternoon and today most of the vms failed the incremental and full backup.
The error is triggered when trying to get the backp_service in the following function:
```
def backup_disk(disk, backups_service, from_checkpoint_id):
# We need to set an array because backups_service.add need it
disks = []
disks.append(disk)
add_error = False
try:
backup = backups_service.add(
types.Backup(
disks=disks,
from_checkpoint_id=from_checkpoint_id
)
)
except Exception as e:
exception = str(e)
logging.error("error on backups_service.add, set backup object to False. Error: {} ".format(exception))
add_error = True
backup = False
if not add_error:
backup_service = backups_service.backup_service(backup.id)
logging.info("Backup phase: {}".format(backup.phase))
while backup.phase != types.BackupPhase.READY:
time.sleep(1)
try:
backup = backup_service.get()
except sdk.NotFoundError:
logging.error("backup_service NotFoundError, set backup object to False")
backup = False
break
if backup and backup.to_checkpoint_id is not None:
logging.info("Created checkpoint {}".format(backup.to_checkpoint_id))
return backup
```
Here's the error in the engine logs.
```
2021-04-21 11:31:21,750+02 INFO [org.ovirt.engine.core.bll.utils.ThreadPoolMonitoringService] (EE-ManagedScheduledExecutorService-engineThreadMonitoringThreadPool-Thread-1) [] Thread pool 'default' is using 0 threads out of 1, 5 threads waiting for tasks.
2021-04-21 11:31:21,750+02 INFO [org.ovirt.engine.core.bll.utils.ThreadPoolMonitoringService] (EE-ManagedScheduledExecutorService-engineThreadMonitoringThreadPool-Thread-1) [] Thread pool 'engine' is using 0 threads out of 500, 50 threads waiting for tasks and 0 tasks in queue.
2021-04-21 11:31:21,750+02 INFO [org.ovirt.engine.core.bll.utils.ThreadPoolMonitoringService] (EE-ManagedScheduledExecutorService-engineThreadMonitoringThreadPool-Thread-1) [] Thread pool 'engineScheduledThreadPool' is using 2 threads out of 1, 98 threads waiting for tasks.
2021-04-21 11:31:21,750+02 INFO [org.ovirt.engine.core.bll.utils.ThreadPoolMonitoringService] (EE-ManagedScheduledExecutorService-engineThreadMonitoringThreadPool-Thread-1) [] Thread pool 'engineThreadMonitoringThreadPool' is using 1 threads out of 1, 0 threads waiting for tasks.
2021-04-21 11:31:21,750+02 INFO [org.ovirt.engine.core.bll.utils.ThreadPoolMonitoringService] (EE-ManagedScheduledExecutorService-engineThreadMonitoringThreadPool-Thread-1) [] Thread pool 'hostUpdatesChecker' is using 0 threads out of 5, 5 threads waiting for tasks.
2021-04-21 11:31:59,013+02 INFO [org.ovirt.engine.core.sso.service.AuthenticationService] (default task-19) [] User admin@internal-authz with profile [internal] successfully logged in with scopes: ovirt-app-api ovirt-ext=token-info:authz-search ovirt-ext=token-info:public-authz-search ovirt-ext=token-info:validate ovirt-ext=token:password-access
2021-04-21 11:31:59,124+02 INFO [org.ovirt.engine.core.bll.aaa.CreateUserSessionCommand] (default task-19) [43d5f322] Running command: CreateUserSessionCommand internal: false.
2021-04-21 11:31:59,157+02 INFO [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (default task-19) [43d5f322] EVENT_ID: USER_VDC_LOGIN(30), User admin@internal-authz connecting from '5.19.121.125' using session 'VN/aBJxvgcG1WBY9OQSjUnHL7HaSPhbLSX52inLiYFGG/gubMeZiv9Tu+3D4htqVwrb/kqlvMzBL3WAQXb+3Lg==' logged in.
2021-04-21 11:31:59,534+02 WARN [org.ovirt.engine.api.restapi.util.LinkHelper] (default task-19) [] Can't find relative path for class "org.ovirt.engine.api.resource.StorageDomainVmDiskAttachmentsResource", will return null
2021-04-21 11:31:59,535+02 WARN [org.ovirt.engine.api.restapi.util.LinkHelper] (default task-19) [] Can't find relative path for class "org.ovirt.engine.api.resource.StorageDomainVmDiskAttachmentsResource", will return null
2021-04-21 11:32:00,670+02 INFO [org.ovirt.engine.core.bll.storage.backup.StartVmBackupCommand] (default task-19) [21bb0fd3-06fe-49c8-95ec-94856edabc41] Lock Acquired to object 'EngineLock:{exclusiveLocks='[0ff73c3a-fb49-4148-a3b4-6ce790523a49=DISK]', sharedLocks='[94ec9758-cfc1-41ba-ba71-e339d8df8cee=VM]'}'
2021-04-21 11:32:00,863+02 INFO [org.ovirt.engine.core.bll.storage.backup.StartVmBackupCommand] (default task-19) [21bb0fd3-06fe-49c8-95ec-94856edabc41] Running command: StartVmBackupCommand internal: false. Entities affected : ID: 94ec9758-cfc1-41ba-ba71-e339d8df8cee Type: VMAction group BACKUP_DISK with role type ADMIN, ID: 0ff73c3a-fb49-4148-a3b4-6ce790523a49 Type: DiskAction group BACKUP_DISK with role type ADMIN
2021-04-21 11:32:00,863+02 INFO [org.ovirt.engine.core.bll.storage.backup.StartVmBackupCommand] (default task-19) [21bb0fd3-06fe-49c8-95ec-94856edabc41] Creating VmBackup entity for VM '94ec9758-cfc1-41ba-ba71-e339d8df8cee'
2021-04-21 11:32:00,956+02 INFO [org.ovirt.engine.core.bll.storage.backup.StartVmBackupCommand] (default task-19) [21bb0fd3-06fe-49c8-95ec-94856edabc41] Created VmBackup entity 'ee02f949-ccb9-452f-a1bb-8e05d0684b0f'
2021-04-21 11:32:00,956+02 INFO [org.ovirt.engine.core.bll.storage.backup.StartVmBackupCommand] (default task-19) [21bb0fd3-06fe-49c8-95ec-94856edabc41] Redefine previous VM checkpoints for VM '94ec9758-cfc1-41ba-ba71-e339d8df8cee'
2021-04-21 11:32:01,316+02 INFO [org.ovirt.engine.core.bll.storage.backup.RedefineVmCheckpointCommand] (default task-19) [21bb0fd3-06fe-49c8-95ec-94856edabc41] Running command: RedefineVmCheckpointCommand internal: true. Entities affected : ID: 94ec9758-cfc1-41ba-ba71-e339d8df8cee Type: VMAction group BACKUP_DISK with role type ADMIN, ID: 0ff73c3a-fb49-4148-a3b4-6ce790523a49 Type: DiskAction group BACKUP_DISK with role type ADMIN
2021-04-21 11:32:01,318+02 INFO [org.ovirt.engine.core.bll.storage.backup.RedefineVmCheckpointCommand] (default task-19) [21bb0fd3-06fe-49c8-95ec-94856edabc41] No previous VM checkpoints found for VM '94ec9758-cfc1-41ba-ba71-e339d8df8cee', skipping redefining VM checkpoints
2021-04-21 11:32:01,567+02 INFO [org.ovirt.engine.core.bll.storage.backup.RedefineVmCheckpointCommand] (default task-19) [21bb0fd3-06fe-49c8-95ec-94856edabc41] Lock freed to object 'EngineLock:{exclusiveLocks='[0ff73c3a-fb49-4148-a3b4-6ce790523a49=DISK]', sharedLocks='[94ec9758-cfc1-41ba-ba71-e339d8df8cee=VM]'}'
2021-04-21 11:32:01,567+02 INFO [org.ovirt.engine.core.bll.storage.backup.StartVmBackupCommand] (default task-19) [21bb0fd3-06fe-49c8-95ec-94856edabc41] Successfully redefined previous VM checkpoints for VM '94ec9758-cfc1-41ba-ba71-e339d8df8cee'
2021-04-21 11:32:01,567+02 INFO [org.ovirt.engine.core.bll.storage.backup.StartVmBackupCommand] (default task-19) [21bb0fd3-06fe-49c8-95ec-94856edabc41] Creating VmCheckpoint entity for VM '94ec9758-cfc1-41ba-ba71-e339d8df8cee'
2021-04-21 11:32:01,602+02 INFO [org.ovirt.engine.core.bll.storage.backup.StartVmBackupCommand] (default task-19) [21bb0fd3-06fe-49c8-95ec-94856edabc41] Created VmCheckpoint entity 'b67a0436-42fb-4715-a0a5-6bfd9cb09214'
2021-04-21 11:32:01,736+02 INFO [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (default task-19) [21bb0fd3-06fe-49c8-95ec-94856edabc41] EVENT_ID: VM_BACKUP_STARTED(10,790), Backup ee02f949-ccb9-452f-a1bb-8e05d0684b0f for VM vm1.cloudserver.tld started (User: admin@internal-authz).
2021-04-21 11:32:03,789+02 INFO [org.ovirt.engine.core.bll.storage.backup.CreateScratchDisksCommand] (EE-ManagedScheduledExecutorService-engineScheduledThreadPool-Thread-94) [21bb0fd3-06fe-49c8-95ec-94856edabc41] Running command: CreateScratchDisksCommand internal: true. Entities affected : ID: 94ec9758-cfc1-41ba-ba71-e339d8df8cee Type: VMAction group BACKUP_DISK with role type ADMIN
2021-04-21 11:32:03,910+02 INFO [org.ovirt.engine.core.bll.storage.backup.CreateScratchDiskCommand] (EE-ManagedScheduledExecutorService-engineScheduledThreadPool-Thread-94) [21bb0fd3-06fe-49c8-95ec-94856edabc41] Running command: CreateScratchDiskCommand internal: true. Entities affected : ID: 4f6d464e-531a-431a-bfcc-ca9dfd1d0c9a Type: StorageAction group CREATE_DISK with role type USER
2021-04-21 11:32:03,910+02 INFO [org.ovirt.engine.core.bll.storage.backup.CreateScratchDiskCommand] (EE-ManagedScheduledExecutorService-engineScheduledThreadPool-Thread-94) [21bb0fd3-06fe-49c8-95ec-94856edabc41] Creating scratch disk for disk ID 0ff73c3a-fb49-4148-a3b4-6ce790523a49
2021-04-21 11:32:03,977+02 INFO [org.ovirt.engine.core.bll.storage.disk.AddDiskCommand] (EE-ManagedScheduledExecutorService-engineScheduledThreadPool-Thread-94) [21bb0fd3-06fe-49c8-95ec-94856edabc41] Running command: AddDiskCommand internal: true. Entities affected : ID: 4f6d464e-531a-431a-bfcc-ca9dfd1d0c9a Type: StorageAction group CREATE_DISK with role type USER
2021-04-21 11:32:04,038+02 INFO [org.ovirt.engine.core.bll.storage.disk.image.AddImageFromScratchCommand] (EE-ManagedScheduledExecutorService-engineScheduledThreadPool-Thread-94) [21bb0fd3-06fe-49c8-95ec-94856edabc41] Running command: AddImageFromScratchCommand internal: true. Entities affected : ID: 4f6d464e-531a-431a-bfcc-ca9dfd1d0c9a Type: Storage
2021-04-21 11:32:04,109+02 WARN [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (EE-ManagedScheduledExecutorService-engineScheduledThreadPool-Thread-94) [21bb0fd3-06fe-49c8-95ec-94856edabc41] EVENT_ID: FAILED_TO_STORE_ENTIRE_DISK_FIELD_IN_DISK_DESCRIPTION_METADATA(1,026), Failed to store field DiskDescription as a part of VM vm1.cloudserver.tld backup ee02f949-ccb9-452f-a1bb-8e05d0684b0f scratch disk for vm1.cloudserver.tld-0's description metadata due to storage space limitations. The field DiskDescription will be truncated.
2021-04-21 11:32:04,111+02 INFO [org.ovirt.engine.core.vdsbroker.irsbroker.CreateVolumeVDSCommand] (EE-ManagedScheduledExecutorService-engineScheduledThreadPool-Thread-94) [21bb0fd3-06fe-49c8-95ec-94856edabc41] START, CreateVolumeVDSCommand( CreateVolumeVDSCommandParameters:{storagePoolId='0532fe8c-d841-49ae-9615-45a7b865f3e3', ignoreFailoverLimit='false', storageDomainId='4f6d464e-531a-431a-bfcc-ca9dfd1d0c9a', imageGroupId='a419fb9b-486d-4a92-83d0-ef6c845ae5ba', imageSizeInBytes='268435456000', volumeFormat='COW', newImageId='0cbd890e-5bdc-4e8e-9e46-53bf5093cfd0', imageType='Sparse', newImageDescription='{"DiskAlias":"VM vm1.cloudserver.tld backup ee02f949-ccb9-452f-a1bb-8e05d0684b0f scratch disk for vm1.cloudserver.tld-0","DiskDescription":"Backup ee02f949-ccb9-452f-a1bb-8e05d0684b0f scratch di"}', imageInitialSizeInBytes='0', imageId='00000000-0000-0000-0000-000000000000', sourceImageGroupId='00000000-0000-0000-0000-000000000000', shouldAddBitmaps='false'}), log id: 34e22243
2021-04-21 11:32:04,137+02 ERROR [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (EE-ManagedScheduledExecutorService-engineScheduledThreadPool-Thread-94) [21bb0fd3-06fe-49c8-95ec-94856edabc41] EVENT_ID: IRS_BROKER_COMMAND_FAILURE(10,803), VDSM command CreateVolumeVDS failed: Invalid parameter: 'DiskType=SCRD'
2021-04-21 11:32:04,137+02 ERROR [org.ovirt.engine.core.vdsbroker.irsbroker.CreateVolumeVDSCommand] (EE-ManagedScheduledExecutorService-engineScheduledThreadPool-Thread-94) [21bb0fd3-06fe-49c8-95ec-94856edabc41] Command 'CreateVolumeVDSCommand( CreateVolumeVDSCommandParameters:{storagePoolId='0532fe8c-d841-49ae-9615-45a7b865f3e3', ignoreFailoverLimit='false', storageDomainId='4f6d464e-531a-431a-bfcc-ca9dfd1d0c9a', imageGroupId='a419fb9b-486d-4a92-83d0-ef6c845ae5ba', imageSizeInBytes='268435456000', volumeFormat='COW', newImageId='0cbd890e-5bdc-4e8e-9e46-53bf5093cfd0', imageType='Sparse', newImageDescription='{"DiskAlias":"VM vm1.cloudserver.tld backup ee02f949-ccb9-452f-a1bb-8e05d0684b0f scratch disk for vm1.cloudserver.tld-0","DiskDescription":"Backup ee02f949-ccb9-452f-a1bb-8e05d0684b0f scratch di"}', imageInitialSizeInBytes='0', imageId='00000000-0000-0000-0000-000000000000', sourceImageGroupId='00000000-0000-0000-0000-000000000000', shouldAddBitmaps='false'})' execution failed: I
RSGenericException: IRSErrorException: Invalid parameter: 'DiskType=SCRD'
2021-04-21 11:32:04,137+02 INFO [org.ovirt.engine.core.vdsbroker.irsbroker.CreateVolumeVDSCommand] (EE-ManagedScheduledExecutorService-engineScheduledThreadPool-Thread-94) [21bb0fd3-06fe-49c8-95ec-94856edabc41] FINISH, CreateVolumeVDSCommand, return: 00000000-0000-0000-0000-000000000000, log id: 34e22243
2021-04-21 11:32:04,137+02 ERROR [org.ovirt.engine.core.bll.storage.disk.image.AddImageFromScratchCommand] (EE-ManagedScheduledExecutorService-engineScheduledThreadPool-Thread-94) [21bb0fd3-06fe-49c8-95ec-94856edabc41] Command 'org.ovirt.engine.core.bll.storage.disk.image.AddImageFromScratchCommand' failed: EngineException: org.ovirt.engine.core.vdsbroker.irsbroker.IrsOperationFailedNoFailoverException: IRSGenericException: IRSErrorException: Invalid parameter: 'DiskType=SCRD' (Failed with error InvalidParameterException and code 1000)
2021-04-21 11:32:04,170+02 INFO [org.ovirt.engine.core.bll.tasks.AsyncTaskManager] (EE-ManagedScheduledExecutorService-engineScheduledThreadPool-Thread-94) [21bb0fd3-06fe-49c8-95ec-94856edabc41] Removed task 'cb1f10fc-13e6-4ead-a33f-fad342a69ea2' from DataBase
```
Here's the SPM host vdsm logs:
```
Traceback (most recent call last):
File "/usr/lib/python3.6/site-packages/vdsm/storage/task.py", line 887, in _run
return fn(*args, **kargs)
File "<decorator-gen-61>", line 2, in createVolume
File "/usr/lib/python3.6/site-packages/vdsm/common/api.py", line 50, in method
ret = func(*args, **kwargs)
File "/usr/lib/python3.6/site-packages/vdsm/storage/hsm.py", line 1510, in createVolume
add_bitmaps=addBitmaps)
File "/usr/lib/python3.6/site-packages/vdsm/storage/sd.py", line 1002, in validateCreateVolumeParams
add_bitmaps=add_bitmaps)
File "/usr/lib/python3.6/site-packages/vdsm/storage/sd.py", line 649, in validateCreateVolumeParams
raise se.InvalidParameterException("DiskType", diskType)
vdsm.storage.exception.InvalidParameterException: Invalid parameter: 'DiskType=SCRD
```
It's a known bug in 4.4.4/4.4.5? Any suggestion?
If needed any other info lemme know, I'll provide you asap.
Thank you for your time
Francesco