Hi,

i upgraded one of the hosts where the backup failed because I noticed that in the lasts deployed hosts the process was smooth and without errors. Maybe the update of on of the following packages (or others) fixed the issue:

FROM

[root@host1.server.tld ~]# rpm -qa | grep vdsm
vdsm-common-4.40.26.3-1.el8.noarch
vdsm-http-4.40.26.3-1.el8.noarch
vdsm-hook-vhostmd-4.40.26.3-1.el8.noarch
vdsm-yajsonrpc-4.40.26.3-1.el8.noarch
vdsm-network-4.40.26.3-1.el8.x86_64
vdsm-api-4.40.26.3-1.el8.noarch
vdsm-4.40.26.3-1.el8.x86_64
vdsm-hook-ethtool-options-4.40.26.3-1.el8.noarch
vdsm-client-4.40.26.3-1.el8.noarch
vdsm-hook-vmfex-dev-4.40.26.3-1.el8.noarch
vdsm-hook-openstacknet-4.40.26.3-1.el8.noarch
vdsm-jsonrpc-4.40.26.3-1.el8.noarch
vdsm-hook-fcoe-4.40.26.3-1.el8.noarch
vdsm-python-4.40.26.3-1.el8.noarch

TO

rpm[root@host2.server.tld ~]# rpm -qa | grep vdsm
vdsm-4.40.50.9-1.el8.x86_64
vdsm-jsonrpc-4.40.50.9-1.el8.noarch
vdsm-api-4.40.50.9-1.el8.noarch
vdsm-python-4.40.50.9-1.el8.noarch
vdsm-yajsonrpc-4.40.50.9-1.el8.noarch
vdsm-hook-openstacknet-4.40.50.9-1.el8.noarch
vdsm-client-4.40.50.9-1.el8.noarch
vdsm-hook-vhostmd-4.40.50.9-1.el8.noarch
vdsm-hook-fcoe-4.40.50.9-1.el8.noarch
vdsm-common-4.40.50.9-1.el8.noarch
vdsm-http-4.40.50.9-1.el8.noarch
vdsm-hook-ethtool-options-4.40.50.9-1.el8.noarch
vdsm-network-4.40.50.9-1.el8.x86_64
vdsm-hook-vmfex-dev-4.40.50.9-1.el8.noarch


I shutdown the vms and upgraded the host via engine (without reboot). I hope the packages upgraded helped, not a simple vms reboot.

Is there any way to upgrade the hosts/ vdsm packages without shutdown the vms? Migrating it's not an option sadly.

Regards,
Francesco

Il 22/04/2021 09:37, francesco--- via Devel ha scritto:
Hi all,

I use a custom python script for performing full and incremental backup on few hundreds vms. I customized the example script in https://github.com/oVirt/ovirt-engine-sdk/blob/master/sdk/examples/backup_vm.py.

Everything was working fine, but starting from a few days ago, some vms starts to fail the backup. I upgraded the engine (self-hosted) from 4.4.4 to 4.4.5 yesterday afternoon and today most of the vms failed the incremental and full backup.

The error is triggered when trying to get the backp_service in the following function:

```
def backup_disk(disk, backups_service, from_checkpoint_id):
    # We need to set an array because backups_service.add need it
    disks = []
    disks.append(disk)
    add_error = False

    try:
        backup = backups_service.add(
                types.Backup(
                    disks=disks,
                    from_checkpoint_id=from_checkpoint_id
                )
            )
    except Exception as e:
        exception = str(e)
        logging.error("error on backups_service.add, set backup object to False. Error: {} ".format(exception))
        add_error = True
        backup = False

    if not add_error:
        backup_service = backups_service.backup_service(backup.id)
        logging.info("Backup phase: {}".format(backup.phase))
        while backup.phase != types.BackupPhase.READY:
            time.sleep(1)
            try:
                backup = backup_service.get()
            except sdk.NotFoundError:
                logging.error("backup_service NotFoundError, set backup object to False")
                backup = False
                break

        if backup and backup.to_checkpoint_id is not None:
            logging.info("Created checkpoint {}".format(backup.to_checkpoint_id))

    return backup
```

Here's the error in the engine logs. 

```
2021-04-21 11:31:21,750+02 INFO  [org.ovirt.engine.core.bll.utils.ThreadPoolMonitoringService] (EE-ManagedScheduledExecutorService-engineThreadMonitoringThreadPool-Thread-1) [] Thread pool 'default' is using 0 threads out of 1, 5 threads waiting for tasks.
2021-04-21 11:31:21,750+02 INFO  [org.ovirt.engine.core.bll.utils.ThreadPoolMonitoringService] (EE-ManagedScheduledExecutorService-engineThreadMonitoringThreadPool-Thread-1) [] Thread pool 'engine' is using 0 threads out of 500, 50 threads waiting for tasks and 0 tasks in queue.
2021-04-21 11:31:21,750+02 INFO  [org.ovirt.engine.core.bll.utils.ThreadPoolMonitoringService] (EE-ManagedScheduledExecutorService-engineThreadMonitoringThreadPool-Thread-1) [] Thread pool 'engineScheduledThreadPool' is using 2 threads out of 1, 98 threads waiting for tasks.
2021-04-21 11:31:21,750+02 INFO  [org.ovirt.engine.core.bll.utils.ThreadPoolMonitoringService] (EE-ManagedScheduledExecutorService-engineThreadMonitoringThreadPool-Thread-1) [] Thread pool 'engineThreadMonitoringThreadPool' is using 1 threads out of 1, 0 threads waiting for tasks.
2021-04-21 11:31:21,750+02 INFO  [org.ovirt.engine.core.bll.utils.ThreadPoolMonitoringService] (EE-ManagedScheduledExecutorService-engineThreadMonitoringThreadPool-Thread-1) [] Thread pool 'hostUpdatesChecker' is using 0 threads out of 5, 5 threads waiting for tasks.
2021-04-21 11:31:59,013+02 INFO  [org.ovirt.engine.core.sso.service.AuthenticationService] (default task-19) [] User admin@internal-authz with profile [internal] successfully logged in with scopes: ovirt-app-api ovirt-ext=token-info:authz-search ovirt-ext=token-info:public-authz-search ovirt-ext=token-info:validate ovirt-ext=token:password-access
2021-04-21 11:31:59,124+02 INFO  [org.ovirt.engine.core.bll.aaa.CreateUserSessionCommand] (default task-19) [43d5f322] Running command: CreateUserSessionCommand internal: false.
2021-04-21 11:31:59,157+02 INFO  [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (default task-19) [43d5f322] EVENT_ID: USER_VDC_LOGIN(30), User admin@internal-authz connecting from '5.19.121.125' using session 'VN/aBJxvgcG1WBY9OQSjUnHL7HaSPhbLSX52inLiYFGG/gubMeZiv9Tu+3D4htqVwrb/kqlvMzBL3WAQXb+3Lg==' logged in.
2021-04-21 11:31:59,534+02 WARN  [org.ovirt.engine.api.restapi.util.LinkHelper] (default task-19) [] Can't find relative path for class "org.ovirt.engine.api.resource.StorageDomainVmDiskAttachmentsResource", will return null
2021-04-21 11:31:59,535+02 WARN  [org.ovirt.engine.api.restapi.util.LinkHelper] (default task-19) [] Can't find relative path for class "org.ovirt.engine.api.resource.StorageDomainVmDiskAttachmentsResource", will return null
2021-04-21 11:32:00,670+02 INFO  [org.ovirt.engine.core.bll.storage.backup.StartVmBackupCommand] (default task-19) [21bb0fd3-06fe-49c8-95ec-94856edabc41] Lock Acquired to object 'EngineLock:{exclusiveLocks='[0ff73c3a-fb49-4148-a3b4-6ce790523a49=DISK]', sharedLocks='[94ec9758-cfc1-41ba-ba71-e339d8df8cee=VM]'}'
2021-04-21 11:32:00,863+02 INFO  [org.ovirt.engine.core.bll.storage.backup.StartVmBackupCommand] (default task-19) [21bb0fd3-06fe-49c8-95ec-94856edabc41] Running command: StartVmBackupCommand internal: false. Entities affected :  ID: 94ec9758-cfc1-41ba-ba71-e339d8df8cee Type: VMAction group BACKUP_DISK with role type ADMIN,  ID: 0ff73c3a-fb49-4148-a3b4-6ce790523a49 Type: DiskAction group BACKUP_DISK with role type ADMIN
2021-04-21 11:32:00,863+02 INFO  [org.ovirt.engine.core.bll.storage.backup.StartVmBackupCommand] (default task-19) [21bb0fd3-06fe-49c8-95ec-94856edabc41] Creating VmBackup entity for VM '94ec9758-cfc1-41ba-ba71-e339d8df8cee'
2021-04-21 11:32:00,956+02 INFO  [org.ovirt.engine.core.bll.storage.backup.StartVmBackupCommand] (default task-19) [21bb0fd3-06fe-49c8-95ec-94856edabc41] Created VmBackup entity 'ee02f949-ccb9-452f-a1bb-8e05d0684b0f'
2021-04-21 11:32:00,956+02 INFO  [org.ovirt.engine.core.bll.storage.backup.StartVmBackupCommand] (default task-19) [21bb0fd3-06fe-49c8-95ec-94856edabc41] Redefine previous VM checkpoints for VM '94ec9758-cfc1-41ba-ba71-e339d8df8cee'
2021-04-21 11:32:01,316+02 INFO  [org.ovirt.engine.core.bll.storage.backup.RedefineVmCheckpointCommand] (default task-19) [21bb0fd3-06fe-49c8-95ec-94856edabc41] Running command: RedefineVmCheckpointCommand internal: true. Entities affected :  ID: 94ec9758-cfc1-41ba-ba71-e339d8df8cee Type: VMAction group BACKUP_DISK with role type ADMIN,  ID: 0ff73c3a-fb49-4148-a3b4-6ce790523a49 Type: DiskAction group BACKUP_DISK with role type ADMIN
2021-04-21 11:32:01,318+02 INFO  [org.ovirt.engine.core.bll.storage.backup.RedefineVmCheckpointCommand] (default task-19) [21bb0fd3-06fe-49c8-95ec-94856edabc41] No previous VM checkpoints found for VM '94ec9758-cfc1-41ba-ba71-e339d8df8cee', skipping redefining VM checkpoints
2021-04-21 11:32:01,567+02 INFO  [org.ovirt.engine.core.bll.storage.backup.RedefineVmCheckpointCommand] (default task-19) [21bb0fd3-06fe-49c8-95ec-94856edabc41] Lock freed to object 'EngineLock:{exclusiveLocks='[0ff73c3a-fb49-4148-a3b4-6ce790523a49=DISK]', sharedLocks='[94ec9758-cfc1-41ba-ba71-e339d8df8cee=VM]'}'
2021-04-21 11:32:01,567+02 INFO  [org.ovirt.engine.core.bll.storage.backup.StartVmBackupCommand] (default task-19) [21bb0fd3-06fe-49c8-95ec-94856edabc41] Successfully redefined previous VM checkpoints for VM '94ec9758-cfc1-41ba-ba71-e339d8df8cee'
2021-04-21 11:32:01,567+02 INFO  [org.ovirt.engine.core.bll.storage.backup.StartVmBackupCommand] (default task-19) [21bb0fd3-06fe-49c8-95ec-94856edabc41] Creating VmCheckpoint entity for VM '94ec9758-cfc1-41ba-ba71-e339d8df8cee'
2021-04-21 11:32:01,602+02 INFO  [org.ovirt.engine.core.bll.storage.backup.StartVmBackupCommand] (default task-19) [21bb0fd3-06fe-49c8-95ec-94856edabc41] Created VmCheckpoint entity 'b67a0436-42fb-4715-a0a5-6bfd9cb09214'
2021-04-21 11:32:01,736+02 INFO  [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (default task-19) [21bb0fd3-06fe-49c8-95ec-94856edabc41] EVENT_ID: VM_BACKUP_STARTED(10,790), Backup ee02f949-ccb9-452f-a1bb-8e05d0684b0f for VM vm1.cloudserver.tld started (User: admin@internal-authz).
2021-04-21 11:32:03,789+02 INFO  [org.ovirt.engine.core.bll.storage.backup.CreateScratchDisksCommand] (EE-ManagedScheduledExecutorService-engineScheduledThreadPool-Thread-94) [21bb0fd3-06fe-49c8-95ec-94856edabc41] Running command: CreateScratchDisksCommand internal: true. Entities affected :  ID: 94ec9758-cfc1-41ba-ba71-e339d8df8cee Type: VMAction group BACKUP_DISK with role type ADMIN
2021-04-21 11:32:03,910+02 INFO  [org.ovirt.engine.core.bll.storage.backup.CreateScratchDiskCommand] (EE-ManagedScheduledExecutorService-engineScheduledThreadPool-Thread-94) [21bb0fd3-06fe-49c8-95ec-94856edabc41] Running command: CreateScratchDiskCommand internal: true. Entities affected :  ID: 4f6d464e-531a-431a-bfcc-ca9dfd1d0c9a Type: StorageAction group CREATE_DISK with role type USER
2021-04-21 11:32:03,910+02 INFO  [org.ovirt.engine.core.bll.storage.backup.CreateScratchDiskCommand] (EE-ManagedScheduledExecutorService-engineScheduledThreadPool-Thread-94) [21bb0fd3-06fe-49c8-95ec-94856edabc41] Creating scratch disk for disk ID 0ff73c3a-fb49-4148-a3b4-6ce790523a49
2021-04-21 11:32:03,977+02 INFO  [org.ovirt.engine.core.bll.storage.disk.AddDiskCommand] (EE-ManagedScheduledExecutorService-engineScheduledThreadPool-Thread-94) [21bb0fd3-06fe-49c8-95ec-94856edabc41] Running command: AddDiskCommand internal: true. Entities affected :  ID: 4f6d464e-531a-431a-bfcc-ca9dfd1d0c9a Type: StorageAction group CREATE_DISK with role type USER
2021-04-21 11:32:04,038+02 INFO  [org.ovirt.engine.core.bll.storage.disk.image.AddImageFromScratchCommand] (EE-ManagedScheduledExecutorService-engineScheduledThreadPool-Thread-94) [21bb0fd3-06fe-49c8-95ec-94856edabc41] Running command: AddImageFromScratchCommand internal: true. Entities affected :  ID: 4f6d464e-531a-431a-bfcc-ca9dfd1d0c9a Type: Storage
2021-04-21 11:32:04,109+02 WARN  [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (EE-ManagedScheduledExecutorService-engineScheduledThreadPool-Thread-94) [21bb0fd3-06fe-49c8-95ec-94856edabc41] EVENT_ID: FAILED_TO_STORE_ENTIRE_DISK_FIELD_IN_DISK_DESCRIPTION_METADATA(1,026), Failed to store field DiskDescription as a part of VM vm1.cloudserver.tld backup ee02f949-ccb9-452f-a1bb-8e05d0684b0f scratch disk for vm1.cloudserver.tld-0's description metadata due to storage space limitations. The field DiskDescription will be truncated.
2021-04-21 11:32:04,111+02 INFO  [org.ovirt.engine.core.vdsbroker.irsbroker.CreateVolumeVDSCommand] (EE-ManagedScheduledExecutorService-engineScheduledThreadPool-Thread-94) [21bb0fd3-06fe-49c8-95ec-94856edabc41] START, CreateVolumeVDSCommand( CreateVolumeVDSCommandParameters:{storagePoolId='0532fe8c-d841-49ae-9615-45a7b865f3e3', ignoreFailoverLimit='false', storageDomainId='4f6d464e-531a-431a-bfcc-ca9dfd1d0c9a', imageGroupId='a419fb9b-486d-4a92-83d0-ef6c845ae5ba', imageSizeInBytes='268435456000', volumeFormat='COW', newImageId='0cbd890e-5bdc-4e8e-9e46-53bf5093cfd0', imageType='Sparse', newImageDescription='{"DiskAlias":"VM vm1.cloudserver.tld backup ee02f949-ccb9-452f-a1bb-8e05d0684b0f scratch disk for vm1.cloudserver.tld-0","DiskDescription":"Backup ee02f949-ccb9-452f-a1bb-8e05d0684b0f scratch di"}', imageInitialSizeInBytes='0', imageId='00000000-0000-0000-0000-000000000000', sourceImageGroupId='00000000-0000-0000-0000-000000000000', shouldAddBitmaps='false'}), log id: 34e22243
2021-04-21 11:32:04,137+02 ERROR [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (EE-ManagedScheduledExecutorService-engineScheduledThreadPool-Thread-94) [21bb0fd3-06fe-49c8-95ec-94856edabc41] EVENT_ID: IRS_BROKER_COMMAND_FAILURE(10,803), VDSM command CreateVolumeVDS failed: Invalid parameter: 'DiskType=SCRD'
2021-04-21 11:32:04,137+02 ERROR [org.ovirt.engine.core.vdsbroker.irsbroker.CreateVolumeVDSCommand] (EE-ManagedScheduledExecutorService-engineScheduledThreadPool-Thread-94) [21bb0fd3-06fe-49c8-95ec-94856edabc41] Command 'CreateVolumeVDSCommand( CreateVolumeVDSCommandParameters:{storagePoolId='0532fe8c-d841-49ae-9615-45a7b865f3e3', ignoreFailoverLimit='false', storageDomainId='4f6d464e-531a-431a-bfcc-ca9dfd1d0c9a', imageGroupId='a419fb9b-486d-4a92-83d0-ef6c845ae5ba', imageSizeInBytes='268435456000', volumeFormat='COW', newImageId='0cbd890e-5bdc-4e8e-9e46-53bf5093cfd0', imageType='Sparse', newImageDescription='{"DiskAlias":"VM vm1.cloudserver.tld backup ee02f949-ccb9-452f-a1bb-8e05d0684b0f scratch disk for vm1.cloudserver.tld-0","DiskDescription":"Backup ee02f949-ccb9-452f-a1bb-8e05d0684b0f scratch di"}', imageInitialSizeInBytes='0', imageId='00000000-0000-0000-0000-000000000000', sourceImageGroupId='00000000-0000-0000-0000-000000000000', shouldAddBitmaps='false'})' execution failed: I
 RSGenericException: IRSErrorException: Invalid parameter: 'DiskType=SCRD'
2021-04-21 11:32:04,137+02 INFO  [org.ovirt.engine.core.vdsbroker.irsbroker.CreateVolumeVDSCommand] (EE-ManagedScheduledExecutorService-engineScheduledThreadPool-Thread-94) [21bb0fd3-06fe-49c8-95ec-94856edabc41] FINISH, CreateVolumeVDSCommand, return: 00000000-0000-0000-0000-000000000000, log id: 34e22243
2021-04-21 11:32:04,137+02 ERROR [org.ovirt.engine.core.bll.storage.disk.image.AddImageFromScratchCommand] (EE-ManagedScheduledExecutorService-engineScheduledThreadPool-Thread-94) [21bb0fd3-06fe-49c8-95ec-94856edabc41] Command 'org.ovirt.engine.core.bll.storage.disk.image.AddImageFromScratchCommand' failed: EngineException: org.ovirt.engine.core.vdsbroker.irsbroker.IrsOperationFailedNoFailoverException: IRSGenericException: IRSErrorException: Invalid parameter: 'DiskType=SCRD' (Failed with error InvalidParameterException and code 1000)
2021-04-21 11:32:04,170+02 INFO  [org.ovirt.engine.core.bll.tasks.AsyncTaskManager] (EE-ManagedScheduledExecutorService-engineScheduledThreadPool-Thread-94) [21bb0fd3-06fe-49c8-95ec-94856edabc41] Removed task 'cb1f10fc-13e6-4ead-a33f-fad342a69ea2' from DataBase
```

Here's the SPM host vdsm logs:

```
Traceback (most recent call last):
  File "/usr/lib/python3.6/site-packages/vdsm/storage/task.py", line 887, in _run
    return fn(*args, **kargs)
  File "<decorator-gen-61>", line 2, in createVolume
  File "/usr/lib/python3.6/site-packages/vdsm/common/api.py", line 50, in method
    ret = func(*args, **kwargs)
  File "/usr/lib/python3.6/site-packages/vdsm/storage/hsm.py", line 1510, in createVolume
    add_bitmaps=addBitmaps)
  File "/usr/lib/python3.6/site-packages/vdsm/storage/sd.py", line 1002, in validateCreateVolumeParams
    add_bitmaps=add_bitmaps)
  File "/usr/lib/python3.6/site-packages/vdsm/storage/sd.py", line 649, in validateCreateVolumeParams
    raise se.InvalidParameterException("DiskType", diskType)
vdsm.storage.exception.InvalidParameterException: Invalid parameter: 'DiskType=SCRD
```

It's a known bug in 4.4.4/4.4.5? Any suggestion?

If needed any other info lemme know, I'll provide you asap.

Thank you for your time
Francesco
_______________________________________________
Devel mailing list -- devel@ovirt.org
To unsubscribe send an email to devel-leave@ovirt.org
Privacy Statement: https://www.ovirt.org/privacy-policy.html
oVirt Code of Conduct: https://www.ovirt.org/community/about/community-guidelines/
List Archives: https://lists.ovirt.org/archives/list/devel@ovirt.org/message/WYWMHZFHBVKAVASDWLFSTHXXGAG26K5D/

--
--    
Shellrent - Il primo hosting italiano Security
                    First
Francesco Lorenzini
System Administrator & DevOps Engineer
Shellrent Srl
Via dell'Edilizia, 19 - 36100 Vicenza
Tel. 0444321155 | Fax 04441492177