[ovirt-users] USER_CREATE_SNAPSHOT_FINISHED_FAILURE with Cinder storage stuck
Maor Lipchuk
mlipchuk at redhat.com
Tue Aug 22 07:33:29 UTC 2017
On Mon, Aug 21, 2017 at 6:12 PM, Matthias Leopold
<matthias.leopold at meduniwien.ac.at> wrote:
> Hi,
>
> we're experimenting with Cinder/Ceph Storage on oVirt 4.1.3. When we tried
> to snapshot a VM (2 disks on Cinder storage domain) the task never finished
> and now seems to be in an uninterruptible loop. We tried to stop it in
> various (brute force) ways, but the below messages (one of the disks as an
> example) are cluttering engine.log every 10 seconds. We tried the following:
>
> - deleting the VM
> - restarting ovirt-engine service
> - vdsClient -s 0 getAllTasksStatuses on SPM host (no result)
> - restarting vdsmd service on SPM host
> - /usr/share/ovirt-engine/setup/dbutils/taskcleaner.sh -u engine -d engine
> -c c841c979-70ea-4e06-b9c4-9c5ce014d76d
>
> None of this helped. How do we get rid of this failed transaction?
>
> thx
> matthias
>
> 2017-08-21 16:40:44,798+02 INFO
> [org.ovirt.engine.core.utils.transaction.TransactionSupport]
> (DefaultQuartzScheduler7) [080af640-bac3-4990-8bf4-6829551b538d] transaction
> rolled back
> 2017-08-21 16:40:44,799+02 ERROR
> [org.ovirt.engine.core.bll.job.ExecutionHandler] (DefaultQuartzScheduler7)
> [080af640-bac3-4990-8bf4-6829551b538d] Exception:
> org.springframework.dao.DataIntegrityViolationException:
> CallableStatementCallback; SQL [{call insertstep(?, ?, ?, ?, ?, ?, ?, ?, ?,
> ?, ?, ?, ?, ?)}]; ERROR: insert or update on table "step" violates foreign
> key constraint "fk_step_job"
> 2017-08-21 16:40:44,805+02 ERROR
> [org.ovirt.engine.core.bll.snapshots.CreateAllSnapshotsFromVmCommand]
> (DefaultQuartzScheduler7) [080af640-bac3-4990-8bf4-6829551b538d] Ending
> command
> 'org.ovirt.engine.core.bll.snapshots.CreateAllSnapshotsFromVmCommand' with
> failure.
> 2017-08-21 16:40:44,807+02 WARN
> [org.ovirt.engine.core.bll.snapshots.CreateAllSnapshotsFromVmCommand]
> (DefaultQuartzScheduler7) [080af640-bac3-4990-8bf4-6829551b538d] No snapshot
> was created for VM 'c0235316-81c4-48be-9521-b86b338c7d20' which is in LOCKED
> status
> 2017-08-21 16:40:44,810+02 INFO
> [org.ovirt.engine.core.utils.transaction.TransactionSupport]
> (DefaultQuartzScheduler7) [080af640-bac3-4990-8bf4-6829551b538d] transaction
> rolled back
> 2017-08-21 16:40:44,810+02 WARN
> [org.ovirt.engine.core.bll.lock.InMemoryLockManager]
> (DefaultQuartzScheduler7) [080af640-bac3-4990-8bf4-6829551b538d] Trying to
> release exclusive lock which does not exist, lock key:
> 'c0235316-81c4-48be-9521-b86b338c7d20VM'
> 2017-08-21 16:40:44,810+02 INFO
> [org.ovirt.engine.core.bll.snapshots.CreateAllSnapshotsFromVmCommand]
> (DefaultQuartzScheduler7) [080af640-bac3-4990-8bf4-6829551b538d] Lock freed
> to object
> 'EngineLock:{exclusiveLocks='[c0235316-81c4-48be-9521-b86b338c7d20=VM]',
> sharedLocks=''}'
> 2017-08-21 16:40:44,829+02 ERROR
> [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector]
> (DefaultQuartzScheduler7) [080af640-bac3-4990-8bf4-6829551b538d] EVENT_ID:
> USER_CREATE_SNAPSHOT_FINISHED_FAILURE(69), Correlation ID:
> 080af640-bac3-4990-8bf4-6829551b538d, Job ID:
> a3be8af1-8d33-4d35-9672-215ac7c9959f, Call Stack: null, Custom Event ID: -1,
> Message: Failed to complete snapshot 'test' creation for VM '<UNKNOWN>'.
> 2017-08-21 16:40:44,829+02 ERROR
> [org.ovirt.engine.core.bll.tasks.CommandCallbacksPoller]
> (DefaultQuartzScheduler7) [080af640-bac3-4990-8bf4-6829551b538d] Failed
> invoking callback end method 'onFailed' for command
> 'c841c979-70ea-4e06-b9c4-9c5ce014d76d' with exception 'null', the callback
> is marked for end method retries
>
>
>
>
>
>
>
> _______________________________________________
> Users mailing list
> Users at ovirt.org
> http://lists.ovirt.org/mailman/listinfo/users
Hi Matthias,
Can you please attach the full engine log contains the first error
occurred so we can trace its origin and fix it?
Does it reproduced constantly?
The engine does not use VDSM tasks to manage Cinder, the engine use
Cinder as an external provider using the COCO infrastructure for async
tasks.
The COCO tasks are managed in the database using the command_entities
table, basically if you will remove all references of the command id
from the command_entities and restart engine you should not see it any
more.
Regards,
Maor
More information about the Users
mailing list