Re: [ovirt-users] move disk and lv not removed

Hi Gianluca, Can you try to reproduce it with FC? Maybe it's somehow related to that. Did you try to lvremove the old lv's, or are they still in use? Trying to do this from the SPM should be ok. What do you mean synchronize the information at the other node side? In any case, this is a disk that you don't need anymore, isn't it? You said that the copy part of the move operation went well. Regards, Idan On Thu, Oct 19, 2017 at 10:47 AM, Gianluca Cecchi <gianluca.cecchi@gmail.com
wrote:
On Thu, Oct 19, 2017 at 7:36 AM, Idan Shaby <ishaby@redhat.com> wrote:
Hi Gianluca,
This sounds like a bug to me. Can you please attach engine and vdsm logs?
Thanks, Idan
Hi Idan, I have not at hand the full files right now, but these were the snippets I cut from vdsm log and engine.log yesterday around the time of the first error after sending the first e-mail It seems vdsm reports that the logical volume is still in use while trying to remove it.... I have tried to replicate on another environment also based on block storage, but now iSCSI and not FC and I was not able to get the error: the move disk operation of the powered off vm always completed successfully. Both tests were done trying to move preallocated disks if that matters
vdsm.log of host1
2017-10-18 14:59:28,907+0200 ERROR (libvirt/events) [storage.TaskManager.Task] (Task='6331ec00-b75b-465e-8d9c-04d465c09ea1') Unexpected error (task:872) Traceback (most recent call last): File "/usr/share/vdsm/storage/task.py", line 879, in _run return fn(*args, **kargs) File "<string>", line 2, in teardownImage File "/usr/lib/python2.7/site-packages/vdsm/common/api.py", line 48, in method ret = func(*args, **kwargs) File "/usr/share/vdsm/storage/hsm.py", line 3189, in teardownImage dom.deactivateImage(imgUUID) File "/usr/share/vdsm/storage/blockSD.py", line 1289, in deactivateImage lvm.deactivateLVs(self.sdUUID, volUUIDs) File "/usr/share/vdsm/storage/lvm.py", line 1304, in deactivateLVs _setLVAvailability(vgName, toDeactivate, "n") File "/usr/share/vdsm/storage/lvm.py", line 843, in _setLVAvailability raise error(str(e)) CannotDeactivateLogicalVolume: Cannot deactivate Logical Volume: ('General Storage Exception: ("5 [] [\' Logical volume c5d360f5-cd90-4b3e-bdef- 924649bfbc5f/eff58f10-12bf-412f-a5db-944bfb10904d is used by another device.\']\\nc5d360f5-cd90-4b3e-bdef-924649bfbc5f/[\' eff58f10-12bf-412f-a5db-944bfb10904d\']",)',) 2017-10-18 14:59:28,907+0200 INFO (libvirt/events) [storage.TaskManager.Task] (Task='6331ec00-b75b-465e-8d9c-04d465c09ea1') aborting: Task is aborted: 'Cannot deactivate Logical Volume: (\'General Storage Exception: ("5 [] [\\\' Logical volume c5d360f5-cd90-4b3e-bdef- 924649bfbc5f/eff58f10-12bf-412f-a5db-944bfb10904d is used by another device.\\\']\\\\nc5d360f5-cd90-4b3e-bdef-924649bfbc5f/[\ \\'eff58f10-12bf-412f-a5db-944bfb10904d\\\']",)\',)' - code 552 (task:1177) 2017-10-18 14:59:28,908+0200 ERROR (libvirt/events) [storage.Dispatcher] FINISH teardownImage error=Cannot deactivate Logical Volume: ('General Storage Exception: ("5 [] [\' Logical volume c5d360f5-cd90-4b3e-bdef- 924649bfbc5f/eff58f10-12bf-412f-a5db-944bfb10904d is used by another device.\']\\nc5d360f5-cd90-4b3e-bdef-924649bfbc5f/[\' eff58f10-12bf-412f-a5db-944bfb10904d\']",)',) (dispatcher:81)
engine.log 2017-10-18 15:01:53,097+02 INFO [org.ovirt.engine.core.dal. dbbroker.auditloghandling.AuditLogDirector] (D efaultQuartzScheduler6) [] EVENT_ID: USER_MOVED_DISK_FINISHED_SUCCESS(2,010), Correlation ID: a8948cbf-80a 2-462d-8805-91da94868c6e, Job ID: ffba7504-a324-43ca-ab4c-481610c1098b, Call Stack: null, Custom ID: null, Custom Event ID: -1, Message: User gcecchi@internal-authz finished moving disk padora1_u01 to domain SATA _ALLBCK. 2017-10-18 15:01:54,029+02 ERROR [org.ovirt.engine.core.dal. dbbroker.auditloghandling.AuditLogDirector] (DefaultQuartzScheduler5) [394f8568] EVENT_ID: VDS_BROKER_COMMAND_FAILURE(10,802), Correlation ID: null, Call Stack: null, Custom ID: null, Custom Event ID: -1, Message: VDSM host1 command HSMGetAllTasksStatusesVDS failed: Cannot remove Logical Volume: (['Cannot remove Logical Volume: (u\'c5d360f5-cd90-4b3e-bdef-924649bfbc5f\', "[\'2255c544-165b-4126-86ee-70e6d1edef9a\']")'],) 2017-10-18 15:01:54,029+02 INFO [org.ovirt.engine.core.bll.tasks.SPMAsyncTask] (DefaultQuartzScheduler5) [394f8568] SPMAsyncTask::PollTask: Polling task '8446a883-9bb0-40a5-b1eb-8a919defe4c0' (Parent Command 'RemoveImage', Parameters Type 'org.ovirt.engine.core.common.asynctasks.AsyncTaskParameters') returned status 'finished', result 'cleanSuccess'. 2017-10-18 15:01:54,029+02 ERROR [org.ovirt.engine.core.bll.tasks.SPMAsyncTask] (DefaultQuartzScheduler5) [394f8568] BaseAsyncTask::logEndTaskFailure: Task '8446a883-9bb0-40a5-b1eb-8a919defe4c0' (Parent Command 'RemoveImage', Parameters Type 'org.ovirt.engine.core.common.asynctasks.AsyncTaskParameters') ended with failure: -- Result: 'cleanSuccess' -- Message: 'VDSGenericException: VDSErrorException: Failed in vdscommand to HSMGetAllTasksStatusesVDS, error = Cannot remove Logical Volume: (['Cannot remove Logical Volume: (u\'c5d360f5-cd90-4b3e-bdef-924649bfbc5f\', "[\'2255c544-165b-4126-86ee-70e6d1edef9a\']")'],)', -- Exception: 'VDSGenericException: VDSErrorException: Failed in vdscommand to HSMGetAllTasksStatusesVDS, error = Cannot remove Logical Volume: (['Cannot remove Logical Volume: (u\'c5d360f5-cd90-4b3e-bdef-924649bfbc5f\', "[\'2255c544-165b-4126-86ee-70e6d1edef9a\']")'],)'
Is there any command I can run to consistently remove the stale LVs? As oVirt doesn't use clvmd I don't think it is sufficient to issue the remove lv from the SPM node.. How will "synchronize" the information at the other node side?
Thanks, Gianluca

On Tue, Oct 24, 2017 at 8:29 AM, Idan Shaby <ishaby@redhat.com> wrote:
Hi Gianluca,
Can you try to reproduce it with FC? Maybe it's somehow related to that.
Just to make it clear - I know that bug originally occurred on FC, and it will be helpful to have the full logs if we want to understand the root cause for this.
Did you try to lvremove the old lv's, or are they still in use? Trying to do this from the SPM should be ok. What do you mean synchronize the information at the other node side? In any case, this is a disk that you don't need anymore, isn't it? You said that the copy part of the move operation went well.
Thanks, Idan
On Thu, Oct 19, 2017 at 10:47 AM, Gianluca Cecchi < gianluca.cecchi@gmail.com> wrote:
On Thu, Oct 19, 2017 at 7:36 AM, Idan Shaby <ishaby@redhat.com> wrote:
Hi Gianluca,
This sounds like a bug to me. Can you please attach engine and vdsm logs?
Thanks, Idan
Hi Idan, I have not at hand the full files right now, but these were the snippets I cut from vdsm log and engine.log yesterday around the time of the first error after sending the first e-mail It seems vdsm reports that the logical volume is still in use while trying to remove it.... I have tried to replicate on another environment also based on block storage, but now iSCSI and not FC and I was not able to get the error: the move disk operation of the powered off vm always completed successfully. Both tests were done trying to move preallocated disks if that matters
vdsm.log of host1
2017-10-18 14:59:28,907+0200 ERROR (libvirt/events) [storage.TaskManager.Task] (Task='6331ec00-b75b-465e-8d9c-04d465c09ea1') Unexpected error (task:872) Traceback (most recent call last): File "/usr/share/vdsm/storage/task.py", line 879, in _run return fn(*args, **kargs) File "<string>", line 2, in teardownImage File "/usr/lib/python2.7/site-packages/vdsm/common/api.py", line 48, in method ret = func(*args, **kwargs) File "/usr/share/vdsm/storage/hsm.py", line 3189, in teardownImage dom.deactivateImage(imgUUID) File "/usr/share/vdsm/storage/blockSD.py", line 1289, in deactivateImage lvm.deactivateLVs(self.sdUUID, volUUIDs) File "/usr/share/vdsm/storage/lvm.py", line 1304, in deactivateLVs _setLVAvailability(vgName, toDeactivate, "n") File "/usr/share/vdsm/storage/lvm.py", line 843, in _setLVAvailability raise error(str(e)) CannotDeactivateLogicalVolume: Cannot deactivate Logical Volume: ('General Storage Exception: ("5 [] [\' Logical volume c5d360f5-cd90-4b3e-bdef-924649bfbc5f/eff58f10-12bf-412f-a5db-944bfb10904d is used by another device.\']\\nc5d360f5-cd90-4b3 e-bdef-924649bfbc5f/[\'eff58f10-12bf-412f-a5db-944bfb10904d\']",)',) 2017-10-18 14:59:28,907+0200 INFO (libvirt/events) [storage.TaskManager.Task] (Task='6331ec00-b75b-465e-8d9c-04d465c09ea1') aborting: Task is aborted: 'Cannot deactivate Logical Volume: (\'General Storage Exception: ("5 [] [\\\' Logical volume c5d360f5-cd90-4b3e-bdef-924649bfbc5f/eff58f10-12bf-412f-a5db-944bfb10904d is used by another device.\\\']\\\\nc5d360f5-cd90 -4b3e-bdef-924649bfbc5f/[\\\'eff58f10-12bf-412f-a5db-944bfb10904d\\\']",)\',)' - code 552 (task:1177) 2017-10-18 14:59:28,908+0200 ERROR (libvirt/events) [storage.Dispatcher] FINISH teardownImage error=Cannot deactivate Logical Volume: ('General Storage Exception: ("5 [] [\' Logical volume c5d360f5-cd90-4b3e-bdef-924649 bfbc5f/eff58f10-12bf-412f-a5db-944bfb10904d is used by another device.\']\\nc5d360f5-cd90-4b3e-bdef-924649bfbc5f/[\'eff58f1 0-12bf-412f-a5db-944bfb10904d\']",)',) (dispatcher:81)
engine.log 2017-10-18 15:01:53,097+02 INFO [org.ovirt.engine.core.dal.dbb roker.auditloghandling.AuditLogDirector] (D efaultQuartzScheduler6) [] EVENT_ID: USER_MOVED_DISK_FINISHED_SUCCESS(2,010), Correlation ID: a8948cbf-80a 2-462d-8805-91da94868c6e, Job ID: ffba7504-a324-43ca-ab4c-481610c1098b, Call Stack: null, Custom ID: null, Custom Event ID: -1, Message: User gcecchi@internal-authz finished moving disk padora1_u01 to domain SATA _ALLBCK. 2017-10-18 15:01:54,029+02 ERROR [org.ovirt.engine.core.dal.dbb roker.auditloghandling.AuditLogDirector] (DefaultQuartzScheduler5) [394f8568] EVENT_ID: VDS_BROKER_COMMAND_FAILURE(10,802), Correlation ID: null, Call Stack: null, Custom ID: null, Custom Event ID: -1, Message: VDSM host1 command HSMGetAllTasksStatusesVDS failed: Cannot remove Logical Volume: (['Cannot remove Logical Volume: (u\'c5d360f5-cd90-4b3e-bdef-924649bfbc5f\', "[\'2255c544-165b-4126-86ee-70e6d1edef9a\']")'],) 2017-10-18 15:01:54,029+02 INFO [org.ovirt.engine.core.bll.tasks.SPMAsyncTask] (DefaultQuartzScheduler5) [394f8568] SPMAsyncTask::PollTask: Polling task '8446a883-9bb0-40a5-b1eb-8a919defe4c0' (Parent Command 'RemoveImage', Parameters Type 'org.ovirt.engine.core.common. asynctasks.AsyncTaskParameters') returned status 'finished', result 'cleanSuccess'. 2017-10-18 15:01:54,029+02 ERROR [org.ovirt.engine.core.bll.tasks.SPMAsyncTask] (DefaultQuartzScheduler5) [394f8568] BaseAsyncTask::logEndTaskFailure: Task '8446a883-9bb0-40a5-b1eb-8a919defe4c0' (Parent Command 'RemoveImage', Parameters Type 'org.ovirt.engine.core.common. asynctasks.AsyncTaskParameters') ended with failure: -- Result: 'cleanSuccess' -- Message: 'VDSGenericException: VDSErrorException: Failed in vdscommand to HSMGetAllTasksStatusesVDS, error = Cannot remove Logical Volume: (['Cannot remove Logical Volume: (u\'c5d360f5-cd90-4b3e-bdef-924649bfbc5f\', "[\'2255c544-165b-4126-86ee-70e6d1edef9a\']")'],)', -- Exception: 'VDSGenericException: VDSErrorException: Failed in vdscommand to HSMGetAllTasksStatusesVDS, error = Cannot remove Logical Volume: (['Cannot remove Logical Volume: (u\'c5d360f5-cd90-4b3e-bdef-924649bfbc5f\', "[\'2255c544-165b-4126-86ee-70e6d1edef9a\']")'],)'
Is there any command I can run to consistently remove the stale LVs? As oVirt doesn't use clvmd I don't think it is sufficient to issue the remove lv from the SPM node.. How will "synchronize" the information at the other node side?
Thanks, Gianluca

On Tue, Oct 24, 2017 at 7:29 AM, Idan Shaby <ishaby@redhat.com> wrote:
Hi Gianluca,
Can you try to reproduce it with FC? Maybe it's somehow related to that. Did you try to lvremove the old lv's, or are they still in use?
Do you mean directly from OS using lvremove command?
Trying to do this from the SPM should be ok.
OK. I can try it on SPM node if you confirm
What do you mean synchronize the information at the other node side?
I mean that in the past I worked with RHCS clusters, in both kinds of LVM configurations: 1) CLVMD and 2) HA-LVM In 1) I have the LVM layer that is cluster aware so if I run commands that modify metadata information for LVM (such as adding, removing, extending LV) I'm confident. In 2), as LVM itself is not cluster aware, it is dangerous to do LVM metadata operations with both nodes accessing the shared volumes. So in this case I typically executed these steps: - relocate the services so that they are all on one node - power down the other one - execute the required LVM changes - power on the second node to have it join the cluster again - eventually relocate back part of the cluster services to the second node If I run "lvs" command on both oVirt nodes, they now both show the information regarding the not-removed LV. How does oVirt manage synchronization and coherence of LVM metadata between nodes? This is what I meant with my question...
In any case, this is a disk that you don't need anymore, isn't it? You said that the copy part of the move operation went well.
Regards, Idan
Yes. the move operation succeeded and I'm usingthe target LV, that indeed is marked with "open" (o) while the source not (-) Gianluca

On Tue, Oct 24, 2017 at 10:35 AM, Gianluca Cecchi <gianluca.cecchi@gmail.com
wrote:
On Tue, Oct 24, 2017 at 7:29 AM, Idan Shaby <ishaby@redhat.com> wrote:
Hi Gianluca,
Can you try to reproduce it with FC? Maybe it's somehow related to that. Did you try to lvremove the old lv's, or are they still in use?
Do you mean directly from OS using lvremove command?
Yes
Trying to do this from the SPM should be ok.
OK. I can try it on SPM node if you confirm
Sure, go ahead.
What do you mean synchronize the information at the other node side?
I mean that in the past I worked with RHCS clusters, in both kinds of LVM configurations: 1) CLVMD and 2) HA-LVM In 1) I have the LVM layer that is cluster aware so if I run commands that modify metadata information for LVM (such as adding, removing, extending LV) I'm confident. In 2), as LVM itself is not cluster aware, it is dangerous to do LVM metadata operations with both nodes accessing the shared volumes. So in this case I typically executed these steps: - relocate the services so that they are all on one node - power down the other one - execute the required LVM changes - power on the second node to have it join the cluster again - eventually relocate back part of the cluster services to the second node
If I run "lvs" command on both oVirt nodes, they now both show the information regarding the not-removed LV. How does oVirt manage synchronization and coherence of LVM metadata between nodes?
oVirt (vdsm, to be precise) uses SANLock [1] to achieve that goal.
This is what I meant with my question...
Thanks for the explanation! Generally you're right. We do have to make sure that metadata operations on a logical volume are done by only one host, because we're indeed not using clvm. However, here we're talking about a stale lv that no one should access anymore, and that we don't have interest in anymore. In that case, we can just go and lvremove it. Regarding the bug itself, if you are able to reproduce it in any way, please send the full logs so we can understand its cause.
In any case, this is a disk that you don't need anymore, isn't it? You said that the copy part of the move operation went well.
Regards, Idan
Yes. the move operation succeeded and I'm usingthe target LV, that indeed is marked with "open" (o) while the source not (-)
Gianluca
[1] https://www.ovirt.org/develop/developer-guide/vdsm/sanlock/
participants (2)
-
Gianluca Cecchi
-
Idan Shaby