Removing Direct Mapped LUNs

Hi List, We need to add and remove directly mapped LUNs to multiple VMs in our Non-Production environment. The environment is backed by an iSCSI SAN. In testing when removing a directly mapped LUN it doesn't remove the underlying multipath and devices. Several questions. 1) Is this the expected behavior? 2) Are we supposed to go to each KVM host and manually remove the underlying multipath devices? 3) Is there a technical reason that oVirt doesn't do this as part of the steps to removing the storage? This is something that was handled by the manager in the previous virtualization that we used, Oracle's Xen based Oracle VM. Thanks! Ryan

On Friday, 23 April 2021 02:44:43 CEST Ryan Chewning wrote:
Hi List,
We need to add and remove directly mapped LUNs to multiple VMs in our Non-Production environment. The environment is backed by an iSCSI SAN. In testing when removing a directly mapped LUN it doesn't remove the underlying multipath and devices. Several questions.
1) Is this the expected behavior?
yes, before removing multipath devices, you need to unzone LUN on storage server. As oVirt doesn't manage storage server in case of iSCSI, it has to be done by storage sever admin and therefore oVirt cannot manage whole flow.
2) Are we supposed to go to each KVM host and manually remove the underlying multipath devices?
oVirt provides ansible script for it: https://github.com/oVirt/ovirt-ansible-collection/blob/master/examples/ remove_mpath_device.yml Usage is as follows: ansible-playbook --extra-vars "lun=<LUN_ID>" remove_mpath_device.yml
3) Is there a technical reason that oVirt doesn't do this as part of the steps to removing the storage?
as mentioned above, oVirt doesn't manage iSCSI server and cannot unzone LUN from the server. For managed storage oVirt does that.
This is something that was handled by the manager in the previous virtualization that we used, Oracle's Xen based Oracle VM.
Thanks!
Ryan

On Fri, Apr 23, 2021 at 6:25 AM Vojtech Juranek <vjuranek@redhat.com> wrote:
On Friday, 23 April 2021 02:44:43 CEST Ryan Chewning wrote:
Hi List,
We need to add and remove directly mapped LUNs to multiple VMs in our Non-Production environment. The environment is backed by an iSCSI SAN. In testing when removing a directly mapped LUN it doesn't remove the underlying multipath and devices. Several questions.
1) Is this the expected behavior?
yes, before removing multipath devices, you need to unzone LUN on storage server. As oVirt doesn't manage storage server in case of iSCSI, it has to be done by storage sever admin and therefore oVirt cannot manage whole flow.
Thank you for the information. Perhaps you can expand then on how the volumes are picked up once mapped from the Storage system? Traditionally when mapping storage from an iSCSI or Fibre Channel storage we have to initiate a LIP or iSCSI login. How is it that oVirt doesn't need to do this?
2) Are we supposed to go to each KVM host and manually remove the
underlying multipath devices?
oVirt provides ansible script for it:
https://github.com/oVirt/ovirt-ansible-collection/blob/master/examples/ remove_mpath_device.yml
Usage is as follows:
ansible-playbook --extra-vars "lun=<LUN_ID>" remove_mpath_device.yml
We'll look into this. At least in our Non Production environment when we take down a development environment or refresh the data there are at least 14 volumes that have to be removed and readded.
3) Is there a technical reason that oVirt doesn't do this as part of the steps to removing the storage?
as mentioned above, oVirt doesn't manage iSCSI server and cannot unzone LUN from the server. For managed storage oVirt does that.
I understand ovirt is not able to unzoning the LUN as that is managed on the Storage system. However oVirt does create the multipath device and underlying block devices. We expected that to be cleaned up when a LUN is deleted.
This is something that was handled by the manager in the previous virtualization that we used, Oracle's Xen based Oracle VM.
Thanks!
Ryan
_______________________________________________ Users mailing list -- users@ovirt.org To unsubscribe send an email to users-leave@ovirt.org Privacy Statement: https://www.ovirt.org/privacy-policy.html oVirt Code of Conduct: https://www.ovirt.org/community/about/community-guidelines/ List Archives: https://lists.ovirt.org/archives/list/users@ovirt.org/message/FR4PIIUO325VT5...

On Fri, Apr 23, 2021 at 5:19 PM Ryan Chewning <ryan_chewning@trimble.com> wrote:
On Fri, Apr 23, 2021 at 6:25 AM Vojtech Juranek <vjuranek@redhat.com> wrote:
On Friday, 23 April 2021 02:44:43 CEST Ryan Chewning wrote:
Hi List,
We need to add and remove directly mapped LUNs to multiple VMs in our Non-Production environment. The environment is backed by an iSCSI SAN. In testing when removing a directly mapped LUN it doesn't remove the underlying multipath and devices. Several questions.
1) Is this the expected behavior?
yes, before removing multipath devices, you need to unzone LUN on storage server. As oVirt doesn't manage storage server in case of iSCSI, it has to be done by storage sever admin and therefore oVirt cannot manage whole flow.
Thank you for the information. Perhaps you can expand then on how the volumes are picked up once mapped from the Storage system? Traditionally when mapping storage from an iSCSI or Fibre Channel storage we have to initiate a LIP or iSCSI login. How is it that oVirt doesn't need to do this?
2) Are we supposed to go to each KVM host and manually remove the underlying multipath devices?
oVirt provides ansible script for it:
https://github.com/oVirt/ovirt-ansible-collection/blob/master/examples/ remove_mpath_device.yml
Usage is as follows:
ansible-playbook --extra-vars "lun=<LUN_ID>" remove_mpath_device.yml
We'll look into this. At least in our Non Production environment when we take down a development environment or refresh the data there are at least 14 volumes that have to be removed and readded.
3) Is there a technical reason that oVirt doesn't do this as part of the steps to removing the storage?
as mentioned above, oVirt doesn't manage iSCSI server and cannot unzone LUN from the server. For managed storage oVirt does that.
I understand ovirt is not able to unzoning the LUN as that is managed on the Storage system. However oVirt does create the multipath device and underlying block devices.
Not really, this is a common misunderstanding about how oVirt manages storage. oVirt does not have the concept of adding a LUN or removing a LUN. This is not a coincidence that oVirt UI doe snot have a LUNs tab. The reason is that oVirt does not manage LUNs. oVirt logs in to the iSCSI target, and the results of this is creating multipath devices for all LUNs from the target. This is done automatically by the system, mostly because oVirt configure multipath to grab all SCSI (or other) devices on the system. At this point oVirt does not know which devices will be discovered. On the host vdsm reports the LUNs to oVirt engine. The admin may add LUNs for storage domains, or for VMs (direct LUN). LUNs that are not used by oVirt are visible on the host and are not managed by oVirt. Since vdsm does not know which LUNs are expected, it does SCSI rescan in many flows to make sure all LUNs are visible on the host. For example after resizing a LUN on the server (not controlled by oVirt), the new size may not be available on the host until the next SCSI rescan. Another example is a new LUN added on the server. When a LUN is removed from a storage domain, or from a VM, oVirt does not remove it from the host. For example you can remove a LUN from a storage domain and then add it as direct LUN to a VM, or the other way around.
We expected that to be cleaned up when a LUN is deleted.
We don't have the concept of deleting a LUN in oVirt. This is done on the server by the storage admin, outside of oVirt. It would be nice if oVirt had a way to remove a specific LUN from the system using the UI, but this feature was never implemented. What we have now is the ansible script that should make this easy enough. Note that the ansible script is not a complete solution. If you remove the LUN from the host before un-zoning the LUN on the server side, the automatic SCSI rescan in oVirt will discover and add back the LUN right after you removed it. Nir

On Fri, Apr 23, 2021 at 7:15 PM Nir Soffer <nsoffer@redhat.com> wrote:
1) Is this the expected behavior?
yes, before removing multipath devices, you need to unzone LUN on storage server. As oVirt doesn't manage storage server in case of iSCSI, it has to be done by storage sever admin and therefore oVirt cannot manage whole flow.
Thank you for the information. Perhaps you can expand then on how the volumes are picked up once mapped from the Storage system? Traditionally when mapping storage from an iSCSI or Fibre Channel storage we have to initiate a LIP or iSCSI login. How is it that oVirt doesn't need to do this?
2) Are we supposed to go to each KVM host and manually remove the underlying multipath devices?
oVirt provides ansible script for it:
https://github.com/oVirt/ovirt-ansible-collection/blob/master/examples/ remove_mpath_device.yml
Usage is as follows:
ansible-playbook --extra-vars "lun=<LUN_ID>" remove_mpath_device.yml
I had to decommission one iSCSI based storage domain, after having added one new iSCSI one (with another portal) and moved all the objects into the new one (vm disks, template disks, iso disks, leases). The Environment is based on 4.4.6, with 3 hosts, external engine. So I tried the ansible playbook way to verify it. Initial situation is this below; the storage domain to decommission is the ovsd3750, based on the 5Tb LUN. $ sudo multipath -l 364817197c52f98316900666e8c2b0b2b dm-13 EQLOGIC,100E-00 size=2.0T features='1 queue_if_no_path' hwhandler='1 alua' wp=rw `-+- policy='round-robin 0' prio=0 status=active |- 16:0:0:0 sde 8:64 active undef running `- 17:0:0:0 sdf 8:80 active undef running 36090a0d800851c9d2195d5b837c9e328 dm-2 EQLOGIC,100E-00 size=5.0T features='1 queue_if_no_path' hwhandler='1 alua' wp=rw `-+- policy='round-robin 0' prio=0 status=active |- 13:0:0:0 sdb 8:16 active undef running `- 14:0:0:0 sdc 8:32 active undef running Connections are using iSCSI multipathing (iscsi1 and iscs2 in web admin gui), so that I have two paths to each LUN: $sudo iscsiadm -m node 10.10.100.7:3260,1 iqn.2001-05.com.equallogic:0-8a0906-9d1c8500d-28e3c937b8d59521-ovsd3750 10.10.100.7:3260,1 iqn.2001-05.com.equallogic:0-8a0906-9d1c8500d-28e3c937b8d59521-ovsd3750 10.10.100.9:3260,1 iqn.2001-05.com.equallogic:4-771816-31982fc59-2b0b2b8c6e660069-ovsd3920 10.10.100.9:3260,1 iqn.2001-05.com.equallogic:4-771816-31982fc59-2b0b2b8c6e660069-ovsd3920 $ sudo iscsiadm -m session tcp: [1] 10.10.100.7:3260,1 iqn.2001-05.com.equallogic:0-8a0906-9d1c8500d-28e3c937b8d59521-ovsd3750 (non-flash) tcp: [2] 10.10.100.7:3260,1 iqn.2001-05.com.equallogic:0-8a0906-9d1c8500d-28e3c937b8d59521-ovsd3750 (non-flash) tcp: [4] 10.10.100.9:3260,1 iqn.2001-05.com.equallogic:4-771816-31982fc59-2b0b2b8c6e660069-ovsd3920 (non-flash) tcp: [5] 10.10.100.9:3260,1 iqn.2001-05.com.equallogic:4-771816-31982fc59-2b0b2b8c6e660069-ovsd3920 (non-flash) One point not taken in consideration inside the previously opened bugs in my opinion is the deletion of iSCSI connections and node at host side (probably to be done by the os admin, but it could be taken in charge by the ansible playbook...) The bugs I'm referring are: Bug 1310330 - [RFE] Provide a way to remove stale LUNs from hypervisors Bug 1928041 - Stale DM links after block SD removal Actions done: put storage domain into maintenance detach storage domain remove storage domain remove access from equallogic admin gui I have a group named ovirt in ansible inventory composed by my 3 hosts: ov200, ov300 and ov301 executed $ ansible-playbook -b -l ovirt --extra-vars "lun=36090a0d800851c9d2195d5b837c9e328" remove_mpath_device.yml it went all ok with ov200 and ov300, but for ov301 I got fatal: [ov301: FAILED! => {"changed": true, "cmd": "multipath -f \"36090a0d800851c9d2195d5b837c9e328\"", "delta": "0:00:00.009003", "end": "2021-07-15 11:17:37.340584", "msg": "non-zero return code", "rc": 1, "start": "2021-07-15 11:17:37.331581", "stderr": "Jul 15 11:17:37 | 36090a0d800851c9d2195d5b837c9e328: map in use", "stderr_lines": ["Jul 15 11:17:37 | 36090a0d800851c9d2195d5b837c9e328: map in use"], "stdout": "", "stdout_lines": []} the complete output: $ ansible-playbook -b -l ovirt --extra-vars "lun=36090a0d800851c9d2195d5b837c9e328" remove_mpath_device.yml PLAY [Cleanly remove unzoned storage devices (LUNs)] ************************************************************* TASK [Gathering Facts] ******************************************************************************************* ok: [ov200] ok: [ov300] ok: [ov301] TASK [Get underlying disks (paths) for a multipath device and turn them into a list.] **************************** changed: [ov300] changed: [ov200] changed: [ov301] TASK [Remove from multipath device.] ***************************************************************************** changed: [ov200] changed: [ov300] fatal: [ov301]: FAILED! => {"changed": true, "cmd": "multipath -f \"36090a0d800851c9d2195d5b837c9e328\"", "delta": "0:00:00.009003", "end": "2021-07-15 11:17:37.340584", "msg": "non-zero return code", "rc": 1, "start": "2021-07-15 11:17:37.331581", "stderr": "Jul 15 11:17:37 | 36090a0d800851c9d2195d5b837c9e328: map in use", "stderr_lines": ["Jul 15 11:17:37 | 36090a0d800851c9d2195d5b837c9e328: map in use"], "stdout": "", "stdout_lines": []} TASK [Remove each path from the SCSI subsystem.] ***************************************************************** changed: [ov300] => (item=sdc) changed: [ov300] => (item=sdb) changed: [ov200] => (item=sdc) changed: [ov200] => (item=sdb) PLAY RECAP ******************************************************************************************************* ov200 : ok=4 changed=3 unreachable=0 failed=0 skipped=0 rescued=0 ignored=0 ov300 : ok=4 changed=3 unreachable=0 failed=0 skipped=0 rescued=0 ignored=0 ov301 : ok=2 changed=1 unreachable=0 failed=1 skipped=0 rescued=0 ignored=0 Indeed going to the server I get: root@ov301 ~]# multipath -f 36090a0d800851c9d2195d5b837c9e328 Jul 15 11:24:37 | 36090a0d800851c9d2195d5b837c9e328: map in use [root@ov301 ~]# the dm device under the multipath one is dm-2 and [root@ov301 ~]# ll /dev/dm-2 brw-rw----. 1 root disk 253, 2 Jul 15 11:28 /dev/dm-2 [root@ov301 ~]# [root@ov301 ~]# lsof | grep "253,2" I get no lines, but other devices with minor beginning with 2 (eg. 24, 25, 27..) . . . qemu-kvm 10638 10653 vnc_worke qemu 84u BLK 253,24 0t0 112027277 /dev/dm-24 qemu-kvm 11479 qemu 43u BLK 253,27 0t0 112135384 /dev/dm-27 qemu-kvm 11479 qemu 110u BLK 253,25 0t0 112140523 /dev/dm-25 so nothing for dm-2 What to do to crosscheck what is using the device and so preventing the "-f" to complete? Now I get # multipath -l 364817197c52f98316900666e8c2b0b2b dm-14 EQLOGIC,100E-00 size=2.0T features='1 queue_if_no_path' hwhandler='1 alua' wp=rw `-+- policy='round-robin 0' prio=0 status=active |- 16:0:0:0 sde 8:64 active undef running `- 17:0:0:0 sdf 8:80 active undef running 36090a0d800851c9d2195d5b837c9e328 dm-2 ##,## size=5.0T features='0' hwhandler='0' wp=rw Another thing to improve perhaps in the ansible playbook is that usually when in general I remove FC or iSCSI LUNs under multipath on a Linux system, after the "multipath -f" command and before the "echo 1 > ... /device/delete" one I run also, for safeness: blockdev --flushbufs /dev/$i where $i loops over the devices composing the multipath. I see that inside the web admin Gui under Datacenter--> iSCSI multipath iscsi1 iscsi2 there is no more the connection to the removed SD. But at the host side nothing changed from the iSCSI point of view. So I executed: log out from the sessions: [root@ov300 ~]# iscsiadm -m session -r 1 -u Logging out of session [sid: 1, target: iqn.2001-05.com.equallogic:0-8a0906-9d1c8500d-28e3c937b8d59521-ovsd3750, portal: 10.10.100.7,3260] Logout of [sid: 1, target: iqn.2001-05.com.equallogic:0-8a0906-9d1c8500d-28e3c937b8d59521-ovsd3750, portal: 10.10.100.7,3260] successful. [root@ov300 ~]# iscsiadm -m session -r 2 -u Logging out of session [sid: 2, target: iqn.2001-05.com.equallogic:0-8a0906-9d1c8500d-28e3c937b8d59521-ovsd3750, portal: 10.10.100.7,3260] Logout of [sid: 2, target: iqn.2001-05.com.equallogic:0-8a0906-9d1c8500d-28e3c937b8d59521-ovsd3750, portal: 10.10.100.7,3260] successful. [root@ov300 ~]# and then removal of the node [root@ov300 ~]# iscsiadm -m node -T iqn.2001-05.com.equallogic:0-8a0906-9d1c8500d-28e3c937b8d59521-ovsd3750 -o delete [root@ov300 ~]# ll /var/lib/iscsi/nodes/ total 4 drw-------. 3 root root 4096 Jul 13 11:18 iqn.2001-05.com.equallogic:4-771816-31982fc59-2b0b2b8c6e660069-ovsd3920 [root@ov300 ~]# while previously I had: [root@ov300 ~]# ll /var/lib/iscsi/nodes/ total 8 drw-------. 3 root root 4096 Jan 12 2021 iqn.2001-05.com.equallogic:0-8a0906-9d1c8500d-28e3c937b8d59521-ovsd3750 drw-------. 3 root root 4096 Jul 13 11:18 iqn.2001-05.com.equallogic:4-771816-31982fc59-2b0b2b8c6e660069-ovsd3920 [root@ov301 ~]# Otherwise I think that at reboot the host will try to reconnect to the no more existing portal... Comments welcome Gianluca

On Thu, Jul 15, 2021 at 3:50 PM Gianluca Cecchi <gianluca.cecchi@gmail.com> wrote:
On Fri, Apr 23, 2021 at 7:15 PM Nir Soffer <nsoffer@redhat.com> wrote:
1) Is this the expected behavior?
yes, before removing multipath devices, you need to unzone LUN on storage server. As oVirt doesn't manage storage server in case of iSCSI, it has to be done by storage sever admin and therefore oVirt cannot manage whole flow.
Thank you for the information. Perhaps you can expand then on how the volumes are picked up once mapped from the Storage system? Traditionally when mapping storage from an iSCSI or Fibre Channel storage we have to initiate a LIP or iSCSI login. How is it that oVirt doesn't need to do this?
2) Are we supposed to go to each KVM host and manually remove the underlying multipath devices?
oVirt provides ansible script for it:
https://github.com/oVirt/ovirt-ansible-collection/blob/master/examples/ remove_mpath_device.yml
Usage is as follows:
ansible-playbook --extra-vars "lun=<LUN_ID>" remove_mpath_device.yml
I had to decommission one iSCSI based storage domain, after having added one new iSCSI one (with another portal) and moved all the objects into the new one (vm disks, template disks, iso disks, leases). The Environment is based on 4.4.6, with 3 hosts, external engine. So I tried the ansible playbook way to verify it.
Initial situation is this below; the storage domain to decommission is the ovsd3750, based on the 5Tb LUN.
$ sudo multipath -l 364817197c52f98316900666e8c2b0b2b dm-13 EQLOGIC,100E-00 size=2.0T features='1 queue_if_no_path' hwhandler='1 alua' wp=rw `-+- policy='round-robin 0' prio=0 status=active |- 16:0:0:0 sde 8:64 active undef running `- 17:0:0:0 sdf 8:80 active undef running 36090a0d800851c9d2195d5b837c9e328 dm-2 EQLOGIC,100E-00 size=5.0T features='1 queue_if_no_path' hwhandler='1 alua' wp=rw `-+- policy='round-robin 0' prio=0 status=active |- 13:0:0:0 sdb 8:16 active undef running `- 14:0:0:0 sdc 8:32 active undef running
Connections are using iSCSI multipathing (iscsi1 and iscs2 in web admin gui), so that I have two paths to each LUN:
$sudo iscsiadm -m node 10.10.100.7:3260,1 iqn.2001-05.com.equallogic:0-8a0906-9d1c8500d-28e3c937b8d59521-ovsd3750 10.10.100.7:3260,1 iqn.2001-05.com.equallogic:0-8a0906-9d1c8500d-28e3c937b8d59521-ovsd3750 10.10.100.9:3260,1 iqn.2001-05.com.equallogic:4-771816-31982fc59-2b0b2b8c6e660069-ovsd3920 10.10.100.9:3260,1 iqn.2001-05.com.equallogic:4-771816-31982fc59-2b0b2b8c6e660069-ovsd3920
$ sudo iscsiadm -m session tcp: [1] 10.10.100.7:3260,1 iqn.2001-05.com.equallogic:0-8a0906-9d1c8500d-28e3c937b8d59521-ovsd3750 (non-flash) tcp: [2] 10.10.100.7:3260,1 iqn.2001-05.com.equallogic:0-8a0906-9d1c8500d-28e3c937b8d59521-ovsd3750 (non-flash) tcp: [4] 10.10.100.9:3260,1 iqn.2001-05.com.equallogic:4-771816-31982fc59-2b0b2b8c6e660069-ovsd3920 (non-flash) tcp: [5] 10.10.100.9:3260,1 iqn.2001-05.com.equallogic:4-771816-31982fc59-2b0b2b8c6e660069-ovsd3920 (non-flash)
One point not taken in consideration inside the previously opened bugs in my opinion is the deletion of iSCSI connections and node at host side (probably to be done by the os admin, but it could be taken in charge by the ansible playbook...) The bugs I'm referring are: Bug 1310330 - [RFE] Provide a way to remove stale LUNs from hypervisors Bug 1928041 - Stale DM links after block SD removal
Actions done: put storage domain into maintenance detach storage domain remove storage domain remove access from equallogic admin gui
I have a group named ovirt in ansible inventory composed by my 3 hosts: ov200, ov300 and ov301 executed $ ansible-playbook -b -l ovirt --extra-vars "lun=36090a0d800851c9d2195d5b837c9e328" remove_mpath_device.yml
it went all ok with ov200 and ov300, but for ov301 I got
fatal: [ov301: FAILED! => {"changed": true, "cmd": "multipath -f \"36090a0d800851c9d2195d5b837c9e328\"", "delta": "0:00:00.009003", "end": "2021-07-15 11:17:37.340584", "msg": "non-zero return code", "rc": 1, "start": "2021-07-15 11:17:37.331581", "stderr": "Jul 15 11:17:37 | 36090a0d800851c9d2195d5b837c9e328: map in use", "stderr_lines": ["Jul 15 11:17:37 | 36090a0d800851c9d2195d5b837c9e328: map in use"], "stdout": "", "stdout_lines": []}
the complete output:
$ ansible-playbook -b -l ovirt --extra-vars "lun=36090a0d800851c9d2195d5b837c9e328" remove_mpath_device.yml
PLAY [Cleanly remove unzoned storage devices (LUNs)] *************************************************************
TASK [Gathering Facts] ******************************************************************************************* ok: [ov200] ok: [ov300] ok: [ov301]
TASK [Get underlying disks (paths) for a multipath device and turn them into a list.] **************************** changed: [ov300] changed: [ov200] changed: [ov301]
TASK [Remove from multipath device.] ***************************************************************************** changed: [ov200] changed: [ov300] fatal: [ov301]: FAILED! => {"changed": true, "cmd": "multipath -f \"36090a0d800851c9d2195d5b837c9e328\"", "delta": "0:00:00.009003", "end": "2021-07-15 11:17:37.340584", "msg": "non-zero return code", "rc": 1, "start": "2021-07-15 11:17:37.331581", "stderr": "Jul 15 11:17:37 | 36090a0d800851c9d2195d5b837c9e328: map in use", "stderr_lines": ["Jul 15 11:17:37 | 36090a0d800851c9d2195d5b837c9e328: map in use"], "stdout": "", "stdout_lines": []}
TASK [Remove each path from the SCSI subsystem.] ***************************************************************** changed: [ov300] => (item=sdc) changed: [ov300] => (item=sdb) changed: [ov200] => (item=sdc) changed: [ov200] => (item=sdb)
PLAY RECAP ******************************************************************************************************* ov200 : ok=4 changed=3 unreachable=0 failed=0 skipped=0 rescued=0 ignored=0 ov300 : ok=4 changed=3 unreachable=0 failed=0 skipped=0 rescued=0 ignored=0 ov301 : ok=2 changed=1 unreachable=0 failed=1 skipped=0 rescued=0 ignored=0
Indeed going to the server I get:
root@ov301 ~]# multipath -f 36090a0d800851c9d2195d5b837c9e328 Jul 15 11:24:37 | 36090a0d800851c9d2195d5b837c9e328: map in use [root@ov301 ~]#
the dm device under the multipath one is dm-2 and [root@ov301 ~]# ll /dev/dm-2 brw-rw----. 1 root disk 253, 2 Jul 15 11:28 /dev/dm-2 [root@ov301 ~]#
[root@ov301 ~]# lsof | grep "253,2"
I get no lines, but other devices with minor beginning with 2 (eg. 24, 25, 27..) . . . qemu-kvm 10638 10653 vnc_worke qemu 84u BLK 253,24 0t0 112027277 /dev/dm-24 qemu-kvm 11479 qemu 43u BLK 253,27 0t0 112135384 /dev/dm-27 qemu-kvm 11479 qemu 110u BLK 253,25 0t0 112140523 /dev/dm-25
so nothing for dm-2
What to do to crosscheck what is using the device and so preventing the "-f" to complete? Now I get
# multipath -l 364817197c52f98316900666e8c2b0b2b dm-14 EQLOGIC,100E-00 size=2.0T features='1 queue_if_no_path' hwhandler='1 alua' wp=rw `-+- policy='round-robin 0' prio=0 status=active |- 16:0:0:0 sde 8:64 active undef running `- 17:0:0:0 sdf 8:80 active undef running 36090a0d800851c9d2195d5b837c9e328 dm-2 ##,## size=5.0T features='0' hwhandler='0' wp=rw
Another thing to improve perhaps in the ansible playbook is that usually when in general I remove FC or iSCSI LUNs under multipath on a Linux system, after the "multipath -f" command and before the "echo 1 > ... /device/delete" one I run also, for safeness:
blockdev --flushbufs /dev/$i where $i loops over the devices composing the multipath.
I see that inside the web admin Gui under Datacenter--> iSCSI multipath iscsi1 iscsi2 there is no more the connection to the removed SD. But at the host side nothing changed from the iSCSI point of view. So I executed:
log out from the sessions: [root@ov300 ~]# iscsiadm -m session -r 1 -u Logging out of session [sid: 1, target: iqn.2001-05.com.equallogic:0-8a0906-9d1c8500d-28e3c937b8d59521-ovsd3750, portal: 10.10.100.7,3260] Logout of [sid: 1, target: iqn.2001-05.com.equallogic:0-8a0906-9d1c8500d-28e3c937b8d59521-ovsd3750, portal: 10.10.100.7,3260] successful. [root@ov300 ~]# iscsiadm -m session -r 2 -u Logging out of session [sid: 2, target: iqn.2001-05.com.equallogic:0-8a0906-9d1c8500d-28e3c937b8d59521-ovsd3750, portal: 10.10.100.7,3260] Logout of [sid: 2, target: iqn.2001-05.com.equallogic:0-8a0906-9d1c8500d-28e3c937b8d59521-ovsd3750, portal: 10.10.100.7,3260] successful. [root@ov300 ~]#
and then removal of the node [root@ov300 ~]# iscsiadm -m node -T iqn.2001-05.com.equallogic:0-8a0906-9d1c8500d-28e3c937b8d59521-ovsd3750 -o delete [root@ov300 ~]# ll /var/lib/iscsi/nodes/ total 4 drw-------. 3 root root 4096 Jul 13 11:18 iqn.2001-05.com.equallogic:4-771816-31982fc59-2b0b2b8c6e660069-ovsd3920 [root@ov300 ~]#
while previously I had: [root@ov300 ~]# ll /var/lib/iscsi/nodes/ total 8 drw-------. 3 root root 4096 Jan 12 2021 iqn.2001-05.com.equallogic:0-8a0906-9d1c8500d-28e3c937b8d59521-ovsd3750 drw-------. 3 root root 4096 Jul 13 11:18 iqn.2001-05.com.equallogic:4-771816-31982fc59-2b0b2b8c6e660069-ovsd3920 [root@ov301 ~]#
Otherwise I think that at reboot the host will try to reconnect to the no more existing portal...
Comments welcome
@Vojtech Juranek can you look at this?

On Thursday, 15 July 2021 14:49:45 CEST Gianluca Cecchi wrote:
On Fri, Apr 23, 2021 at 7:15 PM Nir Soffer <nsoffer@redhat.com> wrote:
1) Is this the expected behavior?
yes, before removing multipath devices, you need to unzone LUN on
storage
server. As oVirt doesn't manage storage server in case of iSCSI, it has
to be
done by storage sever admin and therefore oVirt cannot manage whole
flow.
Thank you for the information. Perhaps you can expand then on how the
volumes are picked up once mapped from the Storage system? Traditionally when mapping storage from an iSCSI or Fibre Channel storage we have to initiate a LIP or iSCSI login. How is it that oVirt doesn't need to do this?>
2) Are we supposed to go to each KVM host and manually remove the underlying multipath devices?
oVirt provides ansible script for it:
https://github.com/oVirt/ovirt-ansible-collection/blob/master/examples/ remove_mpath_device.yml
Usage is as follows:
ansible-playbook --extra-vars "lun=<LUN_ID>" remove_mpath_device.yml
I had to decommission one iSCSI based storage domain, after having added one new iSCSI one (with another portal) and moved all the objects into the new one (vm disks, template disks, iso disks, leases). The Environment is based on 4.4.6, with 3 hosts, external engine. So I tried the ansible playbook way to verify it.
Initial situation is this below; the storage domain to decommission is the ovsd3750, based on the 5Tb LUN.
$ sudo multipath -l 364817197c52f98316900666e8c2b0b2b dm-13 EQLOGIC,100E-00 size=2.0T features='1 queue_if_no_path' hwhandler='1 alua' wp=rw `-+- policy='round-robin 0' prio=0 status=active
|- 16:0:0:0 sde 8:64 active undef running
`- 17:0:0:0 sdf 8:80 active undef running 36090a0d800851c9d2195d5b837c9e328 dm-2 EQLOGIC,100E-00 size=5.0T features='1 queue_if_no_path' hwhandler='1 alua' wp=rw `-+- policy='round-robin 0' prio=0 status=active
|- 13:0:0:0 sdb 8:16 active undef running
`- 14:0:0:0 sdc 8:32 active undef running
Connections are using iSCSI multipathing (iscsi1 and iscs2 in web admin gui), so that I have two paths to each LUN:
$sudo iscsiadm -m node 10.10.100.7:3260,1 iqn.2001-05.com.equallogic:0-8a0906-9d1c8500d-28e3c937b8d59521-ovsd3750 10.10.100.7:3260,1 iqn.2001-05.com.equallogic:0-8a0906-9d1c8500d-28e3c937b8d59521-ovsd3750 10.10.100.9:3260,1 iqn.2001-05.com.equallogic:4-771816-31982fc59-2b0b2b8c6e660069-ovsd3920 10.10.100.9:3260,1 iqn.2001-05.com.equallogic:4-771816-31982fc59-2b0b2b8c6e660069-ovsd3920
$ sudo iscsiadm -m session tcp: [1] 10.10.100.7:3260,1 iqn.2001-05.com.equallogic:0-8a0906-9d1c8500d-28e3c937b8d59521-ovsd3750 (non-flash) tcp: [2] 10.10.100.7:3260,1 iqn.2001-05.com.equallogic:0-8a0906-9d1c8500d-28e3c937b8d59521-ovsd3750 (non-flash) tcp: [4] 10.10.100.9:3260,1 iqn.2001-05.com.equallogic:4-771816-31982fc59-2b0b2b8c6e660069-ovsd3920 (non-flash) tcp: [5] 10.10.100.9:3260,1 iqn.2001-05.com.equallogic:4-771816-31982fc59-2b0b2b8c6e660069-ovsd3920 (non-flash)
One point not taken in consideration inside the previously opened bugs in my opinion is the deletion of iSCSI connections and node at host side (probably to be done by the os admin, but it could be taken in charge by the ansible playbook...) The bugs I'm referring are: Bug 1310330 - [RFE] Provide a way to remove stale LUNs from hypervisors Bug 1928041 - Stale DM links after block SD removal
Actions done: put storage domain into maintenance detach storage domain remove storage domain remove access from equallogic admin gui
I have a group named ovirt in ansible inventory composed by my 3 hosts: ov200, ov300 and ov301 executed $ ansible-playbook -b -l ovirt --extra-vars "lun=36090a0d800851c9d2195d5b837c9e328" remove_mpath_device.yml
it went all ok with ov200 and ov300, but for ov301 I got
fatal: [ov301: FAILED! => {"changed": true, "cmd": "multipath -f \"36090a0d800851c9d2195d5b837c9e328\"", "delta": "0:00:00.009003", "end": "2021-07-15 11:17:37.340584", "msg": "non-zero return code", "rc": 1, "start": "2021-07-15 11:17:37.331581", "stderr": "Jul 15 11:17:37 | 36090a0d800851c9d2195d5b837c9e328: map in use", "stderr_lines": ["Jul 15 11:17:37 | 36090a0d800851c9d2195d5b837c9e328: map in use"], "stdout": "", "stdout_lines": []}
the complete output:
$ ansible-playbook -b -l ovirt --extra-vars "lun=36090a0d800851c9d2195d5b837c9e328" remove_mpath_device.yml
PLAY [Cleanly remove unzoned storage devices (LUNs)] *************************************************************
TASK [Gathering Facts] **************************************************************************** *************** ok: [ov200] ok: [ov300] ok: [ov301]
TASK [Get underlying disks (paths) for a multipath device and turn them into a list.] **************************** changed: [ov300] changed: [ov200] changed: [ov301]
TASK [Remove from multipath device.] **************************************************************************** * changed: [ov200] changed: [ov300] fatal: [ov301]: FAILED! => {"changed": true, "cmd": "multipath -f \"36090a0d800851c9d2195d5b837c9e328\"", "delta": "0:00:00.009003", "end": "2021-07-15 11:17:37.340584", "msg": "non-zero return code", "rc": 1, "start": "2021-07-15 11:17:37.331581", "stderr": "Jul 15 11:17:37 | 36090a0d800851c9d2195d5b837c9e328: map in use", "stderr_lines": ["Jul 15 11:17:37 | 36090a0d800851c9d2195d5b837c9e328: map in use"], "stdout": "", "stdout_lines": []}
TASK [Remove each path from the SCSI subsystem.] ***************************************************************** changed: [ov300] => (item=sdc) changed: [ov300] => (item=sdb) changed: [ov200] => (item=sdc) changed: [ov200] => (item=sdb)
PLAY RECAP **************************************************************************** *************************** ov200 : ok=4 changed=3 unreachable=0 failed=0 skipped=0 rescued=0 ignored=0 ov300 : ok=4 changed=3 unreachable=0 failed=0 skipped=0 rescued=0 ignored=0 ov301 : ok=2 changed=1 unreachable=0 failed=1 skipped=0 rescued=0 ignored=0
Indeed going to the server I get:
root@ov301 ~]# multipath -f 36090a0d800851c9d2195d5b837c9e328 Jul 15 11:24:37 | 36090a0d800851c9d2195d5b837c9e328: map in use [root@ov301 ~]#
the dm device under the multipath one is dm-2 and [root@ov301 ~]# ll /dev/dm-2 brw-rw----. 1 root disk 253, 2 Jul 15 11:28 /dev/dm-2 [root@ov301 ~]#
[root@ov301 ~]# lsof | grep "253,2"
I get no lines, but other devices with minor beginning with 2 (eg. 24, 25, 27..) . . . qemu-kvm 10638 10653 vnc_worke qemu 84u BLK 253,24 0t0 112027277 /dev/dm-24 qemu-kvm 11479 qemu 43u BLK 253,27 0t0 112135384 /dev/dm-27 qemu-kvm 11479 qemu 110u BLK 253,25 0t0 112140523 /dev/dm-25
so nothing for dm-2
What to do to crosscheck what is using the device and so preventing the "-f" to complete?
can you try dmsetup info /dev/mapper/36090a0d800851c9d2195d5b837c9e328 and check "Open count" filed to see if there is still anything open? Also, you can try fuser /dev/dm-2 to see which process is using the device
Now I get
# multipath -l 364817197c52f98316900666e8c2b0b2b dm-14 EQLOGIC,100E-00 size=2.0T features='1 queue_if_no_path' hwhandler='1 alua' wp=rw `-+- policy='round-robin 0' prio=0 status=active
|- 16:0:0:0 sde 8:64 active undef running
`- 17:0:0:0 sdf 8:80 active undef running 36090a0d800851c9d2195d5b837c9e328 dm-2 ##,## size=5.0T features='0' hwhandler='0' wp=rw
Another thing to improve perhaps in the ansible playbook is that usually when in general I remove FC or iSCSI LUNs under multipath on a Linux system, after the "multipath -f" command and before the "echo 1 > ... /device/delete" one I run also, for safeness:
blockdev --flushbufs /dev/$i where $i loops over the devices composing the multipath.
I see that inside the web admin Gui under Datacenter--> iSCSI multipath iscsi1 iscsi2 there is no more the connection to the removed SD. But at the host side nothing changed from the iSCSI point of view. So I executed:
log out from the sessions: [root@ov300 ~]# iscsiadm -m session -r 1 -u Logging out of session [sid: 1, target: iqn.2001-05.com.equallogic:0-8a0906-9d1c8500d-28e3c937b8d59521-ovsd3750, portal: 10.10.100.7,3260] Logout of [sid: 1, target: iqn.2001-05.com.equallogic:0-8a0906-9d1c8500d-28e3c937b8d59521-ovsd3750, portal: 10.10.100.7,3260] successful. [root@ov300 ~]# iscsiadm -m session -r 2 -u Logging out of session [sid: 2, target: iqn.2001-05.com.equallogic:0-8a0906-9d1c8500d-28e3c937b8d59521-ovsd3750, portal: 10.10.100.7,3260] Logout of [sid: 2, target: iqn.2001-05.com.equallogic:0-8a0906-9d1c8500d-28e3c937b8d59521-ovsd3750, portal: 10.10.100.7,3260] successful. [root@ov300 ~]#
and then removal of the node [root@ov300 ~]# iscsiadm -m node -T iqn.2001-05.com.equallogic:0-8a0906-9d1c8500d-28e3c937b8d59521-ovsd3750 -o delete [root@ov300 ~]# ll /var/lib/iscsi/nodes/ total 4 drw-------. 3 root root 4096 Jul 13 11:18 iqn.2001-05.com.equallogic:4-771816-31982fc59-2b0b2b8c6e660069-ovsd3920 [root@ov300 ~]#
while previously I had: [root@ov300 ~]# ll /var/lib/iscsi/nodes/ total 8 drw-------. 3 root root 4096 Jan 12 2021 iqn.2001-05.com.equallogic:0-8a0906-9d1c8500d-28e3c937b8d59521-ovsd3750 drw-------. 3 root root 4096 Jul 13 11:18 iqn.2001-05.com.equallogic:4-771816-31982fc59-2b0b2b8c6e660069-ovsd3920 [root@ov301 ~]#
Otherwise I think that at reboot the host will try to reconnect to the no more existing portal...
Comments welcome
Gianluca

On Fri, Jul 16, 2021 at 11:15 AM Vojtech Juranek <vjuranek@redhat.com> wrote:
What to do to crosscheck what is using the device and so preventing the "-f" to complete?
can you try
dmsetup info /dev/mapper/36090a0d800851c9d2195d5b837c9e328
and check "Open count" filed to see if there is still anything open?
Also, you can try
fuser /dev/dm-2
to see which process is using the device
[root@ov301 ~]# dmsetup info /dev/mapper/36090a0d800851c9d2195d5b837c9e328 Name: 36090a0d800851c9d2195d5b837c9e328 State: ACTIVE Read Ahead: 256 Tables present: LIVE Open count: 1 Event number: 0 Major, minor: 253, 2 Number of targets: 1 UUID: mpath-36090a0d800851c9d2195d5b837c9e328 [root@ov301 ~]# fuser /dev/dm-2 [root@ov301 ~]# echo $? 1 [root@ov301 ~]# ll /dev/dm-2 brw-rw----. 1 root disk 253, 2 Jul 15 11:28 /dev/dm-2 I'm still unable to remove it: [root@ov301 ~]# multipath -f 36090a0d800851c9d2195d5b837c9e328 Jul 16 12:25:11 | 36090a0d800851c9d2195d5b837c9e328: map in use [root@ov301 ~]#

On Friday, 16 July 2021 12:31:34 CEST Gianluca Cecchi wrote:
On Fri, Jul 16, 2021 at 11:15 AM Vojtech Juranek <vjuranek@redhat.com>
wrote:
What to do to crosscheck what is using the device and so preventing the "-f" to complete?
can you try
dmsetup info /dev/mapper/36090a0d800851c9d2195d5b837c9e328
and check "Open count" filed to see if there is still anything open?
Also, you can try
fuser /dev/dm-2
to see which process is using the device
[root@ov301 ~]# dmsetup info /dev/mapper/36090a0d800851c9d2195d5b837c9e328 Name: 36090a0d800851c9d2195d5b837c9e328 State: ACTIVE Read Ahead: 256 Tables present: LIVE Open count: 1
This means there's some open connection. As lsof or fuser doesn't show anything I wonder how this could happen. Theoretically (not tested as I actually don't know how to reproduce this) and on your own risk:-), you can try dmsetup suspend /dev/mapper/36090a0d800851c9d2195d5b837c9e328 dmsetup clear /dev/mapper/36090a0d800851c9d2195d5b837c9e328 dmsetup wipe_table /dev/mapper/36090a0d800851c9d2195d5b837c9e328 which should remove any stale connection. After that dmsetup info should show Open count 0 and multipath -f 36090a0d800851c9d2195d5b837c9e328 should work
Event number: 0 Major, minor: 253, 2 Number of targets: 1 UUID: mpath-36090a0d800851c9d2195d5b837c9e328
[root@ov301 ~]# fuser /dev/dm-2 [root@ov301 ~]# echo $? 1 [root@ov301 ~]# ll /dev/dm-2 brw-rw----. 1 root disk 253, 2 Jul 15 11:28 /dev/dm-2
I'm still unable to remove it: [root@ov301 ~]# multipath -f 36090a0d800851c9d2195d5b837c9e328 Jul 16 12:25:11 | 36090a0d800851c9d2195d5b837c9e328: map in use [root@ov301 ~]#

On Fri, Jul 16, 2021 at 1:59 PM Vojtech Juranek <vjuranek@redhat.com> wrote:
On Fri, Jul 16, 2021 at 11:15 AM Vojtech Juranek <vjuranek@redhat.com>
wrote:
What to do to crosscheck what is using the device and so preventing
On Friday, 16 July 2021 12:31:34 CEST Gianluca Cecchi wrote: the
"-f" to complete?
can you try
dmsetup info /dev/mapper/36090a0d800851c9d2195d5b837c9e328
and check "Open count" filed to see if there is still anything open?
Also, you can try
fuser /dev/dm-2
to see which process is using the device
[root@ov301 ~]# dmsetup info /dev/mapper/36090a0d800851c9d2195d5b837c9e328 Name: 36090a0d800851c9d2195d5b837c9e328 State: ACTIVE Read Ahead: 256 Tables present: LIVE Open count: 1
This means there's some open connection. As lsof or fuser doesn't show anything I wonder how this could happen.
Theoretically (not tested as I actually don't know how to reproduce this) and on your own risk:-), you can try
dmsetup suspend /dev/mapper/36090a0d800851c9d2195d5b837c9e328 dmsetup clear /dev/mapper/36090a0d800851c9d2195d5b837c9e328 dmsetup wipe_table /dev/mapper/36090a0d800851c9d2195d5b837c9e328
which should remove any stale connection. After that dmsetup info should show Open count 0 and multipath -f 36090a0d800851c9d2195d5b837c9e328 should work
The host doesn't see the storage any more, and anyway it's a test system where I try with oVirt, before going with oVirt itself or RHV in production. [root@ov301 ~]# dmsetup suspend /dev/mapper/36090a0d800851c9d2195d5b837c9e328 [root@ov301 ~]# dmsetup clear /dev/mapper/36090a0d800851c9d2195d5b837c9e328 [root@ov301 ~]# dmsetup wipe_table /dev/mapper/36090a0d800851c9d2195d5b837c9e328 But still [root@ov301 ~]# dmsetup info /dev/mapper/36090a0d800851c9d2195d5b837c9e328 Name: 36090a0d800851c9d2195d5b837c9e328 State: ACTIVE Read Ahead: 256 Tables present: LIVE Open count: 1 Event number: 0 Major, minor: 253, 2 Number of targets: 1 UUID: mpath-36090a0d800851c9d2195d5b837c9e328 Anyway the removal operation now goes ok: [root@ov301 ~]# multipath -f 36090a0d800851c9d2195d5b837c9e328 [root@ov301 ~]# echo $? 0 and no multipath device in my output [root@ov301 ~]# multipath -l 364817197c52f98316900666e8c2b0b2b dm-14 EQLOGIC,100E-00 size=2.0T features='1 queue_if_no_path' hwhandler='1 alua' wp=rw `-+- policy='round-robin 0' prio=0 status=active |- 16:0:0:0 sde 8:64 active undef running `- 17:0:0:0 sdf 8:80 active undef running [root@ov301 ~]# In /var/log/messages, during the sequence of commands above I see: Jul 16 14:08:20 ov301 multipathd[1580]: 36090a0d800851c9d2195d5b837c9e328: removing map by alias Jul 16 14:08:20 ov301 multipath[2229532]: dm-2 is not a multipath map Jul 16 14:09:03 ov301 multipathd[1580]: 36090a0d800851c9d2195d5b837c9e328: remove map (operator) Jul 16 14:09:03 ov301 multipathd[1580]: 36090a0d800851c9d2195d5b837c9e328: devmap not registered, can't remove Thanks for the moment... I'm going to do similar storage moving and decommissioning of the old one for 4 other storage domains (two of them iSCSI -> iSCSI, two of them iSCSI -> FC) belonging to RHV environments (4.4.6 at the moment) in the next weeks, so in case I'm going to open a case for them if I find the same strange behavior. Gianluca

+Shani Leviim <sleviim@redhat.com> can you assist? On Fri, 23 Apr 2021 at 03:46, Ryan Chewning <ryan_chewning@trimble.com> wrote:
Hi List,
We need to add and remove directly mapped LUNs to multiple VMs in our Non-Production environment. The environment is backed by an iSCSI SAN. In testing when removing a directly mapped LUN it doesn't remove the underlying multipath and devices. Several questions.
1) Is this the expected behavior? 2) Are we supposed to go to each KVM host and manually remove the underlying multipath devices? 3) Is there a technical reason that oVirt doesn't do this as part of the steps to removing the storage?
This is something that was handled by the manager in the previous virtualization that we used, Oracle's Xen based Oracle VM.
Thanks!
Ryan _______________________________________________ Users mailing list -- users@ovirt.org To unsubscribe send an email to users-leave@ovirt.org Privacy Statement: https://www.ovirt.org/privacy-policy.html oVirt Code of Conduct: https://www.ovirt.org/community/about/community-guidelines/ List Archives: https://lists.ovirt.org/archives/list/users@ovirt.org/message/SDROGZOTZNI2XQ...
-- Regards, Eyal Shenitzky

Please ignore, already replied by Vojtech and Nir. On Sun, 25 Apr 2021 at 14:51, Eyal Shenitzky <eshenitz@redhat.com> wrote:
+Shani Leviim <sleviim@redhat.com> can you assist?
On Fri, 23 Apr 2021 at 03:46, Ryan Chewning <ryan_chewning@trimble.com> wrote:
Hi List,
We need to add and remove directly mapped LUNs to multiple VMs in our Non-Production environment. The environment is backed by an iSCSI SAN. In testing when removing a directly mapped LUN it doesn't remove the underlying multipath and devices. Several questions.
1) Is this the expected behavior? 2) Are we supposed to go to each KVM host and manually remove the underlying multipath devices? 3) Is there a technical reason that oVirt doesn't do this as part of the steps to removing the storage?
This is something that was handled by the manager in the previous virtualization that we used, Oracle's Xen based Oracle VM.
Thanks!
Ryan _______________________________________________ Users mailing list -- users@ovirt.org To unsubscribe send an email to users-leave@ovirt.org Privacy Statement: https://www.ovirt.org/privacy-policy.html oVirt Code of Conduct: https://www.ovirt.org/community/about/community-guidelines/ List Archives: https://lists.ovirt.org/archives/list/users@ovirt.org/message/SDROGZOTZNI2XQ...
-- Regards, Eyal Shenitzky
-- Regards, Eyal Shenitzky
participants (5)
-
Eyal Shenitzky
-
Gianluca Cecchi
-
Nir Soffer
-
Ryan Chewning
-
Vojtech Juranek