[ovirt-users] best way to remove SAN lun
Nir Soffer
nsoffer at redhat.com
Sun Mar 5 12:44:53 UTC 2017
On Sun, Mar 5, 2017 at 2:40 PM, Pavel Gashev <Pax at acronis.com> wrote:
> Please also consider a case when a single iSCSI target has several LUNs and you remove one of them.
> In this case you should not logout.
Right. ovirt manages connections to storage. When you remove the last
usage of a connection, we should disconnect from the target.
If this is not the case, this is ovirt-engine bug.
>
> -----Original Message-----
> From: <users-bounces at ovirt.org> on behalf of Nelson Lameiras <nelson.lameiras at lyra-network.com>
> Date: Friday, 3 March 2017 at 20:55
> To: Nir Soffer <nsoffer at redhat.com>
> Cc: users <users at ovirt.org>
> Subject: Re: [ovirt-users] best way to remove SAN lun
>
> Hello Nir,
>
> I think I was not clear in my explanations, let me try again :
>
> we have a oVirt 4.0.5.5 cluster with multiple hosts (centos 7.2).
> In this cluster, we added a SAN volume (iscsi) a few months ago directly in GUI
> Later we had to remove a DATA volume (SAN iscsi). Below the steps we have taken :
>
> 1- we migrated all disks outside the volume (oVirt)
> 2- we put volume on maintenance (oVirt)
> 3- we detach volume (oVirt)
> 4- we removed/destroyed volume (oVirt)
>
> In SAN :
> 5- we put it offline on SAN
> 6- we delete it from SAN
>
> We thought this would be enough, but later we has a serious incident when log partition went full (partially our fault) :
> /var/log/messages was continuously logging that it was still trying to reach the SAN volumes (we have since taken care of the issue of log space issue => more aggressive logrotate, etc)
>
> The real solution was to add two more steps, using shell in ALL hosts :
> 4a - logout from SAN : iscsiadm -m node --logout -T iqn.XXXXXXXX
> 4b - remove iscsi targets : rm -fr /var/lib/iscsi/nodes/iqn.XXXXXXXXX
>
> This effectively solved our problem, but was fastidious since we had to do it manually in all hosts (imagine if we had hundreds of hosts...)
>
> So my question was : shouldn't it be oVirt job to system "logout" and "remove iscsi targets" automatically when a volume is removed from oVirt ? maybe not, and I'm missing something?
>
> cordialement, regards,
>
> Nelson LAMEIRAS
> Ingénieur Systèmes et Réseaux / Systems and Networks engineer
> Tel: +33 5 32 09 09 70
> nelson.lameiras at lyra-network.com
>
> www.lyra-network.com | www.payzen.eu
>
>
>
>
>
> Lyra Network, 109 rue de l'innovation, 31670 Labège, FRANCE
>
> ----- Original Message -----
> From: "Nir Soffer" <nsoffer at redhat.com>
> To: "Nelson Lameiras" <nelson.lameiras at lyra-network.com>
> Cc: "Gianluca Cecchi" <gianluca.cecchi at gmail.com>, "Adam Litke" <alitke at redhat.com>, "users" <users at ovirt.org>
> Sent: Wednesday, February 22, 2017 8:27:26 AM
> Subject: Re: [ovirt-users] best way to remove SAN lun
>
> On Wed, Feb 22, 2017 at 9:03 AM, Nelson Lameiras
> <nelson.lameiras at lyra-network.com> wrote:
>> Hello,
>>
>> Not sure it is the same issue, but we have had a "major" issue recently in our production system when removing a ISCSI volume from oVirt, and then removing it from SAN.
>
> What version? OS version?
>
> The order must be:
>
> 1. remove the LUN from storage domain
> will be available in next 4.1 release. in older versions you have
> to remove the storage domain
>
> 2. unzone the LUN on the server
>
> 3. remove the multipath devices and the paths on the nodes
>
>> The issue being that each host was still trying to access regularly to the SAN volume in spite of not being completely removed from oVirt.
>
> What do you mean by "not being completely removed"?
>
> Who was accessing the volume?
>
>> This led to an massive increase of error logs, which filled completely /var/log partition,
>
> Which log was full with errors?
>
>> which snowballed into crashing vdsm and other nasty consequences.
>
> You should have big enough /var/log to avoid such issues.
>
>>
>> Anyway, the solution was to manually logout from SAN (in each host) with iscsiadm and manually remove iscsi targets (again in each host). It was not difficult once the problem was found because currently we only have 3 hosts in this cluster, but I'm wondering what would happen if we had hundreds of hosts ?
>>
>> Maybe I'm being naive but shouldn't this be "oVirt job" ? Is there a RFE still waiting to be included on this subject or should I write one ?
>
> We have RFE for this here:
> https://bugzilla.redhat.com/1310330
>
> But you must understand that ovirt does not control your storage server,
> you are responsible to add devices on the storage server, and remove
> them. We are only consuming the devices.
>
> Even we we provide a way to remove devices on all hosts, you will have
> to remove the device on the storage server before removing it from
> hosts. If not, ovirt will find the removed devices again in the next
> scsi rescan,
> and we do lot of these to support automatic discovery of new devices
> or resized devices.
>
> Nir
>
>>
>> cordialement, regards,
>>
>>
>> Nelson LAMEIRAS
>> Ingénieur Systèmes et Réseaux / Systems and Networks engineer
>> Tel: +33 5 32 09 09 70
>> nelson.lameiras at lyra-network.com
>>
>> www.lyra-network.com | www.payzen.eu
>>
>>
>>
>>
>>
>> Lyra Network, 109 rue de l'innovation, 31670 Labège, FRANCE
>>
>> ----- Original Message -----
>> From: "Nir Soffer" <nsoffer at redhat.com>
>> To: "Gianluca Cecchi" <gianluca.cecchi at gmail.com>, "Adam Litke" <alitke at redhat.com>
>> Cc: "users" <users at ovirt.org>
>> Sent: Tuesday, February 21, 2017 6:32:18 PM
>> Subject: Re: [ovirt-users] best way to remove SAN lun
>>
>> On Tue, Feb 21, 2017 at 7:25 PM, Gianluca Cecchi
>> <gianluca.cecchi at gmail.com> wrote:
>>> On Tue, Feb 21, 2017 at 6:10 PM, Nir Soffer <nsoffer at redhat.com> wrote:
>>>>
>>>> This is caused by active lvs on the remove storage domains that were not
>>>> deactivated during the removal. This is a very old known issue.
>>>>
>>>> You have remove the remove device mapper entries - you can see the devices
>>>> using:
>>>>
>>>> dmsetup status
>>>>
>>>> Then you can remove the mapping using:
>>>>
>>>> dmsetup remove device-name
>>>>
>>>> Once you removed the stale lvs, you will be able to remove the multipath
>>>> device and the underlying paths, and lvm will not complain about read
>>>> errors.
>>>>
>>>> Nir
>>>
>>>
>>> OK Nir, thanks for advising.
>>>
>>> So what I run with success on the 2 hosts
>>>
>>> [root at ovmsrv05 vdsm]# for dev in $(dmsetup status | grep
>>> 900b1853--e192--4661--a0f9--7c7c396f6f49 | cut -d ":" -f 1)
>>> do
>>> dmsetup remove $dev
>>> done
>>> [root at ovmsrv05 vdsm]#
>>>
>>> and now I can run
>>>
>>> [root at ovmsrv05 vdsm]# multipath -f 3600a0b80002999020000cd3c5501458f
>>> [root at ovmsrv05 vdsm]#
>>>
>>> Also, with related names depending on host,
>>>
>>> previous maps to single devices were for example in ovmsrv05:
>>>
>>> 3600a0b80002999020000cd3c5501458f dm-4 IBM ,1814 FAStT
>>> size=2.0T features='2 pg_init_retries 50' hwhandler='1 rdac' wp=rw
>>> |-+- policy='service-time 0' prio=0 status=enabled
>>> | |- 0:0:0:2 sdb 8:16 failed undef running
>>> | `- 1:0:0:2 sdh 8:112 failed undef running
>>> `-+- policy='service-time 0' prio=0 status=enabled
>>> |- 0:0:1:2 sdg 8:96 failed undef running
>>> `- 1:0:1:2 sdn 8:208 failed undef running
>>>
>>> And removal of single path devices:
>>>
>>> [root at ovmsrv05 root]# for dev in sdb sdh sdg sdn
>>> do
>>> echo 1 > /sys/block/${dev}/device/delete
>>> done
>>> [root at ovmsrv05 vdsm]#
>>>
>>> All clean now... ;-)
>>
>> Great!
>>
>> I think we should have a script doing all these steps.
>>
>> Nir
>> _______________________________________________
>> Users mailing list
>> Users at ovirt.org
>> http://lists.ovirt.org/mailman/listinfo/users
> _______________________________________________
> Users mailing list
> Users at ovirt.org
> http://lists.ovirt.org/mailman/listinfo/users
>
>
More information about the Users
mailing list