[ovirt-users] best way to remove SAN lun

Yaniv Kaul ykaul at redhat.com
Wed Feb 22 08:45:10 UTC 2017


On Wed, Feb 22, 2017 at 9:27 AM Nir Soffer <nsoffer at redhat.com> wrote:

> On Wed, Feb 22, 2017 at 9:03 AM, Nelson Lameiras
> <nelson.lameiras at lyra-network.com> wrote:
> > Hello,
> >
> > Not sure it is the same issue, but we have had a "major" issue recently
> in our production system when removing a ISCSI volume from oVirt, and then
> removing it from SAN.
>
> What version? OS version?
>
> The order must be:
>
> 1. remove the LUN from storage domain
>     will be available in next 4.1 release. in older versions you have
> to remove the storage domain
>
> 2. unzone the LUN on the server
>
> 3. remove the multipath devices and the paths on the nodes
>
> > The issue being that each host was still trying to access regularly to
> the SAN volume in spite of not being completely removed from oVirt.
>
> What do you mean by "not being completely removed"?
>
> Who was accessing the volume?
>
> > This led to an massive increase of error logs, which filled completely
> /var/log partition,
>
> Which log was full with errors?
>
> > which snowballed into crashing vdsm and other nasty consequences.
>
> You should have big enough /var/log to avoid such issues.
>

- Log rotation should be set better not to consume excessive amounts of
space.
I'm seeing /etc/vdsm/logrotate/vdsm - not sure why it's not under
/etc/logrotate.d . Looking at the file, seems like there's a 15M limit and
100 files, which translates to 1.5GB - and it is supposed to be compressed
(not sure XZ is a good choice - it's very CPU intensive).

Others (Gluster?) do not seem to have a size limit, just weekly. Need to
look at other components as well.
- At least on ovirt-node, we'd like to separate some directories to
different partitions. So for example core dumps (which should be limited as
well) on /var/core do not fill the same partition as /var/log is and thus
render the host unusable.
And again, looking at file, we have a 'size 0' on /var/log/core/*.dump -
and 'rotate 1' - not sure what it means - but it should not be in
/var/log/core, but /var/core, I reckon.
Y.


> >
> > Anyway, the solution was to manually logout from SAN (in each host) with
> iscsiadm and manually remove iscsi targets (again in each host). It was not
> difficult once the problem was found because currently we only have 3 hosts
> in this cluster, but I'm wondering what would happen if we had hundreds of
> hosts ?
> >
> > Maybe I'm being naive but shouldn't this be "oVirt job" ? Is there a RFE
> still waiting to be included on this subject or should I write one ?
>
> We have RFE for this here:
> https://bugzilla.redhat.com/1310330
>
> But you must understand that ovirt does not control your storage server,
> you are responsible to add devices on the storage server, and remove
> them. We are only consuming the devices.
>
> Even we we provide a way to remove devices on all hosts, you will have
> to remove the device on the storage server before removing it from
> hosts. If not, ovirt will find the removed devices again in the next
> scsi rescan,
> and we do lot of these to support automatic discovery of new devices
> or resized devices.
>
> Nir
>
> >
> > cordialement, regards,
> >
> >
> > Nelson LAMEIRAS
> > Ingénieur Systèmes et Réseaux / Systems and Networks engineer
> > Tel: +33 5 32 09 09 70 <+33%205%2032%2009%2009%2070>
> > nelson.lameiras at lyra-network.com
> >
> > www.lyra-network.com | www.payzen.eu
> >
> >
> >
> >
> >
> > Lyra Network, 109 rue de l'innovation, 31670 Labège, FRANCE
> >
> > ----- Original Message -----
> > From: "Nir Soffer" <nsoffer at redhat.com>
> > To: "Gianluca Cecchi" <gianluca.cecchi at gmail.com>, "Adam Litke" <
> alitke at redhat.com>
> > Cc: "users" <users at ovirt.org>
> > Sent: Tuesday, February 21, 2017 6:32:18 PM
> > Subject: Re: [ovirt-users] best way to remove SAN lun
> >
> > On Tue, Feb 21, 2017 at 7:25 PM, Gianluca Cecchi
> > <gianluca.cecchi at gmail.com> wrote:
> >> On Tue, Feb 21, 2017 at 6:10 PM, Nir Soffer <nsoffer at redhat.com> wrote:
> >>>
> >>> This is caused by active lvs on the remove storage domains that were
> not
> >>> deactivated during the removal. This is a very old known issue.
> >>>
> >>> You have remove the remove device mapper entries - you can see the
> devices
> >>> using:
> >>>
> >>>     dmsetup status
> >>>
> >>> Then you can remove the mapping using:
> >>>
> >>>     dmsetup remove device-name
> >>>
> >>> Once you removed the stale lvs, you will be able to remove the
> multipath
> >>> device and the underlying paths, and lvm will not complain about read
> >>> errors.
> >>>
> >>> Nir
> >>
> >>
> >> OK Nir, thanks for advising.
> >>
> >> So what I run with success on the 2 hosts
> >>
> >> [root at ovmsrv05 vdsm]# for dev in $(dmsetup status | grep
> >> 900b1853--e192--4661--a0f9--7c7c396f6f49 | cut -d ":" -f 1)
> >> do
> >>    dmsetup remove $dev
> >> done
> >> [root at ovmsrv05 vdsm]#
> >>
> >> and now I can run
> >>
> >> [root at ovmsrv05 vdsm]# multipath -f 3600a0b80002999020000cd3c5501458f
> >> [root at ovmsrv05 vdsm]#
> >>
> >> Also, with related names depending on host,
> >>
> >> previous maps to single devices were for example in ovmsrv05:
> >>
> >> 3600a0b80002999020000cd3c5501458f dm-4 IBM     ,1814      FAStT
> >> size=2.0T features='2 pg_init_retries 50' hwhandler='1 rdac' wp=rw
> >> |-+- policy='service-time 0' prio=0 status=enabled
> >> | |- 0:0:0:2 sdb        8:16  failed undef running
> >> | `- 1:0:0:2 sdh        8:112 failed undef running
> >> `-+- policy='service-time 0' prio=0 status=enabled
> >>   |- 0:0:1:2 sdg        8:96  failed undef running
> >>   `- 1:0:1:2 sdn        8:208 failed undef running
> >>
> >> And removal of single path devices:
> >>
> >> [root at ovmsrv05 root]# for dev in sdb sdh sdg sdn
> >> do
> >>   echo 1 > /sys/block/${dev}/device/delete
> >> done
> >> [root at ovmsrv05 vdsm]#
> >>
> >> All clean now... ;-)
> >
> > Great!
> >
> > I think we should have a script doing all these steps.
> >
> > Nir
> > _______________________________________________
> > Users mailing list
> > Users at ovirt.org
> > http://lists.ovirt.org/mailman/listinfo/users
> _______________________________________________
> Users mailing list
> Users at ovirt.org
> http://lists.ovirt.org/mailman/listinfo/users
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.ovirt.org/pipermail/users/attachments/20170222/057f4e76/attachment-0001.html>


More information about the Users mailing list