[ovirt-users] Fwd: Are there some general strategyhow to Iscsi strorage domain?

Trey Dockendorf treydock at gmail.com
Thu Oct 23 14:48:36 UTC 2014


Also, if performance is the key, and you don't need the maximum space, try
using 8x mirrors.  If your ZVOL is doing MySQL or any database , set
primary and secondary cache properties to metadata only.  The caching in
ZFS can sometimes hurt when the application does its own caching.

Having a ZIL/slog device is good, in my situation I have neither ZIL or
l2_arc because my zpool is pure SSD.

Maximizing the amount of striped raid sets helps a lot with ZFS, as Karli
mentioned, you'll see better performance with 2x8 than 1x16.

I too have seen RAID cards out perform a plain HBA + ZFS, in some
situations.  This is usually due to the cards doing their own internal
caching and other "dangerous" (my opinion) types of behavior.  Doing ZFS on
those cards is semi-dangerous.  A few months ago I threw away 8 Areca cards
after we lost a 30TB RAID set due to errors in the Areca.  I have yet to
lose a single bit on ZFS.  I have chosen to trade some performance for
stability and data integrity.  To me it was worth it.

For what it's worth, I use ZFS on Linux as the backing file system for our
HPC cluster's parallel filesystem storage nodes, and after upgrading from
0.6.2 to 0.6.3 I saw a double in overall throughput to all storage
systems.  I've also had to tweak things like prefetching and cache tunables
in ZFS to get better performance.

Right now my oVirt instance that's backed by ZFS has done well for MySQL
with the iSCSI backed data domain.  I believe the numbers were ~250
transactions per second on a small-ish (2 core, 8GB RAM) virtual machine
using sysbench 0.5 on MariaDB.  I was formatting the iSCSI backed VM disks
as ext4 mounted with "nobarrier".  Can post specifics about my zvol and
zpool setup if it'll help, just don't want to flood oVirt list with too
much ZFS stuff :)

- Trey

On Thu, Oct 23, 2014 at 4:34 AM, Karli Sjöberg <Karli.Sjoberg at slu.se> wrote:

> On Thu, 2014-10-23 at 11:09 +0200, Arman Khalatyan wrote:
> > yes + 1xssd cache
> >
> >
> >         NAME                                            STATE     READ
> > WRITE CKSUM
> >         tank                                            ONLINE       0
> > 0     0
> >           raidz2-0                                      ONLINE       0
> > 0     0
> >             scsi-35000cca22be96bed                      ONLINE       0
> > 0     0
> >             scsi-35000cca22bc5a20e                      ONLINE       0
> > 0     0
> >             scsi-35000cca22bc515ee                      ONLINE       0
> > 0     0
> >             ata-Hitachi_HUS724030ALE640_PK2A31PAG9VJXW  ONLINE       0
> > 0     0
> >             scsi-35000cca22bc1f9cf                      ONLINE       0
> > 0     0
> >             scsi-35000cca22be68899                      ONLINE       0
> > 0     0
> >             scsi-35000cca22bc58e1b                      ONLINE       0
> > 0     0
> >             scsi-35000cca22bc4dc6b                      ONLINE       0
> > 0     0
> >             scsi-35000cca22bc394ee                      ONLINE       0
> > 0     0
> >             scsi-35000cca22bc10d97                      ONLINE       0
> > 0     0
> >             scsi-35000cca22bc605d1                      ONLINE       0
> > 0     0
> >             scsi-35000cca22bc412bf                      ONLINE       0
> > 0     0
> >             scsi-35000cca22bc3f9ad                      ONLINE       0
> > 0     0
> >             scsi-35000cca22bc53004                      ONLINE       0
> > 0     0
> >             scsi-35000cca22bc5b8e2                      ONLINE       0
> > 0     0
> >             scsi-35000cca22bc3beb3                      ONLINE       0
> > 0     0
> >         cache
> >           sdc                                           ONLINE       0
> > 0     0
>
> OK, two things:
>
> 1) Redo the pool layout into 2x8 disk radiz2
> 2) Add two really fast SSD's as mirrored log devices, like a pair of
> 200GB Intel DC S3700 e.g.
>
> Do this and it may provide even better performance than the HW RAID.
>
> But that depends on the specs of the rest of the HW; CPU and RAM mostly,
> can never have enough RAM with ZFS;)
>
> /K
>
> >
> >
> >
> > On Thu, Oct 23, 2014 at 10:57 AM, Karli Sjöberg <Karli.Sjoberg at slu.se>
> > wrote:
> >         On Thu, 2014-10-23 at 10:11 +0200, Arman Khalatyan wrote:
> >         >
> >         > ---------- Forwarded message ----------
> >         > From: Arman Khalatyan <arm2arm at gmail.com>
> >         > Date: Thu, Oct 23, 2014 at 10:11 AM
> >         > Subject: Re: [ovirt-users] Are there some general
> >         strategyhow to Iscsi
> >         > strorage domain?
> >         > To: Trey Dockendorf <treydock at gmail.com>
> >         >
> >         >
> >         > Thank you Trey for sharing your setup.
> >         >
> >         > I have also one test system with zvol exported with iscsi
> >         over 10G.
> >         > Unfortunately the difference in performance of zfs over raid
> >         > controller is huge, particularly where VM running mysql. I
> >         did not try
> >         > HBAs yet, I have only LSI/Adaptec/Areca RaidControllers they
> >         dont have
> >         > IT mode. Maybe that can be the reason.
> >         >
> >         > For sure always one need to find the sweet edge between
> >         performance
> >         > and reliability.
> >         >
> >         > Just for comparison with yours I get on random IO:
> >         > zvol/16Disks/Raid2/tgtd/iscsi/10G-> on VM multiple rsync -
> >         > ~100-150MB/s, Same HW but disks with Areca RAID6 -  650MB/s
> >         stable
> >         > even more in some cases.
> >
> >         Did you have separate log devices attached to the pool?
> >
> >         The pool´s name was '16Disks'. Did you have 16 disks in one
> >         radiz2 vdev?
> >
> >         /K
> >
> >         >
> >         > The best performance I got on FDR iser->
> >         > 80% of bare metal performance: 1500MB/s but ovirt goes mad
> >         claiming
> >         > that Network and Disk devices are saturated. My VM goes time
> >         by time
> >         > to paused state.
> >         >  It is due to that the Ovirt treating all ib devices as
> >         10Gbit
> >         > cards(in terms of speed).:(
> >         >
> >         >
> >         >
> >         >
> >         >
> >         >
> >         > On Thu, Oct 23, 2014 at 8:30 AM, Trey Dockendorf
> >         <treydock at gmail.com>
> >         > wrote:
> >         >         Not sure if it's a solution for you, but ZFS.  My
> >         domains are
> >         >         all ZFS (using ZFS on Linux in EL6.5) and my backup
> >         server
> >         >         receives incremental snapshots from primary storage
> >         which
> >         >         includes both NFS exports and iSCSI.  ZFS makes
> >         creating block
> >         >         devices for iSCSI very easy, and they are included
> >         in snapshot
> >         >         replication.  The replication is not HA but disaster
> >         recovery
> >         >         and off site.
> >         >
> >         >         I've hit 300MB/s using ZFS send over IPoIB on my DDR
> >         fabric,
> >         >         which isn't amazing but not terrible for an old DDR
> >         fabric.
> >         >
> >         >         ZFS is probably not an easy solution as requires
> >         rebuilding
> >         >         your storage, but maybe for future use or other
> >         readers it
> >         >         will give some useful ideas.
> >         >
> >         >         - Trey
> >         >
> >         >         On Oct 22, 2014 11:56 AM, "Arman Khalatyan"
> >         >         <arm2arm at gmail.com> wrote:
> >         >
> >         >                 Hi,
> >         >                 I have 2x40TB domains each are exported with
> >         >                 iser/iscsi with ib and 10Gb interfaces.
> >         >
> >         >                 For sure they are RAID6 storage little bit
> >         save on
> >         >                 failure.
> >         >                 But I was wondered if any way to backup
> >         those domains.
> >         >                 particularly master one.
> >         >
> >         >
> >         >                 I was thinking somehow DRBD based
> >         replication, with
> >         >                 lvm snapshots etc. But it looks like
> >         overkill.
> >         >
> >         >                 Will be nice somehow to deploy replicated/HA
> >         Master
> >         >                 domain with ability to backp on tapes as
> >         well.
> >         >
> >         >
> >         >                 Any ideas are welcome.
> >         >
> >         >                 Thanks,
> >         >
> >         >                 Arman.
> >         >
> >         >
> >         >
> >         >
> >          _______________________________________________
> >         >                 Users mailing list
> >         >                 Users at ovirt.org
> >         >
> >          http://lists.ovirt.org/mailman/listinfo/users
> >         >
> >         >
> >         >
> >         >
> >         >
> >
> >
> >
> >
> >         --
> >
> >         Med Vänliga Hälsningar
> >
> >
>  -------------------------------------------------------------------------------
> >         Karli Sjöberg
> >         Swedish University of Agricultural Sciences Box 7079 (Visiting
> >         Address
> >         Kronåsvägen 8)
> >         S-750 07 Uppsala, Sweden
> >         Phone:  +46-(0)18-67 15 66
> >         karli.sjoberg at slu.se
> >
> >
>
>
>
> --
>
> Med Vänliga Hälsningar
>
>
> -------------------------------------------------------------------------------
> Karli Sjöberg
> Swedish University of Agricultural Sciences Box 7079 (Visiting Address
> Kronåsvägen 8)
> S-750 07 Uppsala, Sweden
> Phone:  +46-(0)18-67 15 66
> karli.sjoberg at slu.se
> _______________________________________________
> Users mailing list
> Users at ovirt.org
> http://lists.ovirt.org/mailman/listinfo/users
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.ovirt.org/pipermail/users/attachments/20141023/240c22cd/attachment-0001.html>


More information about the Users mailing list