[ovirt-users] storage issue's with oVirt 3.5.1 + Nexenta NFS

Karli Sjöberg karli.sjoberg at slu.se
Wed Apr 22 10:01:03 UTC 2015


On Wed, 2015-04-22 at 11:54 +0200, Maikel vd Mosselaar wrote:
> Yes we are aware of that, problem is it's running production so not very 
> easy to change the pool.
> 
> On 04/22/2015 11:48 AM, InterNetX - Juergen Gotteswinter wrote:
> > i expect that you are aware of the fact that you only get the write
> > performance of a single disk in that configuration? i whould drop that
> > pool configuration, drop the spare drives and go for a mirror pool.

^ What he said:)

That, or if you have more space to add another 2 disks and use them plus
the spare drives to add a second raidz(1|2|3) vdev.

What drives do you use for data, log and cache?

/K

> >
> > Am 22.04.2015 um 11:39 schrieb Maikel vd Mosselaar:
> >>    pool: z2pool
> >>   state: ONLINE
> >>   scan: scrub canceled on Sun Apr 12 16:33:38 2015
> >> config:
> >>
> >>          NAME                       STATE     READ WRITE CKSUM
> >>          z2pool                     ONLINE       0     0     0
> >>            raidz1-0                 ONLINE       0     0     0
> >>              c0t5000C5004172A87Bd0  ONLINE       0     0     0
> >>              c0t5000C50041A59027d0  ONLINE       0     0     0
> >>              c0t5000C50041A592AFd0  ONLINE       0     0     0
> >>              c0t5000C50041A660D7d0  ONLINE       0     0     0
> >>              c0t5000C50041A69223d0  ONLINE       0     0     0
> >>              c0t5000C50041A6ADF3d0  ONLINE       0     0     0
> >>          logs
> >>            c0t5001517BB2845595d0    ONLINE       0     0     0
> >>          cache
> >>            c0t5001517BB2847892d0    ONLINE       0     0     0
> >>          spares
> >>            c0t5000C50041A6B737d0    AVAIL
> >>            c0t5000C50041AC3F07d0    AVAIL
> >>            c0t5000C50041AD48DBd0    AVAIL
> >>            c0t5000C50041ADD727d0    AVAIL
> >>
> >> errors: No known data errors
> >>
> >>
> >> On 04/22/2015 11:17 AM, Karli Sjöberg wrote:
> >>> On Wed, 2015-04-22 at 11:12 +0200, Maikel vd Mosselaar wrote:
> >>>> Our pool is configured as Z1 with ZIL (normal SSD), the sync parameter
> >>>> is on the default setting (standard) so "sync" is on.
> >>> # zpool status ?
> >>>
> >>> /K
> >>>
> >>>> When the issue happens oVirt event viewer shows indeed latency warnings.
> >>>> Not always but most of the time this will be followed by an i/o storage
> >>>> error linked to random VMs and they will be paused when that happens.
> >>>>
> >>>> All the nodes use mode 4 bonding. The interfaces on the nodes don't show
> >>>> any drops or errors, i checked 2 of the VMs that got paused the last
> >>>> time it happened they have dropped packets on their interfaces.
> >>>>
> >>>> We don't have a subscription with nexenta (anymore).
> >>>>
> >>>> On 04/21/2015 04:41 PM, InterNetX - Juergen Gotteswinter wrote:
> >>>>> Am 21.04.2015 um 16:19 schrieb Maikel vd Mosselaar:
> >>>>>> Hi Juergen,
> >>>>>>
> >>>>>> The load on the nodes rises far over >200 during the event. Load on
> >>>>>> the
> >>>>>> nexenta stays normal and nothing strange in the logging.
> >>>>> ZFS + NFS could be still the root of this. Your Pool Configuration is
> >>>>> RaidzX or Mirror, with or without ZIL? The sync Parameter of your ZFS
> >>>>> Subvolume which gets exported is kept default on "standard" ?
> >>>>>
> >>>>> http://christopher-technicalmusings.blogspot.de/2010/09/zfs-and-nfs-performance-with-zil.html
> >>>>>
> >>>>>
> >>>>> Since Ovirt acts very sensible about Storage Latency (throws VM into
> >>>>> unresponsive or unknown state) it might be worth a try to do "zfs set
> >>>>> sync=disabled pool/volume" to see if this changes things. But be aware
> >>>>> that this makes the NFS Export vuln. against dataloss in case of
> >>>>> powerloss etc, comparable to async NFS in Linux.
> >>>>>
> >>>>> If disabling the sync setting helps, and you dont use a seperate ZIL
> >>>>> Flash Drive yet -> this whould be very likely help to get rid of this.
> >>>>>
> >>>>> Also, if you run a subscribed Version of Nexenta it might be helpful to
> >>>>> involve them.
> >>>>>
> >>>>> Do you see any messages about high latency in the Ovirt Events Panel?
> >>>>>
> >>>>>> For our storage interfaces on our nodes we use bonding in mode 4
> >>>>>> (802.3ad) 2x 1Gb. The nexenta has 4x 1Gb bond in mode 4 also.
> >>>>> This should be fine, as long as no Node uses Mode0 / Round Robin which
> >>>>> whould lead to out of order TCP Packets. The Interfaces themself dont
> >>>>> show any Drops or Errors - on the VM Hosts as well as on the Switch
> >>>>> itself?
> >>>>>
> >>>>> Jumbo Frames?
> >>>>>
> >>>>>> Kind regards,
> >>>>>>
> >>>>>> Maikel
> >>>>>>
> >>>>>>
> >>>>>> On 04/21/2015 02:51 PM, InterNetX - Juergen Gotteswinter wrote:
> >>>>>>> Hi,
> >>>>>>>
> >>>>>>> how about Load, Latency, strange dmesg messages on the Nexenta ?
> >>>>>>> You are
> >>>>>>> using bonded Gbit Networking? If yes, which mode?
> >>>>>>>
> >>>>>>> Cheers,
> >>>>>>>
> >>>>>>> Juergen
> >>>>>>>
> >>>>>>> Am 20.04.2015 um 14:25 schrieb Maikel vd Mosselaar:
> >>>>>>>> Hi,
> >>>>>>>>
> >>>>>>>> We are running ovirt 3.5.1 with 3 nodes and seperate engine.
> >>>>>>>>
> >>>>>>>> All on CentOS 6.6:
> >>>>>>>> 3 x nodes
> >>>>>>>> 1 x engine
> >>>>>>>>
> >>>>>>>> 1 x storage nexenta with NFS
> >>>>>>>>
> >>>>>>>> For multiple weeks we are experiencing issues of our nodes that
> >>>>>>>> cannot
> >>>>>>>> access the storage at random moments (atleast thats what the nodes
> >>>>>>>> think).
> >>>>>>>>
> >>>>>>>> When the nodes are complaining about a unavailable storage then
> >>>>>>>> the load
> >>>>>>>> rises up to +200 on all three nodes, this causes that all running
> >>>>>>>> VMs
> >>>>>>>> are unaccessible. During this process oVirt event viewer shows
> >>>>>>>> some i/o
> >>>>>>>> storage error messages, when this happens random VMs get paused
> >>>>>>>> and will
> >>>>>>>> not be resumed anymore (this almost happens every time but not
> >>>>>>>> all the
> >>>>>>>> VMs get paused).
> >>>>>>>>
> >>>>>>>> During the event we tested the accessibility from the nodes to the
> >>>>>>>> storage and it looks like it is working normal, at least we can do a
> >>>>>>>> normal
> >>>>>>>> "ls" on the storage without any delay of showing the contents.
> >>>>>>>>
> >>>>>>>> We tried multiple things that we thought it causes this issue but
> >>>>>>>> nothing worked so far.
> >>>>>>>> * rebooting storage / nodes / engine.
> >>>>>>>> * disabling offsite rsync backups.
> >>>>>>>> * moved the biggest VMs with highest load to different platform
> >>>>>>>> outside
> >>>>>>>> of oVirt.
> >>>>>>>> * checked the wsize and rsize on the nfs mounts, storage and
> >>>>>>>> nodes are
> >>>>>>>> correct according to the "NFS troubleshooting page" on ovirt.org.
> >>>>>>>>
> >>>>>>>> The environment is running in production so we are not free to test
> >>>>>>>> everything.
> >>>>>>>>
> >>>>>>>> I can provide log files if needed.
> >>>>>>>>
> >>>>>>>> Kind Regards,
> >>>>>>>>
> >>>>>>>> Maikel
> >>>>>>>>
> >>>>>>>>
> >>>>>>>> _______________________________________________
> >>>>>>>> Users mailing list
> >>>>>>>> Users at ovirt.org
> >>>>>>>> http://lists.ovirt.org/mailman/listinfo/users
> >>>>>>> _______________________________________________
> >>>>>>> Users mailing list
> >>>>>>> Users at ovirt.org
> >>>>>>> http://lists.ovirt.org/mailman/listinfo/users
> >>>> _______________________________________________
> >>>> Users mailing list
> >>>> Users at ovirt.org
> >>>> http://lists.ovirt.org/mailman/listinfo/users
> >>> _______________________________________________
> >>> Users mailing list
> >>> Users at ovirt.org
> >>> http://lists.ovirt.org/mailman/listinfo/users
> >> _______________________________________________
> >> Users mailing list
> >> Users at ovirt.org
> >> http://lists.ovirt.org/mailman/listinfo/users
> > _______________________________________________
> > Users mailing list
> > Users at ovirt.org
> > http://lists.ovirt.org/mailman/listinfo/users
> 
> _______________________________________________
> Users mailing list
> Users at ovirt.org
> http://lists.ovirt.org/mailman/listinfo/users



More information about the Users mailing list