Yes we are aware of that, problem is it's running production so not very
easy to change the pool.
On 04/22/2015 11:48 AM, InterNetX - Juergen Gotteswinter wrote:
i expect that you are aware of the fact that you only get the write
performance of a single disk in that configuration? i whould drop that
pool configuration, drop the spare drives and go for a mirror pool.
Am 22.04.2015 um 11:39 schrieb Maikel vd Mosselaar:
> pool: z2pool
> state: ONLINE
> scan: scrub canceled on Sun Apr 12 16:33:38 2015
> config:
>
> NAME STATE READ WRITE CKSUM
> z2pool ONLINE 0 0 0
> raidz1-0 ONLINE 0 0 0
> c0t5000C5004172A87Bd0 ONLINE 0 0 0
> c0t5000C50041A59027d0 ONLINE 0 0 0
> c0t5000C50041A592AFd0 ONLINE 0 0 0
> c0t5000C50041A660D7d0 ONLINE 0 0 0
> c0t5000C50041A69223d0 ONLINE 0 0 0
> c0t5000C50041A6ADF3d0 ONLINE 0 0 0
> logs
> c0t5001517BB2845595d0 ONLINE 0 0 0
> cache
> c0t5001517BB2847892d0 ONLINE 0 0 0
> spares
> c0t5000C50041A6B737d0 AVAIL
> c0t5000C50041AC3F07d0 AVAIL
> c0t5000C50041AD48DBd0 AVAIL
> c0t5000C50041ADD727d0 AVAIL
>
> errors: No known data errors
>
>
> On 04/22/2015 11:17 AM, Karli Sjöberg wrote:
>> On Wed, 2015-04-22 at 11:12 +0200, Maikel vd Mosselaar wrote:
>>> Our pool is configured as Z1 with ZIL (normal SSD), the sync parameter
>>> is on the default setting (standard) so "sync" is on.
>> # zpool status ?
>>
>> /K
>>
>>> When the issue happens oVirt event viewer shows indeed latency warnings.
>>> Not always but most of the time this will be followed by an i/o storage
>>> error linked to random VMs and they will be paused when that happens.
>>>
>>> All the nodes use mode 4 bonding. The interfaces on the nodes don't show
>>> any drops or errors, i checked 2 of the VMs that got paused the last
>>> time it happened they have dropped packets on their interfaces.
>>>
>>> We don't have a subscription with nexenta (anymore).
>>>
>>> On 04/21/2015 04:41 PM, InterNetX - Juergen Gotteswinter wrote:
>>>> Am 21.04.2015 um 16:19 schrieb Maikel vd Mosselaar:
>>>>> Hi Juergen,
>>>>>
>>>>> The load on the nodes rises far over >200 during the event. Load
on
>>>>> the
>>>>> nexenta stays normal and nothing strange in the logging.
>>>> ZFS + NFS could be still the root of this. Your Pool Configuration is
>>>> RaidzX or Mirror, with or without ZIL? The sync Parameter of your ZFS
>>>> Subvolume which gets exported is kept default on "standard" ?
>>>>
>>>>
http://christopher-technicalmusings.blogspot.de/2010/09/zfs-and-nfs-perfo...
>>>>
>>>>
>>>> Since Ovirt acts very sensible about Storage Latency (throws VM into
>>>> unresponsive or unknown state) it might be worth a try to do "zfs
set
>>>> sync=disabled pool/volume" to see if this changes things. But be
aware
>>>> that this makes the NFS Export vuln. against dataloss in case of
>>>> powerloss etc, comparable to async NFS in Linux.
>>>>
>>>> If disabling the sync setting helps, and you dont use a seperate ZIL
>>>> Flash Drive yet -> this whould be very likely help to get rid of
this.
>>>>
>>>> Also, if you run a subscribed Version of Nexenta it might be helpful to
>>>> involve them.
>>>>
>>>> Do you see any messages about high latency in the Ovirt Events Panel?
>>>>
>>>>> For our storage interfaces on our nodes we use bonding in mode 4
>>>>> (802.3ad) 2x 1Gb. The nexenta has 4x 1Gb bond in mode 4 also.
>>>> This should be fine, as long as no Node uses Mode0 / Round Robin which
>>>> whould lead to out of order TCP Packets. The Interfaces themself dont
>>>> show any Drops or Errors - on the VM Hosts as well as on the Switch
>>>> itself?
>>>>
>>>> Jumbo Frames?
>>>>
>>>>> Kind regards,
>>>>>
>>>>> Maikel
>>>>>
>>>>>
>>>>> On 04/21/2015 02:51 PM, InterNetX - Juergen Gotteswinter wrote:
>>>>>> Hi,
>>>>>>
>>>>>> how about Load, Latency, strange dmesg messages on the Nexenta ?
>>>>>> You are
>>>>>> using bonded Gbit Networking? If yes, which mode?
>>>>>>
>>>>>> Cheers,
>>>>>>
>>>>>> Juergen
>>>>>>
>>>>>> Am 20.04.2015 um 14:25 schrieb Maikel vd Mosselaar:
>>>>>>> Hi,
>>>>>>>
>>>>>>> We are running ovirt 3.5.1 with 3 nodes and seperate engine.
>>>>>>>
>>>>>>> All on CentOS 6.6:
>>>>>>> 3 x nodes
>>>>>>> 1 x engine
>>>>>>>
>>>>>>> 1 x storage nexenta with NFS
>>>>>>>
>>>>>>> For multiple weeks we are experiencing issues of our nodes
that
>>>>>>> cannot
>>>>>>> access the storage at random moments (atleast thats what the
nodes
>>>>>>> think).
>>>>>>>
>>>>>>> When the nodes are complaining about a unavailable storage
then
>>>>>>> the load
>>>>>>> rises up to +200 on all three nodes, this causes that all
running
>>>>>>> VMs
>>>>>>> are unaccessible. During this process oVirt event viewer
shows
>>>>>>> some i/o
>>>>>>> storage error messages, when this happens random VMs get
paused
>>>>>>> and will
>>>>>>> not be resumed anymore (this almost happens every time but
not
>>>>>>> all the
>>>>>>> VMs get paused).
>>>>>>>
>>>>>>> During the event we tested the accessibility from the nodes
to the
>>>>>>> storage and it looks like it is working normal, at least we
can do a
>>>>>>> normal
>>>>>>> "ls" on the storage without any delay of showing
the contents.
>>>>>>>
>>>>>>> We tried multiple things that we thought it causes this issue
but
>>>>>>> nothing worked so far.
>>>>>>> * rebooting storage / nodes / engine.
>>>>>>> * disabling offsite rsync backups.
>>>>>>> * moved the biggest VMs with highest load to different
platform
>>>>>>> outside
>>>>>>> of oVirt.
>>>>>>> * checked the wsize and rsize on the nfs mounts, storage and
>>>>>>> nodes are
>>>>>>> correct according to the "NFS troubleshooting page"
on
ovirt.org.
>>>>>>>
>>>>>>> The environment is running in production so we are not free
to test
>>>>>>> everything.
>>>>>>>
>>>>>>> I can provide log files if needed.
>>>>>>>
>>>>>>> Kind Regards,
>>>>>>>
>>>>>>> Maikel
>>>>>>>
>>>>>>>
>>>>>>> _______________________________________________
>>>>>>> Users mailing list
>>>>>>> Users(a)ovirt.org
>>>>>>>
http://lists.ovirt.org/mailman/listinfo/users
>>>>>> _______________________________________________
>>>>>> Users mailing list
>>>>>> Users(a)ovirt.org
>>>>>>
http://lists.ovirt.org/mailman/listinfo/users
>>> _______________________________________________
>>> Users mailing list
>>> Users(a)ovirt.org
>>>
http://lists.ovirt.org/mailman/listinfo/users
>> _______________________________________________
>> Users mailing list
>> Users(a)ovirt.org
>>
http://lists.ovirt.org/mailman/listinfo/users
> _______________________________________________
> Users mailing list
> Users(a)ovirt.org
>
http://lists.ovirt.org/mailman/listinfo/users
_______________________________________________
Users mailing list
Users(a)ovirt.org
http://lists.ovirt.org/mailman/listinfo/users