pool: z2pool
state: ONLINE
scan: scrub canceled on Sun Apr 12 16:33:38 2015
config:
NAME STATE READ WRITE CKSUM
z2pool ONLINE 0 0 0
raidz1-0 ONLINE 0 0 0
c0t5000C5004172A87Bd0 ONLINE 0 0 0
c0t5000C50041A59027d0 ONLINE 0 0 0
c0t5000C50041A592AFd0 ONLINE 0 0 0
c0t5000C50041A660D7d0 ONLINE 0 0 0
c0t5000C50041A69223d0 ONLINE 0 0 0
c0t5000C50041A6ADF3d0 ONLINE 0 0 0
logs
c0t5001517BB2845595d0 ONLINE 0 0 0
cache
c0t5001517BB2847892d0 ONLINE 0 0 0
spares
c0t5000C50041A6B737d0 AVAIL
c0t5000C50041AC3F07d0 AVAIL
c0t5000C50041AD48DBd0 AVAIL
c0t5000C50041ADD727d0 AVAIL
errors: No known data errors
On 04/22/2015 11:17 AM, Karli Sjöberg wrote:
On Wed, 2015-04-22 at 11:12 +0200, Maikel vd Mosselaar wrote:
> Our pool is configured as Z1 with ZIL (normal SSD), the sync parameter
> is on the default setting (standard) so "sync" is on.
# zpool status ?
/K
> When the issue happens oVirt event viewer shows indeed latency warnings.
> Not always but most of the time this will be followed by an i/o storage
> error linked to random VMs and they will be paused when that happens.
>
> All the nodes use mode 4 bonding. The interfaces on the nodes don't show
> any drops or errors, i checked 2 of the VMs that got paused the last
> time it happened they have dropped packets on their interfaces.
>
> We don't have a subscription with nexenta (anymore).
>
> On 04/21/2015 04:41 PM, InterNetX - Juergen Gotteswinter wrote:
>> Am 21.04.2015 um 16:19 schrieb Maikel vd Mosselaar:
>>> Hi Juergen,
>>>
>>> The load on the nodes rises far over >200 during the event. Load on the
>>> nexenta stays normal and nothing strange in the logging.
>> ZFS + NFS could be still the root of this. Your Pool Configuration is
>> RaidzX or Mirror, with or without ZIL? The sync Parameter of your ZFS
>> Subvolume which gets exported is kept default on "standard" ?
>>
>>
http://christopher-technicalmusings.blogspot.de/2010/09/zfs-and-nfs-perfo...
>>
>> Since Ovirt acts very sensible about Storage Latency (throws VM into
>> unresponsive or unknown state) it might be worth a try to do "zfs set
>> sync=disabled pool/volume" to see if this changes things. But be aware
>> that this makes the NFS Export vuln. against dataloss in case of
>> powerloss etc, comparable to async NFS in Linux.
>>
>> If disabling the sync setting helps, and you dont use a seperate ZIL
>> Flash Drive yet -> this whould be very likely help to get rid of this.
>>
>> Also, if you run a subscribed Version of Nexenta it might be helpful to
>> involve them.
>>
>> Do you see any messages about high latency in the Ovirt Events Panel?
>>
>>> For our storage interfaces on our nodes we use bonding in mode 4
>>> (802.3ad) 2x 1Gb. The nexenta has 4x 1Gb bond in mode 4 also.
>> This should be fine, as long as no Node uses Mode0 / Round Robin which
>> whould lead to out of order TCP Packets. The Interfaces themself dont
>> show any Drops or Errors - on the VM Hosts as well as on the Switch itself?
>>
>> Jumbo Frames?
>>
>>> Kind regards,
>>>
>>> Maikel
>>>
>>>
>>> On 04/21/2015 02:51 PM, InterNetX - Juergen Gotteswinter wrote:
>>>> Hi,
>>>>
>>>> how about Load, Latency, strange dmesg messages on the Nexenta ? You are
>>>> using bonded Gbit Networking? If yes, which mode?
>>>>
>>>> Cheers,
>>>>
>>>> Juergen
>>>>
>>>> Am 20.04.2015 um 14:25 schrieb Maikel vd Mosselaar:
>>>>> Hi,
>>>>>
>>>>> We are running ovirt 3.5.1 with 3 nodes and seperate engine.
>>>>>
>>>>> All on CentOS 6.6:
>>>>> 3 x nodes
>>>>> 1 x engine
>>>>>
>>>>> 1 x storage nexenta with NFS
>>>>>
>>>>> For multiple weeks we are experiencing issues of our nodes that
cannot
>>>>> access the storage at random moments (atleast thats what the nodes
>>>>> think).
>>>>>
>>>>> When the nodes are complaining about a unavailable storage then the
load
>>>>> rises up to +200 on all three nodes, this causes that all running
VMs
>>>>> are unaccessible. During this process oVirt event viewer shows some
i/o
>>>>> storage error messages, when this happens random VMs get paused and
will
>>>>> not be resumed anymore (this almost happens every time but not all
the
>>>>> VMs get paused).
>>>>>
>>>>> During the event we tested the accessibility from the nodes to the
>>>>> storage and it looks like it is working normal, at least we can do a
>>>>> normal
>>>>> "ls" on the storage without any delay of showing the
contents.
>>>>>
>>>>> We tried multiple things that we thought it causes this issue but
>>>>> nothing worked so far.
>>>>> * rebooting storage / nodes / engine.
>>>>> * disabling offsite rsync backups.
>>>>> * moved the biggest VMs with highest load to different platform
outside
>>>>> of oVirt.
>>>>> * checked the wsize and rsize on the nfs mounts, storage and nodes
are
>>>>> correct according to the "NFS troubleshooting page" on
ovirt.org.
>>>>>
>>>>> The environment is running in production so we are not free to test
>>>>> everything.
>>>>>
>>>>> I can provide log files if needed.
>>>>>
>>>>> Kind Regards,
>>>>>
>>>>> Maikel
>>>>>
>>>>>
>>>>> _______________________________________________
>>>>> Users mailing list
>>>>> Users(a)ovirt.org
>>>>>
http://lists.ovirt.org/mailman/listinfo/users
>>>> _______________________________________________
>>>> Users mailing list
>>>> Users(a)ovirt.org
>>>>
http://lists.ovirt.org/mailman/listinfo/users
> _______________________________________________
> Users mailing list
> Users(a)ovirt.org
>
http://lists.ovirt.org/mailman/listinfo/users
_______________________________________________
Users mailing list
Users(a)ovirt.org
http://lists.ovirt.org/mailman/listinfo/users