[ovirt-users] storage issue's with oVirt 3.5.1 + Nexenta NFS

InterNetX - Juergen Gotteswinter jg at internetx.com
Tue Apr 21 14:41:46 UTC 2015


Am 21.04.2015 um 16:19 schrieb Maikel vd Mosselaar:
> Hi Juergen,
> 
> The load on the nodes rises far over >200 during the event. Load on the
> nexenta stays normal and nothing strange in the logging.

ZFS + NFS could be still the root of this. Your Pool Configuration is
RaidzX or Mirror, with or without ZIL? The sync Parameter of your ZFS
Subvolume which gets exported is kept default on "standard" ?

http://christopher-technicalmusings.blogspot.de/2010/09/zfs-and-nfs-performance-with-zil.html

Since Ovirt acts very sensible about Storage Latency (throws VM into
unresponsive or unknown state) it might be worth a try to do "zfs set
sync=disabled pool/volume" to see if this changes things. But be aware
that this makes the NFS Export vuln. against dataloss in case of
powerloss etc, comparable to async NFS in Linux.

If disabling the sync setting helps, and you dont use a seperate ZIL
Flash Drive yet -> this whould be very likely help to get rid of this.

Also, if you run a subscribed Version of Nexenta it might be helpful to
involve them.

Do you see any messages about high latency in the Ovirt Events Panel?

> 
> For our storage interfaces on our nodes we use bonding in mode 4
> (802.3ad) 2x 1Gb. The nexenta has 4x 1Gb bond in mode 4 also.

This should be fine, as long as no Node uses Mode0 / Round Robin which
whould lead to out of order TCP Packets. The Interfaces themself dont
show any Drops or Errors - on the VM Hosts as well as on the Switch itself?

Jumbo Frames?

> 
> Kind regards,
> 
> Maikel
> 
> 
> On 04/21/2015 02:51 PM, InterNetX - Juergen Gotteswinter wrote:
>> Hi,
>>
>> how about Load, Latency, strange dmesg messages on the Nexenta ? You are
>> using bonded Gbit Networking? If yes, which mode?
>>
>> Cheers,
>>
>> Juergen
>>
>> Am 20.04.2015 um 14:25 schrieb Maikel vd Mosselaar:
>>> Hi,
>>>
>>> We are running ovirt 3.5.1 with 3 nodes and seperate engine.
>>>
>>> All on CentOS 6.6:
>>> 3 x nodes
>>> 1 x engine
>>>
>>> 1 x storage nexenta with NFS
>>>
>>> For multiple weeks we are experiencing issues of our nodes that cannot
>>> access the storage at random moments (atleast thats what the nodes
>>> think).
>>>
>>> When the nodes are complaining about a unavailable storage then the load
>>> rises up to +200 on all three nodes, this causes that all running VMs
>>> are unaccessible. During this process oVirt event viewer shows some i/o
>>> storage error messages, when this happens random VMs get paused and will
>>> not be resumed anymore (this almost happens every time but not all the
>>> VMs get paused).
>>>
>>> During the event we tested the accessibility from the nodes to the
>>> storage and it looks like it is working normal, at least we can do a
>>> normal
>>> "ls" on the storage without any delay of showing the contents.
>>>
>>> We tried multiple things that we thought it causes this issue but
>>> nothing worked so far.
>>> * rebooting storage / nodes / engine.
>>> * disabling offsite rsync backups.
>>> * moved the biggest VMs with highest load to different platform outside
>>> of oVirt.
>>> * checked the wsize and rsize on the nfs mounts, storage and nodes are
>>> correct according to the "NFS troubleshooting page" on ovirt.org.
>>>
>>> The environment is running in production so we are not free to test
>>> everything.
>>>
>>> I can provide log files if needed.
>>>
>>> Kind Regards,
>>>
>>> Maikel
>>>
>>>
>>> _______________________________________________
>>> Users mailing list
>>> Users at ovirt.org
>>> http://lists.ovirt.org/mailman/listinfo/users
>> _______________________________________________
>> Users mailing list
>> Users at ovirt.org
>> http://lists.ovirt.org/mailman/listinfo/users
> 




More information about the Users mailing list