[ovirt-users] storage issue's with oVirt 3.5.1 + Nexenta NFS

Wed Apr 22 09:53:30 UTC 2015

/Our current //nfs settings:/

listen_backlog=64
protocol=ALL
servers=1024
lockd_listen_backlog=64
lockd_servers=1024
lockd_retransmit_timeout=5
grace_period=90
server_versmin=2
server_versmax=4
client_versmin=2
client_versmax=4
server_delegation=on
nfsmapid_domain=
max_connections=-1


On 04/22/2015 11:32 AM, InterNetX - Juergen Gotteswinter wrote:
> Am 22.04.2015 um 11:12 schrieb Maikel vd Mosselaar:
>> Our pool is configured as Z1 with ZIL (normal SSD), the sync parameter
>> is on the default setting (standard) so "sync" is on.
> for testing, i whould give zfs set sync=disabled pool/vol a shot. but as
> i already said, thats nothing you should keep for production.
>
> what i had in the past, too: the filer saturated the max lockd/nfs
> processes (which are quite low in their default setting, dont worry to
> push the nfs threads up to 512+. same goes for lockd)
>
> to get your current values
>
> sharectl get nfs
>
> for example, one of my files which is pretty heavy hammered most of the
> time through nfs uses this settings
>
> servers=1024
> lockd_listen_backlog=32
> lockd_servers=1024
> lockd_retransmit_timeout=5
> grace_period=90
> server_versmin=2
> server_versmax=3
> client_versmin=2
> client_versmax=4
> server_delegation=on
> nfsmapid_domain=
> max_connections=-1
> protocol=ALL
> listen_backlog=32
> device=
> mountd_listen_backlog=64
> mountd_max_threads=16
>
>
>
> to change them, use sharectl or throw it into /etc/system
>
>
> set rpcmod:clnt_max_conns = 8
> set rpcmod:maxdupreqs=8192
> set rpcmod:cotsmaxdupreqs=8192
>
>
> set nfs:nfs3_max_threads=1024
> set nfs:nfs3_nra=128
> set nfs:nfs3_bsize=1048576
> set nfs:nfs3_max_transfer_size=1048576
>
> -> reboot
>
>> When the issue happens oVirt event viewer shows indeed latency warnings.
>> Not always but most of the time this will be followed by an i/o storage
>> error linked to random VMs and they will be paused when that happens.
>>
>> All the nodes use mode 4 bonding. The interfaces on the nodes don't show
>> any drops or errors, i checked 2 of the VMs that got paused the last
>> time it happened they have dropped packets on their interfaces.
>>
>> We don't have a subscription with nexenta (anymore).
>>
>> On 04/21/2015 04:41 PM, InterNetX - Juergen Gotteswinter wrote:
>>> Am 21.04.2015 um 16:19 schrieb Maikel vd Mosselaar:
>>>> Hi Juergen,
>>>>
>>>> The load on the nodes rises far over >200 during the event. Load on the
>>>> nexenta stays normal and nothing strange in the logging.
>>> ZFS + NFS could be still the root of this. Your Pool Configuration is
>>> RaidzX or Mirror, with or without ZIL? The sync Parameter of your ZFS
>>> Subvolume which gets exported is kept default on "standard" ?
>>>
>>> http://christopher-technicalmusings.blogspot.de/2010/09/zfs-and-nfs-performance-with-zil.html
>>>
>>>
>>> Since Ovirt acts very sensible about Storage Latency (throws VM into
>>> unresponsive or unknown state) it might be worth a try to do "zfs set
>>> sync=disabled pool/volume" to see if this changes things. But be aware
>>> that this makes the NFS Export vuln. against dataloss in case of
>>> powerloss etc, comparable to async NFS in Linux.
>>>
>>> If disabling the sync setting helps, and you dont use a seperate ZIL
>>> Flash Drive yet -> this whould be very likely help to get rid of this.
>>>
>>> Also, if you run a subscribed Version of Nexenta it might be helpful to
>>> involve them.
>>>
>>> Do you see any messages about high latency in the Ovirt Events Panel?
>>>
>>>> For our storage interfaces on our nodes we use bonding in mode 4
>>>> (802.3ad) 2x 1Gb. The nexenta has 4x 1Gb bond in mode 4 also.
>>> This should be fine, as long as no Node uses Mode0 / Round Robin which
>>> whould lead to out of order TCP Packets. The Interfaces themself dont
>>> show any Drops or Errors - on the VM Hosts as well as on the Switch
>>> itself?
>>>
>>> Jumbo Frames?
>>>
>>>> Kind regards,
>>>>
>>>> Maikel
>>>>
>>>>
>>>> On 04/21/2015 02:51 PM, InterNetX - Juergen Gotteswinter wrote:
>>>>> Hi,
>>>>>
>>>>> how about Load, Latency, strange dmesg messages on the Nexenta ? You
>>>>> are
>>>>> using bonded Gbit Networking? If yes, which mode?
>>>>>
>>>>> Cheers,
>>>>>
>>>>> Juergen
>>>>>
>>>>> Am 20.04.2015 um 14:25 schrieb Maikel vd Mosselaar:
>>>>>> Hi,
>>>>>>
>>>>>> We are running ovirt 3.5.1 with 3 nodes and seperate engine.
>>>>>>
>>>>>> All on CentOS 6.6:
>>>>>> 3 x nodes
>>>>>> 1 x engine
>>>>>>
>>>>>> 1 x storage nexenta with NFS
>>>>>>
>>>>>> For multiple weeks we are experiencing issues of our nodes that cannot
>>>>>> access the storage at random moments (atleast thats what the nodes
>>>>>> think).
>>>>>>
>>>>>> When the nodes are complaining about a unavailable storage then the
>>>>>> load
>>>>>> rises up to +200 on all three nodes, this causes that all running VMs
>>>>>> are unaccessible. During this process oVirt event viewer shows some
>>>>>> i/o
>>>>>> storage error messages, when this happens random VMs get paused and
>>>>>> will
>>>>>> not be resumed anymore (this almost happens every time but not all the
>>>>>> VMs get paused).
>>>>>>
>>>>>> During the event we tested the accessibility from the nodes to the
>>>>>> storage and it looks like it is working normal, at least we can do a
>>>>>> normal
>>>>>> "ls" on the storage without any delay of showing the contents.
>>>>>>
>>>>>> We tried multiple things that we thought it causes this issue but
>>>>>> nothing worked so far.
>>>>>> * rebooting storage / nodes / engine.
>>>>>> * disabling offsite rsync backups.
>>>>>> * moved the biggest VMs with highest load to different platform
>>>>>> outside
>>>>>> of oVirt.
>>>>>> * checked the wsize and rsize on the nfs mounts, storage and nodes are
>>>>>> correct according to the "NFS troubleshooting page" on ovirt.org.
>>>>>>
>>>>>> The environment is running in production so we are not free to test
>>>>>> everything.
>>>>>>
>>>>>> I can provide log files if needed.
>>>>>>
>>>>>> Kind Regards,
>>>>>>
>>>>>> Maikel
>>>>>>
>>>>>>
>>>>>> _______________________________________________
>>>>>> Users mailing list
>>>>>> Users at ovirt.org
>>>>>> http://lists.ovirt.org/mailman/listinfo/users
>>>>> _______________________________________________
>>>>> Users mailing list
>>>>> Users at ovirt.org
>>>>> http://lists.ovirt.org/mailman/listinfo/users
> _______________________________________________
> Users mailing list
> Users at ovirt.org
> http://lists.ovirt.org/mailman/listinfo/users

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.ovirt.org/pipermail/users/attachments/20150422/cd8d6c36/attachment-0001.html>