
See attached. The event seems to have begun around 06:00:00 on 2014-01-26. I was unable to get the single node cluster back online so I provisioned another node to add to the cluster, which became the SPM. Adding the second node worked and I had to power cycle the node that hung as sanlock was in a zombie state. This is my first attempt at production use of NFS over RDMA and I'd like to rule out that being the cause. Since the issue I've changed the 'nfs_mount_options' in /etc/vdsm/vdsm.conf to 'soft,nosharecache,rdma,port=20049'. The options during the crash were only 'rdma,port=20049'. I am also forcing NFSv3 by setting 'Nfsvers=3' in /etc/nfsmount.conf, which is still in place and was in place during the crash. Thanks - Trey On Tue, Jan 28, 2014 at 2:45 AM, Maor Lipchuk <mlipchuk@redhat.com> wrote:
Hi Trey,
Can you please also attach the engine/vdsm logs.
Thanks, Maor
On 01/27/2014 06:12 PM, Trey Dockendorf wrote:
I setup my first oVirt instance since 3.0 a few days ago and it went very well, and I left the single host cluster running with 1 VM over the weekend. Today I come back and the primary data storage is marked as unresponsive. The logs are full of entries [1] that look very similar to a knowledge base article on RHEL's website [2].
This setup is using NFS over RDMA and so far the ib interfaces report no errors (via `ibcheckerrs -v <LID> 1`). Based on a doc on ovirt site [3] it seems this could be due to response problems. The storage system is a new purchase and not yet in production so if there's any advice on how to track down the cause that would be very helpful. Please let me know what additional information would be helpful as it's been about a year since I've been active in the oVirt community.
Thanks - Trey
[1]: http://pastebin.com/yRpSLKxJ
[2]: https://access.redhat.com/site/solutions/400463
[3]: http://www.ovirt.org/SANLock _______________________________________________ Users mailing list Users@ovirt.org http://lists.ovirt.org/mailman/listinfo/users