<div dir="ltr">Hi All,<div><br></div><div>We're seeing some weird issues in our ovirt setup. We have 4 nodes connected and an NFS (v3) filestore (FreeBSD/ZFS).</div><div><br></div><div>Once in a while, it seems at random, a node loses their connection to storage, recovers it a minute later. The other nodes usually don't lose their storage at that moment. Just one, or two at a time. </div>
<div><br></div><div>We've setup extra tooling to verify the storage performance at those moments and the availability for other systems. It's always online, just the nodes don't think so. </div><div><br></div>
<div>The engine tells me this:</div><div><br></div><div><div>2014-02-18 11:48:03,598 WARN [org.ovirt.engine.core.vdsbroker.irsbroker.IrsBrokerCommand] (pool-6-thread-48) domain d88764c8-ecc3-4f22-967e-2ce225ac4498:Export in problem. vds: hv5</div>
<div>2014-02-18 11:48:18,909 WARN [org.ovirt.engine.core.vdsbroker.irsbroker.IrsBrokerCommand] (pool-6-thread-48) domain e9f70496-f181-4c9b-9ecb-d7f780772b04:Data in problem. vds: hv5</div><div>2014-02-18 11:48:45,021 WARN [org.ovirt.engine.core.vdsbroker.VdsManager] (DefaultQuartzScheduler_Worker-18) [46683672] Failed to refresh VDS , vds = 66e6aace-e51d-4006-bb2f-d85c2f1fd8d2 : hv5, VDS Network Error, continuing.</div>
<div>2014-02-18 11:48:45,070 INFO [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (DefaultQuartzScheduler_Worker-41) [2ef1a894] Correlation ID: 2ef1a894, Call Stack: null, Custom Event ID: -1, Message: Invalid status on Data Center GS. Setting Data Center status to Non Responsive (On host hv5, Error: Network error during communication with the Host.).</div>
<div><br></div><div>The export and data domain live over NFS. There's another domain, ISO, that lives on the engine machine, also shared over NFS. That domain doesn't have any issue at all. </div><div><br></div><div>
Attached are the logfiles for the relevant time period for both the engine server and the node. The node by the way, is a deployment of the node ISO, not a full blown installation.</div><div><br></div><div>Any clues on where to begin searching? The NFS server shows no issues nor anything in the logs. I did notice that the statd and lockd daemons were not running, but I wonder if that can have anything to do with the issue.</div>
<div><br></div>-- <br>Met vriendelijke groeten / With kind regards,<br>Johan Kooijman<br><br><a href="mailto:mail@johankooijman.com">mail@johankooijman.com</a>
</div></div>