ovirt 3.5.2 issues with nodes becoming "Non Operational"

Wednesday, 12 August 2015

Hi,
   I'm new to oVirt and recently built a 10 node ovirt 3.5 DC with shared
storage using gluster configured as distributed-replicated (replication =
2).  Shortly after 7 of the 10 nodes dropped, one at a time over a few
hours, into "Non Operational" state.  Attempting to activate one of these
nodes gives the error:  "Failed to connect Host ovirt-node260 to Storage
Pool LADC-TBX". Attempting to put the node into Maintenance eaves the node
stuck in "Preparing For maintenance".

When I rebooted one of the nodes I see this in the nodes event list:

"Host ovirt-node269 reports about one of the Active Storage Domains as
Problematic."

I see many of these errors in the vdsm log from the failed nodes:

Thread-10000::ERROR::2015-08-12
...
 10:01:17,748::__init__::506::jsonrpc.JsonRpcServer::(_serveRequest)
 Internal server error

 Traceback (most recent call last):

   File "/usr/lib/python2.6/site-packages/yajsonrpc/__init__.py", line 501,
 in _serveRequest

     res = method(**params)

   File "/usr/share/vdsm/rpc/Bridge.py", line 267, in _dynamicMethod

     result = fn(*methodArgs)

   File "/usr/share/vdsm/API.py", line 1330, in getStats

     stats.update(self._cif.mom.getKsmStats())

   File "/usr/share/vdsm/momIF.py", line 60, in getKsmStats

     stats = self._mom.getStatistics()['host']

   File "/usr/lib/python2.6/site-packages/mom/MOMFuncs.py", line 75, in
 getStatistics

     host_stats = self.threads['host_monitor'].interrogate().statistics[-1]

 AttributeError: 'NoneType' object has no attribute 'statistics'
 Any help here is appreciated.

    -- Chris

2025

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011