On Tue, Dec 16, 2014 at 08:48:48AM -0600, Chris Adams wrote:
I have a oVirt setup that has three nodes, all running CentOS 7, with
a
hosted engine running CentOS 6. Two of the nodes (node8 and node9) are
configured for hosted engine, and the third (node2) is just a "regular"
node (as you might guess from the names, more nodes are coming as I
migrate VMs to oVirt).
On one node, node8, vdsm periodically loses its connection to libvirt,
which causes vdsm to restart. There doesn't appear to be any trigger
that I can see (not time of day, load, etc. related). The engine VM is
up and running on node8 (don't know if that has anything to do with it).
I get some entries in /var/log/messages repeated continuously; the
"ovirt-ha-broker: sending ioctl 5401 to a partition" I mentioned before,
and the following:
Dec 15 20:56:23 node8 journal: User record for user '107' was not found: No such
file or directory
Dec 15 20:56:23 node8 journal: Group record for user '107' was not found: No such
file or directory
I don't think those have any relevance (don't know where they come
from); filtering those out, I see:
Dec 15 20:56:33 node8 journal: End of file while reading data: Input/output error
Dec 15 20:56:33 node8 journal: Tried to close invalid fd 0
Dec 15 20:56:38 node8 journal: vdsm root WARNING connection to libvirt broken. ecode: 1
edom: 7
Dec 15 20:56:38 node8 journal: vdsm root CRITICAL taking calling process down.
Dec 15 20:56:38 node8 journal: vdsm vds ERROR libvirt error
Dec 15 20:56:38 node8 journal: ovirt-ha-broker mgmt_bridge.MgmtBridge ERROR Failed to
getVdsCapabilities: Error 16 from getVdsCapabilities: Unexpected exception
Dec 15 20:56:45 node8 journal: End of file while reading data: Input/output error
Dec 15 20:56:45 node8 vdsmd_init_common.sh: vdsm: Running run_final_hooks
Dec 15 20:56:45 node8 systemd: Starting Virtual Desktop Server Manager...
<and then all the normal-looking vdsm startup>
It is happening about once a day, but not at any regular interval or
time (was 02:23 Sunday, then 20:56 Monday).
vdsm.log has this at that time:
Thread-601576::DEBUG::2014-12-15 20:56:38,715::BindingXMLRPC::1132::vds::(wrapper) client
[127.0.0.1]::call getCapabilities with () {}
Thread-601576::DEBUG::2014-12-15 20:56:38,718::utils::738::root::(execCmd) /sbin/ip route
show to 0.0.0.0/0 table all (cwd None)
Thread-601576::DEBUG::2014-12-15 20:56:38,746::utils::758::root::(execCmd) SUCCESS:
<err> = ''; <rc> = 0
Thread-601576::WARNING::2014-12-15 20:56:38,754::libvirtconnection::135::root::(wrapper)
connection to libvirt broken. ecode: 1 edom: 7
Thread-601576::CRITICAL::2014-12-15 20:56:38,754::libvirtconnection::137::root::(wrapper)
taking calling process down.
MainThread::DEBUG::2014-12-15 20:56:38,754::vdsm::58::vds::(sigtermHandler) Received
signal 15
Thread-601576::DEBUG::2014-12-15 20:56:38,755::libvirtconnection::143::root::(wrapper)
Unknown libvirterror: ecode: 1 edom: 7 level: 2 message: internal error: client socket is
closed
MainThread::DEBUG::2014-12-15
20:56:38,755::protocoldetector::135::vds.MultiProtocolAcceptor::(stop) Stopping Acceptor
MainThread::INFO::2014-12-15 20:56:38,755::__init__::563::jsonrpc.JsonRpcServer::(stop)
Stopping JsonRPC Server
Detector thread::DEBUG::2014-12-15
20:56:38,756::protocoldetector::106::vds.MultiProtocolAcceptor::(_cleanup) Cleaning
Acceptor
MainThread::INFO::2014-12-15 20:56:38,757::vmchannels::188::vds::(stop) VM channels
listener was stopped.
MainThread::INFO::2014-12-15 20:56:38,758::momIF::91::MOM::(stop) Shutting down MOM
MainThread::DEBUG::2014-12-15
20:56:38,759::task::595::Storage.TaskManager.Task::(_updateState)
Task=`26c7680c-23e2-42bb-964c-272e778a168a`::moving from state init -> state preparing
MainThread::INFO::2014-12-15 20:56:38,759::logUtils::44::dispatcher::(wrapper) Run and
protect: prepareForShutdown(options=None)
Thread-601576::ERROR::2014-12-15 20:56:38,755::BindingXMLRPC::1142::vds::(wrapper)
libvirt error
Traceback (most recent call last):
File "/usr/share/vdsm/rpc/BindingXMLRPC.py", line 1135, in wrapper
res = f(*args, **kwargs)
File "/usr/share/vdsm/rpc/BindingXMLRPC.py", line 463, in getCapabilities
ret = api.getCapabilities()
File "/usr/share/vdsm/API.py", line 1245, in getCapabilities
c = caps.get()
File "/usr/share/vdsm/caps.py", line 615, in get
caps.update(netinfo.get())
File "/usr/lib/python2.7/site-packages/vdsm/netinfo.py", line 812, in get
nets = networks()
File "/usr/lib/python2.7/site-packages/vdsm/netinfo.py", line 119, in
networks
allNets = ((net, net.name()) for net in conn.listAllNetworks(0))
File "/usr/lib/python2.7/site-packages/vdsm/libvirtconnection.py", line 129,
in wrapper
__connections.get(id(target)).pingLibvirt()
File "/usr/lib64/python2.7/site-packages/libvirt.py", line 3642, in
getLibVersion
if ret == -1: raise libvirtError ('virConnectGetLibVersion() failed',
conn=self)
libvirtError: internal error: client socket is closed
Does anything suspecious show up on libvirtd.log?
I heard a report about a libvirt segmentation fault on centos7 - could
this be it, too? Does `ps -o stime `pgrep libvirtd`` match with reported
libvirt "internal error"?
Would you provide the precise version of libvirt and vdsm?
Can you think of anything that makes node8 "special"?
Dan.