[ovirt-users] vdsm losing connection to libvirt

Dan Kenigsberg danken at redhat.com
Tue Dec 16 17:29:36 EST 2014


On Tue, Dec 16, 2014 at 08:48:48AM -0600, Chris Adams wrote:
> I have a oVirt setup that has three nodes, all running CentOS 7, with a
> hosted engine running CentOS 6.  Two of the nodes (node8 and node9) are
> configured for hosted engine, and the third (node2) is just a "regular"
> node (as you might guess from the names, more nodes are coming as I
> migrate VMs to oVirt).
> 
> On one node, node8, vdsm periodically loses its connection to libvirt,
> which causes vdsm to restart.  There doesn't appear to be any trigger
> that I can see (not time of day, load, etc. related).  The engine VM is
> up and running on node8 (don't know if that has anything to do with it).
> 
> I get some entries in /var/log/messages repeated continuously; the
> "ovirt-ha-broker: sending ioctl 5401 to a partition" I mentioned before,
> and the following:
> 
> Dec 15 20:56:23 node8 journal: User record for user '107' was not found: No such file or directory
> Dec 15 20:56:23 node8 journal: Group record for user '107' was not found: No such file or directory
> 
> I don't think those have any relevance (don't know where they come
> from); filtering those out, I see:
> 
> Dec 15 20:56:33 node8 journal: End of file while reading data: Input/output error
> Dec 15 20:56:33 node8 journal: Tried to close invalid fd 0
> Dec 15 20:56:38 node8 journal: vdsm root WARNING connection to libvirt broken. ecode: 1 edom: 7
> Dec 15 20:56:38 node8 journal: vdsm root CRITICAL taking calling process down.
> Dec 15 20:56:38 node8 journal: vdsm vds ERROR libvirt error
> Dec 15 20:56:38 node8 journal: ovirt-ha-broker mgmt_bridge.MgmtBridge ERROR Failed to getVdsCapabilities: Error 16 from getVdsCapabilities: Unexpected exception
> Dec 15 20:56:45 node8 journal: End of file while reading data: Input/output error
> Dec 15 20:56:45 node8 vdsmd_init_common.sh: vdsm: Running run_final_hooks
> Dec 15 20:56:45 node8 systemd: Starting Virtual Desktop Server Manager...
> <and then all the normal-looking vdsm startup>
> 
> It is happening about once a day, but not at any regular interval or
> time (was 02:23 Sunday, then 20:56 Monday).
> 
> vdsm.log has this at that time:
> 
> Thread-601576::DEBUG::2014-12-15 20:56:38,715::BindingXMLRPC::1132::vds::(wrapper) client [127.0.0.1]::call getCapabilities with () {}
> Thread-601576::DEBUG::2014-12-15 20:56:38,718::utils::738::root::(execCmd) /sbin/ip route show to 0.0.0.0/0 table all (cwd None)
> Thread-601576::DEBUG::2014-12-15 20:56:38,746::utils::758::root::(execCmd) SUCCESS: <err> = ''; <rc> = 0
> Thread-601576::WARNING::2014-12-15 20:56:38,754::libvirtconnection::135::root::(wrapper) connection to libvirt broken. ecode: 1 edom: 7
> Thread-601576::CRITICAL::2014-12-15 20:56:38,754::libvirtconnection::137::root::(wrapper) taking calling process down.
> MainThread::DEBUG::2014-12-15 20:56:38,754::vdsm::58::vds::(sigtermHandler) Received signal 15
> Thread-601576::DEBUG::2014-12-15 20:56:38,755::libvirtconnection::143::root::(wrapper) Unknown libvirterror: ecode: 1 edom: 7 level: 2 message: internal error: client socket is closed
> MainThread::DEBUG::2014-12-15 20:56:38,755::protocoldetector::135::vds.MultiProtocolAcceptor::(stop) Stopping Acceptor
> MainThread::INFO::2014-12-15 20:56:38,755::__init__::563::jsonrpc.JsonRpcServer::(stop) Stopping JsonRPC Server
> Detector thread::DEBUG::2014-12-15 20:56:38,756::protocoldetector::106::vds.MultiProtocolAcceptor::(_cleanup) Cleaning Acceptor
> MainThread::INFO::2014-12-15 20:56:38,757::vmchannels::188::vds::(stop) VM channels listener was stopped.
> MainThread::INFO::2014-12-15 20:56:38,758::momIF::91::MOM::(stop) Shutting down MOM
> MainThread::DEBUG::2014-12-15 20:56:38,759::task::595::Storage.TaskManager.Task::(_updateState) Task=`26c7680c-23e2-42bb-964c-272e778a168a`::moving from state init -> state preparing
> MainThread::INFO::2014-12-15 20:56:38,759::logUtils::44::dispatcher::(wrapper) Run and protect: prepareForShutdown(options=None)
> Thread-601576::ERROR::2014-12-15 20:56:38,755::BindingXMLRPC::1142::vds::(wrapper) libvirt error
> Traceback (most recent call last):
>   File "/usr/share/vdsm/rpc/BindingXMLRPC.py", line 1135, in wrapper
>     res = f(*args, **kwargs)
>   File "/usr/share/vdsm/rpc/BindingXMLRPC.py", line 463, in getCapabilities
>     ret = api.getCapabilities()
>   File "/usr/share/vdsm/API.py", line 1245, in getCapabilities
>     c = caps.get()
>   File "/usr/share/vdsm/caps.py", line 615, in get
>     caps.update(netinfo.get())
>   File "/usr/lib/python2.7/site-packages/vdsm/netinfo.py", line 812, in get
>     nets = networks()
>   File "/usr/lib/python2.7/site-packages/vdsm/netinfo.py", line 119, in networks
>     allNets = ((net, net.name()) for net in conn.listAllNetworks(0))
>   File "/usr/lib/python2.7/site-packages/vdsm/libvirtconnection.py", line 129, in wrapper
>     __connections.get(id(target)).pingLibvirt()
>   File "/usr/lib64/python2.7/site-packages/libvirt.py", line 3642, in getLibVersion
>     if ret == -1: raise libvirtError ('virConnectGetLibVersion() failed', conn=self)
> libvirtError: internal error: client socket is closed

Does anything suspecious show up on libvirtd.log?

I heard a report about a libvirt segmentation fault on centos7 - could
this be it, too? Does `ps -o stime `pgrep libvirtd`` match with reported
libvirt "internal error"?

Would you provide the precise version of libvirt and vdsm?

Can you think of anything that makes node8 "special"?

Dan.


More information about the Users mailing list