[Users] Vdsmd is respawning trying to sample NICs

Dan Kenigsberg danken at redhat.com
Mon Jun 25 12:24:33 UTC 2012


On Mon, Jun 25, 2012 at 01:15:08PM +0100, jose garcia wrote:
> On 06/25/2012 12:30 PM, Dan Kenigsberg wrote:
> >On Mon, Jun 25, 2012 at 12:11:37PM +0100, jose garcia wrote:
> >>On 06/25/2012 11:37 AM, Dan Kenigsberg wrote:
> >>>On Mon, Jun 25, 2012 at 10:57:47AM +0100, jose garcia wrote:
> >>>>Good monday morning,
> >>>>
> >>>>Installed Fedora 17 and tried to install the node to a 3.1 engine.
> >>>>
> >>>>I'm getting an VDS Network exception in the engine side:
> >>>>
> >>>>in /var/log/ovirt-engine/engine:
> >>>>
> >>>>2012-06-25 10:15:34,132 WARN
> >>>>[org.ovirt.engine.core.vdsbroker.VdsManager]
> >>>>(QuartzScheduler_Worker-96)
> >>>>ResourceManager::refreshVdsRunTimeInfo::Failed to refresh VDS , vds
> >>>>= 2e9929c6-bea6-11e1-bfdd-ff11f39c80eb :
> >>>>ovirt-node2.smb.eurotux.local, VDS Network Error, continuing.
> >>>>VDSNetworkException:
> >>>>2012-06-25 10:15:36,143 ERROR
> >>>>[org.ovirt.engine.core.vdsbroker.VdsManager]
> >>>>(QuartzScheduler_Worker-20) VDS::handleNetworkException Server
> >>>>failed to respond,  vds_id = 2e9929c6-bea6-11e1-bfdd-ff11f39c80eb,
> >>>>vds_name = ovirt-node2.smb.eurotux.local, error =
> >>>>VDSNetworkException:
> >>>>2012-06-25 10:15:36,181 INFO
> >>>>[org.ovirt.engine.core.bll.VdsEventListener] (pool-3-thread-49)
> >>>>ResourceManager::vdsNotResponding entered for Host
> >>>>2e9929c6-bea6-11e1-bfdd-ff11f39c80eb, 10.10.30.177
> >>>>2012-06-25 10:15:36,214 ERROR
> >>>>[org.ovirt.engine.core.bll.VdsNotRespondingTreatmentCommand]
> >>>>(pool-3-thread-49) [1afd4b89] Failed to run Fence script on
> >>>>vds:ovirt-node2.smb.eurotux.local, VMs moved to UnKnown instead.
> >>>>
> >>>>While in the node, vdsmd does fail to sample nics:
> >>>>
> >>>>in /var/log/vdsm/vdsm.log:
> >>>>
> >>>>    nf = netinfo.NetInfo()
> >>>>   File "/usr/share/vdsm/netinfo.py", line 268, in __init__
> >>>>     _netinfo = get()
> >>>>   File "/usr/share/vdsm/netinfo.py", line 220, in get
> >>>>     for nic in nics() ])
> >>>>KeyError: 'p36p1'
> >>>>
> >>>>MainThread::INFO::2012-06-25 10:45:09,110::vdsm::76::vds::(run) VDSM
> >>>>main thread ended. Waiting for 1 other threads...
> >>>>MainThread::INFO::2012-06-25 10:45:09,111::vdsm::79::vds::(run)
> >>>><_MainThread(MainThread, started 140567823243072)>
> >>>>MainThread::INFO::2012-06-25 10:45:09,111::vdsm::79::vds::(run)
> >>>><Thread(libvirtEventLoop, started daemon 140567752681216)>
> >>>>
> >>>>in /etc/var/log/messages there is a lot of vdsmd died too quickly:
> >>>>
> >>>>Jun 25 10:45:08 ovirt-node2 respawn: slave '/usr/share/vdsm/vdsm'
> >>>>died too quickly, respawning slave
> >>>>Jun 25 10:45:08 ovirt-node2 respawn: slave '/usr/share/vdsm/vdsm'
> >>>>died too quickly, respawning slave
> >>>>Jun 25 10:45:09 ovirt-node2 respawn: slave '/usr/share/vdsm/vdsm'
> >>>>died too quickly for more than 30 seconds, master sleeping for 900
> >>>>seconds
> >>>>
> >>>>I don't know why Fedora 17 calls p36p1 to what was eth0 in Fedora
> >>>>16, but tried to configure a bridge ovirtmgmt and the only
> >>>>difference is that KeyError becomes 'ovirtmgmt'.
> >>>The nic renaming may have happened due to biosdevname. Do you have it
> >>>installed? Does any of the /etc/sysconfig/network-scripts/ifcfg-* refer
> >>>to an old nic name?
> >>>
> >>>Which version of vdsm are you running? It seems that it is
> >>>pre-v4.9.4-61-g24f8627 which is too old for f17 to run - the output of
> >>>ifconfig has changed. Please retry with latest beta version
> >>>https://koji.fedoraproject.org/koji/buildinfo?buildID=327015
> >>>
> >>>If the problem persists, could you run vdsm manually, with
> >>># su - vdsm -s /bin/bash
> >>># cd /usr/share/vdsm
> >>># ./vdsm
> >>>maybe it would give a hint about the crash.
> >>>
> >>>regards,
> >>>Dan.
> >>Well, thank you. I have updated Vdsm to version 4.10. Now the
> >>problem is with SSL and XMLRPC.
> >>
> >>This is the error in the side of the engine:
> >>
> >>/var/log/ovirt-engine/engine.log
> >>
> >>ERROR [org.ovirt.engine.core.vdsbroker.vdsbroker.VdsBrokerCommand]
> >>(QuartzScheduler_Worker-52) XML RPC error in command
> >>GetCapabilitiesVDS ( Vds: ovirt-node2.smb.eurotux.local ), the error
> >>was: java.util.concurrent.ExecutionException:
> >>java.lang.reflect.InvocationTargetException,
> >>NoHttpResponseException: The server 10.10.30.177 failed to respond.
> >>
> >>In the side of the node, there seems to be an authentication problem:
> >>
> >>/var/log/vdsm/vdsm.log
> >>
> >>SSLError: [Errno 1] _ssl.c:504: error:1407609C:SSL
> >>routines:SSL23_GET_CLIENT_HELLO:http request
> >>Thread-810::ERROR::2012-06-25
> >>12:02:46,351::SecureXMLRPCServer::73::root::(handle_error) client
> >>('10.10.30.101', 58605)
> >>Traceback (most recent call last):
> >>   File "/usr/lib64/python2.7/SocketServer.py", line 582, in
> >>process_request_thread
> >>     self.finish_request(request, client_address)
> >>   File
> >>"/usr/lib/python2.7/site-packages/vdsm/SecureXMLRPCServer.py", line
> >>66, in finish_request
> >>     request.do_handshake()
> >>   File "/usr/lib64/python2.7/ssl.py", line 305, in do_handshake
> >>     self._sslobj.do_handshake()
> >>SSLError: [Errno 1] _ssl.c:504: error:1407609C:SSL
> >>routines:SSL23_GET_CLIENT_HELLO:http request
> >>
> >>In /var/log/messages there is an:
> >>
> >>vdsm [5834]: vdsm root ERROR client ()
> >>
> >>with the ip address of the engine.
> >Hmm... Do you have ssl=true in your /etc/vdsm/vdsm.conf ?
> >Does vdsm respond locally to
> >
> >     vdsClient -s 0 getVdsCaps
> >
> >(Maybe your local certificates and key were corrupted, and you will have
> >to re-install the host form Engine in order to create a new set)
> >
> 
> I have recreated the db and run engine-setup again. I have tried
> with ssl= true commented and uncommented in the node. vdsClient -s 0
> getVdsCaps works locally and provides the information of the host,
> but something seems to be preventing it to get to the engine. I am
> still getting the same error. The installer is not beginning. I can
> do ssh as root to the host and vdsmd is alive.

What do you mean by "The installer is not beginning"?
Could you review your /etc/pki/vdsm/certs/ and check that they have been
generated by *your* engine? Is the cacert the same as the one on the
Engine machine?

Dan.
> 



More information about the Users mailing list