[Users] Vdsmd is respawning trying to sample NICs

jose garcia johnny.cummings at gmail.com
Mon Jun 25 11:11:37 UTC 2012


On 06/25/2012 11:37 AM, Dan Kenigsberg wrote:
> On Mon, Jun 25, 2012 at 10:57:47AM +0100, jose garcia wrote:
>> Good monday morning,
>>
>> Installed Fedora 17 and tried to install the node to a 3.1 engine.
>>
>> I'm getting an VDS Network exception in the engine side:
>>
>> in /var/log/ovirt-engine/engine:
>>
>> 2012-06-25 10:15:34,132 WARN
>> [org.ovirt.engine.core.vdsbroker.VdsManager]
>> (QuartzScheduler_Worker-96)
>> ResourceManager::refreshVdsRunTimeInfo::Failed to refresh VDS , vds
>> = 2e9929c6-bea6-11e1-bfdd-ff11f39c80eb :
>> ovirt-node2.smb.eurotux.local, VDS Network Error, continuing.
>> VDSNetworkException:
>> 2012-06-25 10:15:36,143 ERROR
>> [org.ovirt.engine.core.vdsbroker.VdsManager]
>> (QuartzScheduler_Worker-20) VDS::handleNetworkException Server
>> failed to respond,  vds_id = 2e9929c6-bea6-11e1-bfdd-ff11f39c80eb,
>> vds_name = ovirt-node2.smb.eurotux.local, error =
>> VDSNetworkException:
>> 2012-06-25 10:15:36,181 INFO
>> [org.ovirt.engine.core.bll.VdsEventListener] (pool-3-thread-49)
>> ResourceManager::vdsNotResponding entered for Host
>> 2e9929c6-bea6-11e1-bfdd-ff11f39c80eb, 10.10.30.177
>> 2012-06-25 10:15:36,214 ERROR
>> [org.ovirt.engine.core.bll.VdsNotRespondingTreatmentCommand]
>> (pool-3-thread-49) [1afd4b89] Failed to run Fence script on
>> vds:ovirt-node2.smb.eurotux.local, VMs moved to UnKnown instead.
>>
>> While in the node, vdsmd does fail to sample nics:
>>
>> in /var/log/vdsm/vdsm.log:
>>
>>     nf = netinfo.NetInfo()
>>    File "/usr/share/vdsm/netinfo.py", line 268, in __init__
>>      _netinfo = get()
>>    File "/usr/share/vdsm/netinfo.py", line 220, in get
>>      for nic in nics() ])
>> KeyError: 'p36p1'
>>
>> MainThread::INFO::2012-06-25 10:45:09,110::vdsm::76::vds::(run) VDSM
>> main thread ended. Waiting for 1 other threads...
>> MainThread::INFO::2012-06-25 10:45:09,111::vdsm::79::vds::(run)
>> <_MainThread(MainThread, started 140567823243072)>
>> MainThread::INFO::2012-06-25 10:45:09,111::vdsm::79::vds::(run)
>> <Thread(libvirtEventLoop, started daemon 140567752681216)>
>>
>> in /etc/var/log/messages there is a lot of vdsmd died too quickly:
>>
>> Jun 25 10:45:08 ovirt-node2 respawn: slave '/usr/share/vdsm/vdsm'
>> died too quickly, respawning slave
>> Jun 25 10:45:08 ovirt-node2 respawn: slave '/usr/share/vdsm/vdsm'
>> died too quickly, respawning slave
>> Jun 25 10:45:09 ovirt-node2 respawn: slave '/usr/share/vdsm/vdsm'
>> died too quickly for more than 30 seconds, master sleeping for 900
>> seconds
>>
>> I don't know why Fedora 17 calls p36p1 to what was eth0 in Fedora
>> 16, but tried to configure a bridge ovirtmgmt and the only
>> difference is that KeyError becomes 'ovirtmgmt'.
> The nic renaming may have happened due to biosdevname. Do you have it
> installed? Does any of the /etc/sysconfig/network-scripts/ifcfg-* refer
> to an old nic name?
>
> Which version of vdsm are you running? It seems that it is
> pre-v4.9.4-61-g24f8627 which is too old for f17 to run - the output of
> ifconfig has changed. Please retry with latest beta version
> https://koji.fedoraproject.org/koji/buildinfo?buildID=327015
>
> If the problem persists, could you run vdsm manually, with
> # su - vdsm -s /bin/bash
> # cd /usr/share/vdsm
> # ./vdsm
> maybe it would give a hint about the crash.
>
> regards,
> Dan.
Well, thank you. I have updated Vdsm to version 4.10. Now the problem is 
with SSL and XMLRPC.

This is the error in the side of the engine:

/var/log/ovirt-engine/engine.log

ERROR [org.ovirt.engine.core.vdsbroker.vdsbroker.VdsBrokerCommand] 
(QuartzScheduler_Worker-52) XML RPC error in command GetCapabilitiesVDS 
( Vds: ovirt-node2.smb.eurotux.local ), the error was: 
java.util.concurrent.ExecutionException: 
java.lang.reflect.InvocationTargetException, NoHttpResponseException: 
The server 10.10.30.177 failed to respond.

In the side of the node, there seems to be an authentication problem:

/var/log/vdsm/vdsm.log

SSLError: [Errno 1] _ssl.c:504: error:1407609C:SSL 
routines:SSL23_GET_CLIENT_HELLO:http request
Thread-810::ERROR::2012-06-25 
12:02:46,351::SecureXMLRPCServer::73::root::(handle_error) client 
('10.10.30.101', 58605)
Traceback (most recent call last):
   File "/usr/lib64/python2.7/SocketServer.py", line 582, in 
process_request_thread
     self.finish_request(request, client_address)
   File "/usr/lib/python2.7/site-packages/vdsm/SecureXMLRPCServer.py", 
line 66, in finish_request
     request.do_handshake()
   File "/usr/lib64/python2.7/ssl.py", line 305, in do_handshake
     self._sslobj.do_handshake()
SSLError: [Errno 1] _ssl.c:504: error:1407609C:SSL 
routines:SSL23_GET_CLIENT_HELLO:http request

In /var/log/messages there is an:

vdsm [5834]: vdsm root ERROR client ()

with the ip address of the engine.

Kind regards.




More information about the Users mailing list