On Mon, Jun 25, 2012 at 03:15:47PM +0100, jose garcia wrote:
On 06/25/2012 01:24 PM, Dan Kenigsberg wrote:
>On Mon, Jun 25, 2012 at 01:15:08PM +0100, jose garcia wrote:
>>On 06/25/2012 12:30 PM, Dan Kenigsberg wrote:
>>>On Mon, Jun 25, 2012 at 12:11:37PM +0100, jose garcia wrote:
>>>>On 06/25/2012 11:37 AM, Dan Kenigsberg wrote:
>>>>>On Mon, Jun 25, 2012 at 10:57:47AM +0100, jose garcia wrote:
>>>>>>Good monday morning,
>>>>>>
>>>>>>Installed Fedora 17 and tried to install the node to a 3.1
engine.
>>>>>>
>>>>>>I'm getting an VDS Network exception in the engine side:
>>>>>>
>>>>>>in /var/log/ovirt-engine/engine:
>>>>>>
>>>>>>2012-06-25 10:15:34,132 WARN
>>>>>>[org.ovirt.engine.core.vdsbroker.VdsManager]
>>>>>>(QuartzScheduler_Worker-96)
>>>>>>ResourceManager::refreshVdsRunTimeInfo::Failed to refresh VDS ,
vds
>>>>>>= 2e9929c6-bea6-11e1-bfdd-ff11f39c80eb :
>>>>>>ovirt-node2.smb.eurotux.local, VDS Network Error, continuing.
>>>>>>VDSNetworkException:
>>>>>>2012-06-25 10:15:36,143 ERROR
>>>>>>[org.ovirt.engine.core.vdsbroker.VdsManager]
>>>>>>(QuartzScheduler_Worker-20) VDS::handleNetworkException Server
>>>>>>failed to respond, vds_id =
2e9929c6-bea6-11e1-bfdd-ff11f39c80eb,
>>>>>>vds_name = ovirt-node2.smb.eurotux.local, error =
>>>>>>VDSNetworkException:
>>>>>>2012-06-25 10:15:36,181 INFO
>>>>>>[org.ovirt.engine.core.bll.VdsEventListener] (pool-3-thread-49)
>>>>>>ResourceManager::vdsNotResponding entered for Host
>>>>>>2e9929c6-bea6-11e1-bfdd-ff11f39c80eb, 10.10.30.177
>>>>>>2012-06-25 10:15:36,214 ERROR
>>>>>>[org.ovirt.engine.core.bll.VdsNotRespondingTreatmentCommand]
>>>>>>(pool-3-thread-49) [1afd4b89] Failed to run Fence script on
>>>>>>vds:ovirt-node2.smb.eurotux.local, VMs moved to UnKnown instead.
>>>>>>
>>>>>>While in the node, vdsmd does fail to sample nics:
>>>>>>
>>>>>>in /var/log/vdsm/vdsm.log:
>>>>>>
>>>>>> nf = netinfo.NetInfo()
>>>>>> File "/usr/share/vdsm/netinfo.py", line 268, in
__init__
>>>>>> _netinfo = get()
>>>>>> File "/usr/share/vdsm/netinfo.py", line 220, in get
>>>>>> for nic in nics() ])
>>>>>>KeyError: 'p36p1'
>>>>>>
>>>>>>MainThread::INFO::2012-06-25 10:45:09,110::vdsm::76::vds::(run)
VDSM
>>>>>>main thread ended. Waiting for 1 other threads...
>>>>>>MainThread::INFO::2012-06-25 10:45:09,111::vdsm::79::vds::(run)
>>>>>><_MainThread(MainThread, started 140567823243072)>
>>>>>>MainThread::INFO::2012-06-25 10:45:09,111::vdsm::79::vds::(run)
>>>>>><Thread(libvirtEventLoop, started daemon 140567752681216)>
>>>>>>
>>>>>>in /etc/var/log/messages there is a lot of vdsmd died too
quickly:
>>>>>>
>>>>>>Jun 25 10:45:08 ovirt-node2 respawn: slave
'/usr/share/vdsm/vdsm'
>>>>>>died too quickly, respawning slave
>>>>>>Jun 25 10:45:08 ovirt-node2 respawn: slave
'/usr/share/vdsm/vdsm'
>>>>>>died too quickly, respawning slave
>>>>>>Jun 25 10:45:09 ovirt-node2 respawn: slave
'/usr/share/vdsm/vdsm'
>>>>>>died too quickly for more than 30 seconds, master sleeping for
900
>>>>>>seconds
>>>>>>
>>>>>>I don't know why Fedora 17 calls p36p1 to what was eth0 in
Fedora
>>>>>>16, but tried to configure a bridge ovirtmgmt and the only
>>>>>>difference is that KeyError becomes 'ovirtmgmt'.
>>>>>The nic renaming may have happened due to biosdevname. Do you have
it
>>>>>installed? Does any of the /etc/sysconfig/network-scripts/ifcfg-*
refer
>>>>>to an old nic name?
>>>>>
>>>>>Which version of vdsm are you running? It seems that it is
>>>>>pre-v4.9.4-61-g24f8627 which is too old for f17 to run - the output
of
>>>>>ifconfig has changed. Please retry with latest beta version
>>>>>https://koji.fedoraproject.org/koji/buildinfo?buildID=327015
>>>>>
>>>>>If the problem persists, could you run vdsm manually, with
>>>>># su - vdsm -s /bin/bash
>>>>># cd /usr/share/vdsm
>>>>># ./vdsm
>>>>>maybe it would give a hint about the crash.
>>>>>
>>>>>regards,
>>>>>Dan.
>>>>Well, thank you. I have updated Vdsm to version 4.10. Now the
>>>>problem is with SSL and XMLRPC.
>>>>
>>>>This is the error in the side of the engine:
>>>>
>>>>/var/log/ovirt-engine/engine.log
>>>>
>>>>ERROR [org.ovirt.engine.core.vdsbroker.vdsbroker.VdsBrokerCommand]
>>>>(QuartzScheduler_Worker-52) XML RPC error in command
>>>>GetCapabilitiesVDS ( Vds: ovirt-node2.smb.eurotux.local ), the error
>>>>was: java.util.concurrent.ExecutionException:
>>>>java.lang.reflect.InvocationTargetException,
>>>>NoHttpResponseException: The server 10.10.30.177 failed to respond.
>>>>
>>>>In the side of the node, there seems to be an authentication problem:
>>>>
>>>>/var/log/vdsm/vdsm.log
>>>>
>>>>SSLError: [Errno 1] _ssl.c:504: error:1407609C:SSL
>>>>routines:SSL23_GET_CLIENT_HELLO:http request
>>>>Thread-810::ERROR::2012-06-25
>>>>12:02:46,351::SecureXMLRPCServer::73::root::(handle_error) client
>>>>('10.10.30.101', 58605)
>>>>Traceback (most recent call last):
>>>> File "/usr/lib64/python2.7/SocketServer.py", line 582, in
>>>>process_request_thread
>>>> self.finish_request(request, client_address)
>>>> File
>>>>"/usr/lib/python2.7/site-packages/vdsm/SecureXMLRPCServer.py",
line
>>>>66, in finish_request
>>>> request.do_handshake()
>>>> File "/usr/lib64/python2.7/ssl.py", line 305, in
do_handshake
>>>> self._sslobj.do_handshake()
>>>>SSLError: [Errno 1] _ssl.c:504: error:1407609C:SSL
>>>>routines:SSL23_GET_CLIENT_HELLO:http request
>>>>
>>>>In /var/log/messages there is an:
>>>>
>>>>vdsm [5834]: vdsm root ERROR client ()
>>>>
>>>>with the ip address of the engine.
>>>Hmm... Do you have ssl=true in your /etc/vdsm/vdsm.conf ?
>>>Does vdsm respond locally to
>>>
>>> vdsClient -s 0 getVdsCaps
>>>
>>>(Maybe your local certificates and key were corrupted, and you will have
>>>to re-install the host form Engine in order to create a new set)
>>>
>>I have recreated the db and run engine-setup again. I have tried
>>with ssl= true commented and uncommented in the node. vdsClient -s 0
>>getVdsCaps works locally and provides the information of the host,
>>but something seems to be preventing it to get to the engine. I am
>>still getting the same error. The installer is not beginning. I can
>>do ssh as root to the host and vdsmd is alive.
>What do you mean by "The installer is not beginning"?
>Could you review your /etc/pki/vdsm/certs/ and check that they have been
>generated by *your* engine? Is the cacert the same as the one on the
>Engine machine?
>
>Dan.
In the engine server I have a self signed certificate. I set up a
cacert and server.pem to avoid libvirtd complaining about gssapi.
The one in the node is issued by the VDSM Certificate Authority, so
I suppose it is set up by the vdsm package installation.
So here lies the problem. When Vdsm is first installed, it generates its
own self-signed certificate. This, by definition, does not help to
identify it for Engine.
When you add a host to a data center, Engine logs into the host and
askes the host to produce a *new* key. Engine's CA then signs the key,
and put the cert back under /etc/pki/vdsm/cert.
Something has gone wrong in this process. If you re-install the host it
*should* override current keys. If not - it is a bug. Could you look at
the installation logs (they sit on a random dir under /tmp)?
Maybe there's a clue there why you keep the default
good-for-almost-nothing keys that come with vdsm.
Dan.