[Users] Vdsmd is respawning trying to sample NICs

older
[Users] Following Up on LinuxCon...

jose garcia

25 Jun 2012 25 Jun '12

11:57 a.m.

Good monday morning, Installed Fedora 17 and tried to install the node to a 3.1 engine. I'm getting an VDS Network exception in the engine side: in /var/log/ovirt-engine/engine: 2012-06-25 10:15:34,132 WARN [org.ovirt.engine.core.vdsbroker.VdsManager] (QuartzScheduler_Worker-96) ResourceManager::refreshVdsRunTimeInfo::Failed to refresh VDS , vds = 2e9929c6-bea6-11e1-bfdd-ff11f39c80eb : ovirt-node2.smb.eurotux.local, VDS Network Error, continuing. VDSNetworkException: 2012-06-25 10:15:36,143 ERROR [org.ovirt.engine.core.vdsbroker.VdsManager] (QuartzScheduler_Worker-20) VDS::handleNetworkException Server failed to respond, vds_id = 2e9929c6-bea6-11e1-bfdd-ff11f39c80eb, vds_name = ovirt-node2.smb.eurotux.local, error = VDSNetworkException: 2012-06-25 10:15:36,181 INFO [org.ovirt.engine.core.bll.VdsEventListener] (pool-3-thread-49) ResourceManager::vdsNotResponding entered for Host 2e9929c6-bea6-11e1-bfdd-ff11f39c80eb, 10.10.30.177 2012-06-25 10:15:36,214 ERROR [org.ovirt.engine.core.bll.VdsNotRespondingTreatmentCommand] (pool-3-thread-49) [1afd4b89] Failed to run Fence script on vds:ovirt-node2.smb.eurotux.local, VMs moved to UnKnown instead. While in the node, vdsmd does fail to sample nics: in /var/log/vdsm/vdsm.log: nf = netinfo.NetInfo() File "/usr/share/vdsm/netinfo.py", line 268, in __init__ _netinfo = get() File "/usr/share/vdsm/netinfo.py", line 220, in get for nic in nics() ]) KeyError: 'p36p1' MainThread::INFO::2012-06-25 10:45:09,110::vdsm::76::vds::(run) VDSM main thread ended. Waiting for 1 other threads... MainThread::INFO::2012-06-25 10:45:09,111::vdsm::79::vds::(run) <_MainThread(MainThread, started 140567823243072)> MainThread::INFO::2012-06-25 10:45:09,111::vdsm::79::vds::(run) <Thread(libvirtEventLoop, started daemon 140567752681216)> in /etc/var/log/messages there is a lot of vdsmd died too quickly: Jun 25 10:45:08 ovirt-node2 respawn: slave '/usr/share/vdsm/vdsm' died too quickly, respawning slave Jun 25 10:45:08 ovirt-node2 respawn: slave '/usr/share/vdsm/vdsm' died too quickly, respawning slave Jun 25 10:45:09 ovirt-node2 respawn: slave '/usr/share/vdsm/vdsm' died too quickly for more than 30 seconds, master sleeping for 900 seconds I don't know why Fedora 17 calls p36p1 to what was eth0 in Fedora 16, but tried to configure a bridge ovirtmgmt and the only difference is that KeyError becomes 'ovirtmgmt'. Regards, Jose Garcia

Show replies by date

Dan Kenigsberg

25 Jun 25 Jun

12:37 p.m.

On Mon, Jun 25, 2012 at 10:57:47AM +0100, jose garcia wrote:

...

Good monday morning,

Installed Fedora 17 and tried to install the node to a 3.1 engine.

I'm getting an VDS Network exception in the engine side:

in /var/log/ovirt-engine/engine:

2012-06-25 10:15:34,132 WARN [org.ovirt.engine.core.vdsbroker.VdsManager] (QuartzScheduler_Worker-96) ResourceManager::refreshVdsRunTimeInfo::Failed to refresh VDS , vds = 2e9929c6-bea6-11e1-bfdd-ff11f39c80eb : ovirt-node2.smb.eurotux.local, VDS Network Error, continuing. VDSNetworkException: 2012-06-25 10:15:36,143 ERROR [org.ovirt.engine.core.vdsbroker.VdsManager] (QuartzScheduler_Worker-20) VDS::handleNetworkException Server failed to respond, vds_id = 2e9929c6-bea6-11e1-bfdd-ff11f39c80eb, vds_name = ovirt-node2.smb.eurotux.local, error = VDSNetworkException: 2012-06-25 10:15:36,181 INFO [org.ovirt.engine.core.bll.VdsEventListener] (pool-3-thread-49) ResourceManager::vdsNotResponding entered for Host 2e9929c6-bea6-11e1-bfdd-ff11f39c80eb, 10.10.30.177 2012-06-25 10:15:36,214 ERROR [org.ovirt.engine.core.bll.VdsNotRespondingTreatmentCommand] (pool-3-thread-49) [1afd4b89] Failed to run Fence script on vds:ovirt-node2.smb.eurotux.local, VMs moved to UnKnown instead.

While in the node, vdsmd does fail to sample nics:

in /var/log/vdsm/vdsm.log:

nf = netinfo.NetInfo() File "/usr/share/vdsm/netinfo.py", line 268, in __init__ _netinfo = get() File "/usr/share/vdsm/netinfo.py", line 220, in get for nic in nics() ]) KeyError: 'p36p1'

MainThread::INFO::2012-06-25 10:45:09,110::vdsm::76::vds::(run) VDSM main thread ended. Waiting for 1 other threads... MainThread::INFO::2012-06-25 10:45:09,111::vdsm::79::vds::(run) <_MainThread(MainThread, started 140567823243072)> MainThread::INFO::2012-06-25 10:45:09,111::vdsm::79::vds::(run) <Thread(libvirtEventLoop, started daemon 140567752681216)>

in /etc/var/log/messages there is a lot of vdsmd died too quickly:

Jun 25 10:45:08 ovirt-node2 respawn: slave '/usr/share/vdsm/vdsm' died too quickly, respawning slave Jun 25 10:45:08 ovirt-node2 respawn: slave '/usr/share/vdsm/vdsm' died too quickly, respawning slave Jun 25 10:45:09 ovirt-node2 respawn: slave '/usr/share/vdsm/vdsm' died too quickly for more than 30 seconds, master sleeping for 900 seconds

I don't know why Fedora 17 calls p36p1 to what was eth0 in Fedora 16, but tried to configure a bridge ovirtmgmt and the only difference is that KeyError becomes 'ovirtmgmt'.

The nic renaming may have happened due to biosdevname. Do you have it installed? Does any of the /etc/sysconfig/network-scripts/ifcfg-* refer to an old nic name? Which version of vdsm are you running? It seems that it is pre-v4.9.4-61-g24f8627 which is too old for f17 to run - the output of ifconfig has changed. Please retry with latest beta version https://koji.fedoraproject.org/koji/buildinfo?buildID=327015 If the problem persists, could you run vdsm manually, with # su - vdsm -s /bin/bash # cd /usr/share/vdsm # ./vdsm maybe it would give a hint about the crash. regards, Dan.

jose garcia

1:11 p.m.

On 06/25/2012 11:37 AM, Dan Kenigsberg wrote:

...

On Mon, Jun 25, 2012 at 10:57:47AM +0100, jose garcia wrote:

...
Good monday morning,

Installed Fedora 17 and tried to install the node to a 3.1 engine.

I'm getting an VDS Network exception in the engine side:

in /var/log/ovirt-engine/engine:

2012-06-25 10:15:34,132 WARN [org.ovirt.engine.core.vdsbroker.VdsManager] (QuartzScheduler_Worker-96) ResourceManager::refreshVdsRunTimeInfo::Failed to refresh VDS , vds = 2e9929c6-bea6-11e1-bfdd-ff11f39c80eb : ovirt-node2.smb.eurotux.local, VDS Network Error, continuing. VDSNetworkException: 2012-06-25 10:15:36,143 ERROR [org.ovirt.engine.core.vdsbroker.VdsManager] (QuartzScheduler_Worker-20) VDS::handleNetworkException Server failed to respond, vds_id = 2e9929c6-bea6-11e1-bfdd-ff11f39c80eb, vds_name = ovirt-node2.smb.eurotux.local, error = VDSNetworkException: 2012-06-25 10:15:36,181 INFO [org.ovirt.engine.core.bll.VdsEventListener] (pool-3-thread-49) ResourceManager::vdsNotResponding entered for Host 2e9929c6-bea6-11e1-bfdd-ff11f39c80eb, 10.10.30.177 2012-06-25 10:15:36,214 ERROR [org.ovirt.engine.core.bll.VdsNotRespondingTreatmentCommand] (pool-3-thread-49) [1afd4b89] Failed to run Fence script on vds:ovirt-node2.smb.eurotux.local, VMs moved to UnKnown instead.

While in the node, vdsmd does fail to sample nics:

in /var/log/vdsm/vdsm.log:

nf = netinfo.NetInfo() File "/usr/share/vdsm/netinfo.py", line 268, in __init__ _netinfo = get() File "/usr/share/vdsm/netinfo.py", line 220, in get for nic in nics() ]) KeyError: 'p36p1'

MainThread::INFO::2012-06-25 10:45:09,110::vdsm::76::vds::(run) VDSM main thread ended. Waiting for 1 other threads... MainThread::INFO::2012-06-25 10:45:09,111::vdsm::79::vds::(run) <_MainThread(MainThread, started 140567823243072)> MainThread::INFO::2012-06-25 10:45:09,111::vdsm::79::vds::(run) <Thread(libvirtEventLoop, started daemon 140567752681216)>

in /etc/var/log/messages there is a lot of vdsmd died too quickly:

Jun 25 10:45:08 ovirt-node2 respawn: slave '/usr/share/vdsm/vdsm' died too quickly, respawning slave Jun 25 10:45:08 ovirt-node2 respawn: slave '/usr/share/vdsm/vdsm' died too quickly, respawning slave Jun 25 10:45:09 ovirt-node2 respawn: slave '/usr/share/vdsm/vdsm' died too quickly for more than 30 seconds, master sleeping for 900 seconds

I don't know why Fedora 17 calls p36p1 to what was eth0 in Fedora 16, but tried to configure a bridge ovirtmgmt and the only difference is that KeyError becomes 'ovirtmgmt'. The nic renaming may have happened due to biosdevname. Do you have it installed? Does any of the /etc/sysconfig/network-scripts/ifcfg-* refer to an old nic name?

Which version of vdsm are you running? It seems that it is pre-v4.9.4-61-g24f8627 which is too old for f17 to run - the output of ifconfig has changed. Please retry with latest beta version https://koji.fedoraproject.org/koji/buildinfo?buildID=327015

If the problem persists, could you run vdsm manually, with # su - vdsm -s /bin/bash # cd /usr/share/vdsm # ./vdsm maybe it would give a hint about the crash.

regards, Dan. Well, thank you. I have updated Vdsm to version 4.10. Now the problem is with SSL and XMLRPC.

This is the error in the side of the engine: /var/log/ovirt-engine/engine.log ERROR [org.ovirt.engine.core.vdsbroker.vdsbroker.VdsBrokerCommand] (QuartzScheduler_Worker-52) XML RPC error in command GetCapabilitiesVDS ( Vds: ovirt-node2.smb.eurotux.local ), the error was: java.util.concurrent.ExecutionException: java.lang.reflect.InvocationTargetException, NoHttpResponseException: The server 10.10.30.177 failed to respond. In the side of the node, there seems to be an authentication problem: /var/log/vdsm/vdsm.log SSLError: [Errno 1] _ssl.c:504: error:1407609C:SSL routines:SSL23_GET_CLIENT_HELLO:http request Thread-810::ERROR::2012-06-25 12:02:46,351::SecureXMLRPCServer::73::root::(handle_error) client ('10.10.30.101', 58605) Traceback (most recent call last): File "/usr/lib64/python2.7/SocketServer.py", line 582, in process_request_thread self.finish_request(request, client_address) File "/usr/lib/python2.7/site-packages/vdsm/SecureXMLRPCServer.py", line 66, in finish_request request.do_handshake() File "/usr/lib64/python2.7/ssl.py", line 305, in do_handshake self._sslobj.do_handshake() SSLError: [Errno 1] _ssl.c:504: error:1407609C:SSL routines:SSL23_GET_CLIENT_HELLO:http request In /var/log/messages there is an: vdsm [5834]: vdsm root ERROR client () with the ip address of the engine. Kind regards.

Dan Kenigsberg

1:30 p.m.

On Mon, Jun 25, 2012 at 12:11:37PM +0100, jose garcia wrote:

...

...
On Mon, Jun 25, 2012 at 10:57:47AM +0100, jose garcia wrote:

...
Good monday morning,

Installed Fedora 17 and tried to install the node to a 3.1 engine.

I'm getting an VDS Network exception in the engine side:

in /var/log/ovirt-engine/engine:

2012-06-25 10:15:34,132 WARN [org.ovirt.engine.core.vdsbroker.VdsManager] (QuartzScheduler_Worker-96) ResourceManager::refreshVdsRunTimeInfo::Failed to refresh VDS , vds = 2e9929c6-bea6-11e1-bfdd-ff11f39c80eb : ovirt-node2.smb.eurotux.local, VDS Network Error, continuing. VDSNetworkException: 2012-06-25 10:15:36,143 ERROR [org.ovirt.engine.core.vdsbroker.VdsManager] (QuartzScheduler_Worker-20) VDS::handleNetworkException Server failed to respond, vds_id = 2e9929c6-bea6-11e1-bfdd-ff11f39c80eb, vds_name = ovirt-node2.smb.eurotux.local, error = VDSNetworkException: 2012-06-25 10:15:36,181 INFO [org.ovirt.engine.core.bll.VdsEventListener] (pool-3-thread-49) ResourceManager::vdsNotResponding entered for Host 2e9929c6-bea6-11e1-bfdd-ff11f39c80eb, 10.10.30.177 2012-06-25 10:15:36,214 ERROR [org.ovirt.engine.core.bll.VdsNotRespondingTreatmentCommand] (pool-3-thread-49) [1afd4b89] Failed to run Fence script on vds:ovirt-node2.smb.eurotux.local, VMs moved to UnKnown instead.

While in the node, vdsmd does fail to sample nics:

in /var/log/vdsm/vdsm.log:

nf = netinfo.NetInfo() File "/usr/share/vdsm/netinfo.py", line 268, in __init__ _netinfo = get() File "/usr/share/vdsm/netinfo.py", line 220, in get for nic in nics() ]) KeyError: 'p36p1'

MainThread::INFO::2012-06-25 10:45:09,110::vdsm::76::vds::(run) VDSM main thread ended. Waiting for 1 other threads... MainThread::INFO::2012-06-25 10:45:09,111::vdsm::79::vds::(run) <_MainThread(MainThread, started 140567823243072)> MainThread::INFO::2012-06-25 10:45:09,111::vdsm::79::vds::(run) <Thread(libvirtEventLoop, started daemon 140567752681216)>

in /etc/var/log/messages there is a lot of vdsmd died too quickly:

Jun 25 10:45:08 ovirt-node2 respawn: slave '/usr/share/vdsm/vdsm' died too quickly, respawning slave Jun 25 10:45:08 ovirt-node2 respawn: slave '/usr/share/vdsm/vdsm' died too quickly, respawning slave Jun 25 10:45:09 ovirt-node2 respawn: slave '/usr/share/vdsm/vdsm' died too quickly for more than 30 seconds, master sleeping for 900 seconds

I don't know why Fedora 17 calls p36p1 to what was eth0 in Fedora 16, but tried to configure a bridge ovirtmgmt and the only difference is that KeyError becomes 'ovirtmgmt'. The nic renaming may have happened due to biosdevname. Do you have it installed? Does any of the /etc/sysconfig/network-scripts/ifcfg-* refer to an old nic name?

Which version of vdsm are you running? It seems that it is pre-v4.9.4-61-g24f8627 which is too old for f17 to run - the output of ifconfig has changed. Please retry with latest beta version https://koji.fedoraproject.org/koji/buildinfo?buildID=327015

If the problem persists, could you run vdsm manually, with # su - vdsm -s /bin/bash # cd /usr/share/vdsm # ./vdsm maybe it would give a hint about the crash.

regards, Dan. Well, thank you. I have updated Vdsm to version 4.10. Now the

On 06/25/2012 11:37 AM, Dan Kenigsberg wrote: problem is with SSL and XMLRPC.

This is the error in the side of the engine:

/var/log/ovirt-engine/engine.log

ERROR [org.ovirt.engine.core.vdsbroker.vdsbroker.VdsBrokerCommand] (QuartzScheduler_Worker-52) XML RPC error in command GetCapabilitiesVDS ( Vds: ovirt-node2.smb.eurotux.local ), the error was: java.util.concurrent.ExecutionException: java.lang.reflect.InvocationTargetException, NoHttpResponseException: The server 10.10.30.177 failed to respond.

In the side of the node, there seems to be an authentication problem:

/var/log/vdsm/vdsm.log

SSLError: [Errno 1] _ssl.c:504: error:1407609C:SSL routines:SSL23_GET_CLIENT_HELLO:http request Thread-810::ERROR::2012-06-25 12:02:46,351::SecureXMLRPCServer::73::root::(handle_error) client ('10.10.30.101', 58605) Traceback (most recent call last): File "/usr/lib64/python2.7/SocketServer.py", line 582, in process_request_thread self.finish_request(request, client_address) File "/usr/lib/python2.7/site-packages/vdsm/SecureXMLRPCServer.py", line 66, in finish_request request.do_handshake() File "/usr/lib64/python2.7/ssl.py", line 305, in do_handshake self._sslobj.do_handshake() SSLError: [Errno 1] _ssl.c:504: error:1407609C:SSL routines:SSL23_GET_CLIENT_HELLO:http request

In /var/log/messages there is an:

vdsm [5834]: vdsm root ERROR client ()

with the ip address of the engine.

Hmm... Do you have ssl=true in your /etc/vdsm/vdsm.conf ? Does vdsm respond locally to vdsClient -s 0 getVdsCaps (Maybe your local certificates and key were corrupted, and you will have to re-install the host form Engine in order to create a new set)

jose garcia

2:15 p.m.

On 06/25/2012 12:30 PM, Dan Kenigsberg wrote:

...

On Mon, Jun 25, 2012 at 12:11:37PM +0100, jose garcia wrote:

...
...
On Mon, Jun 25, 2012 at 10:57:47AM +0100, jose garcia wrote:

...
Good monday morning,

Installed Fedora 17 and tried to install the node to a 3.1 engine.

I'm getting an VDS Network exception in the engine side:

in /var/log/ovirt-engine/engine:

2012-06-25 10:15:34,132 WARN [org.ovirt.engine.core.vdsbroker.VdsManager] (QuartzScheduler_Worker-96) ResourceManager::refreshVdsRunTimeInfo::Failed to refresh VDS , vds = 2e9929c6-bea6-11e1-bfdd-ff11f39c80eb : ovirt-node2.smb.eurotux.local, VDS Network Error, continuing. VDSNetworkException: 2012-06-25 10:15:36,143 ERROR [org.ovirt.engine.core.vdsbroker.VdsManager] (QuartzScheduler_Worker-20) VDS::handleNetworkException Server failed to respond, vds_id = 2e9929c6-bea6-11e1-bfdd-ff11f39c80eb, vds_name = ovirt-node2.smb.eurotux.local, error = VDSNetworkException: 2012-06-25 10:15:36,181 INFO [org.ovirt.engine.core.bll.VdsEventListener] (pool-3-thread-49) ResourceManager::vdsNotResponding entered for Host 2e9929c6-bea6-11e1-bfdd-ff11f39c80eb, 10.10.30.177 2012-06-25 10:15:36,214 ERROR [org.ovirt.engine.core.bll.VdsNotRespondingTreatmentCommand] (pool-3-thread-49) [1afd4b89] Failed to run Fence script on vds:ovirt-node2.smb.eurotux.local, VMs moved to UnKnown instead.

While in the node, vdsmd does fail to sample nics:

in /var/log/vdsm/vdsm.log:

nf = netinfo.NetInfo() File "/usr/share/vdsm/netinfo.py", line 268, in __init__ _netinfo = get() File "/usr/share/vdsm/netinfo.py", line 220, in get for nic in nics() ]) KeyError: 'p36p1'

MainThread::INFO::2012-06-25 10:45:09,110::vdsm::76::vds::(run) VDSM main thread ended. Waiting for 1 other threads... MainThread::INFO::2012-06-25 10:45:09,111::vdsm::79::vds::(run) <_MainThread(MainThread, started 140567823243072)> MainThread::INFO::2012-06-25 10:45:09,111::vdsm::79::vds::(run) <Thread(libvirtEventLoop, started daemon 140567752681216)>

in /etc/var/log/messages there is a lot of vdsmd died too quickly:

Jun 25 10:45:08 ovirt-node2 respawn: slave '/usr/share/vdsm/vdsm' died too quickly, respawning slave Jun 25 10:45:08 ovirt-node2 respawn: slave '/usr/share/vdsm/vdsm' died too quickly, respawning slave Jun 25 10:45:09 ovirt-node2 respawn: slave '/usr/share/vdsm/vdsm' died too quickly for more than 30 seconds, master sleeping for 900 seconds

I don't know why Fedora 17 calls p36p1 to what was eth0 in Fedora 16, but tried to configure a bridge ovirtmgmt and the only difference is that KeyError becomes 'ovirtmgmt'. The nic renaming may have happened due to biosdevname. Do you have it installed? Does any of the /etc/sysconfig/network-scripts/ifcfg-* refer to an old nic name?

Which version of vdsm are you running? It seems that it is pre-v4.9.4-61-g24f8627 which is too old for f17 to run - the output of ifconfig has changed. Please retry with latest beta version https://koji.fedoraproject.org/koji/buildinfo?buildID=327015

If the problem persists, could you run vdsm manually, with # su - vdsm -s /bin/bash # cd /usr/share/vdsm # ./vdsm maybe it would give a hint about the crash.

regards, Dan. Well, thank you. I have updated Vdsm to version 4.10. Now the

On 06/25/2012 11:37 AM, Dan Kenigsberg wrote: problem is with SSL and XMLRPC.

This is the error in the side of the engine:

/var/log/ovirt-engine/engine.log

ERROR [org.ovirt.engine.core.vdsbroker.vdsbroker.VdsBrokerCommand] (QuartzScheduler_Worker-52) XML RPC error in command GetCapabilitiesVDS ( Vds: ovirt-node2.smb.eurotux.local ), the error was: java.util.concurrent.ExecutionException: java.lang.reflect.InvocationTargetException, NoHttpResponseException: The server 10.10.30.177 failed to respond.

In the side of the node, there seems to be an authentication problem:

/var/log/vdsm/vdsm.log

SSLError: [Errno 1] _ssl.c:504: error:1407609C:SSL routines:SSL23_GET_CLIENT_HELLO:http request Thread-810::ERROR::2012-06-25 12:02:46,351::SecureXMLRPCServer::73::root::(handle_error) client ('10.10.30.101', 58605) Traceback (most recent call last): File "/usr/lib64/python2.7/SocketServer.py", line 582, in process_request_thread self.finish_request(request, client_address) File "/usr/lib/python2.7/site-packages/vdsm/SecureXMLRPCServer.py", line 66, in finish_request request.do_handshake() File "/usr/lib64/python2.7/ssl.py", line 305, in do_handshake self._sslobj.do_handshake() SSLError: [Errno 1] _ssl.c:504: error:1407609C:SSL routines:SSL23_GET_CLIENT_HELLO:http request

In /var/log/messages there is an:

vdsm [5834]: vdsm root ERROR client ()

with the ip address of the engine. Hmm... Do you have ssl=true in your /etc/vdsm/vdsm.conf ? Does vdsm respond locally to

vdsClient -s 0 getVdsCaps

(Maybe your local certificates and key were corrupted, and you will have to re-install the host form Engine in order to create a new set)

I have recreated the db and run engine-setup again. I have tried with ssl= true commented and uncommented in the node. vdsClient -s 0 getVdsCaps works locally and provides the information of the host, but something seems to be preventing it to get to the engine. I am still getting the same error. The installer is not beginning. I can do ssh as root to the host and vdsmd is alive.

Dan Kenigsberg

2:24 p.m.

On Mon, Jun 25, 2012 at 01:15:08PM +0100, jose garcia wrote:

...

On 06/25/2012 12:30 PM, Dan Kenigsberg wrote:

...
On Mon, Jun 25, 2012 at 12:11:37PM +0100, jose garcia wrote:

...
...
On Mon, Jun 25, 2012 at 10:57:47AM +0100, jose garcia wrote:

...
Good monday morning,

Installed Fedora 17 and tried to install the node to a 3.1 engine.

I'm getting an VDS Network exception in the engine side:

in /var/log/ovirt-engine/engine:

2012-06-25 10:15:34,132 WARN [org.ovirt.engine.core.vdsbroker.VdsManager] (QuartzScheduler_Worker-96) ResourceManager::refreshVdsRunTimeInfo::Failed to refresh VDS , vds = 2e9929c6-bea6-11e1-bfdd-ff11f39c80eb : ovirt-node2.smb.eurotux.local, VDS Network Error, continuing. VDSNetworkException: 2012-06-25 10:15:36,143 ERROR [org.ovirt.engine.core.vdsbroker.VdsManager] (QuartzScheduler_Worker-20) VDS::handleNetworkException Server failed to respond, vds_id = 2e9929c6-bea6-11e1-bfdd-ff11f39c80eb, vds_name = ovirt-node2.smb.eurotux.local, error = VDSNetworkException: 2012-06-25 10:15:36,181 INFO [org.ovirt.engine.core.bll.VdsEventListener] (pool-3-thread-49) ResourceManager::vdsNotResponding entered for Host 2e9929c6-bea6-11e1-bfdd-ff11f39c80eb, 10.10.30.177 2012-06-25 10:15:36,214 ERROR [org.ovirt.engine.core.bll.VdsNotRespondingTreatmentCommand] (pool-3-thread-49) [1afd4b89] Failed to run Fence script on vds:ovirt-node2.smb.eurotux.local, VMs moved to UnKnown instead.

While in the node, vdsmd does fail to sample nics:

in /var/log/vdsm/vdsm.log:

nf = netinfo.NetInfo() File "/usr/share/vdsm/netinfo.py", line 268, in __init__ _netinfo = get() File "/usr/share/vdsm/netinfo.py", line 220, in get for nic in nics() ]) KeyError: 'p36p1'

MainThread::INFO::2012-06-25 10:45:09,110::vdsm::76::vds::(run) VDSM main thread ended. Waiting for 1 other threads... MainThread::INFO::2012-06-25 10:45:09,111::vdsm::79::vds::(run) <_MainThread(MainThread, started 140567823243072)> MainThread::INFO::2012-06-25 10:45:09,111::vdsm::79::vds::(run) <Thread(libvirtEventLoop, started daemon 140567752681216)>

in /etc/var/log/messages there is a lot of vdsmd died too quickly:

Jun 25 10:45:08 ovirt-node2 respawn: slave '/usr/share/vdsm/vdsm' died too quickly, respawning slave Jun 25 10:45:08 ovirt-node2 respawn: slave '/usr/share/vdsm/vdsm' died too quickly, respawning slave Jun 25 10:45:09 ovirt-node2 respawn: slave '/usr/share/vdsm/vdsm' died too quickly for more than 30 seconds, master sleeping for 900 seconds

I don't know why Fedora 17 calls p36p1 to what was eth0 in Fedora 16, but tried to configure a bridge ovirtmgmt and the only difference is that KeyError becomes 'ovirtmgmt'. The nic renaming may have happened due to biosdevname. Do you have it installed? Does any of the /etc/sysconfig/network-scripts/ifcfg-* refer to an old nic name?

Which version of vdsm are you running? It seems that it is pre-v4.9.4-61-g24f8627 which is too old for f17 to run - the output of ifconfig has changed. Please retry with latest beta version https://koji.fedoraproject.org/koji/buildinfo?buildID=327015

If the problem persists, could you run vdsm manually, with # su - vdsm -s /bin/bash # cd /usr/share/vdsm # ./vdsm maybe it would give a hint about the crash.

regards, Dan. Well, thank you. I have updated Vdsm to version 4.10. Now the

On 06/25/2012 11:37 AM, Dan Kenigsberg wrote: problem is with SSL and XMLRPC.

This is the error in the side of the engine:

/var/log/ovirt-engine/engine.log

ERROR [org.ovirt.engine.core.vdsbroker.vdsbroker.VdsBrokerCommand] (QuartzScheduler_Worker-52) XML RPC error in command GetCapabilitiesVDS ( Vds: ovirt-node2.smb.eurotux.local ), the error was: java.util.concurrent.ExecutionException: java.lang.reflect.InvocationTargetException, NoHttpResponseException: The server 10.10.30.177 failed to respond.

In the side of the node, there seems to be an authentication problem:

/var/log/vdsm/vdsm.log

SSLError: [Errno 1] _ssl.c:504: error:1407609C:SSL routines:SSL23_GET_CLIENT_HELLO:http request Thread-810::ERROR::2012-06-25 12:02:46,351::SecureXMLRPCServer::73::root::(handle_error) client ('10.10.30.101', 58605) Traceback (most recent call last): File "/usr/lib64/python2.7/SocketServer.py", line 582, in process_request_thread self.finish_request(request, client_address) File "/usr/lib/python2.7/site-packages/vdsm/SecureXMLRPCServer.py", line 66, in finish_request request.do_handshake() File "/usr/lib64/python2.7/ssl.py", line 305, in do_handshake self._sslobj.do_handshake() SSLError: [Errno 1] _ssl.c:504: error:1407609C:SSL routines:SSL23_GET_CLIENT_HELLO:http request

In /var/log/messages there is an:

vdsm [5834]: vdsm root ERROR client ()

with the ip address of the engine. Hmm... Do you have ssl=true in your /etc/vdsm/vdsm.conf ? Does vdsm respond locally to

vdsClient -s 0 getVdsCaps

(Maybe your local certificates and key were corrupted, and you will have to re-install the host form Engine in order to create a new set)

I have recreated the db and run engine-setup again. I have tried with ssl= true commented and uncommented in the node. vdsClient -s 0 getVdsCaps works locally and provides the information of the host, but something seems to be preventing it to get to the engine. I am still getting the same error. The installer is not beginning. I can do ssh as root to the host and vdsmd is alive.

What do you mean by "The installer is not beginning"? Could you review your /etc/pki/vdsm/certs/ and check that they have been generated by *your* engine? Is the cacert the same as the one on the Engine machine? Dan.

...

jose garcia

4:15 p.m.

...

On Mon, Jun 25, 2012 at 01:15:08PM +0100, jose garcia wrote:

...
On 06/25/2012 12:30 PM, Dan Kenigsberg wrote:

...
On Mon, Jun 25, 2012 at 12:11:37PM +0100, jose garcia wrote:

...
...
On Mon, Jun 25, 2012 at 10:57:47AM +0100, jose garcia wrote:

...
Good monday morning,

Installed Fedora 17 and tried to install the node to a 3.1 engine.

I'm getting an VDS Network exception in the engine side:

in /var/log/ovirt-engine/engine:

2012-06-25 10:15:34,132 WARN [org.ovirt.engine.core.vdsbroker.VdsManager] (QuartzScheduler_Worker-96) ResourceManager::refreshVdsRunTimeInfo::Failed to refresh VDS , vds = 2e9929c6-bea6-11e1-bfdd-ff11f39c80eb : ovirt-node2.smb.eurotux.local, VDS Network Error, continuing. VDSNetworkException: 2012-06-25 10:15:36,143 ERROR [org.ovirt.engine.core.vdsbroker.VdsManager] (QuartzScheduler_Worker-20) VDS::handleNetworkException Server failed to respond, vds_id = 2e9929c6-bea6-11e1-bfdd-ff11f39c80eb, vds_name = ovirt-node2.smb.eurotux.local, error = VDSNetworkException: 2012-06-25 10:15:36,181 INFO [org.ovirt.engine.core.bll.VdsEventListener] (pool-3-thread-49) ResourceManager::vdsNotResponding entered for Host 2e9929c6-bea6-11e1-bfdd-ff11f39c80eb, 10.10.30.177 2012-06-25 10:15:36,214 ERROR [org.ovirt.engine.core.bll.VdsNotRespondingTreatmentCommand] (pool-3-thread-49) [1afd4b89] Failed to run Fence script on vds:ovirt-node2.smb.eurotux.local, VMs moved to UnKnown instead.

While in the node, vdsmd does fail to sample nics:

in /var/log/vdsm/vdsm.log:

nf = netinfo.NetInfo() File "/usr/share/vdsm/netinfo.py", line 268, in __init__ _netinfo = get() File "/usr/share/vdsm/netinfo.py", line 220, in get for nic in nics() ]) KeyError: 'p36p1'

MainThread::INFO::2012-06-25 10:45:09,110::vdsm::76::vds::(run) VDSM main thread ended. Waiting for 1 other threads... MainThread::INFO::2012-06-25 10:45:09,111::vdsm::79::vds::(run) <_MainThread(MainThread, started 140567823243072)> MainThread::INFO::2012-06-25 10:45:09,111::vdsm::79::vds::(run) <Thread(libvirtEventLoop, started daemon 140567752681216)>

in /etc/var/log/messages there is a lot of vdsmd died too quickly:

Jun 25 10:45:08 ovirt-node2 respawn: slave '/usr/share/vdsm/vdsm' died too quickly, respawning slave Jun 25 10:45:08 ovirt-node2 respawn: slave '/usr/share/vdsm/vdsm' died too quickly, respawning slave Jun 25 10:45:09 ovirt-node2 respawn: slave '/usr/share/vdsm/vdsm' died too quickly for more than 30 seconds, master sleeping for 900 seconds

I don't know why Fedora 17 calls p36p1 to what was eth0 in Fedora 16, but tried to configure a bridge ovirtmgmt and the only difference is that KeyError becomes 'ovirtmgmt'. The nic renaming may have happened due to biosdevname. Do you have it installed? Does any of the /etc/sysconfig/network-scripts/ifcfg-* refer to an old nic name?

Which version of vdsm are you running? It seems that it is pre-v4.9.4-61-g24f8627 which is too old for f17 to run - the output of ifconfig has changed. Please retry with latest beta version https://koji.fedoraproject.org/koji/buildinfo?buildID=327015

If the problem persists, could you run vdsm manually, with # su - vdsm -s /bin/bash # cd /usr/share/vdsm # ./vdsm maybe it would give a hint about the crash.

regards, Dan. Well, thank you. I have updated Vdsm to version 4.10. Now the

On 06/25/2012 11:37 AM, Dan Kenigsberg wrote: problem is with SSL and XMLRPC.

This is the error in the side of the engine:

/var/log/ovirt-engine/engine.log

ERROR [org.ovirt.engine.core.vdsbroker.vdsbroker.VdsBrokerCommand] (QuartzScheduler_Worker-52) XML RPC error in command GetCapabilitiesVDS ( Vds: ovirt-node2.smb.eurotux.local ), the error was: java.util.concurrent.ExecutionException: java.lang.reflect.InvocationTargetException, NoHttpResponseException: The server 10.10.30.177 failed to respond.

In the side of the node, there seems to be an authentication problem:

/var/log/vdsm/vdsm.log

SSLError: [Errno 1] _ssl.c:504: error:1407609C:SSL routines:SSL23_GET_CLIENT_HELLO:http request Thread-810::ERROR::2012-06-25 12:02:46,351::SecureXMLRPCServer::73::root::(handle_error) client ('10.10.30.101', 58605) Traceback (most recent call last): File "/usr/lib64/python2.7/SocketServer.py", line 582, in process_request_thread self.finish_request(request, client_address) File "/usr/lib/python2.7/site-packages/vdsm/SecureXMLRPCServer.py", line 66, in finish_request request.do_handshake() File "/usr/lib64/python2.7/ssl.py", line 305, in do_handshake self._sslobj.do_handshake() SSLError: [Errno 1] _ssl.c:504: error:1407609C:SSL routines:SSL23_GET_CLIENT_HELLO:http request

In /var/log/messages there is an:

vdsm [5834]: vdsm root ERROR client ()

with the ip address of the engine. Hmm... Do you have ssl=true in your /etc/vdsm/vdsm.conf ? Does vdsm respond locally to

vdsClient -s 0 getVdsCaps

(Maybe your local certificates and key were corrupted, and you will have to re-install the host form Engine in order to create a new set)

I have recreated the db and run engine-setup again. I have tried with ssl= true commented and uncommented in the node. vdsClient -s 0 getVdsCaps works locally and provides the information of the host, but something seems to be preventing it to get to the engine. I am still getting the same error. The installer is not beginning. I can do ssh as root to the host and vdsmd is alive. What do you mean by "The installer is not beginning"? Could you review your /etc/pki/vdsm/certs/ and check that they have been generated by *your* engine? Is the cacert the same as the one on the Engine machine?

Dan. In the engine server I have a self signed certificate. I set up a cacert and server.pem to avoid libvirtd complaining about gssapi. The one in

On 06/25/2012 01:24 PM, Dan Kenigsberg wrote: the node is issued by the VDSM Certificate Authority, so I suppose it is set up by the vdsm package installation.

Dan Kenigsberg

5:17 p.m.

On Mon, Jun 25, 2012 at 03:15:47PM +0100, jose garcia wrote:

...

On 06/25/2012 01:24 PM, Dan Kenigsberg wrote:

...
On Mon, Jun 25, 2012 at 01:15:08PM +0100, jose garcia wrote:

...
On 06/25/2012 12:30 PM, Dan Kenigsberg wrote:

...
On Mon, Jun 25, 2012 at 12:11:37PM +0100, jose garcia wrote:

...
...
On Mon, Jun 25, 2012 at 10:57:47AM +0100, jose garcia wrote: >Good monday morning, > >Installed Fedora 17 and tried to install the node to a 3.1 engine. > >I'm getting an VDS Network exception in the engine side: > >in /var/log/ovirt-engine/engine: > >2012-06-25 10:15:34,132 WARN >[org.ovirt.engine.core.vdsbroker.VdsManager] >(QuartzScheduler_Worker-96) >ResourceManager::refreshVdsRunTimeInfo::Failed to refresh VDS , vds >= 2e9929c6-bea6-11e1-bfdd-ff11f39c80eb : >ovirt-node2.smb.eurotux.local, VDS Network Error, continuing. >VDSNetworkException: >2012-06-25 10:15:36,143 ERROR >[org.ovirt.engine.core.vdsbroker.VdsManager] >(QuartzScheduler_Worker-20) VDS::handleNetworkException Server >failed to respond, vds_id = 2e9929c6-bea6-11e1-bfdd-ff11f39c80eb, >vds_name = ovirt-node2.smb.eurotux.local, error = >VDSNetworkException: >2012-06-25 10:15:36,181 INFO >[org.ovirt.engine.core.bll.VdsEventListener] (pool-3-thread-49) >ResourceManager::vdsNotResponding entered for Host >2e9929c6-bea6-11e1-bfdd-ff11f39c80eb, 10.10.30.177 >2012-06-25 10:15:36,214 ERROR >[org.ovirt.engine.core.bll.VdsNotRespondingTreatmentCommand] >(pool-3-thread-49) [1afd4b89] Failed to run Fence script on >vds:ovirt-node2.smb.eurotux.local, VMs moved to UnKnown instead. > >While in the node, vdsmd does fail to sample nics: > >in /var/log/vdsm/vdsm.log: > > nf = netinfo.NetInfo() > File "/usr/share/vdsm/netinfo.py", line 268, in __init__ > _netinfo = get() > File "/usr/share/vdsm/netinfo.py", line 220, in get > for nic in nics() ]) >KeyError: 'p36p1' > >MainThread::INFO::2012-06-25 10:45:09,110::vdsm::76::vds::(run) VDSM >main thread ended. Waiting for 1 other threads... >MainThread::INFO::2012-06-25 10:45:09,111::vdsm::79::vds::(run) ><_MainThread(MainThread, started 140567823243072)> >MainThread::INFO::2012-06-25 10:45:09,111::vdsm::79::vds::(run) ><Thread(libvirtEventLoop, started daemon 140567752681216)> > >in /etc/var/log/messages there is a lot of vdsmd died too quickly: > >Jun 25 10:45:08 ovirt-node2 respawn: slave '/usr/share/vdsm/vdsm' >died too quickly, respawning slave >Jun 25 10:45:08 ovirt-node2 respawn: slave '/usr/share/vdsm/vdsm' >died too quickly, respawning slave >Jun 25 10:45:09 ovirt-node2 respawn: slave '/usr/share/vdsm/vdsm' >died too quickly for more than 30 seconds, master sleeping for 900 >seconds > >I don't know why Fedora 17 calls p36p1 to what was eth0 in Fedora >16, but tried to configure a bridge ovirtmgmt and the only >difference is that KeyError becomes 'ovirtmgmt'. The nic renaming may have happened due to biosdevname. Do you have it installed? Does any of the /etc/sysconfig/network-scripts/ifcfg-* refer to an old nic name?

Which version of vdsm are you running? It seems that it is pre-v4.9.4-61-g24f8627 which is too old for f17 to run - the output of ifconfig has changed. Please retry with latest beta version https://koji.fedoraproject.org/koji/buildinfo?buildID=327015

If the problem persists, could you run vdsm manually, with # su - vdsm -s /bin/bash # cd /usr/share/vdsm # ./vdsm maybe it would give a hint about the crash.

regards, Dan. Well, thank you. I have updated Vdsm to version 4.10. Now the

On 06/25/2012 11:37 AM, Dan Kenigsberg wrote: problem is with SSL and XMLRPC.

This is the error in the side of the engine:

/var/log/ovirt-engine/engine.log

ERROR [org.ovirt.engine.core.vdsbroker.vdsbroker.VdsBrokerCommand] (QuartzScheduler_Worker-52) XML RPC error in command GetCapabilitiesVDS ( Vds: ovirt-node2.smb.eurotux.local ), the error was: java.util.concurrent.ExecutionException: java.lang.reflect.InvocationTargetException, NoHttpResponseException: The server 10.10.30.177 failed to respond.

In the side of the node, there seems to be an authentication problem:

/var/log/vdsm/vdsm.log

SSLError: [Errno 1] _ssl.c:504: error:1407609C:SSL routines:SSL23_GET_CLIENT_HELLO:http request Thread-810::ERROR::2012-06-25 12:02:46,351::SecureXMLRPCServer::73::root::(handle_error) client ('10.10.30.101', 58605) Traceback (most recent call last): File "/usr/lib64/python2.7/SocketServer.py", line 582, in process_request_thread self.finish_request(request, client_address) File "/usr/lib/python2.7/site-packages/vdsm/SecureXMLRPCServer.py", line 66, in finish_request request.do_handshake() File "/usr/lib64/python2.7/ssl.py", line 305, in do_handshake self._sslobj.do_handshake() SSLError: [Errno 1] _ssl.c:504: error:1407609C:SSL routines:SSL23_GET_CLIENT_HELLO:http request

In /var/log/messages there is an:

vdsm [5834]: vdsm root ERROR client ()

with the ip address of the engine. Hmm... Do you have ssl=true in your /etc/vdsm/vdsm.conf ? Does vdsm respond locally to

vdsClient -s 0 getVdsCaps

(Maybe your local certificates and key were corrupted, and you will have to re-install the host form Engine in order to create a new set)

I have recreated the db and run engine-setup again. I have tried with ssl= true commented and uncommented in the node. vdsClient -s 0 getVdsCaps works locally and provides the information of the host, but something seems to be preventing it to get to the engine. I am still getting the same error. The installer is not beginning. I can do ssh as root to the host and vdsmd is alive. What do you mean by "The installer is not beginning"? Could you review your /etc/pki/vdsm/certs/ and check that they have been generated by *your* engine? Is the cacert the same as the one on the Engine machine?

Dan. In the engine server I have a self signed certificate. I set up a cacert and server.pem to avoid libvirtd complaining about gssapi. The one in the node is issued by the VDSM Certificate Authority, so I suppose it is set up by the vdsm package installation.

So here lies the problem. When Vdsm is first installed, it generates its own self-signed certificate. This, by definition, does not help to identify it for Engine. When you add a host to a data center, Engine logs into the host and askes the host to produce a *new* key. Engine's CA then signs the key, and put the cert back under /etc/pki/vdsm/cert. Something has gone wrong in this process. If you re-install the host it *should* override current keys. If not - it is a bug. Could you look at the installation logs (they sit on a random dir under /tmp)? Maybe there's a clue there why you keep the default good-for-almost-nothing keys that come with vdsm. Dan.

...

jose garcia

6 p.m.

...

On Mon, Jun 25, 2012 at 03:15:47PM +0100, jose garcia wrote:

...
On 06/25/2012 01:24 PM, Dan Kenigsberg wrote:

...
On Mon, Jun 25, 2012 at 01:15:08PM +0100, jose garcia wrote:

...
On 06/25/2012 12:30 PM, Dan Kenigsberg wrote:

...
On Mon, Jun 25, 2012 at 12:11:37PM +0100, jose garcia wrote:

...
On 06/25/2012 11:37 AM, Dan Kenigsberg wrote: > On Mon, Jun 25, 2012 at 10:57:47AM +0100, jose garcia wrote: >> Good monday morning, >> >> Installed Fedora 17 and tried to install the node to a 3.1 engine. >> >> I'm getting an VDS Network exception in the engine side: >> >> in /var/log/ovirt-engine/engine: >> >> 2012-06-25 10:15:34,132 WARN >> [org.ovirt.engine.core.vdsbroker.VdsManager] >> (QuartzScheduler_Worker-96) >> ResourceManager::refreshVdsRunTimeInfo::Failed to refresh VDS , vds >> = 2e9929c6-bea6-11e1-bfdd-ff11f39c80eb : >> ovirt-node2.smb.eurotux.local, VDS Network Error, continuing. >> VDSNetworkException: >> 2012-06-25 10:15:36,143 ERROR >> [org.ovirt.engine.core.vdsbroker.VdsManager] >> (QuartzScheduler_Worker-20) VDS::handleNetworkException Server >> failed to respond, vds_id = 2e9929c6-bea6-11e1-bfdd-ff11f39c80eb, >> vds_name = ovirt-node2.smb.eurotux.local, error = >> VDSNetworkException: >> 2012-06-25 10:15:36,181 INFO >> [org.ovirt.engine.core.bll.VdsEventListener] (pool-3-thread-49) >> ResourceManager::vdsNotResponding entered for Host >> 2e9929c6-bea6-11e1-bfdd-ff11f39c80eb, 10.10.30.177 >> 2012-06-25 10:15:36,214 ERROR >> [org.ovirt.engine.core.bll.VdsNotRespondingTreatmentCommand] >> (pool-3-thread-49) [1afd4b89] Failed to run Fence script on >> vds:ovirt-node2.smb.eurotux.local, VMs moved to UnKnown instead. >> >> While in the node, vdsmd does fail to sample nics: >> >> in /var/log/vdsm/vdsm.log: >> >> nf = netinfo.NetInfo() >> File "/usr/share/vdsm/netinfo.py", line 268, in __init__ >> _netinfo = get() >> File "/usr/share/vdsm/netinfo.py", line 220, in get >> for nic in nics() ]) >> KeyError: 'p36p1' >> >> MainThread::INFO::2012-06-25 10:45:09,110::vdsm::76::vds::(run) VDSM >> main thread ended. Waiting for 1 other threads... >> MainThread::INFO::2012-06-25 10:45:09,111::vdsm::79::vds::(run) >> <_MainThread(MainThread, started 140567823243072)> >> MainThread::INFO::2012-06-25 10:45:09,111::vdsm::79::vds::(run) >> <Thread(libvirtEventLoop, started daemon 140567752681216)> >> >> in /etc/var/log/messages there is a lot of vdsmd died too quickly: >> >> Jun 25 10:45:08 ovirt-node2 respawn: slave '/usr/share/vdsm/vdsm' >> died too quickly, respawning slave >> Jun 25 10:45:08 ovirt-node2 respawn: slave '/usr/share/vdsm/vdsm' >> died too quickly, respawning slave >> Jun 25 10:45:09 ovirt-node2 respawn: slave '/usr/share/vdsm/vdsm' >> died too quickly for more than 30 seconds, master sleeping for 900 >> seconds >> >> I don't know why Fedora 17 calls p36p1 to what was eth0 in Fedora >> 16, but tried to configure a bridge ovirtmgmt and the only >> difference is that KeyError becomes 'ovirtmgmt'. > The nic renaming may have happened due to biosdevname. Do you have it > installed? Does any of the /etc/sysconfig/network-scripts/ifcfg-* refer > to an old nic name? > > Which version of vdsm are you running? It seems that it is > pre-v4.9.4-61-g24f8627 which is too old for f17 to run - the output of > ifconfig has changed. Please retry with latest beta version > https://koji.fedoraproject.org/koji/buildinfo?buildID=327015 > > If the problem persists, could you run vdsm manually, with > # su - vdsm -s /bin/bash > # cd /usr/share/vdsm > # ./vdsm > maybe it would give a hint about the crash. > > regards, > Dan. Well, thank you. I have updated Vdsm to version 4.10. Now the problem is with SSL and XMLRPC.

This is the error in the side of the engine:

/var/log/ovirt-engine/engine.log

ERROR [org.ovirt.engine.core.vdsbroker.vdsbroker.VdsBrokerCommand] (QuartzScheduler_Worker-52) XML RPC error in command GetCapabilitiesVDS ( Vds: ovirt-node2.smb.eurotux.local ), the error was: java.util.concurrent.ExecutionException: java.lang.reflect.InvocationTargetException, NoHttpResponseException: The server 10.10.30.177 failed to respond.

In the side of the node, there seems to be an authentication problem:

/var/log/vdsm/vdsm.log

SSLError: [Errno 1] _ssl.c:504: error:1407609C:SSL routines:SSL23_GET_CLIENT_HELLO:http request Thread-810::ERROR::2012-06-25 12:02:46,351::SecureXMLRPCServer::73::root::(handle_error) client ('10.10.30.101', 58605) Traceback (most recent call last): File "/usr/lib64/python2.7/SocketServer.py", line 582, in process_request_thread self.finish_request(request, client_address) File "/usr/lib/python2.7/site-packages/vdsm/SecureXMLRPCServer.py", line 66, in finish_request request.do_handshake() File "/usr/lib64/python2.7/ssl.py", line 305, in do_handshake self._sslobj.do_handshake() SSLError: [Errno 1] _ssl.c:504: error:1407609C:SSL routines:SSL23_GET_CLIENT_HELLO:http request

In /var/log/messages there is an:

vdsm [5834]: vdsm root ERROR client ()

with the ip address of the engine. Hmm... Do you have ssl=true in your /etc/vdsm/vdsm.conf ? Does vdsm respond locally to

vdsClient -s 0 getVdsCaps

(Maybe your local certificates and key were corrupted, and you will have to re-install the host form Engine in order to create a new set)

I have recreated the db and run engine-setup again. I have tried with ssl= true commented and uncommented in the node. vdsClient -s 0 getVdsCaps works locally and provides the information of the host, but something seems to be preventing it to get to the engine. I am still getting the same error. The installer is not beginning. I can do ssh as root to the host and vdsmd is alive. What do you mean by "The installer is not beginning"? Could you review your /etc/pki/vdsm/certs/ and check that they have been generated by *your* engine? Is the cacert the same as the one on the Engine machine?

Dan. In the engine server I have a self signed certificate. I set up a cacert and server.pem to avoid libvirtd complaining about gssapi. The one in the node is issued by the VDSM Certificate Authority, so I suppose it is set up by the vdsm package installation. So here lies the problem. When Vdsm is first installed, it generates its own self-signed certificate. This, by definition, does not help to identify it for Engine.

When you add a host to a data center, Engine logs into the host and askes the host to produce a *new* key. Engine's CA then signs the key, and put the cert back under /etc/pki/vdsm/cert.

Something has gone wrong in this process. If you re-install the host it *should* override current keys. If not - it is a bug. Could you look at the installation logs (they sit on a random dir under /tmp)? Maybe there's a clue there why you keep the default good-for-almost-nothing keys that come with vdsm.

Dan. Yeah, there lies the problem. There is not installation process that I am aware of. The error seems to be raised in the first transaction, getting the capabilities of the node. As there is no do_handshake or whatever it is called by vdsm, the installation progress does not begin and the host is considered unresponsive and stored with the info I

On 06/25/2012 04:17 PM, Dan Kenigsberg wrote: provided, hostname and IP address, no more. The host just report: File "/usr/lib64/pythoFile "/usr/lib64/python2.7/ssl.py", line 305, in do_handshake self._sslobj.do_handshake() SSLError: [Errno 1] _ssl.c:504: error:1407609C:SSL routines:SSL23_GET_CLIENT_HELLO:http requestn2.7/ssl.py", line 305, in do_handshake self._sslobj.do_handshake() and in /var/log/messages appear a series of not-very-promising news: Jun 25 16:53:19 ovirt-node2 vdsm root ERROR client ('10.10.30.101', 57035) Jun 25 16:53:21 ovirt-node2 vdsm root ERROR client ('10.10.30.101', 35856) Jun 25 16:53:21 ovirt-node2 vdsm root ERROR client ('10.10.30.101', 33413) Jun 25 16:53:21 ovirt-node2 vdsm root ERROR client ('10.10.30.101', 60822) Jun 25 16:53:21 ovirt-node2 vdsm root ERROR client ('10.10.30.101', 61000) Regards.

Itamar Heim

8:23 p.m.

On 06/25/2012 12:00 PM, jose garcia wrote:

...

On 06/25/2012 04:17 PM, Dan Kenigsberg wrote:

...
On Mon, Jun 25, 2012 at 03:15:47PM +0100, jose garcia wrote:

...
On 06/25/2012 01:24 PM, Dan Kenigsberg wrote:

...
On Mon, Jun 25, 2012 at 01:15:08PM +0100, jose garcia wrote:

...
On 06/25/2012 12:30 PM, Dan Kenigsberg wrote:

...
On Mon, Jun 25, 2012 at 12:11:37PM +0100, jose garcia wrote: > On 06/25/2012 11:37 AM, Dan Kenigsberg wrote: >> On Mon, Jun 25, 2012 at 10:57:47AM +0100, jose garcia wrote: >>> Good monday morning, >>> >>> Installed Fedora 17 and tried to install the node to a 3.1 engine. >>> >>> I'm getting an VDS Network exception in the engine side: >>> >>> in /var/log/ovirt-engine/engine: >>> >>> 2012-06-25 10:15:34,132 WARN >>> [org.ovirt.engine.core.vdsbroker.VdsManager] >>> (QuartzScheduler_Worker-96) >>> ResourceManager::refreshVdsRunTimeInfo::Failed to refresh VDS , >>> vds >>> = 2e9929c6-bea6-11e1-bfdd-ff11f39c80eb : >>> ovirt-node2.smb.eurotux.local, VDS Network Error, continuing. >>> VDSNetworkException: >>> 2012-06-25 10:15:36,143 ERROR >>> [org.ovirt.engine.core.vdsbroker.VdsManager] >>> (QuartzScheduler_Worker-20) VDS::handleNetworkException Server >>> failed to respond, vds_id = 2e9929c6-bea6-11e1-bfdd-ff11f39c80eb, >>> vds_name = ovirt-node2.smb.eurotux.local, error = >>> VDSNetworkException: >>> 2012-06-25 10:15:36,181 INFO >>> [org.ovirt.engine.core.bll.VdsEventListener] (pool-3-thread-49) >>> ResourceManager::vdsNotResponding entered for Host >>> 2e9929c6-bea6-11e1-bfdd-ff11f39c80eb, 10.10.30.177 >>> 2012-06-25 10:15:36,214 ERROR >>> [org.ovirt.engine.core.bll.VdsNotRespondingTreatmentCommand] >>> (pool-3-thread-49) [1afd4b89] Failed to run Fence script on >>> vds:ovirt-node2.smb.eurotux.local, VMs moved to UnKnown instead. >>> >>> While in the node, vdsmd does fail to sample nics: >>> >>> in /var/log/vdsm/vdsm.log: >>> >>> nf = netinfo.NetInfo() >>> File "/usr/share/vdsm/netinfo.py", line 268, in __init__ >>> _netinfo = get() >>> File "/usr/share/vdsm/netinfo.py", line 220, in get >>> for nic in nics() ]) >>> KeyError: 'p36p1' >>> >>> MainThread::INFO::2012-06-25 10:45:09,110::vdsm::76::vds::(run) >>> VDSM >>> main thread ended. Waiting for 1 other threads... >>> MainThread::INFO::2012-06-25 10:45:09,111::vdsm::79::vds::(run) >>> <_MainThread(MainThread, started 140567823243072)> >>> MainThread::INFO::2012-06-25 10:45:09,111::vdsm::79::vds::(run) >>> <Thread(libvirtEventLoop, started daemon 140567752681216)> >>> >>> in /etc/var/log/messages there is a lot of vdsmd died too quickly: >>> >>> Jun 25 10:45:08 ovirt-node2 respawn: slave '/usr/share/vdsm/vdsm' >>> died too quickly, respawning slave >>> Jun 25 10:45:08 ovirt-node2 respawn: slave '/usr/share/vdsm/vdsm' >>> died too quickly, respawning slave >>> Jun 25 10:45:09 ovirt-node2 respawn: slave '/usr/share/vdsm/vdsm' >>> died too quickly for more than 30 seconds, master sleeping for 900 >>> seconds >>> >>> I don't know why Fedora 17 calls p36p1 to what was eth0 in Fedora >>> 16, but tried to configure a bridge ovirtmgmt and the only >>> difference is that KeyError becomes 'ovirtmgmt'. >> The nic renaming may have happened due to biosdevname. Do you >> have it >> installed? Does any of the >> /etc/sysconfig/network-scripts/ifcfg-* refer >> to an old nic name? >> >> Which version of vdsm are you running? It seems that it is >> pre-v4.9.4-61-g24f8627 which is too old for f17 to run - the >> output of >> ifconfig has changed. Please retry with latest beta version >> https://koji.fedoraproject.org/koji/buildinfo?buildID=327015 >> >> If the problem persists, could you run vdsm manually, with >> # su - vdsm -s /bin/bash >> # cd /usr/share/vdsm >> # ./vdsm >> maybe it would give a hint about the crash. >> >> regards, >> Dan. > Well, thank you. I have updated Vdsm to version 4.10. Now the > problem is with SSL and XMLRPC. > > This is the error in the side of the engine: > > /var/log/ovirt-engine/engine.log > > ERROR [org.ovirt.engine.core.vdsbroker.vdsbroker.VdsBrokerCommand] > (QuartzScheduler_Worker-52) XML RPC error in command > GetCapabilitiesVDS ( Vds: ovirt-node2.smb.eurotux.local ), the error > was: java.util.concurrent.ExecutionException: > java.lang.reflect.InvocationTargetException, > NoHttpResponseException: The server 10.10.30.177 failed to respond. > > In the side of the node, there seems to be an authentication > problem: > > /var/log/vdsm/vdsm.log > > SSLError: [Errno 1] _ssl.c:504: error:1407609C:SSL > routines:SSL23_GET_CLIENT_HELLO:http request > Thread-810::ERROR::2012-06-25 > 12:02:46,351::SecureXMLRPCServer::73::root::(handle_error) client > ('10.10.30.101', 58605) > Traceback (most recent call last): > File "/usr/lib64/python2.7/SocketServer.py", line 582, in > process_request_thread > self.finish_request(request, client_address) > File > "/usr/lib/python2.7/site-packages/vdsm/SecureXMLRPCServer.py", line > 66, in finish_request > request.do_handshake() > File "/usr/lib64/python2.7/ssl.py", line 305, in do_handshake > self._sslobj.do_handshake() > SSLError: [Errno 1] _ssl.c:504: error:1407609C:SSL > routines:SSL23_GET_CLIENT_HELLO:http request > > In /var/log/messages there is an: > > vdsm [5834]: vdsm root ERROR client () > > with the ip address of the engine. Hmm... Do you have ssl=true in your /etc/vdsm/vdsm.conf ? Does vdsm respond locally to

vdsClient -s 0 getVdsCaps

(Maybe your local certificates and key were corrupted, and you will have to re-install the host form Engine in order to create a new set)

I have recreated the db and run engine-setup again. I have tried with ssl= true commented and uncommented in the node. vdsClient -s 0 getVdsCaps works locally and provides the information of the host, but something seems to be preventing it to get to the engine. I am still getting the same error. The installer is not beginning. I can do ssh as root to the host and vdsmd is alive. What do you mean by "The installer is not beginning"? Could you review your /etc/pki/vdsm/certs/ and check that they have been generated by *your* engine? Is the cacert the same as the one on the Engine machine?

Dan. In the engine server I have a self signed certificate. I set up a cacert and server.pem to avoid libvirtd complaining about gssapi. The one in the node is issued by the VDSM Certificate Authority, so I suppose it is set up by the vdsm package installation. So here lies the problem. When Vdsm is first installed, it generates its own self-signed certificate. This, by definition, does not help to identify it for Engine.

When you add a host to a data center, Engine logs into the host and askes the host to produce a *new* key. Engine's CA then signs the key, and put the cert back under /etc/pki/vdsm/cert.

Something has gone wrong in this process. If you re-install the host it *should* override current keys. If not - it is a bug. Could you look at the installation logs (they sit on a random dir under /tmp)? Maybe there's a clue there why you keep the default good-for-almost-nothing keys that come with vdsm.

Dan. Yeah, there lies the problem. There is not installation process that I am aware of. The error seems to be raised in the first transaction, getting the capabilities of the node. As there is no do_handshake or

getting the capailites is already *post* the installation which happens when you "add host". please remove and re-add the host to engine to re-create the certificates. (assuming its a clean install of engine, and you didn't change its config to isntallVds=false)

...

whatever it is called by vdsm, the installation progress does not begin and the host is considered unresponsive and stored with the info I provided, hostname and IP address, no more.

The host just report:

File "/usr/lib64/pythoFile "/usr/lib64/python2.7/ssl.py", line 305, in do_handshake self._sslobj.do_handshake() SSLError: [Errno 1] _ssl.c:504: error:1407609C:SSL routines:SSL23_GET_CLIENT_HELLO:http requestn2.7/ssl.py", line 305, in do_handshake self._sslobj.do_handshake()

and in /var/log/messages appear a series of not-very-promising news:

Jun 25 16:53:19 ovirt-node2 vdsm root ERROR client ('10.10.30.101', 57035) Jun 25 16:53:21 ovirt-node2 vdsm root ERROR client ('10.10.30.101', 35856) Jun 25 16:53:21 ovirt-node2 vdsm root ERROR client ('10.10.30.101', 33413) Jun 25 16:53:21 ovirt-node2 vdsm root ERROR client ('10.10.30.101', 60822) Jun 25 16:53:21 ovirt-node2 vdsm root ERROR client ('10.10.30.101', 61000)

Regards.

_______________________________________________ Users mailing list Users@ovirt.org http://lists.ovirt.org/mailman/listinfo/users

4852

Age (days ago)

4852

Last active (days ago)

List overview

Download

9 comments

3 participants

participants (3)

Dan Kenigsberg
Itamar Heim
jose garcia