On 12/09/2014 05:33 AM, Alex Crow wrote:
Hi,
Will the vdsm patches to properly enable libgfapi storage for VMs (and
matching refactored code in the hosted-engine setup scripts) for VMs
make it into 3.5.1? It's not in the snapshots yet it seems.
I notice it's in master/3.6 snapshot but something stops the HA stuff in
self-hosted setups from connecting storage:
from Master test setup:
/var/log/ovirt-hosted-engine-ha/broker.log
MainThread::INFO::2014-12-08
19:22:56,287::hosted_engine::222::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(_get_hostname)
Found certificate common name: 172.17.10.50
MainThread::WARNING::2014-12-08
19:22:56,395::hosted_engine::497::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(_initialize_vdsm)
Failed to connect storage, waiting '15' seconds before the next attempt
MainThread::WARNING::2014-12-08
19:23:11,501::hosted_engine::497::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(_initialize_vdsm)
Failed to connect storage, waiting '15' seconds before the next attempt
MainThread::WARNING::2014-12-08
19:23:26,610::hosted_engine::497::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(_initialize_vdsm)
Failed to connect storage, waiting '15' seconds before the next attempt
MainThread::WARNING::2014-12-08
19:23:41,717::hosted_engine::497::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(_initialize_vdsm)
Failed to connect storage, waiting '15' seconds before the next attempt
MainThread::WARNING::2014-12-08
19:23:56,824::hosted_engine::497::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(_initialize_vdsm)
Failed to connect storage, waiting '15' seconds before the next attempt
MainThread::ERROR::2014-12-08
19:24:11,840::hosted_engine::500::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(_initialize_vdsm)
Failed trying to connect storage:
MainThread::ERROR::2014-12-08
19:24:11,840::agent::173::ovirt_hosted_engine_ha.agent.agent.Agent::(_run_agent)
Error: 'Failed trying to connect storage' - trying to restart agent
MainThread::WARNING::2014-12-08
19:24:16,845::agent::176::ovirt_hosted_engine_ha.agent.agent.Agent::(_run_agent)
Restarting agent, attempt '8'
MainThread::INFO::2014-12-08
19:24:16,855::hosted_engine::222::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(_get_hostname)
Found certificate common name: 172.17.10.50
MainThread::WARNING::2014-12-08
19:24:16,962::hosted_engine::497::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(_initialize_vdsm)
Failed to connect storage, waiting '15' seconds before the next attempt
MainThread::WARNING::2014-12-08
19:24:32,069::hosted_engine::497::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(_initialize_vdsm)
Failed to connect storage, waiting '15' seconds before the next attempt
MainThread::WARNING::2014-12-08
19:24:47,181::hosted_engine::497::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(_initialize_vdsm)
Failed to connect storage, waiting '15' seconds before the next attempt
MainThread::WARNING::2014-12-08
19:25:02,288::hosted_engine::497::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(_initialize_vdsm)
Failed to connect storage, waiting '15' seconds before the next attempt
MainThread::WARNING::2014-12-08
19:25:17,389::hosted_engine::497::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(_initialize_vdsm)
Failed to connect storage, waiting '15' seconds before the next attempt
MainThread::ERROR::2014-12-08
19:25:32,404::hosted_engine::500::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(_initialize_vdsm)
Failed trying to connect storage:
MainThread::ERROR::2014-12-08
19:25:32,404::agent::173::ovirt_hosted_engine_ha.agent.agent.Agent::(_run_agent)
Error: 'Failed trying to connect storage' - trying to restart agent
MainThread::WARNING::2014-12-08
19:25:37,409::agent::176::ovirt_hosted_engine_ha.agent.agent.Agent::(_run_agent)
Restarting agent, attempt '9'
MainThread::ERROR::2014-12-08
19:25:37,409::agent::178::ovirt_hosted_engine_ha.agent.agent.Agent::(_run_agent)
Too many errors occurred, giving up. Please review the log and consider
filing a bug.
MainThread::INFO::2014-12-08
19:25:37,409::agent::118::ovirt_hosted_engine_ha.agent.agent.Agent::(run) Agent
shutting down
(END) - Next: /var/log/ovirt-hosted-engine-ha/broker.log
vdsm.log:
Detector thread::DEBUG::2014-12-08
19:20:45,458::protocoldetector::214::vds.MultiProtocolAcceptor::(_remove_connection)
Removing connection 127.0.0.1:53083
Detector thread::DEBUG::2014-12-08
19:20:45,458::BindingXMLRPC::1193::XmlDetector::(handleSocket) xml over
http detected from ('127.0.0.1', 53083)
Thread-44::DEBUG::2014-12-08
19:20:45,459::BindingXMLRPC::318::vds::(wrapper) client [127.0.0.1]
Thread-44::DEBUG::2014-12-08
19:20:45,460::task::592::Storage.TaskManager.Task::(_updateState)
Task=`b5accf8f-014a-412d-9fb8-9e9447d49b72`::moving from state init ->
state preparing
Thread-44::INFO::2014-12-08
19:20:45,460::logUtils::48::dispatcher::(wrapper) Run and protect:
connectStorageServer(domType=1,
spUUID='ab2b5ee7-9aa7-426f-9d58-5e7d3840ad81', conList=[{'connection':
'zebulon.ifa.net:/engine', 'iqn': ',',
'protocol_version': '3'
, 'kvm': 'password', '=': 'user', ',':
'='}], options=None)
Thread-44::DEBUG::2014-12-08
19:20:45,461::hsm::2384::Storage.HSM::(__prefetchDomains) nfs local
path: /rhev/data-center/mnt/zebulon.ifa.net:_engine
Thread-44::DEBUG::2014-12-08
19:20:45,462::hsm::2408::Storage.HSM::(__prefetchDomains) Found SD
uuids: (u'd3240928-dae9-4ed0-8a28-7ab552455063',)
Thread-44::DEBUG::2014-12-08
19:20:45,463::hsm::2464::Storage.HSM::(connectStorageServer) knownSDs:
{d3240928-dae9-4ed0-8a28-7ab552455063: storage.nfsSD.findDomain}
Thread-44::ERROR::2014-12-08
19:20:45,463::task::863::Storage.TaskManager.Task::(_setError)
Task=`b5accf8f-014a-412d-9fb8-9e9447d49b72`::Unexpected error
Traceback (most recent call last):
File "/usr/share/vdsm/storage/task.py", line 870, in _run
return fn(*args, **kargs)
File "/usr/share/vdsm/logUtils.py", line 49, in wrapper
res = f(*args, **kwargs)
File "/usr/share/vdsm/storage/hsm.py", line 2466, in
connectStorageServer
res.append({'id': conDef["id"], 'status': status})
KeyError: 'id'
Thread-44::DEBUG::2014-12-08
19:20:45,463::task::882::Storage.TaskManager.Task::(_run)
Task=`b5accf8f-014a-412d-9fb8-9e9447d49b72`::Task._run:
b5accf8f-014a-412d-9fb8-9e9447d49b72 (1,
'ab2b5ee7-9aa7-426f-9d58-5e7d3840ad81', [{'kvm': 'password',
',': '=',
'conn
ection': 'zebulon.ifa.net:/engine', 'iqn': ',',
'protocol_version': '3',
'=': 'user'}]) {} failed - stopping task
Thread-44::DEBUG::2014-12-08
19:20:45,463::task::1214::Storage.TaskManager.Task::(stop)
Task=`b5accf8f-014a-412d-9fb8-9e9447d49b72`::stopping in state preparing
(force False)
Thread-44::DEBUG::2014-12-08
19:20:45,463::task::990::Storage.TaskManager.Task::(_decref)
Task=`b5accf8f-014a-412d-9fb8-9e9447d49b72`::ref 1 aborting True
Thread-44::INFO::2014-12-08
19:20:45,463::task::1168::Storage.TaskManager.Task::(prepare)
Task=`b5accf8f-014a-412d-9fb8-9e9447d49b72`::aborting: Task is aborted:
u"'id'" - code 100
Thread-44::DEBUG::2014-12-08
19:20:45,463::task::1173::Storage.TaskManager.Task::(prepare)
Task=`b5accf8f-014a-412d-9fb8-9e9447d49b72`::Prepare: aborted: 'id'
Thread-44::DEBUG::2014-12-08
19:20:45,463::task::990::Storage.TaskManager.Task::(_decref)
Task=`b5accf8f-014a-412d-9fb8-9e9447d49b72`::ref 0 aborting True
Thread-44::DEBUG::2014-12-08
19:20:45,463::task::925::Storage.TaskManager.Task::(_doAbort)
Task=`b5accf8f-014a-412d-9fb8-9e9447d49b72`::Task._doAbort: force False
Thread-44::DEBUG::2014-12-08
19:20:45,463::resourceManager::977::Storage.ResourceManager.Owner::(cancelAll)
Owner.cancelAll requests {}
Thread-44::DEBUG::2014-12-08
19:20:45,463::task::592::Storage.TaskManager.Task::(_updateState)
Task=`b5accf8f-014a-412d-9fb8-9e9447d49b72`::moving from state preparing
-> state aborting
Thread-44::DEBUG::2014-12-08
19:20:45,464::task::547::Storage.TaskManager.Task::(__state_aborting)
Task=`b5accf8f-014a-412d-9fb8-9e9447d49b72`::_aborting: recover policy none
Thread-44::DEBUG::2014-12-08
19:20:45,464::task::592::Storage.TaskManager.Task::(_updateState)
Task=`b5accf8f-014a-412d-9fb8-9e9447d49b72`::moving from state aborting
-> state failed
Thread-44::DEBUG::2014-12-08
19:20:45,464::resourceManager::940::Storage.ResourceManager.Owner::(releaseAll)
Owner.releaseAll requests {} resources {}
Thread-44::DEBUG::2014-12-08
19:20:45,464::resourceManager::977::Storage.ResourceManager.Owner::(cancelAll)
Owner.cancelAll requests {}
Thread-44::ERROR::2014-12-08
19:20:45,464::dispatcher::79::Storage.Dispatcher::(wrapper) 'id'
Traceback (most recent call last):
File "/usr/share/vdsm/storage/dispatcher.py", line 71, in wrapper
result = ctask.prepare(func, *args, **kwargs)
File "/usr/share/vdsm/storage/task.py", line 103, in wrapper
return m(self, *a, **kw)
File "/usr/share/vdsm/storage/task.py", line 1176, in prepare
raise self.error
KeyError: 'id'
clientIFinit::ERROR::2014-12-08
19:20:48,190::clientIF::460::vds::(_recoverExistingVms) Vm's recovery
failed
Traceback (most recent call last):
File "/usr/share/vdsm/clientIF.py", line 404, in _recoverExistingVms
caps.CpuTopology().cores())
File "/usr/share/vdsm/caps.py", line 200, in __init__
self._topology = _getCpuTopology(capabilities)
File "/usr/share/vdsm/caps.py", line 232, in _getCpuTopology
capabilities = _getFreshCapsXMLStr()
File "/usr/share/vdsm/caps.py", line 222, in _getFreshCapsXMLStr
return libvirtconnection.get().getCapabilities()
File "/usr/lib/python2.7/site-packages/vdsm/libvirtconnection.py",
line 157, in get
passwd)
File "/usr/lib/python2.7/site-packages/vdsm/libvirtconnection.py",
line 102, in open_connection
return utils.retry(libvirtOpen, timeout=10, sleep=0.2)
File "/usr/lib/python2.7/site-packages/vdsm/utils.py", line 935, in
retry
return func()
File "/usr/lib64/python2.7/site-packages/libvirt.py", line 102, in
openAuth
if ret is None:raise libvirtError('virConnectOpenAuth() failed')
libvirtError: authentication failed: polkit:
polkit\56retains_authorization_after_challenge=1
Authorization requires authentication but no agent is available.
not sure about above error, but libgfapi are still in out of tree
testing mode. iirc, federico created a job which keeps building vdsm
rpm's with the patch for easy consumption.