Hi,
Anyone know if this is due to work correctly in the next iteration of 3.5?
Thanks
Alex
On 09/12/14 10:33, Alex Crow wrote:
Hi,
Will the vdsm patches to properly enable libgfapi storage for VMs (and
matching refactored code in the hosted-engine setup scripts) for VMs
make it into 3.5.1? It's not in the snapshots yet it seems.
I notice it's in master/3.6 snapshot but something stops the HA stuff
in self-hosted setups from connecting storage:
from Master test setup:
/var/log/ovirt-hosted-engine-ha/broker.log
MainThread::INFO::2014-12-08
19:22:56,287::hosted_engine::222::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(_get_hostname)
Found certificate common name: 172.17.10.50
MainThread::WARNING::2014-12-08
19:22:56,395::hosted_engine::497::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(_initialize_vdsm)
Failed to connect storage, waiting '15' seconds before the next attempt
MainThread::WARNING::2014-12-08
19:23:11,501::hosted_engine::497::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(_initialize_vdsm)
Failed to connect storage, waiting '15' seconds before the next attempt
MainThread::WARNING::2014-12-08
19:23:26,610::hosted_engine::497::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(_initialize_vdsm)
Failed to connect storage, waiting '15' seconds before the next attempt
MainThread::WARNING::2014-12-08
19:23:41,717::hosted_engine::497::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(_initialize_vdsm)
Failed to connect storage, waiting '15' seconds before the next attempt
MainThread::WARNING::2014-12-08
19:23:56,824::hosted_engine::497::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(_initialize_vdsm)
Failed to connect storage, waiting '15' seconds before the next attempt
MainThread::ERROR::2014-12-08
19:24:11,840::hosted_engine::500::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(_initialize_vdsm)
Failed trying to connect storage:
MainThread::ERROR::2014-12-08
19:24:11,840::agent::173::ovirt_hosted_engine_ha.agent.agent.Agent::(_run_agent)
Error: 'Failed trying to connect storage' - trying to restart agent
MainThread::WARNING::2014-12-08
19:24:16,845::agent::176::ovirt_hosted_engine_ha.agent.agent.Agent::(_run_agent)
Restarting agent, attempt '8'
MainThread::INFO::2014-12-08
19:24:16,855::hosted_engine::222::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(_get_hostname)
Found certificate common name: 172.17.10.50
MainThread::WARNING::2014-12-08
19:24:16,962::hosted_engine::497::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(_initialize_vdsm)
Failed to connect storage, waiting '15' seconds before the next attempt
MainThread::WARNING::2014-12-08
19:24:32,069::hosted_engine::497::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(_initialize_vdsm)
Failed to connect storage, waiting '15' seconds before the next attempt
MainThread::WARNING::2014-12-08
19:24:47,181::hosted_engine::497::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(_initialize_vdsm)
Failed to connect storage, waiting '15' seconds before the next attempt
MainThread::WARNING::2014-12-08
19:25:02,288::hosted_engine::497::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(_initialize_vdsm)
Failed to connect storage, waiting '15' seconds before the next attempt
MainThread::WARNING::2014-12-08
19:25:17,389::hosted_engine::497::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(_initialize_vdsm)
Failed to connect storage, waiting '15' seconds before the next attempt
MainThread::ERROR::2014-12-08
19:25:32,404::hosted_engine::500::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(_initialize_vdsm)
Failed trying to connect storage:
MainThread::ERROR::2014-12-08
19:25:32,404::agent::173::ovirt_hosted_engine_ha.agent.agent.Agent::(_run_agent)
Error: 'Failed trying to connect storage' - trying to restart agent
MainThread::WARNING::2014-12-08
19:25:37,409::agent::176::ovirt_hosted_engine_ha.agent.agent.Agent::(_run_agent)
Restarting agent, attempt '9'
MainThread::ERROR::2014-12-08
19:25:37,409::agent::178::ovirt_hosted_engine_ha.agent.agent.Agent::(_run_agent)
Too many errors occurred, giving up. Please review the log and
consider filing a bug.
MainThread::INFO::2014-12-08
19:25:37,409::agent::118::ovirt_hosted_engine_ha.agent.agent.Agent::(run)
Agent shutting down
(END) - Next: /var/log/ovirt-hosted-engine-ha/broker.log
vdsm.log:
Detector thread::DEBUG::2014-12-08
19:20:45,458::protocoldetector::214::vds.MultiProtocolAcceptor::(_remove_connection)
Removing connection 127.0.0.1:53083
Detector thread::DEBUG::2014-12-08
19:20:45,458::BindingXMLRPC::1193::XmlDetector::(handleSocket) xml
over http detected from ('127.0.0.1', 53083)
Thread-44::DEBUG::2014-12-08
19:20:45,459::BindingXMLRPC::318::vds::(wrapper) client [127.0.0.1]
Thread-44::DEBUG::2014-12-08
19:20:45,460::task::592::Storage.TaskManager.Task::(_updateState)
Task=`b5accf8f-014a-412d-9fb8-9e9447d49b72`::moving from state init ->
state preparing
Thread-44::INFO::2014-12-08
19:20:45,460::logUtils::48::dispatcher::(wrapper) Run and protect:
connectStorageServer(domType=1,
spUUID='ab2b5ee7-9aa7-426f-9d58-5e7d3840ad81', conList=[{'connection':
'zebulon.ifa.net:/engine', 'iqn': ',',
'protocol_version': '3'
, 'kvm': 'password', '=': 'user', ',':
'='}], options=None)
Thread-44::DEBUG::2014-12-08
19:20:45,461::hsm::2384::Storage.HSM::(__prefetchDomains) nfs local
path: /rhev/data-center/mnt/zebulon.ifa.net:_engine
Thread-44::DEBUG::2014-12-08
19:20:45,462::hsm::2408::Storage.HSM::(__prefetchDomains) Found SD
uuids: (u'd3240928-dae9-4ed0-8a28-7ab552455063',)
Thread-44::DEBUG::2014-12-08
19:20:45,463::hsm::2464::Storage.HSM::(connectStorageServer) knownSDs:
{d3240928-dae9-4ed0-8a28-7ab552455063: storage.nfsSD.findDomain}
Thread-44::ERROR::2014-12-08
19:20:45,463::task::863::Storage.TaskManager.Task::(_setError)
Task=`b5accf8f-014a-412d-9fb8-9e9447d49b72`::Unexpected error
Traceback (most recent call last):
File "/usr/share/vdsm/storage/task.py", line 870, in _run
return fn(*args, **kargs)
File "/usr/share/vdsm/logUtils.py", line 49, in wrapper
res = f(*args, **kwargs)
File "/usr/share/vdsm/storage/hsm.py", line 2466, in
connectStorageServer
res.append({'id': conDef["id"], 'status': status})
KeyError: 'id'
Thread-44::DEBUG::2014-12-08
19:20:45,463::task::882::Storage.TaskManager.Task::(_run)
Task=`b5accf8f-014a-412d-9fb8-9e9447d49b72`::Task._run:
b5accf8f-014a-412d-9fb8-9e9447d49b72 (1,
'ab2b5ee7-9aa7-426f-9d58-5e7d3840ad81', [{'kvm': 'password',
',': '=',
'conn
ection': 'zebulon.ifa.net:/engine', 'iqn': ',',
'protocol_version':
'3', '=': 'user'}]) {} failed - stopping task
Thread-44::DEBUG::2014-12-08
19:20:45,463::task::1214::Storage.TaskManager.Task::(stop)
Task=`b5accf8f-014a-412d-9fb8-9e9447d49b72`::stopping in state
preparing (force False)
Thread-44::DEBUG::2014-12-08
19:20:45,463::task::990::Storage.TaskManager.Task::(_decref)
Task=`b5accf8f-014a-412d-9fb8-9e9447d49b72`::ref 1 aborting True
Thread-44::INFO::2014-12-08
19:20:45,463::task::1168::Storage.TaskManager.Task::(prepare)
Task=`b5accf8f-014a-412d-9fb8-9e9447d49b72`::aborting: Task is
aborted: u"'id'" - code 100
Thread-44::DEBUG::2014-12-08
19:20:45,463::task::1173::Storage.TaskManager.Task::(prepare)
Task=`b5accf8f-014a-412d-9fb8-9e9447d49b72`::Prepare: aborted: 'id'
Thread-44::DEBUG::2014-12-08
19:20:45,463::task::990::Storage.TaskManager.Task::(_decref)
Task=`b5accf8f-014a-412d-9fb8-9e9447d49b72`::ref 0 aborting True
Thread-44::DEBUG::2014-12-08
19:20:45,463::task::925::Storage.TaskManager.Task::(_doAbort)
Task=`b5accf8f-014a-412d-9fb8-9e9447d49b72`::Task._doAbort: force False
Thread-44::DEBUG::2014-12-08
19:20:45,463::resourceManager::977::Storage.ResourceManager.Owner::(cancelAll)
Owner.cancelAll requests {}
Thread-44::DEBUG::2014-12-08
19:20:45,463::task::592::Storage.TaskManager.Task::(_updateState)
Task=`b5accf8f-014a-412d-9fb8-9e9447d49b72`::moving from state
preparing -> state aborting
Thread-44::DEBUG::2014-12-08
19:20:45,464::task::547::Storage.TaskManager.Task::(__state_aborting)
Task=`b5accf8f-014a-412d-9fb8-9e9447d49b72`::_aborting: recover policy
none
Thread-44::DEBUG::2014-12-08
19:20:45,464::task::592::Storage.TaskManager.Task::(_updateState)
Task=`b5accf8f-014a-412d-9fb8-9e9447d49b72`::moving from state
aborting -> state failed
Thread-44::DEBUG::2014-12-08
19:20:45,464::resourceManager::940::Storage.ResourceManager.Owner::(releaseAll)
Owner.releaseAll requests {} resources {}
Thread-44::DEBUG::2014-12-08
19:20:45,464::resourceManager::977::Storage.ResourceManager.Owner::(cancelAll)
Owner.cancelAll requests {}
Thread-44::ERROR::2014-12-08
19:20:45,464::dispatcher::79::Storage.Dispatcher::(wrapper) 'id'
Traceback (most recent call last):
File "/usr/share/vdsm/storage/dispatcher.py", line 71, in wrapper
result = ctask.prepare(func, *args, **kwargs)
File "/usr/share/vdsm/storage/task.py", line 103, in wrapper
return m(self, *a, **kw)
File "/usr/share/vdsm/storage/task.py", line 1176, in prepare
raise self.error
KeyError: 'id'
clientIFinit::ERROR::2014-12-08
19:20:48,190::clientIF::460::vds::(_recoverExistingVms) Vm's recovery
failed
Traceback (most recent call last):
File "/usr/share/vdsm/clientIF.py", line 404, in _recoverExistingVms
caps.CpuTopology().cores())
File "/usr/share/vdsm/caps.py", line 200, in __init__
self._topology = _getCpuTopology(capabilities)
File "/usr/share/vdsm/caps.py", line 232, in _getCpuTopology
capabilities = _getFreshCapsXMLStr()
File "/usr/share/vdsm/caps.py", line 222, in _getFreshCapsXMLStr
return libvirtconnection.get().getCapabilities()
File "/usr/lib/python2.7/site-packages/vdsm/libvirtconnection.py",
line 157, in get
passwd)
File "/usr/lib/python2.7/site-packages/vdsm/libvirtconnection.py",
line 102, in open_connection
return utils.retry(libvirtOpen, timeout=10, sleep=0.2)
File "/usr/lib/python2.7/site-packages/vdsm/utils.py", line 935, in
retry
return func()
File "/usr/lib64/python2.7/site-packages/libvirt.py", line 102, in
openAuth
if ret is None:raise libvirtError('virConnectOpenAuth() failed')
libvirtError: authentication failed: polkit:
polkit\56retains_authorization_after_challenge=1
Authorization requires authentication but no agent is available.
--
This message is intended only for the addressee and may contain
confidential information. Unless you are that person, you may not
disclose its contents or use it in any way and are requested to delete
the message along with any attachments and notify us immediately.
"Transact" is operated by Integrated Financial Arrangements plc. 29
Clement's Lane, London EC4N 7AE. Tel: (020) 7608 4900 Fax: (020) 7608
5300. (Registered office: as above; Registered in England and Wales
under number: 3727592). Authorised and regulated by the Financial
Conduct Authority (entered on the Financial Services Register; no. 190856).