[ovirt-users] gfapi, 3.5.1
Itamar Heim
iheim at redhat.com
Fri Dec 12 13:36:04 UTC 2014
On 12/09/2014 05:33 AM, Alex Crow wrote:
> Hi,
>
> Will the vdsm patches to properly enable libgfapi storage for VMs (and
> matching refactored code in the hosted-engine setup scripts) for VMs
> make it into 3.5.1? It's not in the snapshots yet it seems.
>
> I notice it's in master/3.6 snapshot but something stops the HA stuff in
> self-hosted setups from connecting storage:
>
> from Master test setup:
> /var/log/ovirt-hosted-engine-ha/broker.log
>
> MainThread::INFO::2014-12-08
> 19:22:56,287::hosted_engine::222::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(_get_hostname)
> Found certificate common name: 172.17.10.50
> MainThread::WARNING::2014-12-08
> 19:22:56,395::hosted_engine::497::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(_initialize_vdsm)
> Failed to connect storage, waiting '15' seconds before the next attempt
> MainThread::WARNING::2014-12-08
> 19:23:11,501::hosted_engine::497::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(_initialize_vdsm)
> Failed to connect storage, waiting '15' seconds before the next attempt
> MainThread::WARNING::2014-12-08
> 19:23:26,610::hosted_engine::497::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(_initialize_vdsm)
> Failed to connect storage, waiting '15' seconds before the next attempt
> MainThread::WARNING::2014-12-08
> 19:23:41,717::hosted_engine::497::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(_initialize_vdsm)
> Failed to connect storage, waiting '15' seconds before the next attempt
> MainThread::WARNING::2014-12-08
> 19:23:56,824::hosted_engine::497::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(_initialize_vdsm)
> Failed to connect storage, waiting '15' seconds before the next attempt
> MainThread::ERROR::2014-12-08
> 19:24:11,840::hosted_engine::500::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(_initialize_vdsm)
> Failed trying to connect storage:
> MainThread::ERROR::2014-12-08
> 19:24:11,840::agent::173::ovirt_hosted_engine_ha.agent.agent.Agent::(_run_agent)
> Error: 'Failed trying to connect storage' - trying to restart agent
> MainThread::WARNING::2014-12-08
> 19:24:16,845::agent::176::ovirt_hosted_engine_ha.agent.agent.Agent::(_run_agent)
> Restarting agent, attempt '8'
> MainThread::INFO::2014-12-08
> 19:24:16,855::hosted_engine::222::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(_get_hostname)
> Found certificate common name: 172.17.10.50
> MainThread::WARNING::2014-12-08
> 19:24:16,962::hosted_engine::497::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(_initialize_vdsm)
> Failed to connect storage, waiting '15' seconds before the next attempt
> MainThread::WARNING::2014-12-08
> 19:24:32,069::hosted_engine::497::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(_initialize_vdsm)
> Failed to connect storage, waiting '15' seconds before the next attempt
> MainThread::WARNING::2014-12-08
> 19:24:47,181::hosted_engine::497::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(_initialize_vdsm)
> Failed to connect storage, waiting '15' seconds before the next attempt
> MainThread::WARNING::2014-12-08
> 19:25:02,288::hosted_engine::497::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(_initialize_vdsm)
> Failed to connect storage, waiting '15' seconds before the next attempt
> MainThread::WARNING::2014-12-08
> 19:25:17,389::hosted_engine::497::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(_initialize_vdsm)
> Failed to connect storage, waiting '15' seconds before the next attempt
> MainThread::ERROR::2014-12-08
> 19:25:32,404::hosted_engine::500::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(_initialize_vdsm)
> Failed trying to connect storage:
> MainThread::ERROR::2014-12-08
> 19:25:32,404::agent::173::ovirt_hosted_engine_ha.agent.agent.Agent::(_run_agent)
> Error: 'Failed trying to connect storage' - trying to restart agent
> MainThread::WARNING::2014-12-08
> 19:25:37,409::agent::176::ovirt_hosted_engine_ha.agent.agent.Agent::(_run_agent)
> Restarting agent, attempt '9'
> MainThread::ERROR::2014-12-08
> 19:25:37,409::agent::178::ovirt_hosted_engine_ha.agent.agent.Agent::(_run_agent)
> Too many errors occurred, giving up. Please review the log and consider
> filing a bug.
> MainThread::INFO::2014-12-08
> 19:25:37,409::agent::118::ovirt_hosted_engine_ha.agent.agent.Agent::(run) Agent
> shutting down
> (END) - Next: /var/log/ovirt-hosted-engine-ha/broker.log
>
> vdsm.log:
>
> Detector thread::DEBUG::2014-12-08
> 19:20:45,458::protocoldetector::214::vds.MultiProtocolAcceptor::(_remove_connection)
> Removing connection 127.0.0.1:53083
> Detector thread::DEBUG::2014-12-08
> 19:20:45,458::BindingXMLRPC::1193::XmlDetector::(handleSocket) xml over
> http detected from ('127.0.0.1', 53083)
> Thread-44::DEBUG::2014-12-08
> 19:20:45,459::BindingXMLRPC::318::vds::(wrapper) client [127.0.0.1]
> Thread-44::DEBUG::2014-12-08
> 19:20:45,460::task::592::Storage.TaskManager.Task::(_updateState)
> Task=`b5accf8f-014a-412d-9fb8-9e9447d49b72`::moving from state init ->
> state preparing
> Thread-44::INFO::2014-12-08
> 19:20:45,460::logUtils::48::dispatcher::(wrapper) Run and protect:
> connectStorageServer(domType=1,
> spUUID='ab2b5ee7-9aa7-426f-9d58-5e7d3840ad81', conList=[{'connection':
> 'zebulon.ifa.net:/engine', 'iqn': ',', 'protocol_version': '3'
> , 'kvm': 'password', '=': 'user', ',': '='}], options=None)
> Thread-44::DEBUG::2014-12-08
> 19:20:45,461::hsm::2384::Storage.HSM::(__prefetchDomains) nfs local
> path: /rhev/data-center/mnt/zebulon.ifa.net:_engine
> Thread-44::DEBUG::2014-12-08
> 19:20:45,462::hsm::2408::Storage.HSM::(__prefetchDomains) Found SD
> uuids: (u'd3240928-dae9-4ed0-8a28-7ab552455063',)
> Thread-44::DEBUG::2014-12-08
> 19:20:45,463::hsm::2464::Storage.HSM::(connectStorageServer) knownSDs:
> {d3240928-dae9-4ed0-8a28-7ab552455063: storage.nfsSD.findDomain}
> Thread-44::ERROR::2014-12-08
> 19:20:45,463::task::863::Storage.TaskManager.Task::(_setError)
> Task=`b5accf8f-014a-412d-9fb8-9e9447d49b72`::Unexpected error
> Traceback (most recent call last):
> File "/usr/share/vdsm/storage/task.py", line 870, in _run
> return fn(*args, **kargs)
> File "/usr/share/vdsm/logUtils.py", line 49, in wrapper
> res = f(*args, **kwargs)
> File "/usr/share/vdsm/storage/hsm.py", line 2466, in
> connectStorageServer
> res.append({'id': conDef["id"], 'status': status})
> KeyError: 'id'
> Thread-44::DEBUG::2014-12-08
> 19:20:45,463::task::882::Storage.TaskManager.Task::(_run)
> Task=`b5accf8f-014a-412d-9fb8-9e9447d49b72`::Task._run:
> b5accf8f-014a-412d-9fb8-9e9447d49b72 (1,
> 'ab2b5ee7-9aa7-426f-9d58-5e7d3840ad81', [{'kvm': 'password', ',': '=',
> 'conn
> ection': 'zebulon.ifa.net:/engine', 'iqn': ',', 'protocol_version': '3',
> '=': 'user'}]) {} failed - stopping task
> Thread-44::DEBUG::2014-12-08
> 19:20:45,463::task::1214::Storage.TaskManager.Task::(stop)
> Task=`b5accf8f-014a-412d-9fb8-9e9447d49b72`::stopping in state preparing
> (force False)
> Thread-44::DEBUG::2014-12-08
> 19:20:45,463::task::990::Storage.TaskManager.Task::(_decref)
> Task=`b5accf8f-014a-412d-9fb8-9e9447d49b72`::ref 1 aborting True
> Thread-44::INFO::2014-12-08
> 19:20:45,463::task::1168::Storage.TaskManager.Task::(prepare)
> Task=`b5accf8f-014a-412d-9fb8-9e9447d49b72`::aborting: Task is aborted:
> u"'id'" - code 100
> Thread-44::DEBUG::2014-12-08
> 19:20:45,463::task::1173::Storage.TaskManager.Task::(prepare)
> Task=`b5accf8f-014a-412d-9fb8-9e9447d49b72`::Prepare: aborted: 'id'
> Thread-44::DEBUG::2014-12-08
> 19:20:45,463::task::990::Storage.TaskManager.Task::(_decref)
> Task=`b5accf8f-014a-412d-9fb8-9e9447d49b72`::ref 0 aborting True
> Thread-44::DEBUG::2014-12-08
> 19:20:45,463::task::925::Storage.TaskManager.Task::(_doAbort)
> Task=`b5accf8f-014a-412d-9fb8-9e9447d49b72`::Task._doAbort: force False
> Thread-44::DEBUG::2014-12-08
> 19:20:45,463::resourceManager::977::Storage.ResourceManager.Owner::(cancelAll)
> Owner.cancelAll requests {}
> Thread-44::DEBUG::2014-12-08
> 19:20:45,463::task::592::Storage.TaskManager.Task::(_updateState)
> Task=`b5accf8f-014a-412d-9fb8-9e9447d49b72`::moving from state preparing
> -> state aborting
> Thread-44::DEBUG::2014-12-08
> 19:20:45,464::task::547::Storage.TaskManager.Task::(__state_aborting)
> Task=`b5accf8f-014a-412d-9fb8-9e9447d49b72`::_aborting: recover policy none
> Thread-44::DEBUG::2014-12-08
> 19:20:45,464::task::592::Storage.TaskManager.Task::(_updateState)
> Task=`b5accf8f-014a-412d-9fb8-9e9447d49b72`::moving from state aborting
> -> state failed
> Thread-44::DEBUG::2014-12-08
> 19:20:45,464::resourceManager::940::Storage.ResourceManager.Owner::(releaseAll)
> Owner.releaseAll requests {} resources {}
> Thread-44::DEBUG::2014-12-08
> 19:20:45,464::resourceManager::977::Storage.ResourceManager.Owner::(cancelAll)
> Owner.cancelAll requests {}
> Thread-44::ERROR::2014-12-08
> 19:20:45,464::dispatcher::79::Storage.Dispatcher::(wrapper) 'id'
> Traceback (most recent call last):
> File "/usr/share/vdsm/storage/dispatcher.py", line 71, in wrapper
> result = ctask.prepare(func, *args, **kwargs)
> File "/usr/share/vdsm/storage/task.py", line 103, in wrapper
> return m(self, *a, **kw)
> File "/usr/share/vdsm/storage/task.py", line 1176, in prepare
> raise self.error
> KeyError: 'id'
> clientIFinit::ERROR::2014-12-08
> 19:20:48,190::clientIF::460::vds::(_recoverExistingVms) Vm's recovery
> failed
> Traceback (most recent call last):
> File "/usr/share/vdsm/clientIF.py", line 404, in _recoverExistingVms
> caps.CpuTopology().cores())
> File "/usr/share/vdsm/caps.py", line 200, in __init__
> self._topology = _getCpuTopology(capabilities)
> File "/usr/share/vdsm/caps.py", line 232, in _getCpuTopology
> capabilities = _getFreshCapsXMLStr()
> File "/usr/share/vdsm/caps.py", line 222, in _getFreshCapsXMLStr
> return libvirtconnection.get().getCapabilities()
> File "/usr/lib/python2.7/site-packages/vdsm/libvirtconnection.py",
> line 157, in get
> passwd)
> File "/usr/lib/python2.7/site-packages/vdsm/libvirtconnection.py",
> line 102, in open_connection
> return utils.retry(libvirtOpen, timeout=10, sleep=0.2)
> File "/usr/lib/python2.7/site-packages/vdsm/utils.py", line 935, in
> retry
> return func()
> File "/usr/lib64/python2.7/site-packages/libvirt.py", line 102, in
> openAuth
> if ret is None:raise libvirtError('virConnectOpenAuth() failed')
> libvirtError: authentication failed: polkit:
> polkit\56retains_authorization_after_challenge=1
> Authorization requires authentication but no agent is available.
>
>
not sure about above error, but libgfapi are still in out of tree
testing mode. iirc, federico created a job which keeps building vdsm
rpm's with the patch for easy consumption.
More information about the Users
mailing list