
On 12/09/2014 05:33 AM, Alex Crow wrote:
Hi,
Will the vdsm patches to properly enable libgfapi storage for VMs (and matching refactored code in the hosted-engine setup scripts) for VMs make it into 3.5.1? It's not in the snapshots yet it seems.
I notice it's in master/3.6 snapshot but something stops the HA stuff in self-hosted setups from connecting storage:
from Master test setup: /var/log/ovirt-hosted-engine-ha/broker.log
MainThread::INFO::2014-12-08 19:22:56,287::hosted_engine::222::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(_get_hostname) Found certificate common name: 172.17.10.50 MainThread::WARNING::2014-12-08 19:22:56,395::hosted_engine::497::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(_initialize_vdsm) Failed to connect storage, waiting '15' seconds before the next attempt MainThread::WARNING::2014-12-08 19:23:11,501::hosted_engine::497::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(_initialize_vdsm) Failed to connect storage, waiting '15' seconds before the next attempt MainThread::WARNING::2014-12-08 19:23:26,610::hosted_engine::497::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(_initialize_vdsm) Failed to connect storage, waiting '15' seconds before the next attempt MainThread::WARNING::2014-12-08 19:23:41,717::hosted_engine::497::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(_initialize_vdsm) Failed to connect storage, waiting '15' seconds before the next attempt MainThread::WARNING::2014-12-08 19:23:56,824::hosted_engine::497::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(_initialize_vdsm) Failed to connect storage, waiting '15' seconds before the next attempt MainThread::ERROR::2014-12-08 19:24:11,840::hosted_engine::500::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(_initialize_vdsm) Failed trying to connect storage: MainThread::ERROR::2014-12-08 19:24:11,840::agent::173::ovirt_hosted_engine_ha.agent.agent.Agent::(_run_agent) Error: 'Failed trying to connect storage' - trying to restart agent MainThread::WARNING::2014-12-08 19:24:16,845::agent::176::ovirt_hosted_engine_ha.agent.agent.Agent::(_run_agent) Restarting agent, attempt '8' MainThread::INFO::2014-12-08 19:24:16,855::hosted_engine::222::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(_get_hostname) Found certificate common name: 172.17.10.50 MainThread::WARNING::2014-12-08 19:24:16,962::hosted_engine::497::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(_initialize_vdsm) Failed to connect storage, waiting '15' seconds before the next attempt MainThread::WARNING::2014-12-08 19:24:32,069::hosted_engine::497::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(_initialize_vdsm) Failed to connect storage, waiting '15' seconds before the next attempt MainThread::WARNING::2014-12-08 19:24:47,181::hosted_engine::497::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(_initialize_vdsm) Failed to connect storage, waiting '15' seconds before the next attempt MainThread::WARNING::2014-12-08 19:25:02,288::hosted_engine::497::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(_initialize_vdsm) Failed to connect storage, waiting '15' seconds before the next attempt MainThread::WARNING::2014-12-08 19:25:17,389::hosted_engine::497::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(_initialize_vdsm) Failed to connect storage, waiting '15' seconds before the next attempt MainThread::ERROR::2014-12-08 19:25:32,404::hosted_engine::500::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(_initialize_vdsm) Failed trying to connect storage: MainThread::ERROR::2014-12-08 19:25:32,404::agent::173::ovirt_hosted_engine_ha.agent.agent.Agent::(_run_agent) Error: 'Failed trying to connect storage' - trying to restart agent MainThread::WARNING::2014-12-08 19:25:37,409::agent::176::ovirt_hosted_engine_ha.agent.agent.Agent::(_run_agent) Restarting agent, attempt '9' MainThread::ERROR::2014-12-08 19:25:37,409::agent::178::ovirt_hosted_engine_ha.agent.agent.Agent::(_run_agent) Too many errors occurred, giving up. Please review the log and consider filing a bug. MainThread::INFO::2014-12-08 19:25:37,409::agent::118::ovirt_hosted_engine_ha.agent.agent.Agent::(run) Agent shutting down (END) - Next: /var/log/ovirt-hosted-engine-ha/broker.log
vdsm.log:
Detector thread::DEBUG::2014-12-08 19:20:45,458::protocoldetector::214::vds.MultiProtocolAcceptor::(_remove_connection) Removing connection 127.0.0.1:53083 Detector thread::DEBUG::2014-12-08 19:20:45,458::BindingXMLRPC::1193::XmlDetector::(handleSocket) xml over http detected from ('127.0.0.1', 53083) Thread-44::DEBUG::2014-12-08 19:20:45,459::BindingXMLRPC::318::vds::(wrapper) client [127.0.0.1] Thread-44::DEBUG::2014-12-08 19:20:45,460::task::592::Storage.TaskManager.Task::(_updateState) Task=`b5accf8f-014a-412d-9fb8-9e9447d49b72`::moving from state init -> state preparing Thread-44::INFO::2014-12-08 19:20:45,460::logUtils::48::dispatcher::(wrapper) Run and protect: connectStorageServer(domType=1, spUUID='ab2b5ee7-9aa7-426f-9d58-5e7d3840ad81', conList=[{'connection': 'zebulon.ifa.net:/engine', 'iqn': ',', 'protocol_version': '3' , 'kvm': 'password', '=': 'user', ',': '='}], options=None) Thread-44::DEBUG::2014-12-08 19:20:45,461::hsm::2384::Storage.HSM::(__prefetchDomains) nfs local path: /rhev/data-center/mnt/zebulon.ifa.net:_engine Thread-44::DEBUG::2014-12-08 19:20:45,462::hsm::2408::Storage.HSM::(__prefetchDomains) Found SD uuids: (u'd3240928-dae9-4ed0-8a28-7ab552455063',) Thread-44::DEBUG::2014-12-08 19:20:45,463::hsm::2464::Storage.HSM::(connectStorageServer) knownSDs: {d3240928-dae9-4ed0-8a28-7ab552455063: storage.nfsSD.findDomain} Thread-44::ERROR::2014-12-08 19:20:45,463::task::863::Storage.TaskManager.Task::(_setError) Task=`b5accf8f-014a-412d-9fb8-9e9447d49b72`::Unexpected error Traceback (most recent call last): File "/usr/share/vdsm/storage/task.py", line 870, in _run return fn(*args, **kargs) File "/usr/share/vdsm/logUtils.py", line 49, in wrapper res = f(*args, **kwargs) File "/usr/share/vdsm/storage/hsm.py", line 2466, in connectStorageServer res.append({'id': conDef["id"], 'status': status}) KeyError: 'id' Thread-44::DEBUG::2014-12-08 19:20:45,463::task::882::Storage.TaskManager.Task::(_run) Task=`b5accf8f-014a-412d-9fb8-9e9447d49b72`::Task._run: b5accf8f-014a-412d-9fb8-9e9447d49b72 (1, 'ab2b5ee7-9aa7-426f-9d58-5e7d3840ad81', [{'kvm': 'password', ',': '=', 'conn ection': 'zebulon.ifa.net:/engine', 'iqn': ',', 'protocol_version': '3', '=': 'user'}]) {} failed - stopping task Thread-44::DEBUG::2014-12-08 19:20:45,463::task::1214::Storage.TaskManager.Task::(stop) Task=`b5accf8f-014a-412d-9fb8-9e9447d49b72`::stopping in state preparing (force False) Thread-44::DEBUG::2014-12-08 19:20:45,463::task::990::Storage.TaskManager.Task::(_decref) Task=`b5accf8f-014a-412d-9fb8-9e9447d49b72`::ref 1 aborting True Thread-44::INFO::2014-12-08 19:20:45,463::task::1168::Storage.TaskManager.Task::(prepare) Task=`b5accf8f-014a-412d-9fb8-9e9447d49b72`::aborting: Task is aborted: u"'id'" - code 100 Thread-44::DEBUG::2014-12-08 19:20:45,463::task::1173::Storage.TaskManager.Task::(prepare) Task=`b5accf8f-014a-412d-9fb8-9e9447d49b72`::Prepare: aborted: 'id' Thread-44::DEBUG::2014-12-08 19:20:45,463::task::990::Storage.TaskManager.Task::(_decref) Task=`b5accf8f-014a-412d-9fb8-9e9447d49b72`::ref 0 aborting True Thread-44::DEBUG::2014-12-08 19:20:45,463::task::925::Storage.TaskManager.Task::(_doAbort) Task=`b5accf8f-014a-412d-9fb8-9e9447d49b72`::Task._doAbort: force False Thread-44::DEBUG::2014-12-08 19:20:45,463::resourceManager::977::Storage.ResourceManager.Owner::(cancelAll) Owner.cancelAll requests {} Thread-44::DEBUG::2014-12-08 19:20:45,463::task::592::Storage.TaskManager.Task::(_updateState) Task=`b5accf8f-014a-412d-9fb8-9e9447d49b72`::moving from state preparing -> state aborting Thread-44::DEBUG::2014-12-08 19:20:45,464::task::547::Storage.TaskManager.Task::(__state_aborting) Task=`b5accf8f-014a-412d-9fb8-9e9447d49b72`::_aborting: recover policy none Thread-44::DEBUG::2014-12-08 19:20:45,464::task::592::Storage.TaskManager.Task::(_updateState) Task=`b5accf8f-014a-412d-9fb8-9e9447d49b72`::moving from state aborting -> state failed Thread-44::DEBUG::2014-12-08 19:20:45,464::resourceManager::940::Storage.ResourceManager.Owner::(releaseAll) Owner.releaseAll requests {} resources {} Thread-44::DEBUG::2014-12-08 19:20:45,464::resourceManager::977::Storage.ResourceManager.Owner::(cancelAll) Owner.cancelAll requests {} Thread-44::ERROR::2014-12-08 19:20:45,464::dispatcher::79::Storage.Dispatcher::(wrapper) 'id' Traceback (most recent call last): File "/usr/share/vdsm/storage/dispatcher.py", line 71, in wrapper result = ctask.prepare(func, *args, **kwargs) File "/usr/share/vdsm/storage/task.py", line 103, in wrapper return m(self, *a, **kw) File "/usr/share/vdsm/storage/task.py", line 1176, in prepare raise self.error KeyError: 'id' clientIFinit::ERROR::2014-12-08 19:20:48,190::clientIF::460::vds::(_recoverExistingVms) Vm's recovery failed Traceback (most recent call last): File "/usr/share/vdsm/clientIF.py", line 404, in _recoverExistingVms caps.CpuTopology().cores()) File "/usr/share/vdsm/caps.py", line 200, in __init__ self._topology = _getCpuTopology(capabilities) File "/usr/share/vdsm/caps.py", line 232, in _getCpuTopology capabilities = _getFreshCapsXMLStr() File "/usr/share/vdsm/caps.py", line 222, in _getFreshCapsXMLStr return libvirtconnection.get().getCapabilities() File "/usr/lib/python2.7/site-packages/vdsm/libvirtconnection.py", line 157, in get passwd) File "/usr/lib/python2.7/site-packages/vdsm/libvirtconnection.py", line 102, in open_connection return utils.retry(libvirtOpen, timeout=10, sleep=0.2) File "/usr/lib/python2.7/site-packages/vdsm/utils.py", line 935, in retry return func() File "/usr/lib64/python2.7/site-packages/libvirt.py", line 102, in openAuth if ret is None:raise libvirtError('virConnectOpenAuth() failed') libvirtError: authentication failed: polkit: polkit\56retains_authorization_after_challenge=1 Authorization requires authentication but no agent is available.
not sure about above error, but libgfapi are still in out of tree testing mode. iirc, federico created a job which keeps building vdsm rpm's with the patch for easy consumption.