Hi,
VDSM error 99 means RecoveryInProgress and it might take some time
depending on how many VMs there are.
So I suggest you wait a bit more for now and see what happens.
Best regards
--
Martin Sivak
oVirt / SLA
On Thu, Feb 2, 2017 at 4:18 PM, Bryan Sockel <Bryan.Sockel(a)altn.com> wrote:
Hi,
Came into the office with an issue with my ovirt setup this morning. On one
of my hosts the / partition was completely full causing the host to go into
an unknown state. I was able to clear out some space for the time being and
attempting to recover my that host. VM's are still running and responding
on the host.
I am using Gluster volumes in my configuration, and had to restart gluster
service on that host. I also restarted the ovirt-ha-agent service.
I am seeing this entry in my agent.log every two seconds:
MainThread::INFO::2017-02-02
09:11:19,606::util::214::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(connect_vdsm_json_rpc)
Waiting for VDSM hardware info
In my vdsm.log i am seeing this
jsonrpc.Executor/4::INFO::2017-02-02
09:13:42,088::__init__::525::jsonrpc.JsonRpcServer::(_handle_request) In
recovery, ignoring 'Host.getAllVmStats' in bridge with {}
jsonrpc.Executor/4::INFO::2017-02-02
09:13:42,088::__init__::513::jsonrpc.JsonRpcServer::(_serveRequest) RPC call
Host.getAllVmStats failed (error 99) in 0.00 seconds
jsonrpc.Executor/5::INFO::2017-02-02
09:13:42,114::__init__::525::jsonrpc.JsonRpcServer::(_handle_request) In
recovery, ignoring 'Host.getHardwareInfo' in bridge with {}
jsonrpc.Executor/5::INFO::2017-02-02
09:13:42,115::__init__::513::jsonrpc.JsonRpcServer::(_serveRequest) RPC call
Host.getHardwareInfo failed (error 99) in 0.00 seconds
jsonrpc.Executor/6::INFO::2017-02-02
09:13:44,121::__init__::525::jsonrpc.JsonRpcServer::(_handle_request) In
recovery, ignoring 'Host.getHardwareInfo' in bridge with {}
jsonrpc.Executor/6::INFO::2017-02-02
09:13:44,122::__init__::513::jsonrpc.JsonRpcServer::(_serveRequest) RPC call
Host.getHardwareInfo failed (error 99) in 0.00 seconds
jsonrpc.Executor/7::INFO::2017-02-02
09:13:46,127::__init__::525::jsonrpc.JsonRpcServer::(_handle_request) In
recovery, ignoring 'Host.getHardwareInfo' in bridge with {}
jsonrpc.Executor/7::INFO::2017-02-02
09:13:46,127::__init__::513::jsonrpc.JsonRpcServer::(_serveRequest) RPC call
Host.getHardwareInfo failed (error 99) in 0.00 seconds
clientIFinit::DEBUG::2017-02-02
09:13:46,257::task::597::Storage.TaskManager.Task::(_updateState)
Task=`ba88b701-9f7a-488e-9c80-3d61cec38053`::moving from state init -> state
preparing
clientIFinit::INFO::2017-02-02
09:13:46,258::logUtils::49::dispatcher::(wrapper) Run and protect:
getConnectedStoragePoolsList(options=None)
clientIFinit::INFO::2017-02-02
09:13:46,258::logUtils::52::dispatcher::(wrapper) Run and protect:
getConnectedStoragePoolsList, Return response: {'poollist': []}
clientIFinit::DEBUG::2017-02-02
09:13:46,258::task::1193::Storage.TaskManager.Task::(prepare)
Task=`ba88b701-9f7a-488e-9c80-3d61cec38053`::finished: {'poollist': []}
clientIFinit::DEBUG::2017-02-02
09:13:46,258::task::597::Storage.TaskManager.Task::(_updateState)
Task=`ba88b701-9f7a-488e-9c80-3d61cec38053`::moving from state preparing ->
state finished
clientIFinit::DEBUG::2017-02-02
09:13:46,258::resourceManager::952::Storage.ResourceManager.Owner::(releaseAll)
Owner.releaseAll requests {} resources {}
clientIFinit::DEBUG::2017-02-02
09:13:46,259::resourceManager::989::Storage.ResourceManager.Owner::(cancelAll)
Owner.cancelAll requests {}
clientIFinit::DEBUG::2017-02-02
09:13:46,259::task::995::Storage.TaskManager.Task::(_decref)
Task=`ba88b701-9f7a-488e-9c80-3d61cec38053`::ref 0 aborting False
clientIFinit::INFO::2017-02-02
09:13:46,259::clientIF::558::vds::(_waitForStoragePool) recovery: waiting
for storage pool to go up
jsonrpc.Executor/0::INFO::2017-02-02
09:13:48,133::__init__::525::jsonrpc.JsonRpcServer::(_handle_request) In
recovery, ignoring 'Host.getHardwareInfo' in bridge with {}
jsonrpc.Executor/0::INFO::2017-02-02
09:13:48,134::__init__::513::jsonrpc.JsonRpcServer::(_serveRequest) RPC call
Host.getHardwareInfo failed (error 99) in 0.00 seconds
jsonrpc.Executor/1::INFO::2017-02-02
09:13:50,140::__init__::525::jsonrpc.JsonRpcServer::(_handle_request) In
recovery, ignoring 'Host.getHardwareInfo' in bridge with {}
jsonrpc.Executor/1::INFO::2017-02-02
09:13:50,140::__init__::513::jsonrpc.JsonRpcServer::(_serveRequest) RPC call
Host.getHardwareInfo failed (error 99) in 0.00 seconds
clientIFinit::DEBUG::2017-02-02
09:13:51,265::task::597::Storage.TaskManager.Task::(_updateState)
Task=`e8c558b1-f5d0-49ea-ac92-51660d03636e`::moving from state init -> state
preparing
clientIFinit::INFO::2017-02-02
09:13:51,265::logUtils::49::dispatcher::(wrapper) Run and protect:
getConnectedStoragePoolsList(options=None)
clientIFinit::INFO::2017-02-02
09:13:51,265::logUtils::52::dispatcher::(wrapper) Run and protect:
getConnectedStoragePoolsList, Return response: {'poollist': []}
clientIFinit::DEBUG::2017-02-02
09:13:51,265::task::1193::Storage.TaskManager.Task::(prepare)
Task=`e8c558b1-f5d0-49ea-ac92-51660d03636e`::finished: {'poollist': []}
clientIFinit::DEBUG::2017-02-02
09:13:51,266::task::597::Storage.TaskManager.Task::(_updateState)
Task=`e8c558b1-f5d0-49ea-ac92-51660d03636e`::moving from state preparing ->
state finished
clientIFinit::DEBUG::2017-02-02
09:13:51,266::resourceManager::952::Storage.ResourceManager.Owner::(releaseAll)
Owner.releaseAll requests {} resources {}
clientIFinit::DEBUG::2017-02-02
09:13:51,266::resourceManager::989::Storage.ResourceManager.Owner::(cancelAll)
Owner.cancelAll requests {}
clientIFinit::DEBUG::2017-02-02
09:13:51,266::task::995::Storage.TaskManager.Task::(_decref)
Task=`e8c558b1-f5d0-49ea-ac92-51660d03636e`::ref 0 aborting False
clientIFinit::INFO::2017-02-02
09:13:51,266::clientIF::558::vds::(_waitForStoragePool) recovery: waiting
for storage pool to go up
jsonrpc.Executor/2::INFO::2017-02-02
09:13:52,146::__init__::525::jsonrpc.JsonRpcServer::(_handle_request) In
recovery, ignoring 'Host.getHardwareInfo' in bridge with {}
jsonrpc.Executor/2::INFO::2017-02-02
09:13:52,147::__init__::513::jsonrpc.JsonRpcServer::(_serveRequest) RPC call
Host.getHardwareInfo failed (error 99) in 0.01 seconds
I attempt to move my service into maintenance mode, but it is unable to
migrate any of my vm's on that host over to another host. Is there away to
get my host running again with out reboot or to migrate my VM's via CLI so i
can restart my host.
Thanks
_______________________________________________
Users mailing list
Users(a)ovirt.org
http://lists.ovirt.org/mailman/listinfo/users