[Users] hosted engine issues

René Koch rkoch at linuxland.at
Mon Mar 3 11:20:35 UTC 2014


On 03/03/2014 11:47 AM, Sandro Bonazzola wrote:
> Il 03/03/2014 11:33, René Koch ha scritto:
>> Hi,
>>
>> I have some issues with hosted engine (oVirt 3.4 prerelease repo on CentOS 6.5).
>> My setups is the following:
>> 2 hosts (will be 4 in the future) with 4 GlusterFS shares:
>> - engine (for hosted engine)
>> - iso (for ISO domain)
>> - ovirt (oVirt storage domain)
>>
>> I had a split-brain situation today (after rebooting both nodes) on hosted-engine.lockspace file on engine GlusterFS volume which I resolved.
>
> How did you solved it? By switching to NFS only?


I removed the file on host1 (directly on the brick) and ran "gluster 
volume heal engine full", which synced the file from host2 to host1.

>
>
>> hosted engine used engine share via NFS (TCP) as glusterfs isn't supported for oVirt hosted engine, yet. I'll switch to GlusterFS as soon as oVirt
>> will support it (I hope this will be soon as RHEV 3.3 is already supporting GlusterFS for hosted engine).
>>
>>
>> First of all ovirt-ha-agent fails to start on both nodes:
>>
>> # service ovirt-ha-agent start
>> Starting ovirt-ha-agent:                                   [  OK  ]
>> # service ovirt-ha-agent status
>> ovirt-ha-agent dead but subsys locked
>>
>>
>> MainThread::INFO::2014-03-03 11:20:39,539::agent::52::ovirt_hosted_engine_ha.agent.agent.Agent::(run) ovirt-hosted-engine-ha agent 1.1.0 started
>> MainThread::INFO::2014-03-03 11:20:39,590::hosted_engine::223::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(_get_hostname) Found
>> certificate common name: 10.0.200.101
>> MainThread::CRITICAL::2014-03-03 11:20:39,590::agent::103::ovirt_hosted_engine_ha.agent.agent.Agent::(run) Could not start ha-agent
>> Traceback (most recent call last):
>>    File "/usr/lib/python2.6/site-packages/ovirt_hosted_engine_ha/agent/agent.py", line 97, in run
>>      self._run_agent()
>>    File "/usr/lib/python2.6/site-packages/ovirt_hosted_engine_ha/agent/agent.py", line 154, in _run_agent
>>      hosted_engine.HostedEngine(self.shutdown_requested).start_monitoring()
>>    File "/usr/lib/python2.6/site-packages/ovirt_hosted_engine_ha/agent/hosted_engine.py", line 152, in __init__
>>      "STOP_VM": self._stop_engine_vm
>>    File "/usr/lib/python2.6/site-packages/ovirt_hosted_engine_ha/agent/state_machine.py", line 56, in __init__
>>      logger, actions)
>>    File "/usr/lib/python2.6/site-packages/ovirt_hosted_engine_ha/lib/fsm/machine.py", line 93, in __init__
>>      self._logger = FSMLoggerAdapter(logger, self)
>>    File "/usr/lib/python2.6/site-packages/ovirt_hosted_engine_ha/lib/fsm/machine.py", line 16, in __init__
>>      super(FSMLoggerAdapter, self).__init__(logger, None)
>> TypeError: super() argument 1 must be type, not classobj
>>
>>
>>
>> If I want to start my hosted engine, I receive the following error in vdsm logs, which makes absolutly no sense to me, as there is plenty of disk
>> space available:
>>
>> Thread-62::DEBUG::2014-03-03 11:24:46,282::libvirtconnection::124::root::(wrapper) Unknown libvirterror: ecode: 38 edom: 42 level: 2 message: Failed
>> to acquire lock: No space left on device
>
> seems like a vdsm failure in starting monitor the hosted engine storage domain.
> Can you attach vdsm logs?

Logs are quite big for an email (6.8MB).
I attached the last entries which show the information for vm-start.

>
>
>
>> Thread-62::DEBUG::2014-03-03 11:24:46,282::vm::2252::vm.Vm::(_startUnderlyingVm) vmId=`f26dd37e-13b5-430c-b2f2-ecd098b82a91`::_ongoingCreations released
>> Thread-62::ERROR::2014-03-03 11:24:46,283::vm::2278::vm.Vm::(_startUnderlyingVm) vmId=`f26dd37e-13b5-430c-b2f2-ecd098b82a91`::The vm start process failed
>> Traceback (most recent call last):
>>    File "/usr/share/vdsm/vm.py", line 2238, in _startUnderlyingVm
>>      self._run()
>>    File "/usr/share/vdsm/vm.py", line 3159, in _run
>>      self._connection.createXML(domxml, flags),
>>    File "/usr/lib64/python2.6/site-packages/vdsm/libvirtconnection.py", line 92, in wrapper
>>      ret = f(*args, **kwargs)
>>    File "/usr/lib64/python2.6/site-packages/libvirt.py", line 2665, in createXML
>>      if ret is None:raise libvirtError('virDomainCreateXML() failed', conn=self)
>> libvirtError: Failed to acquire lock: No space left on device
>> Thread-62::DEBUG::2014-03-03 11:24:46,286::vm::2720::vm.Vm::(setDownStatus) vmId=`f26dd37e-13b5-430c-b2f2-ecd098b82a91`::Changed state to Down: Failed
>> to acquire lock: No space left on device
>>
>> # df -h | grep engine
>> ovirt-host01:/engine     281G   21G  261G   8% /rhev/data-center/mnt/ovirt-host01:_engine
>>
>> # sudo -u vdsm dd if=/dev/zero
>> of=/rhev/data-center/mnt/ovirt-host01:_engine/2851af27-8744-445d-9fb1-a0d083c8dc82/images/0e4d270f-2f7e-4b2b-847f-f114a4ba9bdc/test bs=512 count=100
>> 100+0 records in
>> 100+0 records out
>> 51200 bytes (51 kB) copied, 0.0230566 s, 2.2 MB/s
>>
>>
>> Could you give me some information on how to fix the ovirt-ha-agent and then hosted-engine storage issue? Thanks a lot.
>>
>> Btw, I had some issues during installation which I will explain in separate emails.
>>
>>
>
>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: vdsm.log
Type: text/x-log
Size: 25324 bytes
Desc: not available
URL: <http://lists.ovirt.org/pipermail/users/attachments/20140303/f529caad/attachment-0001.bin>


More information about the Users mailing list