The agent is required. In fact it contains all the logic.
Thanks a lot for the information.
I patched the agent and now it is running fine:
# service ovirt-ha-agent status
ovirt-ha-agent (pid 12385) is running...
Btw, after starting the ha-agent the broken lock-file was fixed, too.
So hosted engine is working fine now.
Martin, Didi, thanks a lot for your help!
Regards,
René
--
Martin Sivák
msivak(a)redhat.com
Red Hat Czech
RHEV-M SLA / Brno, CZ
----- Original Message -----
> On 03/03/2014 02:05 PM, Martin Sivak wrote:
>> Hi René,
>>
>>> # python --version
>>> Python 2.6.6
>>
>> Then I guess the traceback is my fault...
>>
>> See
http://gerrit.ovirt.org/#/c/25269/ for the fix. I will try to get it
>> into the soonest release possible.
>
>
> Thanks. Do I have to patch the files manually or is ovirt-ha-agent not
> strictly required for hosted engine? Some features like restarting
> engine on 2nd node want work if ovirt-ha-agent isn't working, I guess.
>
>>
>>> I can't see a full filesystem here:
>>>
>>
>> Me neither. Is everything Read-Write? Read-Only FS might report no space
>> left as well in some cases. Other than that, I do not know.
>
> No, I can write to all disks.
> Btw, the same error message occurs on both nodes...
>
>
> Regards,
> René
>
>
>>
>> Regards
>> --
>> Martin Sivák
>> msivak(a)redhat.com
>> Red Hat Czech
>> RHEV-M SLA / Brno, CZ
>>
>> ----- Original Message -----
>>> On 03/03/2014 12:05 PM, Martin Sivak wrote:
>>>> Hi René,
>>>>
>>>> thanks for the report.
>>>>
>>>>>> TypeError: super() argument 1 must be type, not classobj
>>>> What Python version are you using?
>>>
>>> # python --version
>>> Python 2.6.6
>>>
>>>>
>>>> You can debug a crash of this version of ha-agent using:
>>>>
>>>> /usr/share/ovirt-hosted-engine-ha/ovirt-ha-agent --no-daemon --pdb
>>>
>>> This gives me the same information as in vdsm.log
>>>
>>>>
>>>> But this exception is trying to tell you that
>>>> FSMLoggerAdapter(logging.LoggerAdapter) does not have object in the
>>>> ancestor list. And that is very weird.
>>>>
>>>> It can be related to the disk space issues.
>>>>
>>>>>> libvirtError: Failed to acquire lock: No space left on device
>>>>
>>>> Check the free space on all your devices, including /tmp and /var. Or
>>>> post
>>>> the output of "df -h" command here
>>>
>>> I can't see a full filesystem here:
>>>
>>> # df -h
>>> Filesystem Size Used Avail Use% Mounted on
>>> /dev/mapper/vg0-lv_root 5.0G 1.1G 3.6G 24% /
>>> tmpfs 16G 0 16G 0% /dev/shm
>>> /dev/sda1 243M 45M 185M 20% /boot
>>> /dev/mapper/vg0-lv_data 281G 21G 261G 8% /data
>>> /dev/mapper/vg0-lv_tmp 2.0G 69M 1.9G 4% /tmp
>>> /dev/mapper/vg0-lv_var 5.0G 384M 4.3G 9% /var
>>> ovirt-host01:/engine 281G 21G 261G 8%
>>> /rhev/data-center/mnt/ovirt-host01:_engine
>>>
>>>
>>> Thanks,
>>> René
>>>
>>>
>>>>
>>>> Regards
>>>>
>>>> --
>>>> Martin Sivák
>>>> msivak(a)redhat.com
>>>> Red Hat Czech
>>>> RHEV-M SLA / Brno, CZ
>>>>
>>>> ----- Original Message -----
>>>>> Il 03/03/2014 11:33, René Koch ha scritto:
>>>>>> Hi,
>>>>>>
>>>>>> I have some issues with hosted engine (oVirt 3.4 prerelease repo
on
>>>>>> CentOS
>>>>>> 6.5).
>>>>>> My setups is the following:
>>>>>> 2 hosts (will be 4 in the future) with 4 GlusterFS shares:
>>>>>> - engine (for hosted engine)
>>>>>> - iso (for ISO domain)
>>>>>> - ovirt (oVirt storage domain)
>>>>>>
>>>>>> I had a split-brain situation today (after rebooting both nodes)
on
>>>>>> hosted-engine.lockspace file on engine GlusterFS volume which I
>>>>>> resolved.
>>>>>
>>>>> How did you solved it? By switching to NFS only?
>>>>>
>>>>>
>>>>>> hosted engine used engine share via NFS (TCP) as glusterfs
isn't
>>>>>> supported
>>>>>> for oVirt hosted engine, yet. I'll switch to GlusterFS as
soon as oVirt
>>>>>> will support it (I hope this will be soon as RHEV 3.3 is already
>>>>>> supporting
>>>>>> GlusterFS for hosted engine).
>>>>>>
>>>>>>
>>>>>> First of all ovirt-ha-agent fails to start on both nodes:
>>>>>>
>>>>>> # service ovirt-ha-agent start
>>>>>> Starting ovirt-ha-agent: [ OK
]
>>>>>> # service ovirt-ha-agent status
>>>>>> ovirt-ha-agent dead but subsys locked
>>>>>>
>>>>>>
>>>>>> MainThread::INFO::2014-03-03
>>>>>>
11:20:39,539::agent::52::ovirt_hosted_engine_ha.agent.agent.Agent::(run)
>>>>>> ovirt-hosted-engine-ha agent 1.1.0 started
>>>>>> MainThread::INFO::2014-03-03
>>>>>>
11:20:39,590::hosted_engine::223::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(_get_hostname)
>>>>>> Found
>>>>>> certificate common name: 10.0.200.101
>>>>>> MainThread::CRITICAL::2014-03-03
>>>>>>
11:20:39,590::agent::103::ovirt_hosted_engine_ha.agent.agent.Agent::(run)
>>>>>> Could not start ha-agent
>>>>>> Traceback (most recent call last):
>>>>>> File
>>>>>>
"/usr/lib/python2.6/site-packages/ovirt_hosted_engine_ha/agent/agent.py",
>>>>>> line 97, in run
>>>>>> self._run_agent()
>>>>>> File
>>>>>>
"/usr/lib/python2.6/site-packages/ovirt_hosted_engine_ha/agent/agent.py",
>>>>>> line 154, in _run_agent
>>>>>>
hosted_engine.HostedEngine(self.shutdown_requested).start_monitoring()
>>>>>> File
>>>>>>
"/usr/lib/python2.6/site-packages/ovirt_hosted_engine_ha/agent/hosted_engine.py",
>>>>>> line 152, in __init__
>>>>>> "STOP_VM": self._stop_engine_vm
>>>>>> File
>>>>>>
"/usr/lib/python2.6/site-packages/ovirt_hosted_engine_ha/agent/state_machine.py",
>>>>>> line 56, in __init__
>>>>>> logger, actions)
>>>>>> File
>>>>>>
"/usr/lib/python2.6/site-packages/ovirt_hosted_engine_ha/lib/fsm/machine.py",
>>>>>> line 93, in __init__
>>>>>> self._logger = FSMLoggerAdapter(logger, self)
>>>>>> File
>>>>>>
"/usr/lib/python2.6/site-packages/ovirt_hosted_engine_ha/lib/fsm/machine.py",
>>>>>> line 16, in __init__
>>>>>> super(FSMLoggerAdapter, self).__init__(logger, None)
>>>>>> TypeError: super() argument 1 must be type, not classobj
>>>>>>
>>>>>>
>>>>>>
>>>>>> If I want to start my hosted engine, I receive the following
error in
>>>>>> vdsm
>>>>>> logs, which makes absolutly no sense to me, as there is plenty of
disk
>>>>>> space available:
>>>>>>
>>>>>> Thread-62::DEBUG::2014-03-03
>>>>>> 11:24:46,282::libvirtconnection::124::root::(wrapper) Unknown
>>>>>> libvirterror: ecode: 38 edom: 42 level: 2 message: Failed
>>>>>> to acquire lock: No space left on device
>>>>>
>>>>> seems like a vdsm failure in starting monitor the hosted engine
storage
>>>>> domain.
>>>>> Can you attach vdsm logs?
>>>>>
>>>>>
>>>>>
>>>>>> Thread-62::DEBUG::2014-03-03
>>>>>> 11:24:46,282::vm::2252::vm.Vm::(_startUnderlyingVm)
>>>>>> vmId=`f26dd37e-13b5-430c-b2f2-ecd098b82a91`::_ongoingCreations
released
>>>>>> Thread-62::ERROR::2014-03-03
>>>>>> 11:24:46,283::vm::2278::vm.Vm::(_startUnderlyingVm)
>>>>>> vmId=`f26dd37e-13b5-430c-b2f2-ecd098b82a91`::The vm start
process
>>>>>> failed
>>>>>> Traceback (most recent call last):
>>>>>> File "/usr/share/vdsm/vm.py", line 2238, in
_startUnderlyingVm
>>>>>> self._run()
>>>>>> File "/usr/share/vdsm/vm.py", line 3159, in _run
>>>>>> self._connection.createXML(domxml, flags),
>>>>>> File
>>>>>>
"/usr/lib64/python2.6/site-packages/vdsm/libvirtconnection.py",
>>>>>> line
>>>>>> 92, in wrapper
>>>>>> ret = f(*args, **kwargs)
>>>>>> File
"/usr/lib64/python2.6/site-packages/libvirt.py", line 2665, in
>>>>>> createXML
>>>>>> if ret is None:raise
libvirtError('virDomainCreateXML() failed',
>>>>>> conn=self)
>>>>>> libvirtError: Failed to acquire lock: No space left on device
>>>>>> Thread-62::DEBUG::2014-03-03
>>>>>> 11:24:46,286::vm::2720::vm.Vm::(setDownStatus)
>>>>>> vmId=`f26dd37e-13b5-430c-b2f2-ecd098b82a91`::Changed state to
Down:
>>>>>> Failed
>>>>>> to acquire lock: No space left on device
>>>>>>
>>>>>> # df -h | grep engine
>>>>>> ovirt-host01:/engine 281G 21G 261G 8%
>>>>>> /rhev/data-center/mnt/ovirt-host01:_engine
>>>>>>
>>>>>> # sudo -u vdsm dd if=/dev/zero
>>>>>>
of=/rhev/data-center/mnt/ovirt-host01:_engine/2851af27-8744-445d-9fb1-a0d083c8dc82/images/0e4d270f-2f7e-4b2b-847f-f114a4ba9bdc/test
>>>>>> bs=512 count=100
>>>>>> 100+0 records in
>>>>>> 100+0 records out
>>>>>> 51200 bytes (51 kB) copied, 0.0230566 s, 2.2 MB/s
>>>>>>
>>>>>>
>>>>>> Could you give me some information on how to fix the
ovirt-ha-agent and
>>>>>> then hosted-engine storage issue? Thanks a lot.
>>>>>>
>>>>>> Btw, I had some issues during installation which I will explain
in
>>>>>> separate
>>>>>> emails.
>>>>>>
>>>>>>
>>>>>
>>>>>
>>>>> --
>>>>> Sandro Bonazzola
>>>>> Better technology. Faster innovation. Powered by community
>>>>> collaboration.
>>>>> See how it works at
redhat.com
>>>>>
>>>
>