[Users] hosted engine issues

René Koch rkoch at linuxland.at
Mon Mar 3 17:10:38 UTC 2014


On 03/03/2014 05:27 PM, Martin Sivak wrote:
> The agent is required. In fact it contains all the logic.

Thanks a lot for the information.
I patched the agent and now it is running fine:

# service ovirt-ha-agent status
ovirt-ha-agent (pid 12385) is running...

Btw, after starting the ha-agent the broken lock-file was fixed, too.
So hosted engine is working fine now.
Martin, Didi, thanks a lot for your help!


Regards,
René


>
> --
> Martin Sivák
> msivak at redhat.com
> Red Hat Czech
> RHEV-M SLA / Brno, CZ
>
> ----- Original Message -----
>> On 03/03/2014 02:05 PM, Martin Sivak wrote:
>>> Hi René,
>>>
>>>> # python --version
>>>> Python 2.6.6
>>>
>>> Then I guess the traceback is my fault...
>>>
>>> See http://gerrit.ovirt.org/#/c/25269/ for the fix. I will try to get it
>>> into the soonest release possible.
>>
>>
>> Thanks. Do I have to patch the files manually or is ovirt-ha-agent not
>> strictly required for hosted engine? Some features like restarting
>> engine on 2nd node want work if ovirt-ha-agent isn't working, I guess.
>>
>>>
>>>> I can't see a full filesystem here:
>>>>
>>>
>>> Me neither. Is everything Read-Write? Read-Only FS might report no space
>>> left as well in some cases. Other than that, I do not know.
>>
>> No, I can write to all disks.
>> Btw, the same error message occurs on both nodes...
>>
>>
>> Regards,
>> René
>>
>>
>>>
>>> Regards
>>> --
>>> Martin Sivák
>>> msivak at redhat.com
>>> Red Hat Czech
>>> RHEV-M SLA / Brno, CZ
>>>
>>> ----- Original Message -----
>>>> On 03/03/2014 12:05 PM, Martin Sivak wrote:
>>>>> Hi René,
>>>>>
>>>>> thanks for the report.
>>>>>
>>>>>>> TypeError: super() argument 1 must be type, not classobj
>>>>> What Python version are you using?
>>>>
>>>> # python --version
>>>> Python 2.6.6
>>>>
>>>>>
>>>>> You can debug a crash of this version of ha-agent using:
>>>>>
>>>>> /usr/share/ovirt-hosted-engine-ha/ovirt-ha-agent --no-daemon --pdb
>>>>
>>>> This gives me the same information as in vdsm.log
>>>>
>>>>>
>>>>> But this exception is trying to tell you that
>>>>> FSMLoggerAdapter(logging.LoggerAdapter) does not have object in the
>>>>> ancestor list. And that is very weird.
>>>>>
>>>>> It can be related to the disk space issues.
>>>>>
>>>>>>> libvirtError: Failed to acquire lock: No space left on device
>>>>>
>>>>> Check the free space on all your devices, including /tmp and /var. Or
>>>>> post
>>>>> the output of "df -h" command here
>>>>
>>>> I can't see a full filesystem here:
>>>>
>>>> # df -h
>>>> Filesystem               Size  Used Avail Use% Mounted on
>>>> /dev/mapper/vg0-lv_root  5.0G  1.1G  3.6G  24% /
>>>> tmpfs                     16G     0   16G   0% /dev/shm
>>>> /dev/sda1                243M   45M  185M  20% /boot
>>>> /dev/mapper/vg0-lv_data  281G   21G  261G   8% /data
>>>> /dev/mapper/vg0-lv_tmp   2.0G   69M  1.9G   4% /tmp
>>>> /dev/mapper/vg0-lv_var   5.0G  384M  4.3G   9% /var
>>>> ovirt-host01:/engine     281G   21G  261G   8%
>>>> /rhev/data-center/mnt/ovirt-host01:_engine
>>>>
>>>>
>>>> Thanks,
>>>> René
>>>>
>>>>
>>>>>
>>>>> Regards
>>>>>
>>>>> --
>>>>> Martin Sivák
>>>>> msivak at redhat.com
>>>>> Red Hat Czech
>>>>> RHEV-M SLA / Brno, CZ
>>>>>
>>>>> ----- Original Message -----
>>>>>> Il 03/03/2014 11:33, René Koch ha scritto:
>>>>>>> Hi,
>>>>>>>
>>>>>>> I have some issues with hosted engine (oVirt 3.4 prerelease repo on
>>>>>>> CentOS
>>>>>>> 6.5).
>>>>>>> My setups is the following:
>>>>>>> 2 hosts (will be 4 in the future) with 4 GlusterFS shares:
>>>>>>> - engine (for hosted engine)
>>>>>>> - iso (for ISO domain)
>>>>>>> - ovirt (oVirt storage domain)
>>>>>>>
>>>>>>> I had a split-brain situation today (after rebooting both nodes) on
>>>>>>> hosted-engine.lockspace file on engine GlusterFS volume which I
>>>>>>> resolved.
>>>>>>
>>>>>> How did you solved it? By switching to NFS only?
>>>>>>
>>>>>>
>>>>>>> hosted engine used engine share via NFS (TCP) as glusterfs isn't
>>>>>>> supported
>>>>>>> for oVirt hosted engine, yet. I'll switch to GlusterFS as soon as oVirt
>>>>>>> will support it (I hope this will be soon as RHEV 3.3 is already
>>>>>>> supporting
>>>>>>> GlusterFS for hosted engine).
>>>>>>>
>>>>>>>
>>>>>>> First of all ovirt-ha-agent fails to start on both nodes:
>>>>>>>
>>>>>>> # service ovirt-ha-agent start
>>>>>>> Starting ovirt-ha-agent:                                   [  OK  ]
>>>>>>> # service ovirt-ha-agent status
>>>>>>> ovirt-ha-agent dead but subsys locked
>>>>>>>
>>>>>>>
>>>>>>> MainThread::INFO::2014-03-03
>>>>>>> 11:20:39,539::agent::52::ovirt_hosted_engine_ha.agent.agent.Agent::(run)
>>>>>>> ovirt-hosted-engine-ha agent 1.1.0 started
>>>>>>> MainThread::INFO::2014-03-03
>>>>>>> 11:20:39,590::hosted_engine::223::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(_get_hostname)
>>>>>>> Found
>>>>>>> certificate common name: 10.0.200.101
>>>>>>> MainThread::CRITICAL::2014-03-03
>>>>>>> 11:20:39,590::agent::103::ovirt_hosted_engine_ha.agent.agent.Agent::(run)
>>>>>>> Could not start ha-agent
>>>>>>> Traceback (most recent call last):
>>>>>>>      File
>>>>>>>      "/usr/lib/python2.6/site-packages/ovirt_hosted_engine_ha/agent/agent.py",
>>>>>>>      line 97, in run
>>>>>>>        self._run_agent()
>>>>>>>      File
>>>>>>>      "/usr/lib/python2.6/site-packages/ovirt_hosted_engine_ha/agent/agent.py",
>>>>>>>      line 154, in _run_agent
>>>>>>>        hosted_engine.HostedEngine(self.shutdown_requested).start_monitoring()
>>>>>>>      File
>>>>>>>      "/usr/lib/python2.6/site-packages/ovirt_hosted_engine_ha/agent/hosted_engine.py",
>>>>>>>      line 152, in __init__
>>>>>>>        "STOP_VM": self._stop_engine_vm
>>>>>>>      File
>>>>>>>      "/usr/lib/python2.6/site-packages/ovirt_hosted_engine_ha/agent/state_machine.py",
>>>>>>>      line 56, in __init__
>>>>>>>        logger, actions)
>>>>>>>      File
>>>>>>>      "/usr/lib/python2.6/site-packages/ovirt_hosted_engine_ha/lib/fsm/machine.py",
>>>>>>>      line 93, in __init__
>>>>>>>        self._logger = FSMLoggerAdapter(logger, self)
>>>>>>>      File
>>>>>>>      "/usr/lib/python2.6/site-packages/ovirt_hosted_engine_ha/lib/fsm/machine.py",
>>>>>>>      line 16, in __init__
>>>>>>>        super(FSMLoggerAdapter, self).__init__(logger, None)
>>>>>>> TypeError: super() argument 1 must be type, not classobj
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> If I want to start my hosted engine, I receive the following error in
>>>>>>> vdsm
>>>>>>> logs, which makes absolutly no sense to me, as there is plenty of disk
>>>>>>> space available:
>>>>>>>
>>>>>>> Thread-62::DEBUG::2014-03-03
>>>>>>> 11:24:46,282::libvirtconnection::124::root::(wrapper) Unknown
>>>>>>> libvirterror: ecode: 38 edom: 42 level: 2 message: Failed
>>>>>>> to acquire lock: No space left on device
>>>>>>
>>>>>> seems like a vdsm failure in starting monitor the hosted engine storage
>>>>>> domain.
>>>>>> Can you attach vdsm logs?
>>>>>>
>>>>>>
>>>>>>
>>>>>>> Thread-62::DEBUG::2014-03-03
>>>>>>> 11:24:46,282::vm::2252::vm.Vm::(_startUnderlyingVm)
>>>>>>> vmId=`f26dd37e-13b5-430c-b2f2-ecd098b82a91`::_ongoingCreations released
>>>>>>> Thread-62::ERROR::2014-03-03
>>>>>>> 11:24:46,283::vm::2278::vm.Vm::(_startUnderlyingVm)
>>>>>>> vmId=`f26dd37e-13b5-430c-b2f2-ecd098b82a91`::The vm start process
>>>>>>> failed
>>>>>>> Traceback (most recent call last):
>>>>>>>      File "/usr/share/vdsm/vm.py", line 2238, in _startUnderlyingVm
>>>>>>>        self._run()
>>>>>>>      File "/usr/share/vdsm/vm.py", line 3159, in _run
>>>>>>>        self._connection.createXML(domxml, flags),
>>>>>>>      File
>>>>>>>      "/usr/lib64/python2.6/site-packages/vdsm/libvirtconnection.py",
>>>>>>>      line
>>>>>>>      92, in wrapper
>>>>>>>        ret = f(*args, **kwargs)
>>>>>>>      File "/usr/lib64/python2.6/site-packages/libvirt.py", line 2665, in
>>>>>>>      createXML
>>>>>>>        if ret is None:raise libvirtError('virDomainCreateXML() failed',
>>>>>>>        conn=self)
>>>>>>> libvirtError: Failed to acquire lock: No space left on device
>>>>>>> Thread-62::DEBUG::2014-03-03
>>>>>>> 11:24:46,286::vm::2720::vm.Vm::(setDownStatus)
>>>>>>> vmId=`f26dd37e-13b5-430c-b2f2-ecd098b82a91`::Changed state to Down:
>>>>>>> Failed
>>>>>>> to acquire lock: No space left on device
>>>>>>>
>>>>>>> # df -h | grep engine
>>>>>>> ovirt-host01:/engine     281G   21G  261G   8%
>>>>>>> /rhev/data-center/mnt/ovirt-host01:_engine
>>>>>>>
>>>>>>> # sudo -u vdsm dd if=/dev/zero
>>>>>>> of=/rhev/data-center/mnt/ovirt-host01:_engine/2851af27-8744-445d-9fb1-a0d083c8dc82/images/0e4d270f-2f7e-4b2b-847f-f114a4ba9bdc/test
>>>>>>> bs=512 count=100
>>>>>>> 100+0 records in
>>>>>>> 100+0 records out
>>>>>>> 51200 bytes (51 kB) copied, 0.0230566 s, 2.2 MB/s
>>>>>>>
>>>>>>>
>>>>>>> Could you give me some information on how to fix the ovirt-ha-agent and
>>>>>>> then hosted-engine storage issue? Thanks a lot.
>>>>>>>
>>>>>>> Btw, I had some issues during installation which I will explain in
>>>>>>> separate
>>>>>>> emails.
>>>>>>>
>>>>>>>
>>>>>>
>>>>>>
>>>>>> --
>>>>>> Sandro Bonazzola
>>>>>> Better technology. Faster innovation. Powered by community
>>>>>> collaboration.
>>>>>> See how it works at redhat.com
>>>>>>
>>>>
>>



More information about the Users mailing list