[Users] hosted engine issues

Martin Sivak msivak at redhat.com
Mon Mar 3 16:27:47 UTC 2014


The agent is required. In fact it contains all the logic.

--
Martin Sivák
msivak at redhat.com
Red Hat Czech
RHEV-M SLA / Brno, CZ

----- Original Message -----
> On 03/03/2014 02:05 PM, Martin Sivak wrote:
> > Hi René,
> >
> >> # python --version
> >> Python 2.6.6
> >
> > Then I guess the traceback is my fault...
> >
> > See http://gerrit.ovirt.org/#/c/25269/ for the fix. I will try to get it
> > into the soonest release possible.
> 
> 
> Thanks. Do I have to patch the files manually or is ovirt-ha-agent not
> strictly required for hosted engine? Some features like restarting
> engine on 2nd node want work if ovirt-ha-agent isn't working, I guess.
> 
> >
> >> I can't see a full filesystem here:
> >>
> >
> > Me neither. Is everything Read-Write? Read-Only FS might report no space
> > left as well in some cases. Other than that, I do not know.
> 
> No, I can write to all disks.
> Btw, the same error message occurs on both nodes...
> 
> 
> Regards,
> René
> 
> 
> >
> > Regards
> > --
> > Martin Sivák
> > msivak at redhat.com
> > Red Hat Czech
> > RHEV-M SLA / Brno, CZ
> >
> > ----- Original Message -----
> >> On 03/03/2014 12:05 PM, Martin Sivak wrote:
> >>> Hi René,
> >>>
> >>> thanks for the report.
> >>>
> >>>>> TypeError: super() argument 1 must be type, not classobj
> >>> What Python version are you using?
> >>
> >> # python --version
> >> Python 2.6.6
> >>
> >>>
> >>> You can debug a crash of this version of ha-agent using:
> >>>
> >>> /usr/share/ovirt-hosted-engine-ha/ovirt-ha-agent --no-daemon --pdb
> >>
> >> This gives me the same information as in vdsm.log
> >>
> >>>
> >>> But this exception is trying to tell you that
> >>> FSMLoggerAdapter(logging.LoggerAdapter) does not have object in the
> >>> ancestor list. And that is very weird.
> >>>
> >>> It can be related to the disk space issues.
> >>>
> >>>>> libvirtError: Failed to acquire lock: No space left on device
> >>>
> >>> Check the free space on all your devices, including /tmp and /var. Or
> >>> post
> >>> the output of "df -h" command here
> >>
> >> I can't see a full filesystem here:
> >>
> >> # df -h
> >> Filesystem               Size  Used Avail Use% Mounted on
> >> /dev/mapper/vg0-lv_root  5.0G  1.1G  3.6G  24% /
> >> tmpfs                     16G     0   16G   0% /dev/shm
> >> /dev/sda1                243M   45M  185M  20% /boot
> >> /dev/mapper/vg0-lv_data  281G   21G  261G   8% /data
> >> /dev/mapper/vg0-lv_tmp   2.0G   69M  1.9G   4% /tmp
> >> /dev/mapper/vg0-lv_var   5.0G  384M  4.3G   9% /var
> >> ovirt-host01:/engine     281G   21G  261G   8%
> >> /rhev/data-center/mnt/ovirt-host01:_engine
> >>
> >>
> >> Thanks,
> >> René
> >>
> >>
> >>>
> >>> Regards
> >>>
> >>> --
> >>> Martin Sivák
> >>> msivak at redhat.com
> >>> Red Hat Czech
> >>> RHEV-M SLA / Brno, CZ
> >>>
> >>> ----- Original Message -----
> >>>> Il 03/03/2014 11:33, René Koch ha scritto:
> >>>>> Hi,
> >>>>>
> >>>>> I have some issues with hosted engine (oVirt 3.4 prerelease repo on
> >>>>> CentOS
> >>>>> 6.5).
> >>>>> My setups is the following:
> >>>>> 2 hosts (will be 4 in the future) with 4 GlusterFS shares:
> >>>>> - engine (for hosted engine)
> >>>>> - iso (for ISO domain)
> >>>>> - ovirt (oVirt storage domain)
> >>>>>
> >>>>> I had a split-brain situation today (after rebooting both nodes) on
> >>>>> hosted-engine.lockspace file on engine GlusterFS volume which I
> >>>>> resolved.
> >>>>
> >>>> How did you solved it? By switching to NFS only?
> >>>>
> >>>>
> >>>>> hosted engine used engine share via NFS (TCP) as glusterfs isn't
> >>>>> supported
> >>>>> for oVirt hosted engine, yet. I'll switch to GlusterFS as soon as oVirt
> >>>>> will support it (I hope this will be soon as RHEV 3.3 is already
> >>>>> supporting
> >>>>> GlusterFS for hosted engine).
> >>>>>
> >>>>>
> >>>>> First of all ovirt-ha-agent fails to start on both nodes:
> >>>>>
> >>>>> # service ovirt-ha-agent start
> >>>>> Starting ovirt-ha-agent:                                   [  OK  ]
> >>>>> # service ovirt-ha-agent status
> >>>>> ovirt-ha-agent dead but subsys locked
> >>>>>
> >>>>>
> >>>>> MainThread::INFO::2014-03-03
> >>>>> 11:20:39,539::agent::52::ovirt_hosted_engine_ha.agent.agent.Agent::(run)
> >>>>> ovirt-hosted-engine-ha agent 1.1.0 started
> >>>>> MainThread::INFO::2014-03-03
> >>>>> 11:20:39,590::hosted_engine::223::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(_get_hostname)
> >>>>> Found
> >>>>> certificate common name: 10.0.200.101
> >>>>> MainThread::CRITICAL::2014-03-03
> >>>>> 11:20:39,590::agent::103::ovirt_hosted_engine_ha.agent.agent.Agent::(run)
> >>>>> Could not start ha-agent
> >>>>> Traceback (most recent call last):
> >>>>>     File
> >>>>>     "/usr/lib/python2.6/site-packages/ovirt_hosted_engine_ha/agent/agent.py",
> >>>>>     line 97, in run
> >>>>>       self._run_agent()
> >>>>>     File
> >>>>>     "/usr/lib/python2.6/site-packages/ovirt_hosted_engine_ha/agent/agent.py",
> >>>>>     line 154, in _run_agent
> >>>>>       hosted_engine.HostedEngine(self.shutdown_requested).start_monitoring()
> >>>>>     File
> >>>>>     "/usr/lib/python2.6/site-packages/ovirt_hosted_engine_ha/agent/hosted_engine.py",
> >>>>>     line 152, in __init__
> >>>>>       "STOP_VM": self._stop_engine_vm
> >>>>>     File
> >>>>>     "/usr/lib/python2.6/site-packages/ovirt_hosted_engine_ha/agent/state_machine.py",
> >>>>>     line 56, in __init__
> >>>>>       logger, actions)
> >>>>>     File
> >>>>>     "/usr/lib/python2.6/site-packages/ovirt_hosted_engine_ha/lib/fsm/machine.py",
> >>>>>     line 93, in __init__
> >>>>>       self._logger = FSMLoggerAdapter(logger, self)
> >>>>>     File
> >>>>>     "/usr/lib/python2.6/site-packages/ovirt_hosted_engine_ha/lib/fsm/machine.py",
> >>>>>     line 16, in __init__
> >>>>>       super(FSMLoggerAdapter, self).__init__(logger, None)
> >>>>> TypeError: super() argument 1 must be type, not classobj
> >>>>>
> >>>>>
> >>>>>
> >>>>> If I want to start my hosted engine, I receive the following error in
> >>>>> vdsm
> >>>>> logs, which makes absolutly no sense to me, as there is plenty of disk
> >>>>> space available:
> >>>>>
> >>>>> Thread-62::DEBUG::2014-03-03
> >>>>> 11:24:46,282::libvirtconnection::124::root::(wrapper) Unknown
> >>>>> libvirterror: ecode: 38 edom: 42 level: 2 message: Failed
> >>>>> to acquire lock: No space left on device
> >>>>
> >>>> seems like a vdsm failure in starting monitor the hosted engine storage
> >>>> domain.
> >>>> Can you attach vdsm logs?
> >>>>
> >>>>
> >>>>
> >>>>> Thread-62::DEBUG::2014-03-03
> >>>>> 11:24:46,282::vm::2252::vm.Vm::(_startUnderlyingVm)
> >>>>> vmId=`f26dd37e-13b5-430c-b2f2-ecd098b82a91`::_ongoingCreations released
> >>>>> Thread-62::ERROR::2014-03-03
> >>>>> 11:24:46,283::vm::2278::vm.Vm::(_startUnderlyingVm)
> >>>>> vmId=`f26dd37e-13b5-430c-b2f2-ecd098b82a91`::The vm start process
> >>>>> failed
> >>>>> Traceback (most recent call last):
> >>>>>     File "/usr/share/vdsm/vm.py", line 2238, in _startUnderlyingVm
> >>>>>       self._run()
> >>>>>     File "/usr/share/vdsm/vm.py", line 3159, in _run
> >>>>>       self._connection.createXML(domxml, flags),
> >>>>>     File
> >>>>>     "/usr/lib64/python2.6/site-packages/vdsm/libvirtconnection.py",
> >>>>>     line
> >>>>>     92, in wrapper
> >>>>>       ret = f(*args, **kwargs)
> >>>>>     File "/usr/lib64/python2.6/site-packages/libvirt.py", line 2665, in
> >>>>>     createXML
> >>>>>       if ret is None:raise libvirtError('virDomainCreateXML() failed',
> >>>>>       conn=self)
> >>>>> libvirtError: Failed to acquire lock: No space left on device
> >>>>> Thread-62::DEBUG::2014-03-03
> >>>>> 11:24:46,286::vm::2720::vm.Vm::(setDownStatus)
> >>>>> vmId=`f26dd37e-13b5-430c-b2f2-ecd098b82a91`::Changed state to Down:
> >>>>> Failed
> >>>>> to acquire lock: No space left on device
> >>>>>
> >>>>> # df -h | grep engine
> >>>>> ovirt-host01:/engine     281G   21G  261G   8%
> >>>>> /rhev/data-center/mnt/ovirt-host01:_engine
> >>>>>
> >>>>> # sudo -u vdsm dd if=/dev/zero
> >>>>> of=/rhev/data-center/mnt/ovirt-host01:_engine/2851af27-8744-445d-9fb1-a0d083c8dc82/images/0e4d270f-2f7e-4b2b-847f-f114a4ba9bdc/test
> >>>>> bs=512 count=100
> >>>>> 100+0 records in
> >>>>> 100+0 records out
> >>>>> 51200 bytes (51 kB) copied, 0.0230566 s, 2.2 MB/s
> >>>>>
> >>>>>
> >>>>> Could you give me some information on how to fix the ovirt-ha-agent and
> >>>>> then hosted-engine storage issue? Thanks a lot.
> >>>>>
> >>>>> Btw, I had some issues during installation which I will explain in
> >>>>> separate
> >>>>> emails.
> >>>>>
> >>>>>
> >>>>
> >>>>
> >>>> --
> >>>> Sandro Bonazzola
> >>>> Better technology. Faster innovation. Powered by community
> >>>> collaboration.
> >>>> See how it works at redhat.com
> >>>>
> >>
> 



More information about the Users mailing list