Hi,
I use this version : ovirt-hosted-engine-ha-1.1.2-1.el6.noarch
For 3 days, my engine-ha worked perfectly but i tried to snapshot a Vm and
ha service make defunct ==> 400% CPU !!
Agent.log and broker.log says nothing. But vdsm.log i have errors :
Thread-9462::DEBUG::2014-04-28
07:23:58,994::libvirtconnection::124::root::(wrapper) Unknown libvirterror:
ecode: 84 edom: 10 level: 2 message: Operation not supported: live disk
snapshot not supported with this QEMU binary
Thread-9462::ERROR::2014-04-28 07:23:58,995::vm::4006::vm.Vm::(snapshot)
vmId=`773f6e6d-c670-49f3-ae8c-dfbcfa22d0a5`::Unable to take snapshot
Thread-9352::DEBUG::2014-04-28
08:41:39,922::lvm::295::Storage.Misc.excCmd::(cmd) '/usr/bin/sudo -n
/sbin/lvm vgs --config " devices { preferred_names = [\\"^/dev/mapper/\\"]
ignore_suspended_devices=1 write_cache_state=0 disable_after_error_count=3
obtain_device_list_from_udev=0 filter = [ \'r|.*|\' ] } global {
locking_type=1 prioritise_write_locks=1 wait_for_locks=1 } backup {
retain_min = 50 retain_days = 0 } " --noheadings --units b --nosuffix
--separator | -o
uuid,name,attr,size,free,extent_size,extent_count,free_count,tags,vg_mda_size,vg_mda_free,lv_count,pv_count,pv_name
cc51143e-8ad7-4b0b-a4d2-9024dffc1188 ff98d346-4515-4349-8437-fb2f5e9eaadf'
(cwd None)
I'll try to reboot my node with hosted-engine.
2014-04-25 13:54 GMT+02:00 Martin Sivak <msivak(a)redhat.com>:
Hi Kevin,
can you please tell us what version of hosted-engine are you running?
rpm -q ovirt-hosted-engine-ha
Also, do I understand it correctly that the engine VM is running, but you
see bad status when you execute the hosted-engine --vm-status command?
If that is so, can you give us current logs from
/var/log/ovirt-hosted-engine-ha?
--
Martin Sivák
msivak(a)redhat.com
Red Hat Czech
RHEV-M SLA / Brno, CZ
----- Original Message -----
> Ok i mount manualy the domain for hosted engine and agent go up.
>
> But vm-status :
>
> --== Host 2 status ==--
>
> Status up-to-date : False
> Hostname : 192.168.99.103
> Host ID : 2
> Engine status : unknown stale-data
> Score : 0
> Local maintenance : False
> Host timestamp : 1398333438
>
> And in my engine, host02 Ha is no active.
>
>
> 2014-04-24 12:48 GMT+02:00 Kevin Tibi <kevintibi(a)hotmail.com>:
>
> > Hi,
> >
> > I try to reboot my hosts and now [supervdsmServer] is <defunct>.
> >
> > /var/log/vdsm/supervdsm.log
> >
> >
> > MainProcess|Thread-120::DEBUG::2014-04-24
> > 12:22:19,955::supervdsmServer::103::SuperVdsm.ServerCallback::(wrapper)
> > return validateAccess with None
> > MainProcess|Thread-120::DEBUG::2014-04-24
> > 12:22:20,010::supervdsmServer::96::SuperVdsm.ServerCallback::(wrapper)
call
> > validateAccess with ('qemu', ('qemu', 'kvm'),
> > '/rhev/data-center/mnt/host01.ovirt.lan:_home_export', 5) {}
> > MainProcess|Thread-120::DEBUG::2014-04-24
> > 12:22:20,014::supervdsmServer::103::SuperVdsm.ServerCallback::(wrapper)
> > return validateAccess with None
> > MainProcess|Thread-120::DEBUG::2014-04-24
> > 12:22:20,059::supervdsmServer::96::SuperVdsm.ServerCallback::(wrapper)
call
> > validateAccess with ('qemu', ('qemu', 'kvm'),
> > '/rhev/data-center/mnt/host01.ovirt.lan:_home_iso', 5) {}
> > MainProcess|Thread-120::DEBUG::2014-04-24
> > 12:22:20,063::supervdsmServer::103::SuperVdsm.ServerCallback::(wrapper)
> > return validateAccess with None
> >
> > and one host don't mount the NFS used for hosted engine.
> >
> > MainThread::CRITICAL::2014-04-24
> >
12:36:16,603::agent::103::ovirt_hosted_engine_ha.agent.agent.Agent::(run)
> > Could not start ha-agent
> > Traceback (most recent call last):
> > File
> >
"/usr/lib/python2.6/site-packages/ovirt_hosted_engine_ha/agent/agent.py",
> > line 97, in run
> > self._run_agent()
> > File
> >
"/usr/lib/python2.6/site-packages/ovirt_hosted_engine_ha/agent/agent.py",
> > line 154, in _run_agent
> >
hosted_engine.HostedEngine(self.shutdown_requested).start_monitoring()
> > File
> >
"/usr/lib/python2.6/site-packages/ovirt_hosted_engine_ha/agent/hosted_engine.py",
> > line 299, in start_monitoring
> > self._initialize_vdsm()
> > File
> >
"/usr/lib/python2.6/site-packages/ovirt_hosted_engine_ha/agent/hosted_engine.py",
> > line 418, in _initialize_vdsm
> > self._sd_path = env_path.get_domain_path(self._config)
> > File
> >
"/usr/lib/python2.6/site-packages/ovirt_hosted_engine_ha/env/path.py",
line
> > 40, in get_domain_path
> > .format(sd_uuid, parent))
> > Exception: path to storage domain aea040f8-ab9d-435b-9ecf-ddd4272e592f
not
> > found in /rhev/data-center/mnt
> >
> >
> >
> > 2014-04-23 17:40 GMT+02:00 Kevin Tibi <kevintibi(a)hotmail.com>:
> >
> > top
> >> 1729 vdsm 20 0 0 0 0 Z 373.8 0.0 252:08.51
> >> ovirt-ha-broker <defunct>
> >>
> >>
> >> [root@host01 ~]# ps axwu | grep 1729
> >> vdsm 1729 0.7 0.0 0 0 ? Zl Apr02 240:24
> >> [ovirt-ha-broker] <defunct>
> >>
> >> [root@host01 ~]# ll
> >>
/rhev/data-center/mnt/host01.ovirt.lan\:_home_NFS01/aea040f8-ab9d-435b-9ecf-ddd4272e592f/ha_agent/
> >> total 2028
> >> -rw-rw----. 1 vdsm kvm 1048576 23 avril 17:35 hosted-engine.lockspace
> >> -rw-rw----. 1 vdsm kvm 1028096 23 avril 17:35 hosted-engine.metadata
> >>
> >> cat /var/log/vdsm/vdsm.log
> >>
> >> Thread-120518::DEBUG::2014-04-23
> >> 17:38:02,299::task::1185::TaskManager.Task::(prepare)
> >> Task=`f13e71f1-ac7c-49ab-8079-8f099ebf72b6`::finished:
> >> {'aea040f8-ab9d-435b-9ecf-ddd4272e592f': {'code': 0,
'version': 3,
> >> 'acquired': True, 'delay': '0.000410963',
'lastCheck': '3.4', 'valid':
> >> True}, '5ae613a4-44e4-42cb-89fc-7b5d34c1f30f': {'code': 0,
'version':
3,
> >> 'acquired': True, 'delay': '0.000412357',
'lastCheck': '6.8', 'valid':
> >> True}, 'cc51143e-8ad7-4b0b-a4d2-9024dffc1188': {'code': 0,
'version':
0,
> >> 'acquired': True, 'delay': '0.000455292',
'lastCheck': '1.2', 'valid':
> >> True}, 'ff98d346-4515-4349-8437-fb2f5e9eaadf': {'code': 0,
'version':
0,
> >> 'acquired': True, 'delay': '0.00817113',
'lastCheck': '1.7', 'valid':
> >> True}}
> >> Thread-120518::DEBUG::2014-04-23
> >> 17:38:02,300::task::595::TaskManager.Task::(_updateState)
> >> Task=`f13e71f1-ac7c-49ab-8079-8f099ebf72b6`::moving from state
preparing
> >> ->
> >> state finished
> >> Thread-120518::DEBUG::2014-04-23
> >>
17:38:02,300::resourceManager::940::ResourceManager.Owner::(releaseAll)
> >> Owner.releaseAll requests {} resources {}
> >> Thread-120518::DEBUG::2014-04-23
> >> 17:38:02,300::resourceManager::977::ResourceManager.Owner::(cancelAll)
> >> Owner.cancelAll requests {}
> >> Thread-120518::DEBUG::2014-04-23
> >> 17:38:02,300::task::990::TaskManager.Task::(_decref)
> >> Task=`f13e71f1-ac7c-49ab-8079-8f099ebf72b6`::ref 0 aborting False
> >> Thread-120518::ERROR::2014-04-23
> >>
17:38:02,302::brokerlink::72::ovirt_hosted_engine_ha.lib.brokerlink.BrokerLink::(connect)
> >> Failed to connect to broker: [Errno 2] No such file or directory
> >> Thread-120518::ERROR::2014-04-23
> >> 17:38:02,302::API::1612::vds::(_getHaInfo) failed to retrieve Hosted
> >> Engine
> >> HA info
> >> Traceback (most recent call last):
> >> File "/usr/share/vdsm/API.py", line 1603, in _getHaInfo
> >> stats = instance.get_all_stats()
> >> File
> >>
"/usr/lib/python2.6/site-packages/ovirt_hosted_engine_ha/client/client.py",
> >> line 83, in get_all_stats
> >> with broker.connection():
> >> File "/usr/lib64/python2.6/contextlib.py", line 16, in
__enter__
> >> return self.gen.next()
> >> File
> >>
"/usr/lib/python2.6/site-packages/ovirt_hosted_engine_ha/lib/brokerlink.py",
> >> line 96, in connection
> >> self.connect()
> >> File
> >>
"/usr/lib/python2.6/site-packages/ovirt_hosted_engine_ha/lib/brokerlink.py",
> >> line 64, in connect
> >> self._socket.connect(constants.BROKER_SOCKET_FILE)
> >> File "<string>", line 1, in connect
> >> error: [Errno 2] No such file or directory
> >> Thread-78::DEBUG::2014-04-23
> >> 17:38:05,490::fileSD::225::Storage.Misc.excCmd::(getReadDelay)
'/bin/dd
> >> iflag=direct
> >>
if=/rhev/data-center/mnt/host01.ovirt.lan:_home_DATA/5ae613a4-44e4-42cb-89fc-7b5d34c1f30f/dom_md/metadata
> >> bs=4096 count=1' (cwd None)
> >> Thread-78::DEBUG::2014-04-23
> >> 17:38:05,523::fileSD::225::Storage.Misc.excCmd::(getReadDelay)
SUCCESS:
> >> <err> = '0+1 records in\n0+1 records out\n545 bytes (545 B)
copied,
> >> 0.000412209 s, 1.3 MB/s\n'; <rc> = 0
> >>
> >>
> >>
> >>
> >> 2014-04-23 17:27 GMT+02:00 Martin Sivak <msivak(a)redhat.com>:
> >>
> >> Hi Kevin,
> >>>
> >>> > same pb.
> >>>
> >>> Are you missing the lockspace file as well while running on top of
> >>> GlusterFS?
> >>>
> >>> > ovirt-ha-broker have 400% cpu and is defunct. I can't kill
with -9.
> >>>
> >>> Defunct process eating full four cores? I wonder how is that
possible..
> >>> What are the status flags of that process when you do ps axwu?
> >>>
> >>> Can you attach the log files please?
> >>>
> >>> --
> >>> Martin Sivák
> >>> msivak(a)redhat.com
> >>> Red Hat Czech
> >>> RHEV-M SLA / Brno, CZ
> >>>
> >>> ----- Original Message -----
> >>> > same pb. ovirt-ha-broker have 400% cpu and is defunct. I can't
kill
> >>> with -9.
> >>> >
> >>> >
> >>> > 2014-04-23 13:55 GMT+02:00 Martin Sivak
<msivak(a)redhat.com>:
> >>> >
> >>> > > Hi,
> >>> > >
> >>> > > > Isn't this file created when hosted engine is
started?
> >>> > >
> >>> > > The file is created by the setup script. If it got lost then
there
> >>> was
> >>> > > probably something bad happening in your NFS or Gluster
storage.
> >>> > >
> >>> > > > Or how can I create this file manually?
> >>> > >
> >>> > > I can give you experimental treatment for this. We do not
have
any
> >>> > > official way as this is something that should not ever happen
:)
> >>> > >
> >>> > > !! But before you do that make sure you do not have any
nodes
running
> >>> > > properly. This will destroy and reinitialize the lockspace
database
> >>> for the
> >>> > > whole hosted-engine environment (which you apparently lack,
but..).
> >>> !!
> >>> > >
> >>> > > You have to create the ha_agent/hosted-engine.lockspace file
with the
> >>> > > expected size (1MB) and then tell sanlock to initialize it as
a
> >>> lockspace
> >>> > > using:
> >>> > >
> >>> > > # python
> >>> > > >>> import sanlock
> >>> > > >>>
sanlock.write_lockspace(lockspace="hosted-engine",
> >>> > > ... path="/rhev/data-center/mnt/<nfs>/<hosted
engine storage
> >>> > > domain>/ha_agent/hosted-engine.lockspace",
> >>> > > ... offset=0)
> >>> > > >>>
> >>> > >
> >>> > > Then try starting the services (both broker and agent)
again.
> >>> > >
> >>> > > --
> >>> > > Martin Sivák
> >>> > > msivak(a)redhat.com
> >>> > > Red Hat Czech
> >>> > > RHEV-M SLA / Brno, CZ
> >>> > >
> >>> > >
> >>> > > ----- Original Message -----
> >>> > > > On 04/23/2014 11:08 AM, Martin Sivak wrote:
> >>> > > > > Hi René,
> >>> > > > >
> >>> > > > >>>> libvirtError: Failed to acquire lock:
No space left on
device
> >>> > > > >
> >>> > > > >>>> 2014-04-22 12:38:17+0200 654 [3093]: r2
cmd_acquire
2,9,5733
> >>> invalid
> >>> > > > >>>> lockspace found -1 failed 0 name
> >>> > > 2851af27-8744-445d-9fb1-a0d083c8dc82
> >>> > > > >
> >>> > > > > Can you please check the contents of
/rhev/data-center/<your
nfs
> >>> > > > > mount>/<nfs domain uuid>/ha_agent/?
> >>> > > > >
> >>> > > > > This is how it should look like:
> >>> > > > >
> >>> > > > > [root@dev-03 ~]# ls -al
> >>> > > > >
> >>> > >
> >>>
/rhev/data-center/mnt/euryale\:_home_ovirt_he/e16de6a2-53f5-4ab3-95a3-255d08398824/ha_agent/
> >>> > > > > total 2036
> >>> > > > > drwxr-x---. 2 vdsm kvm 4096 Mar 19 18:46 .
> >>> > > > > drwxr-xr-x. 6 vdsm kvm 4096 Mar 19 18:46 ..
> >>> > > > > -rw-rw----. 1 vdsm kvm 1048576 Apr 23 11:05
> >>> hosted-engine.lockspace
> >>> > > > > -rw-rw----. 1 vdsm kvm 1028096 Mar 19 18:46
> >>> hosted-engine.metadata
> >>> > > > >
> >>> > > > > The errors seem to indicate that you somehow lost
the
lockspace
> >>> file.
> >>> > > >
> >>> > > > True :)
> >>> > > > Isn't this file created when hosted engine is
started? Or how
can I
> >>> > > > create this file manually?
> >>> > > >
> >>> > > > >
> >>> > > > > --
> >>> > > > > Martin Sivák
> >>> > > > > msivak(a)redhat.com
> >>> > > > > Red Hat Czech
> >>> > > > > RHEV-M SLA / Brno, CZ
> >>> > > > >
> >>> > > > > ----- Original Message -----
> >>> > > > >> On 04/23/2014 12:28 AM, Doron Fediuck wrote:
> >>> > > > >>> Hi Rene,
> >>> > > > >>> any idea what closed your ovirtmgmt
bridge?
> >>> > > > >>> as long as it is down vdsm may have issues
starting up
properly
> >>> > > > >>> and this is why you see the complaints on
the rpc server.
> >>> > > > >>>
> >>> > > > >>> Can you try manually fixing the network
part first and then
> >>> > > > >>> restart vdsm?
> >>> > > > >>> Once vdsm is happy hosted engine VM will
start.
> >>> > > > >>
> >>> > > > >> Thanks for your feedback, Doron.
> >>> > > > >>
> >>> > > > >> My ovirtmgmt bridge seems to be on or isn't
it:
> >>> > > > >> # brctl show ovirtmgmt
> >>> > > > >> bridge name bridge id STP
enabled
> >>> interfaces
> >>> > > > >> ovirtmgmt 8000.0025907587c2 no
> >>> eth0.200
> >>> > > > >>
> >>> > > > >> # ip a s ovirtmgmt
> >>> > > > >> 7: ovirtmgmt:
<BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500
qdisc
> >>> noqueue
> >>> > > > >> state UNKNOWN
> >>> > > > >> link/ether 00:25:90:75:87:c2 brd
ff:ff:ff:ff:ff:ff
> >>> > > > >> inet 10.0.200.102/24 brd 10.0.200.255
scope global
> >>> ovirtmgmt
> >>> > > > >> inet6 fe80::225:90ff:fe75:87c2/64 scope
link
> >>> > > > >> valid_lft forever preferred_lft
forever
> >>> > > > >>
> >>> > > > >> # ip a s eth0.200
> >>> > > > >> 6: eth0.200@eth0:
<BROADCAST,MULTICAST,UP,LOWER_UP> mtu
1500
> >>> qdisc
> >>> > > > >> noqueue state UP
> >>> > > > >> link/ether 00:25:90:75:87:c2 brd
ff:ff:ff:ff:ff:ff
> >>> > > > >> inet6 fe80::225:90ff:fe75:87c2/64 scope
link
> >>> > > > >> valid_lft forever preferred_lft
forever
> >>> > > > >>
> >>> > > > >> I tried the following yesterday:
> >>> > > > >> Copy virtual disk from GlusterFS storage to
local disk of
host
> >>> and
> >>> > > > >> create a new vm with virt-manager which loads
ovirtmgmt
disk. I
> >>> could
> >>> > > > >> reach my engine over the ovirtmgmt bridge (so
bridge must be
> >>> working).
> >>> > > > >>
> >>> > > > >> I also started libvirtd with Option -v and I
saw the
following
> >>> in
> >>> > > > >> libvirtd.log when trying to start ovirt
engine:
> >>> > > > >> 2014-04-22 14:18:25.432+0000: 8901: debug :
> >>> virCommandRunAsync:2250 :
> >>> > > > >> Command result 0, with PID 11491
> >>> > > > >> 2014-04-22 14:18:25.478+0000: 8901: debug :
virCommandRun:2045 :
> >>> > > Result
> >>> > > > >> exit status 255, stdout: '' stderr:
'iptables v1.4.7: goto
> >>> 'FO-vnet0'
> >>> > > is
> >>> > > > >> not a chain
> >>> > > > >>
> >>> > > > >> So it could be that something is broken in my
hosted-engine
> >>> network.
> >>> > > Do
> >>> > > > >> you have any clue how I can troubleshoot this?
> >>> > > > >>
> >>> > > > >>
> >>> > > > >> Thanks,
> >>> > > > >> René
> >>> > > > >>
> >>> > > > >>
> >>> > > > >>>
> >>> > > > >>> ----- Original Message -----
> >>> > > > >>>> From: "René Koch"
<rkoch(a)linuxland.at>
> >>> > > > >>>> To: "Martin Sivak"
<msivak(a)redhat.com>
> >>> > > > >>>> Cc: users(a)ovirt.org
> >>> > > > >>>> Sent: Tuesday, April 22, 2014 1:46:38
PM
> >>> > > > >>>> Subject: Re: [ovirt-users] hosted
engine health check
issues
> >>> > > > >>>>
> >>> > > > >>>> Hi,
> >>> > > > >>>>
> >>> > > > >>>> I rebooted one of my ovirt hosts today
and the result is
now
> >>> that I
> >>> > > > >>>> can't start hosted-engine anymore.
> >>> > > > >>>>
> >>> > > > >>>> ovirt-ha-agent isn't running
because the lockspace file is
> >>> missing
> >>> > > > >>>> (sanlock complains about it).
> >>> > > > >>>> So I tried to start hosted-engine with
--vm-start and I
get
> >>> the
> >>> > > > >>>> following errors:
> >>> > > > >>>>
> >>> > > > >>>> ==> /var/log/sanlock.log <==
> >>> > > > >>>> 2014-04-22 12:38:17+0200 654 [3093]: r2
cmd_acquire
2,9,5733
> >>> invalid
> >>> > > > >>>> lockspace found -1 failed 0 name
> >>> > > 2851af27-8744-445d-9fb1-a0d083c8dc82
> >>> > > > >>>>
> >>> > > > >>>> ==> /var/log/messages <==
> >>> > > > >>>> Apr 22 12:38:17 ovirt-host02
sanlock[3079]: 2014-04-22
> >>> > > 12:38:17+0200 654
> >>> > > > >>>> [3093]: r2 cmd_acquire 2,9,5733 invalid
lockspace found -1
> >>> failed 0
> >>> > > name
> >>> > > > >>>> 2851af27-8744-445d-9fb1-a0d083c8dc82
> >>> > > > >>>> Apr 22 12:38:17 ovirt-host02 kernel:
ovirtmgmt: port
2(vnet0)
> >>> > > entering
> >>> > > > >>>> disabled state
> >>> > > > >>>> Apr 22 12:38:17 ovirt-host02 kernel:
device vnet0 left
> >>> promiscuous
> >>> > > mode
> >>> > > > >>>> Apr 22 12:38:17 ovirt-host02 kernel:
ovirtmgmt: port
2(vnet0)
> >>> > > entering
> >>> > > > >>>> disabled state
> >>> > > > >>>>
> >>> > > > >>>> ==> /var/log/vdsm/vdsm.log <==
> >>> > > > >>>> Thread-21::DEBUG::2014-04-22
> >>> > > > >>>>
12:38:17,563::libvirtconnection::124::root::(wrapper)
Unknown
> >>> > > > >>>> libvirterror: ecode: 38 edom: 42 level:
2 message: Failed
to
> >>> acquire
> >>> > > > >>>> lock: No space left on device
> >>> > > > >>>> Thread-21::DEBUG::2014-04-22
> >>> > > > >>>>
12:38:17,563::vm::2263::vm.Vm::(_startUnderlyingVm)
> >>> > > > >>>>
vmId=`f26dd37e-13b5-430c-b2f2-ecd098b82a91`::_ongoingCreations
> >>> > > released
> >>> > > > >>>> Thread-21::ERROR::2014-04-22
> >>> > > > >>>>
12:38:17,564::vm::2289::vm.Vm::(_startUnderlyingVm)
> >>> > > > >>>>
vmId=`f26dd37e-13b5-430c-b2f2-ecd098b82a91`::The vm start
> >>> process
> >>> > > failed
> >>> > > > >>>> Traceback (most recent call last):
> >>> > > > >>>> File
"/usr/share/vdsm/vm.py", line 2249, in
> >>> _startUnderlyingVm
> >>> > > > >>>> self._run()
> >>> > > > >>>> File
"/usr/share/vdsm/vm.py", line 3170, in _run
> >>> > > > >>>>
self._connection.createXML(domxml, flags),
> >>> > > > >>>> File
> >>> > > > >>>>
> >>>
"/usr/lib64/python2.6/site-packages/vdsm/libvirtconnection.py",
> >>> > > > >>>> line 92, in wrapper
> >>> > > > >>>> ret = f(*args, **kwargs)
> >>> > > > >>>> File
"/usr/lib64/python2.6/site-packages/libvirt.py",
> >>> line
> >>> > > 2665, in
> >>> > > > >>>> createXML
> >>> > > > >>>> if ret is None:raise
libvirtError('virDomainCreateXML()
> >>> > > failed',
> >>> > > > >>>> conn=self)
> >>> > > > >>>> libvirtError: Failed to acquire lock:
No space left on
device
> >>> > > > >>>>
> >>> > > > >>>> ==> /var/log/messages <==
> >>> > > > >>>> Apr 22 12:38:17 ovirt-host02 vdsm vm.Vm
ERROR
> >>> > > > >>>>
vmId=`f26dd37e-13b5-430c-b2f2-ecd098b82a91`::The vm start
> >>> process
> >>> > > > >>>> failed#012Traceback (most recent call
last):#012 File
> >>> > > > >>>> "/usr/share/vdsm/vm.py", line
2249, in
_startUnderlyingVm#012
> >>> > > > >>>> self._run()#012 File
"/usr/share/vdsm/vm.py", line 3170,
in
> >>> > > _run#012
> >>> > > > >>>> self._connection.createXML(domxml,
flags),#012 File
> >>> > > > >>>>
> >>>
"/usr/lib64/python2.6/site-packages/vdsm/libvirtconnection.py",
> >>> > > line 92,
> >>> > > > >>>> in wrapper#012 ret = f(*args,
**kwargs)#012 File
> >>> > > > >>>>
"/usr/lib64/python2.6/site-packages/libvirt.py", line
2665, in
> >>> > > > >>>> createXML#012 if ret is None:raise
> >>> > > libvirtError('virDomainCreateXML()
> >>> > > > >>>> failed',
conn=self)#012libvirtError: Failed to acquire
lock:
> >>> No
> >>> > > space
> >>> > > > >>>> left on device
> >>> > > > >>>>
> >>> > > > >>>> ==> /var/log/vdsm/vdsm.log <==
> >>> > > > >>>> Thread-21::DEBUG::2014-04-22
> >>> > > > >>>>
12:38:17,569::vm::2731::vm.Vm::(setDownStatus)
> >>> > > > >>>>
vmId=`f26dd37e-13b5-430c-b2f2-ecd098b82a91`::Changed
state to
> >>> Down:
> >>> > > > >>>> Failed to acquire lock: No space left
on device
> >>> > > > >>>>
> >>> > > > >>>>
> >>> > > > >>>> No space left on device is nonsense as
there is enough
space
> >>> (I had
> >>> > > this
> >>> > > > >>>> issue last time as well where I had to
patch machine.py,
but
> >>> this
> >>> > > file
> >>> > > > >>>> is now Python 2.6.6 compatible.
> >>> > > > >>>>
> >>> > > > >>>> Any idea what prevents hosted-engine
from starting?
> >>> > > > >>>> ovirt-ha-broker, vdsmd and sanlock are
running btw.
> >>> > > > >>>>
> >>> > > > >>>> Btw, I can see in log that json rpc
server module is
missing
> >>> - which
> >>> > > > >>>> package is required for CentOS 6.5?
> >>> > > > >>>> Apr 22 12:37:14 ovirt-host02 vdsm vds
WARNING Unable to
load
> >>> the
> >>> > > json
> >>> > > > >>>> rpc server module. Please make sure it
is installed.
> >>> > > > >>>>
> >>> > > > >>>>
> >>> > > > >>>> Thanks,
> >>> > > > >>>> René
> >>> > > > >>>>
> >>> > > > >>>>
> >>> > > > >>>>
> >>> > > > >>>> On 04/17/2014 10:02 AM, Martin Sivak
wrote:
> >>> > > > >>>>> Hi,
> >>> > > > >>>>>
> >>> > > > >>>>>>>> How can I disable
notifications?
> >>> > > > >>>>>
> >>> > > > >>>>> The notification is configured in
> >>> > > > >>>>>
/etc/ovirt-hosted-engine-ha/broker.conf
> >>> > > > >>>>> section notification.
> >>> > > > >>>>> The email is sent when the key
state_transition exists
and
> >>> the
> >>> > > string
> >>> > > > >>>>> OldState-NewState contains the
(case insensitive) regexp
> >>> from the
> >>> > > > >>>>> value.
> >>> > > > >>>>>
> >>> > > > >>>>>>>> Is it intended to send
out these messages and detect
that
> >>> ovirt
> >>> > > > >>>>>>>> engine
> >>> > > > >>>>>>>> is down (which is false
anyway), but not to restart
the
> >>> vm?
> >>> > > > >>>>>
> >>> > > > >>>>> Forget about emails for now and
check the
> >>> > > > >>>>>
/var/log/ovirt-hosted-engine-ha/agent.log and broker.log
(and
> >>> > > attach
> >>> > > > >>>>> them
> >>> > > > >>>>> as well btw).
> >>> > > > >>>>>
> >>> > > > >>>>>>>> oVirt hosts think that
hosted engine is down because
it
> >>> seems
> >>> > > that
> >>> > > > >>>>>>>> hosts
> >>> > > > >>>>>>>> can't write to
hosted-engine.lockspace due to
glusterfs
> >>> issues
> >>> > > (or
> >>> > > > >>>>>>>> at
> >>> > > > >>>>>>>> least I think so).
> >>> > > > >>>>>
> >>> > > > >>>>> The hosts think so or can't
really write there? The
> >>> lockspace is
> >>> > > > >>>>> managed
> >>> > > > >>>>> by
> >>> > > > >>>>> sanlock and our HA daemons do not
touch it at all. We
only
> >>> ask
> >>> > > sanlock
> >>> > > > >>>>> to
> >>> > > > >>>>> get make sure we have unique server
id.
> >>> > > > >>>>>
> >>> > > > >>>>>>>> Is is possible or
planned to make the whole ha feature
> >>> optional?
> >>> > > > >>>>>
> >>> > > > >>>>> Well the system won't perform
any automatic actions if
you
> >>> put the
> >>> > > > >>>>> hosted
> >>> > > > >>>>> engine to global maintenance and
only start/stop/migrate
the
> >>> VM
> >>> > > > >>>>> manually.
> >>> > > > >>>>> I would discourage you from
stopping agent/broker,
because
> >>> the
> >>> > > engine
> >>> > > > >>>>> itself has some logic based on the
reporting.
> >>> > > > >>>>>
> >>> > > > >>>>> Regards
> >>> > > > >>>>>
> >>> > > > >>>>> --
> >>> > > > >>>>> Martin Sivák
> >>> > > > >>>>> msivak(a)redhat.com
> >>> > > > >>>>> Red Hat Czech
> >>> > > > >>>>> RHEV-M SLA / Brno, CZ
> >>> > > > >>>>>
> >>> > > > >>>>> ----- Original Message -----
> >>> > > > >>>>>> On 04/15/2014 04:53 PM, Jiri
Moskovcak wrote:
> >>> > > > >>>>>>> On 04/14/2014 10:50 AM,
René Koch wrote:
> >>> > > > >>>>>>>> Hi,
> >>> > > > >>>>>>>>
> >>> > > > >>>>>>>> I have some issues with
hosted engine status.
> >>> > > > >>>>>>>>
> >>> > > > >>>>>>>> oVirt hosts think that
hosted engine is down because
it
> >>> seems
> >>> > > that
> >>> > > > >>>>>>>> hosts
> >>> > > > >>>>>>>> can't write to
hosted-engine.lockspace due to
glusterfs
> >>> issues
> >>> > > (or
> >>> > > > >>>>>>>> at
> >>> > > > >>>>>>>> least I think so).
> >>> > > > >>>>>>>>
> >>> > > > >>>>>>>> Here's the output
of vm-status:
> >>> > > > >>>>>>>>
> >>> > > > >>>>>>>> # hosted-engine
--vm-status
> >>> > > > >>>>>>>>
> >>> > > > >>>>>>>>
> >>> > > > >>>>>>>> --== Host 1 status
==--
> >>> > > > >>>>>>>>
> >>> > > > >>>>>>>> Status up-to-date
: False
> >>> > > > >>>>>>>> Hostname
: 10.0.200.102
> >>> > > > >>>>>>>> Host ID
: 1
> >>> > > > >>>>>>>> Engine status
: unknown
stale-data
> >>> > > > >>>>>>>> Score
: 2400
> >>> > > > >>>>>>>> Local maintenance
: False
> >>> > > > >>>>>>>> Host timestamp
: 1397035677
> >>> > > > >>>>>>>> Extra metadata (valid
at timestamp):
> >>> > > > >>>>>>>>
metadata_parse_version=1
> >>> > > > >>>>>>>>
metadata_feature_version=1
> >>> > > > >>>>>>>>
timestamp=1397035677 (Wed Apr 9 11:27:57
2014)
> >>> > > > >>>>>>>> host-id=1
> >>> > > > >>>>>>>> score=2400
> >>> > > > >>>>>>>>
maintenance=False
> >>> > > > >>>>>>>> state=EngineUp
> >>> > > > >>>>>>>>
> >>> > > > >>>>>>>>
> >>> > > > >>>>>>>> --== Host 2 status
==--
> >>> > > > >>>>>>>>
> >>> > > > >>>>>>>> Status up-to-date
: True
> >>> > > > >>>>>>>> Hostname
: 10.0.200.101
> >>> > > > >>>>>>>> Host ID
: 2
> >>> > > > >>>>>>>> Engine status
: {'reason': 'vm
not
> >>> running
> >>> > > on
> >>> > > > >>>>>>>> this
> >>> > > > >>>>>>>> host',
'health': 'bad', 'vm': 'down', 'detail':
'unknown'}
> >>> > > > >>>>>>>> Score
: 0
> >>> > > > >>>>>>>> Local maintenance
: False
> >>> > > > >>>>>>>> Host timestamp
: 1397464031
> >>> > > > >>>>>>>> Extra metadata (valid
at timestamp):
> >>> > > > >>>>>>>>
metadata_parse_version=1
> >>> > > > >>>>>>>>
metadata_feature_version=1
> >>> > > > >>>>>>>>
timestamp=1397464031 (Mon Apr 14 10:27:11
2014)
> >>> > > > >>>>>>>> host-id=2
> >>> > > > >>>>>>>> score=0
> >>> > > > >>>>>>>>
maintenance=False
> >>> > > > >>>>>>>>
state=EngineUnexpectedlyDown
> >>> > > > >>>>>>>> timeout=Mon Apr
14 10:35:05 2014
> >>> > > > >>>>>>>>
> >>> > > > >>>>>>>> oVirt engine is sending
me 2 emails every 10 minutes
with
> >>> the
> >>> > > > >>>>>>>> following
> >>> > > > >>>>>>>> subjects:
> >>> > > > >>>>>>>> - ovirt-hosted-engine
state transition
> >>> EngineDown-EngineStart
> >>> > > > >>>>>>>> - ovirt-hosted-engine
state transition
> >>> EngineStart-EngineUp
> >>> > > > >>>>>>>>
> >>> > > > >>>>>>>> In oVirt webadmin I can
see the following message:
> >>> > > > >>>>>>>> VM HostedEngine is
down. Exit message: internal error
> >>> Failed to
> >>> > > > >>>>>>>> acquire
> >>> > > > >>>>>>>> lock: error -243.
> >>> > > > >>>>>>>>
> >>> > > > >>>>>>>> These messages are
really annoying as oVirt isn't
doing
> >>> anything
> >>> > > > >>>>>>>> with
> >>> > > > >>>>>>>> hosted engine - I have
an uptime of 9 days in my
engine
> >>> vm.
> >>> > > > >>>>>>>>
> >>> > > > >>>>>>>> So my questions are
now:
> >>> > > > >>>>>>>> Is it intended to send
out these messages and detect
that
> >>> ovirt
> >>> > > > >>>>>>>> engine
> >>> > > > >>>>>>>> is down (which is false
anyway), but not to restart
the
> >>> vm?
> >>> > > > >>>>>>>>
> >>> > > > >>>>>>>> How can I disable
notifications? I'm planning to
write a
> >>> Nagios
> >>> > > > >>>>>>>> plugin
> >>> > > > >>>>>>>> which parses the output
of hosted-engine --vm-status
and
> >>> only
> >>> > > Nagios
> >>> > > > >>>>>>>> should notify me, not
hosted-engine script.
> >>> > > > >>>>>>>>
> >>> > > > >>>>>>>> Is is possible or
planned to make the whole ha feature
> >>> > > optional? I
> >>> > > > >>>>>>>> really really really
hate cluster software as it
causes
> >>> more
> >>> > > > >>>>>>>> troubles
> >>> > > > >>>>>>>> then standalone
machines and in my case the
hosted-engine
> >>> ha
> >>> > > feature
> >>> > > > >>>>>>>> really causes troubles
(and I didn't had a hardware or
> >>> network
> >>> > > > >>>>>>>> outage
> >>> > > > >>>>>>>> yet only issues with
hosted-engine ha agent). I don't
> >>> need any
> >>> > > ha
> >>> > > > >>>>>>>> feature for hosted
engine. I just want to run engine
> >>> > > virtualized on
> >>> > > > >>>>>>>> oVirt and if engine vm
fails (e.g. because of issues
with
> >>> a
> >>> > > host)
> >>> > > > >>>>>>>> I'll
> >>> > > > >>>>>>>> restart it on another
node.
> >>> > > > >>>>>>>
> >>> > > > >>>>>>> Hi, you can:
> >>> > > > >>>>>>> 1. edit
> >>> /etc/ovirt-hosted-engine-ha/{agent,broker}-log.conf and
> >>> > > tweak
> >>> > > > >>>>>>> the logger as you like
> >>> > > > >>>>>>> 2. or kill ovirt-ha-broker
& ovirt-ha-agent services
> >>> > > > >>>>>>
> >>> > > > >>>>>> Thanks for the information.
> >>> > > > >>>>>> So engine is able to run when
ovirt-ha-broker and
> >>> ovirt-ha-agent
> >>> > > isn't
> >>> > > > >>>>>> running?
> >>> > > > >>>>>>
> >>> > > > >>>>>>
> >>> > > > >>>>>> Regards,
> >>> > > > >>>>>> René
> >>> > > > >>>>>>
> >>> > > > >>>>>>>
> >>> > > > >>>>>>> --Jirka
> >>> > > > >>>>>>>>
> >>> > > > >>>>>>>> Thanks,
> >>> > > > >>>>>>>> René
> >>> > > > >>>>>>>>
> >>> > > > >>>>>>>>
> >>> > > > >>>>>>>
> >>> > > > >>>>>>
_______________________________________________
> >>> > > > >>>>>> Users mailing list
> >>> > > > >>>>>> Users(a)ovirt.org
> >>> > > > >>>>>>
http://lists.ovirt.org/mailman/listinfo/users
> >>> > > > >>>>>>
> >>> > > > >>>>
_______________________________________________
> >>> > > > >>>> Users mailing list
> >>> > > > >>>> Users(a)ovirt.org
> >>> > > > >>>>
http://lists.ovirt.org/mailman/listinfo/users
> >>> > > > >>>>
> >>> > > > >>
> >>> > > >
> >>> > > _______________________________________________
> >>> > > Users mailing list
> >>> > > Users(a)ovirt.org
> >>> > >
http://lists.ovirt.org/mailman/listinfo/users
> >>> > >
> >>> >
> >>> _______________________________________________
> >>> Users mailing list
> >>> Users(a)ovirt.org
> >>>
http://lists.ovirt.org/mailman/listinfo/users
> >>>
> >>
> >>
> >
>
_______________________________________________
Users mailing list
Users(a)ovirt.org
http://lists.ovirt.org/mailman/listinfo/users