Re: [ovirt-users] hosted engine health check issues

28 Apr 2014


      I'am on Centos 6.5 and this repo is for fedora...


2014-04-28 12:16 GMT+02:00 Kevin Tibi <kevintibi@hotmail.com>:
...
Hi,
qemu-kvm-0.12.1.2-2.415.el6_5.8.x86_64
libvirt-0.10.2-29.el6_5.7.x86_64
vdsm-4.14.6-0.el6.x86_64
kernel-2.6.32-431.el6.x86_64
kernel-2.6.32-431.11.2.el6.x86_64
i add this repop and try to update.
2014-04-28 11:57 GMT+02:00 Martin Sivak <msivak@redhat.com>:
Hi Kevin,
...
thanks for the information.
...
Agent.log and broker.log says nothing.
Can you please attach those files? I would like to see how the crashed
Qemu process is reported to us and what are the state machine trainsitions
that cause the load.
...
07:23:58,994::libvirtconnection::124::root::(wrapper) Unknown
libvirterror:
ecode: 84 edom: 10 level: 2 message: Operation not supported: live disk
snapshot not supported with this QEMU binary
What are the versions of vdsm, libvirt, qemu-kvm and kernel?
If you feel like it try updating virt packages from the virt-preview
repository:
http://fedoraproject.org/wiki/Virtualization_Preview_Repository
--
Martin Sivák
msivak@redhat.com
Red Hat Czech
RHEV-M SLA / Brno, CZ
...
Hi,
I use this version : ovirt-hosted-engine-ha-1.1.2-1.el6.noarch
For 3 days, my engine-ha worked perfectly but i tried to snapshot a Vm
and
ha service make defunct ==> 400% CPU !!
Agent.log and broker.log says nothing. But vdsm.log i have errors :
Thread-9462::DEBUG::2014-04-28
07:23:58,994::libvirtconnection::124::root::(wrapper) Unknown
----- Original Message -----
libvirterror:
...
ecode: 84 edom: 10 level: 2 message: Operation not supported: live disk
snapshot not supported with this QEMU binary
Thread-9462::ERROR::2014-04-28 07:23:58,995::vm::4006::vm.Vm::(snapshot)
vmId=`773f6e6d-c670-49f3-ae8c-dfbcfa22d0a5`::Unable to take snapshot
Thread-9352::DEBUG::2014-04-28
08:41:39,922::lvm::295::Storage.Misc.excCmd::(cmd) '/usr/bin/sudo -n
/sbin/lvm vgs --config " devices { preferred_names =
[\\"^/dev/mapper/\\"]
ignore_suspended_devices=1 write_cache_state=0
disable_after_error_count=3
obtain_device_list_from_udev=0 filter = [ \'r|.*|\' ] }  global {
 locking_type=1  prioritise_write_locks=1  wait_for_locks=1 }  backup {
 retain_min = 50  retain_days = 0 } " --noheadings --units b --nosuffix
--separator | -o
uuid,name,attr,size,free,extent_size,extent_count,free_count,tags,vg_mda_size,vg_mda_free,lv_count,pv_count,pv_name
...
cc51143e-8ad7-4b0b-a4d2-9024dffc1188
ff98d346-4515-4349-8437-fb2f5e9eaadf'
(cwd None)
I'll try to reboot my node with hosted-engine.
2014-04-25 13:54 GMT+02:00 Martin Sivak <msivak@redhat.com>:
...
Hi Kevin,
can you please tell us what version of hosted-engine are you running?
rpm -q ovirt-hosted-engine-ha
Also, do I understand it correctly that the engine VM is running, but
you
see bad status when you execute the hosted-engine --vm-status command?
If that is so, can you give us current logs from
/var/log/ovirt-hosted-engine-ha?
--
Martin Sivák
msivak@redhat.com
Red Hat Czech
RHEV-M SLA / Brno, CZ
----- Original Message -----
...
Ok i mount manualy the domain for hosted engine and agent go up.
But vm-status :
--== Host 2 status ==--
Status up-to-date                  : False
Hostname                           : 192.168.99.103
Host ID                            : 2
Engine status                      : unknown stale-data
Score                              : 0
Local maintenance                  : False
Host timestamp                     : 1398333438
And in my engine, host02 Ha is no active.
2014-04-24 12:48 GMT+02:00 Kevin Tibi <kevintibi@hotmail.com>:
...
Hi,
I try to reboot my hosts and now [supervdsmServer] is <defunct>.
/var/log/vdsm/supervdsm.log
MainProcess|Thread-120::DEBUG::2014-04-24
12:22:19,955::supervdsmServer::103::SuperVdsm.ServerCallback::(wrapper)
...
return validateAccess with None
MainProcess|Thread-120::DEBUG::2014-04-24
12:22:20,010::supervdsmServer::96::SuperVdsm.ServerCallback::(wrapper)
call
...
validateAccess with ('qemu', ('qemu', 'kvm'),
'/rhev/data-center/mnt/host01.ovirt.lan:_home_export', 5) {}
MainProcess|Thread-120::DEBUG::2014-04-24
12:22:20,014::supervdsmServer::103::SuperVdsm.ServerCallback::(wrapper)
...
return validateAccess with None
MainProcess|Thread-120::DEBUG::2014-04-24
12:22:20,059::supervdsmServer::96::SuperVdsm.ServerCallback::(wrapper)
call
...
validateAccess with ('qemu', ('qemu', 'kvm'),
'/rhev/data-center/mnt/host01.ovirt.lan:_home_iso', 5) {}
MainProcess|Thread-120::DEBUG::2014-04-24
12:22:20,063::supervdsmServer::103::SuperVdsm.ServerCallback::(wrapper)
...
return validateAccess with None
and one host don't mount the NFS used for hosted engine.
MainThread::CRITICAL::2014-04-24
12:36:16,603::agent::103::ovirt_hosted_engine_ha.agent.agent.Agent::(run)
...
...
...
Could not start ha-agent
Traceback (most recent call last):
  File
"/usr/lib/python2.6/site-packages/ovirt_hosted_engine_ha/agent/agent.py",
...
...
...
line 97, in run
    self._run_agent()
  File
"/usr/lib/python2.6/site-packages/ovirt_hosted_engine_ha/agent/agent.py",
...
...
...
line 154, in _run_agent
hosted_engine.HostedEngine(self.shutdown_requested).start_monitoring()
...
File
"/usr/lib/python2.6/site-packages/ovirt_hosted_engine_ha/agent/hosted_engine.py",
...
...
...
line 299, in start_monitoring
    self._initialize_vdsm()
  File
"/usr/lib/python2.6/site-packages/ovirt_hosted_engine_ha/agent/hosted_engine.py",
...
...
...
line 418, in _initialize_vdsm
    self._sd_path = env_path.get_domain_path(self._config)
  File
"/usr/lib/python2.6/site-packages/ovirt_hosted_engine_ha/env/path.py",
line
...
40, in get_domain_path
    .format(sd_uuid, parent))
Exception: path to storage domain
aea040f8-ab9d-435b-9ecf-ddd4272e592f
not
found in /rhev/data-center/mnt
2014-04-23 17:40 GMT+02:00 Kevin Tibi <kevintibi@hotmail.com>:
top
> 1729 vdsm      20   0     0    0    0 Z 373.8  0.0 252:08.51
> ovirt-ha-broker <defunct>
>
>
> [root@host01 ~]# ps axwu | grep 1729
> vdsm      1729  0.7  0.0      0     0 ?        Zl   Apr02 240:24
> [ovirt-ha-broker] <defunct>
>
> [root@host01 ~]# ll
>
/rhev/data-center/mnt/host01.ovirt.lan\:_home_NFS01/aea040f8-ab9d-435b-9ecf-ddd4272e592f/ha_agent/
...
...
...
> total 2028
> -rw-rw----. 1 vdsm kvm 1048576 23 avril 17:35
hosted-engine.lockspace
> -rw-rw----. 1 vdsm kvm 1028096 23 avril 17:35
hosted-engine.metadata
>
> cat /var/log/vdsm/vdsm.log
>
> Thread-120518::DEBUG::2014-04-23
> 17:38:02,299::task::1185::TaskManager.Task::(prepare)
> Task=`f13e71f1-ac7c-49ab-8079-8f099ebf72b6`::finished:
> {'aea040f8-ab9d-435b-9ecf-ddd4272e592f': {'code': 0, 'version':
3,
> 'acquired': True, 'delay': '0.000410963', 'lastCheck': '3.4',
'valid':
> True}, '5ae613a4-44e4-42cb-89fc-7b5d34c1f30f': {'code': 0,
'version':
3,
> 'acquired': True, 'delay': '0.000412357', 'lastCheck': '6.8',
'valid':
> True}, 'cc51143e-8ad7-4b0b-a4d2-9024dffc1188': {'code': 0,
'version':
0,
> 'acquired': True, 'delay': '0.000455292', 'lastCheck': '1.2',
'valid':
> True}, 'ff98d346-4515-4349-8437-fb2f5e9eaadf': {'code': 0,
'version':
0,
> 'acquired': True, 'delay': '0.00817113', 'lastCheck': '1.7',
'valid':
> True}}
> Thread-120518::DEBUG::2014-04-23
> 17:38:02,300::task::595::TaskManager.Task::(_updateState)
> Task=`f13e71f1-ac7c-49ab-8079-8f099ebf72b6`::moving from state
preparing
> ->
> state finished
> Thread-120518::DEBUG::2014-04-23
>
17:38:02,300::resourceManager::940::ResourceManager.Owner::(releaseAll)
...
...
...
> Owner.releaseAll requests {} resources {}
> Thread-120518::DEBUG::2014-04-23
>
17:38:02,300::resourceManager::977::ResourceManager.Owner::(cancelAll)
> Owner.cancelAll requests {}
> Thread-120518::DEBUG::2014-04-23
> 17:38:02,300::task::990::TaskManager.Task::(_decref)
> Task=`f13e71f1-ac7c-49ab-8079-8f099ebf72b6`::ref 0 aborting False
> Thread-120518::ERROR::2014-04-23
>
17:38:02,302::brokerlink::72::ovirt_hosted_engine_ha.lib.brokerlink.BrokerLink::(connect)
...
...
...
> Failed to connect to broker: [Errno 2] No such file or directory
> Thread-120518::ERROR::2014-04-23
> 17:38:02,302::API::1612::vds::(_getHaInfo) failed to retrieve
Hosted
> Engine
> HA info
>  Traceback (most recent call last):
>   File "/usr/share/vdsm/API.py", line 1603, in _getHaInfo
>     stats = instance.get_all_stats()
>   File
>
"/usr/lib/python2.6/site-packages/ovirt_hosted_engine_ha/client/client.py",
...
...
...
> line 83, in get_all_stats
>     with broker.connection():
>   File "/usr/lib64/python2.6/contextlib.py", line 16, in
__enter__
>     return self.gen.next()
>   File
>
"/usr/lib/python2.6/site-packages/ovirt_hosted_engine_ha/lib/brokerlink.py",
...
...
...
> line 96, in connection
>     self.connect()
>   File
>
"/usr/lib/python2.6/site-packages/ovirt_hosted_engine_ha/lib/brokerlink.py",
...
...
...
> line 64, in connect
>     self._socket.connect(constants.BROKER_SOCKET_FILE)
>   File "<string>", line 1, in connect
> error: [Errno 2] No such file or directory
> Thread-78::DEBUG::2014-04-23
> 17:38:05,490::fileSD::225::Storage.Misc.excCmd::(getReadDelay)
'/bin/dd
> iflag=direct
>
if=/rhev/data-center/mnt/host01.ovirt.lan:_home_DATA/5ae613a4-44e4-42cb-89fc-7b5d34c1f30f/dom_md/metadata
...
...
...
> bs=4096 count=1' (cwd None)
> Thread-78::DEBUG::2014-04-23
> 17:38:05,523::fileSD::225::Storage.Misc.excCmd::(getReadDelay)
SUCCESS:
> <err> = '0+1 records in\n0+1 records out\n545 bytes (545 B)
copied,
> 0.000412209 s, 1.3 MB/s\n'; <rc> = 0
>
>
>
>
> 2014-04-23 17:27 GMT+02:00 Martin Sivak <msivak@redhat.com>:
>
> Hi Kevin,
>>
>> > same pb.
>>
>> Are you missing the lockspace file as well while running on top
of
>> GlusterFS?
>>
>> > ovirt-ha-broker have 400% cpu and is defunct. I can't kill
with -9.
>>
>> Defunct process eating full four cores? I wonder how is that
possible..
>> What are the status flags of that process when you do ps axwu?
>>
>> Can you attach the log files please?
>>
>> --
>> Martin Sivák
>> msivak@redhat.com
>> Red Hat Czech
>> RHEV-M SLA / Brno, CZ
>>
>> ----- Original Message -----
>> > same pb. ovirt-ha-broker have 400% cpu and is defunct. I
can't kill
>> with -9.
>> >
>> >
>> > 2014-04-23 13:55 GMT+02:00 Martin Sivak <msivak@redhat.com>:
>> >
>> > > Hi,
>> > >
>> > > > Isn't this file created when hosted engine is started?
>> > >
>> > > The file is created by the setup script. If it got lost then
there
>> was
>> > > probably something bad happening in your NFS or Gluster
storage.
>> > >
>> > > > Or how can I create this file manually?
>> > >
>> > > I can give you experimental treatment for this. We do not
have
any
>> > > official way as this is something that should not ever
happen :)
>> > >
>> > > !! But before you do that make sure you do not have any
nodes
running
>> > > properly. This will destroy and reinitialize the lockspace
database
>> for the
>> > > whole hosted-engine environment (which you apparently lack,
but..).
>> !!
>> > >
>> > > You have to create the ha_agent/hosted-engine.lockspace file
with the
>> > > expected size (1MB) and then tell sanlock to initialize it
as a
>> lockspace
>> > > using:
>> > >
>> > > # python
>> > > >>> import sanlock
>> > > >>> sanlock.write_lockspace(lockspace="hosted-engine",
>> > > ... path="/rhev/data-center/mnt/<nfs>/<hosted engine storage
>> > > domain>/ha_agent/hosted-engine.lockspace",
>> > > ... offset=0)
>> > > >>>
>> > >
>> > > Then try starting the services (both broker and agent)
again.
>> > >
>> > > --
>> > > Martin Sivák
>> > > msivak@redhat.com
>> > > Red Hat Czech
>> > > RHEV-M SLA / Brno, CZ
>> > >
>> > >
>> > > ----- Original Message -----
>> > > > On 04/23/2014 11:08 AM, Martin Sivak wrote:
>> > > > > Hi René,
>> > > > >
>> > > > >>>> libvirtError: Failed to acquire lock: No space left
on
device
>> > > > >
>> > > > >>>> 2014-04-22 12:38:17+0200 654 [3093]: r2 cmd_acquire
2,9,5733
>> invalid
>> > > > >>>> lockspace found -1 failed 0 name
>> > > 2851af27-8744-445d-9fb1-a0d083c8dc82
>> > > > >
>> > > > > Can you please check the contents of
/rhev/data-center/<your
nfs
>> > > > > mount>/<nfs domain uuid>/ha_agent/?
>> > > > >
>> > > > > This is how it should look like:
>> > > > >
>> > > > > [root@dev-03 ~]# ls -al
>> > > > >
>> > >
>>
/rhev/data-center/mnt/euryale\:_home_ovirt_he/e16de6a2-53f5-4ab3-95a3-255d08398824/ha_agent/
...


...
...
>> > > > >>>>>>>>         host-id=1
>> > > > >>>>>>>>         score=2400
>> > > > >>>>>>>>         maintenance=False
>> > > > >>>>>>>>         state=EngineUp
>> > > > >>>>>>>>
>> > > > >>>>>>>>
>> > > > >>>>>>>> --== Host 2 status ==--
>> > > > >>>>>>>>
>> > > > >>>>>>>> Status up-to-date                  : True
>> > > > >>>>>>>> Hostname                           : 10.0.200.101
>> > > > >>>>>>>> Host ID                            : 2
>> > > > >>>>>>>> Engine status                      : {'reason':
'vm
not
>> running
>> > > on
>> > > > >>>>>>>> this
>> > > > >>>>>>>> host', 'health': 'bad', 'vm': 'down', 'detail':
'unknown'}
>> > > > >>>>>>>> Score                              : 0
>> > > > >>>>>>>> Local maintenance                  : False
>> > > > >>>>>>>> Host timestamp                     : 1397464031
>> > > > >>>>>>>> Extra metadata (valid at timestamp):
>> > > > >>>>>>>>         metadata_parse_version=1
>> > > > >>>>>>>>         metadata_feature_version=1
>> > > > >>>>>>>>         timestamp=1397464031 (Mon Apr 14 10:27:11
...
...
>> > > > > total 2036
>> > > > > drwxr-x---. 2 vdsm kvm    4096 Mar 19 18:46 .
>> > > > > drwxr-xr-x. 6 vdsm kvm    4096 Mar 19 18:46 ..
>> > > > > -rw-rw----. 1 vdsm kvm 1048576 Apr 23 11:05
>> hosted-engine.lockspace
>> > > > > -rw-rw----. 1 vdsm kvm 1028096 Mar 19 18:46
>> hosted-engine.metadata
>> > > > >
>> > > > > The errors seem to indicate that you somehow lost the
lockspace
>> file.
>> > > >
>> > > > True :)
>> > > > Isn't this file created when hosted engine is started? Or
how
can I
>> > > > create this file manually?
>> > > >
>> > > > >
>> > > > > --
>> > > > > Martin Sivák
>> > > > > msivak@redhat.com
>> > > > > Red Hat Czech
>> > > > > RHEV-M SLA / Brno, CZ
>> > > > >
>> > > > > ----- Original Message -----
>> > > > >> On 04/23/2014 12:28 AM, Doron Fediuck wrote:
>> > > > >>> Hi Rene,
>> > > > >>> any idea what closed your ovirtmgmt bridge?
>> > > > >>> as long as it is down vdsm may have issues starting up
properly
>> > > > >>> and this is why you see the complaints on the rpc
server.
>> > > > >>>
>> > > > >>> Can you try manually fixing the network part first
and then
>> > > > >>> restart vdsm?
>> > > > >>> Once vdsm is happy hosted engine VM will start.
>> > > > >>
>> > > > >> Thanks for your feedback, Doron.
>> > > > >>
>> > > > >> My ovirtmgmt bridge seems to be on or isn't it:
>> > > > >> # brctl show ovirtmgmt
>> > > > >> bridge name        bridge id               STP enabled
>> interfaces
>> > > > >> ovirtmgmt          8000.0025907587c2       no
>>  eth0.200
>> > > > >>
>> > > > >> # ip a s ovirtmgmt
>> > > > >> 7: ovirtmgmt: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu
1500
qdisc
>> noqueue
>> > > > >> state UNKNOWN
>> > > > >>       link/ether 00:25:90:75:87:c2 brd
ff:ff:ff:ff:ff:ff
>> > > > >>       inet 10.0.200.102/24 brd 10.0.200.255 scope
global
>> ovirtmgmt
>> > > > >>       inet6 fe80::225:90ff:fe75:87c2/64 scope link
>> > > > >>          valid_lft forever preferred_lft forever
>> > > > >>
>> > > > >> # ip a s eth0.200
>> > > > >> 6: eth0.200@eth0: <BROADCAST,MULTICAST,UP,LOWER_UP>
mtu
1500
>> qdisc
>> > > > >> noqueue state UP
>> > > > >>       link/ether 00:25:90:75:87:c2 brd
ff:ff:ff:ff:ff:ff
>> > > > >>       inet6 fe80::225:90ff:fe75:87c2/64 scope link
>> > > > >>          valid_lft forever preferred_lft forever
>> > > > >>
>> > > > >> I tried the following yesterday:
>> > > > >> Copy virtual disk from GlusterFS storage to local disk
of
host
>> and
>> > > > >> create a new vm with virt-manager which loads ovirtmgmt
disk. I
>> could
>> > > > >> reach my engine over the ovirtmgmt bridge (so bridge
must be
>> working).
>> > > > >>
>> > > > >> I also started libvirtd with Option -v and I saw the
following
>> in
>> > > > >> libvirtd.log when trying to start ovirt engine:
>> > > > >> 2014-04-22 14:18:25.432+0000: 8901: debug :
>> virCommandRunAsync:2250 :
>> > > > >> Command result 0, with PID 11491
>> > > > >> 2014-04-22 14:18:25.478+0000: 8901: debug :
virCommandRun:2045 :
>> > > Result
>> > > > >> exit status 255, stdout: '' stderr: 'iptables v1.4.7:
goto
>> 'FO-vnet0'
>> > > is
>> > > > >> not a chain
>> > > > >>
>> > > > >> So it could be that something is broken in my
hosted-engine
>> network.
>> > > Do
>> > > > >> you have any clue how I can troubleshoot this?
>> > > > >>
>> > > > >>
>> > > > >> Thanks,
>> > > > >> René
>> > > > >>
>> > > > >>
>> > > > >>>
>> > > > >>> ----- Original Message -----
>> > > > >>>> From: "René Koch" <rkoch@linuxland.at>
>> > > > >>>> To: "Martin Sivak" <msivak@redhat.com>
>> > > > >>>> Cc: users@ovirt.org
>> > > > >>>> Sent: Tuesday, April 22, 2014 1:46:38 PM
>> > > > >>>> Subject: Re: [ovirt-users] hosted engine health check
issues
>> > > > >>>>
>> > > > >>>> Hi,
>> > > > >>>>
>> > > > >>>> I rebooted one of my ovirt hosts today and the
result is
now
>> that I
>> > > > >>>> can't start hosted-engine anymore.
>> > > > >>>>
>> > > > >>>> ovirt-ha-agent isn't running because the lockspace
file is
>> missing
>> > > > >>>> (sanlock complains about it).
>> > > > >>>> So I tried to start hosted-engine with --vm-start
and I
get
>> the
>> > > > >>>> following errors:
>> > > > >>>>
>> > > > >>>> ==> /var/log/sanlock.log <==
>> > > > >>>> 2014-04-22 12:38:17+0200 654 [3093]: r2 cmd_acquire
2,9,5733
>> invalid
>> > > > >>>> lockspace found -1 failed 0 name
>> > > 2851af27-8744-445d-9fb1-a0d083c8dc82
>> > > > >>>>
>> > > > >>>> ==> /var/log/messages <==
>> > > > >>>> Apr 22 12:38:17 ovirt-host02 sanlock[3079]:
2014-04-22
>> > > 12:38:17+0200 654
>> > > > >>>> [3093]: r2 cmd_acquire 2,9,5733 invalid lockspace
found -1
>> failed 0
>> > > name
>> > > > >>>> 2851af27-8744-445d-9fb1-a0d083c8dc82
>> > > > >>>> Apr 22 12:38:17 ovirt-host02 kernel: ovirtmgmt: port
2(vnet0)
>> > > entering
>> > > > >>>> disabled state
>> > > > >>>> Apr 22 12:38:17 ovirt-host02 kernel: device vnet0
left
>> promiscuous
>> > > mode
>> > > > >>>> Apr 22 12:38:17 ovirt-host02 kernel: ovirtmgmt: port
2(vnet0)
>> > > entering
>> > > > >>>> disabled state
>> > > > >>>>
>> > > > >>>> ==> /var/log/vdsm/vdsm.log <==
>> > > > >>>> Thread-21::DEBUG::2014-04-22
>> > > > >>>> 12:38:17,563::libvirtconnection::124::root::(wrapper)
Unknown
>> > > > >>>> libvirterror: ecode: 38 edom: 42 level: 2 message:
Failed
to
>> acquire
>> > > > >>>> lock: No space left on device
>> > > > >>>> Thread-21::DEBUG::2014-04-22
>> > > > >>>> 12:38:17,563::vm::2263::vm.Vm::(_startUnderlyingVm)
>> > > > >>>>
vmId=`f26dd37e-13b5-430c-b2f2-ecd098b82a91`::_ongoingCreations
>> > > released
>> > > > >>>> Thread-21::ERROR::2014-04-22
>> > > > >>>> 12:38:17,564::vm::2289::vm.Vm::(_startUnderlyingVm)
>> > > > >>>> vmId=`f26dd37e-13b5-430c-b2f2-ecd098b82a91`::The vm
start
>> process
>> > > failed
>> > > > >>>> Traceback (most recent call last):
>> > > > >>>>      File "/usr/share/vdsm/vm.py", line 2249, in
>> _startUnderlyingVm
>> > > > >>>>        self._run()
>> > > > >>>>      File "/usr/share/vdsm/vm.py", line 3170, in _run
>> > > > >>>>        self._connection.createXML(domxml, flags),
>> > > > >>>>      File
>> > > > >>>>
>>  "/usr/lib64/python2.6/site-packages/vdsm/libvirtconnection.py",
>> > > > >>>> line 92, in wrapper
>> > > > >>>>        ret = f(*args, **kwargs)
>> > > > >>>>      File
"/usr/lib64/python2.6/site-packages/libvirt.py",
>> line
>> > > 2665, in
>> > > > >>>> createXML
>> > > > >>>>        if ret is None:raise
libvirtError('virDomainCreateXML()
>> > > failed',
>> > > > >>>> conn=self)
>> > > > >>>> libvirtError: Failed to acquire lock: No space left
on
device
>> > > > >>>>
>> > > > >>>> ==> /var/log/messages <==
>> > > > >>>> Apr 22 12:38:17 ovirt-host02 vdsm vm.Vm ERROR
>> > > > >>>> vmId=`f26dd37e-13b5-430c-b2f2-ecd098b82a91`::The vm
start
>> process
>> > > > >>>> failed#012Traceback (most recent call last):#012
 File
>> > > > >>>> "/usr/share/vdsm/vm.py", line 2249, in
_startUnderlyingVm#012
>> > > > >>>> self._run()#012  File "/usr/share/vdsm/vm.py", line
3170,
in
>> > > _run#012
>> > > > >>>>     self._connection.createXML(domxml, flags),#012
 File
>> > > > >>>>
>> "/usr/lib64/python2.6/site-packages/vdsm/libvirtconnection.py",
>> > > line 92,
>> > > > >>>> in wrapper#012    ret = f(*args, **kwargs)#012  File
>> > > > >>>> "/usr/lib64/python2.6/site-packages/libvirt.py", line
2665, in
>> > > > >>>> createXML#012    if ret is None:raise
>> > > libvirtError('virDomainCreateXML()
>> > > > >>>> failed', conn=self)#012libvirtError: Failed to
acquire
lock:
>> No
>> > > space
>> > > > >>>> left on device
>> > > > >>>>
>> > > > >>>> ==> /var/log/vdsm/vdsm.log <==
>> > > > >>>> Thread-21::DEBUG::2014-04-22
>> > > > >>>> 12:38:17,569::vm::2731::vm.Vm::(setDownStatus)
>> > > > >>>> vmId=`f26dd37e-13b5-430c-b2f2-ecd098b82a91`::Changed
state to
>> Down:
>> > > > >>>> Failed to acquire lock: No space left on device
>> > > > >>>>
>> > > > >>>>
>> > > > >>>> No space left on device is nonsense as there is
enough
space
>> (I had
>> > > this
>> > > > >>>> issue last time as well where I had to patch
machine.py,
but
>> this
>> > > file
>> > > > >>>> is now Python 2.6.6 compatible.
>> > > > >>>>
>> > > > >>>> Any idea what prevents hosted-engine from starting?
>> > > > >>>> ovirt-ha-broker, vdsmd and sanlock are running btw.
>> > > > >>>>
>> > > > >>>> Btw, I can see in log that json rpc server module is
missing
>> - which
>> > > > >>>> package is required for CentOS 6.5?
>> > > > >>>> Apr 22 12:37:14 ovirt-host02 vdsm vds WARNING Unable
to
load
>> the
>> > > json
>> > > > >>>> rpc server module. Please make sure it is installed.
>> > > > >>>>
>> > > > >>>>
>> > > > >>>> Thanks,
>> > > > >>>> René
>> > > > >>>>
>> > > > >>>>
>> > > > >>>>
>> > > > >>>> On 04/17/2014 10:02 AM, Martin Sivak wrote:
>> > > > >>>>> Hi,
>> > > > >>>>>
>> > > > >>>>>>>> How can I disable notifications?
>> > > > >>>>>
>> > > > >>>>> The notification is configured in
>> > > > >>>>> /etc/ovirt-hosted-engine-ha/broker.conf
>> > > > >>>>> section notification.
>> > > > >>>>> The email is sent when the key state_transition
exists
and
>> the
>> > > string
>> > > > >>>>> OldState-NewState contains the (case insensitive)
regexp
>> from the
>> > > > >>>>> value.
>> > > > >>>>>
>> > > > >>>>>>>> Is it intended to send out these messages and
detect
that
>> ovirt
>> > > > >>>>>>>> engine
>> > > > >>>>>>>> is down (which is false anyway), but not to
restart
the
>> vm?
>> > > > >>>>>
>> > > > >>>>> Forget about emails for now and check the
>> > > > >>>>> /var/log/ovirt-hosted-engine-ha/agent.log and
broker.log
(and
>> > > attach
>> > > > >>>>> them
>> > > > >>>>> as well btw).
>> > > > >>>>>
>> > > > >>>>>>>> oVirt hosts think that hosted engine is down
because
it
>> seems
>> > > that
>> > > > >>>>>>>> hosts
>> > > > >>>>>>>> can't write to hosted-engine.lockspace due to
glusterfs
>> issues
>> > > (or
>> > > > >>>>>>>> at
>> > > > >>>>>>>> least I think so).
>> > > > >>>>>
>> > > > >>>>> The hosts think so or can't really write there? The
>> lockspace is
>> > > > >>>>> managed
>> > > > >>>>> by
>> > > > >>>>> sanlock and our HA daemons do not touch it at all.
We
only
>> ask
>> > > sanlock
>> > > > >>>>> to
>> > > > >>>>> get make sure we have unique server id.
>> > > > >>>>>
>> > > > >>>>>>>> Is is possible or planned to make the whole ha
feature
>> optional?
>> > > > >>>>>
>> > > > >>>>> Well the system won't perform any automatic actions
if
you
>> put the
>> > > > >>>>> hosted
>> > > > >>>>> engine to global maintenance and only
start/stop/migrate
the
>> VM
>> > > > >>>>> manually.
>> > > > >>>>> I would discourage you from stopping agent/broker,
because
>> the
>> > > engine
>> > > > >>>>> itself has some logic based on the reporting.
>> > > > >>>>>
>> > > > >>>>> Regards
>> > > > >>>>>
>> > > > >>>>> --
>> > > > >>>>> Martin Sivák
>> > > > >>>>> msivak@redhat.com
>> > > > >>>>> Red Hat Czech
>> > > > >>>>> RHEV-M SLA / Brno, CZ
>> > > > >>>>>
>> > > > >>>>> ----- Original Message -----
>> > > > >>>>>> On 04/15/2014 04:53 PM, Jiri Moskovcak wrote:
>> > > > >>>>>>> On 04/14/2014 10:50 AM, René Koch wrote:
>> > > > >>>>>>>> Hi,
>> > > > >>>>>>>>
>> > > > >>>>>>>> I have some issues with hosted engine status.
>> > > > >>>>>>>>
>> > > > >>>>>>>> oVirt hosts think that hosted engine is down
because
it
>> seems
>> > > that
>> > > > >>>>>>>> hosts
>> > > > >>>>>>>> can't write to hosted-engine.lockspace due to
glusterfs
>> issues
>> > > (or
>> > > > >>>>>>>> at
>> > > > >>>>>>>> least I think so).
>> > > > >>>>>>>>
>> > > > >>>>>>>> Here's the output of vm-status:
>> > > > >>>>>>>>
>> > > > >>>>>>>> # hosted-engine --vm-status
>> > > > >>>>>>>>
>> > > > >>>>>>>>
>> > > > >>>>>>>> --== Host 1 status ==--
>> > > > >>>>>>>>
>> > > > >>>>>>>> Status up-to-date                  : False
>> > > > >>>>>>>> Hostname                           : 10.0.200.102
>> > > > >>>>>>>> Host ID                            : 1
>> > > > >>>>>>>> Engine status                      : unknown
stale-data
>> > > > >>>>>>>> Score                              : 2400
>> > > > >>>>>>>> Local maintenance                  : False
>> > > > >>>>>>>> Host timestamp                     : 1397035677
>> > > > >>>>>>>> Extra metadata (valid at timestamp):
>> > > > >>>>>>>>         metadata_parse_version=1
>> > > > >>>>>>>>         metadata_feature_version=1
>> > > > >>>>>>>>         timestamp=1397035677 (Wed Apr  9 11:27:57
...
...
>> > > > >>>>>>>>         host-id=2
>> > > > >>>>>>>>         score=0
>> > > > >>>>>>>>         maintenance=False
>> > > > >>>>>>>>         state=EngineUnexpectedlyDown
>> > > > >>>>>>>>         timeout=Mon Apr 14 10:35:05 2014
>> > > > >>>>>>>>
>> > > > >>>>>>>> oVirt engine is sending me 2 emails every 10
minutes
with
>> the
>> > > > >>>>>>>> following
>> > > > >>>>>>>> subjects:
>> > > > >>>>>>>> - ovirt-hosted-engine state transition
>> EngineDown-EngineStart
>> > > > >>>>>>>> - ovirt-hosted-engine state transition
>> EngineStart-EngineUp
>> > > > >>>>>>>>
>> > > > >>>>>>>> In oVirt webadmin I can see the following
message:
>> > > > >>>>>>>> VM HostedEngine is down. Exit message: internal
error
>> Failed to
>> > > > >>>>>>>> acquire
>> > > > >>>>>>>> lock: error -243.
>> > > > >>>>>>>>
>> > > > >>>>>>>> These messages are really annoying as oVirt isn't
doing
>> anything
>> > > > >>>>>>>> with
>> > > > >>>>>>>> hosted engine - I have an uptime of 9 days in my
engine
>> vm.
>> > > > >>>>>>>>
>> > > > >>>>>>>> So my questions are now:
>> > > > >>>>>>>> Is it intended to send out these messages and
detect
that
>> ovirt
>> > > > >>>>>>>> engine
>> > > > >>>>>>>> is down (which is false anyway), but not to
restart
the
>> vm?
>> > > > >>>>>>>>
>> > > > >>>>>>>> How can I disable notifications? I'm planning to
write a
>> Nagios
>> > > > >>>>>>>> plugin
>> > > > >>>>>>>> which parses the output of hosted-engine
--vm-status
and
>> only
>> > > Nagios
>> > > > >>>>>>>> should notify me, not hosted-engine script.
>> > > > >>>>>>>>
>> > > > >>>>>>>> Is is possible or planned to make the whole ha
feature
>> > > optional? I
>> > > > >>>>>>>> really really really hate cluster software as it
causes
>> more
>> > > > >>>>>>>> troubles
>> > > > >>>>>>>> then standalone machines and in my case the
hosted-engine
>> ha
>> > > feature
>> > > > >>>>>>>> really causes troubles (and I didn't had a
hardware or
>> network
>> > > > >>>>>>>> outage
>> > > > >>>>>>>> yet only issues with hosted-engine ha agent). I
don't
>> need any
>> > > ha
>> > > > >>>>>>>> feature for hosted engine. I just want to run
engine
>> > > virtualized on
>> > > > >>>>>>>> oVirt and if engine vm fails (e.g. because of
issues
with
>> a
>> > > host)
>> > > > >>>>>>>> I'll
>> > > > >>>>>>>> restart it on another node.
>> > > > >>>>>>>
>> > > > >>>>>>> Hi, you can:
>> > > > >>>>>>> 1. edit
>> /etc/ovirt-hosted-engine-ha/{agent,broker}-log.conf and
>> > > tweak
>> > > > >>>>>>> the logger as you like
>> > > > >>>>>>> 2. or kill ovirt-ha-broker & ovirt-ha-agent
services
>> > > > >>>>>>
>> > > > >>>>>> Thanks for the information.
>> > > > >>>>>> So engine is able to run when ovirt-ha-broker and
>> ovirt-ha-agent
>> > > isn't
>> > > > >>>>>> running?
>> > > > >>>>>>
>> > > > >>>>>>
>> > > > >>>>>> Regards,
>> > > > >>>>>> René
>> > > > >>>>>>
>> > > > >>>>>>>
>> > > > >>>>>>> --Jirka
>> > > > >>>>>>>>
>> > > > >>>>>>>> Thanks,
>> > > > >>>>>>>> René
>> > > > >>>>>>>>
>> > > > >>>>>>>>
>> > > > >>>>>>>
>> > > > >>>>>> _______________________________________________
>> > > > >>>>>> Users mailing list
>> > > > >>>>>> Users@ovirt.org
>> > > > >>>>>> http://lists.ovirt.org/mailman/listinfo/users
>> > > > >>>>>>
>> > > > >>>> _______________________________________________
>> > > > >>>> Users mailing list
>> > > > >>>> Users@ovirt.org
>> > > > >>>> http://lists.ovirt.org/mailman/listinfo/users
>> > > > >>>>
>> > > > >>
>> > > >
>> > > _______________________________________________
>> > > Users mailing list
>> > > Users@ovirt.org
>> > > http://lists.ovirt.org/mailman/listinfo/users
>> > >
>> >
>> _______________________________________________
>> Users mailing list
>> Users@ovirt.org
>> http://lists.ovirt.org/mailman/listinfo/users
>>
>
>
_______________________________________________
Users mailing list
Users@ovirt.org
http://lists.ovirt.org/mailman/listinfo/users
_______________________________________________
Users mailing list
Users@ovirt.org
http://lists.ovirt.org/mailman/listinfo/users