I'am on Centos 6.5 and this repo is for fedora...
2014-04-28 12:16 GMT+02:00 Kevin Tibi <kevintibi(a)hotmail.com>:
Hi,
qemu-kvm-0.12.1.2-2.415.el6_5.8.x86_64
libvirt-0.10.2-29.el6_5.7.x86_64
vdsm-4.14.6-0.el6.x86_64
kernel-2.6.32-431.el6.x86_64
kernel-2.6.32-431.11.2.el6.x86_64
i add this repop and try to update.
2014-04-28 11:57 GMT+02:00 Martin Sivak <msivak(a)redhat.com>:
Hi Kevin,
>
> thanks for the information.
>
> > Agent.log and broker.log says nothing.
>
> Can you please attach those files? I would like to see how the crashed
> Qemu process is reported to us and what are the state machine trainsitions
> that cause the load.
>
> > 07:23:58,994::libvirtconnection::124::root::(wrapper) Unknown
> libvirterror:
> > ecode: 84 edom: 10 level: 2 message: Operation not supported: live disk
> > snapshot not supported with this QEMU binary
>
> What are the versions of vdsm, libvirt, qemu-kvm and kernel?
>
> If you feel like it try updating virt packages from the virt-preview
> repository:
>
http://fedoraproject.org/wiki/Virtualization_Preview_Repository
>
> --
> Martin Sivák
> msivak(a)redhat.com
> Red Hat Czech
> RHEV-M SLA / Brno, CZ
>
> ----- Original Message -----
> > Hi,
> >
> > I use this version : ovirt-hosted-engine-ha-1.1.2-1.el6.noarch
> >
> > For 3 days, my engine-ha worked perfectly but i tried to snapshot a Vm
> and
> > ha service make defunct ==> 400% CPU !!
> >
> > Agent.log and broker.log says nothing. But vdsm.log i have errors :
> >
> > Thread-9462::DEBUG::2014-04-28
> > 07:23:58,994::libvirtconnection::124::root::(wrapper) Unknown
> libvirterror:
> > ecode: 84 edom: 10 level: 2 message: Operation not supported: live disk
> > snapshot not supported with this QEMU binary
> >
> > Thread-9462::ERROR::2014-04-28 07:23:58,995::vm::4006::vm.Vm::(snapshot)
> > vmId=`773f6e6d-c670-49f3-ae8c-dfbcfa22d0a5`::Unable to take snapshot
> >
> >
> > Thread-9352::DEBUG::2014-04-28
> > 08:41:39,922::lvm::295::Storage.Misc.excCmd::(cmd) '/usr/bin/sudo -n
> > /sbin/lvm vgs --config " devices { preferred_names =
> [\\"^/dev/mapper/\\"]
> > ignore_suspended_devices=1 write_cache_state=0
> disable_after_error_count=3
> > obtain_device_list_from_udev=0 filter = [ \'r|.*|\' ] } global {
> > locking_type=1 prioritise_write_locks=1 wait_for_locks=1 } backup {
> > retain_min = 50 retain_days = 0 } " --noheadings --units b --nosuffix
> > --separator | -o
> >
>
uuid,name,attr,size,free,extent_size,extent_count,free_count,tags,vg_mda_size,vg_mda_free,lv_count,pv_count,pv_name
> > cc51143e-8ad7-4b0b-a4d2-9024dffc1188
> ff98d346-4515-4349-8437-fb2f5e9eaadf'
> > (cwd None)
> >
> > I'll try to reboot my node with hosted-engine.
> >
> >
> >
> > 2014-04-25 13:54 GMT+02:00 Martin Sivak <msivak(a)redhat.com>:
> >
> > > Hi Kevin,
> > >
> > > can you please tell us what version of hosted-engine are you running?
> > >
> > > rpm -q ovirt-hosted-engine-ha
> > >
> > > Also, do I understand it correctly that the engine VM is running, but
> you
> > > see bad status when you execute the hosted-engine --vm-status command?
> > >
> > > If that is so, can you give us current logs from
> > > /var/log/ovirt-hosted-engine-ha?
> > >
> > > --
> > > Martin Sivák
> > > msivak(a)redhat.com
> > > Red Hat Czech
> > > RHEV-M SLA / Brno, CZ
> > >
> > > ----- Original Message -----
> > > > Ok i mount manualy the domain for hosted engine and agent go up.
> > > >
> > > > But vm-status :
> > > >
> > > > --== Host 2 status ==--
> > > >
> > > > Status up-to-date : False
> > > > Hostname : 192.168.99.103
> > > > Host ID : 2
> > > > Engine status : unknown stale-data
> > > > Score : 0
> > > > Local maintenance : False
> > > > Host timestamp : 1398333438
> > > >
> > > > And in my engine, host02 Ha is no active.
> > > >
> > > >
> > > > 2014-04-24 12:48 GMT+02:00 Kevin Tibi <kevintibi(a)hotmail.com>:
> > > >
> > > > > Hi,
> > > > >
> > > > > I try to reboot my hosts and now [supervdsmServer] is
<defunct>.
> > > > >
> > > > > /var/log/vdsm/supervdsm.log
> > > > >
> > > > >
> > > > > MainProcess|Thread-120::DEBUG::2014-04-24
> > > > >
> 12:22:19,955::supervdsmServer::103::SuperVdsm.ServerCallback::(wrapper)
> > > > > return validateAccess with None
> > > > > MainProcess|Thread-120::DEBUG::2014-04-24
> > > > >
> 12:22:20,010::supervdsmServer::96::SuperVdsm.ServerCallback::(wrapper)
> > > call
> > > > > validateAccess with ('qemu', ('qemu',
'kvm'),
> > > > > '/rhev/data-center/mnt/host01.ovirt.lan:_home_export', 5)
{}
> > > > > MainProcess|Thread-120::DEBUG::2014-04-24
> > > > >
> 12:22:20,014::supervdsmServer::103::SuperVdsm.ServerCallback::(wrapper)
> > > > > return validateAccess with None
> > > > > MainProcess|Thread-120::DEBUG::2014-04-24
> > > > >
> 12:22:20,059::supervdsmServer::96::SuperVdsm.ServerCallback::(wrapper)
> > > call
> > > > > validateAccess with ('qemu', ('qemu',
'kvm'),
> > > > > '/rhev/data-center/mnt/host01.ovirt.lan:_home_iso', 5)
{}
> > > > > MainProcess|Thread-120::DEBUG::2014-04-24
> > > > >
> 12:22:20,063::supervdsmServer::103::SuperVdsm.ServerCallback::(wrapper)
> > > > > return validateAccess with None
> > > > >
> > > > > and one host don't mount the NFS used for hosted engine.
> > > > >
> > > > > MainThread::CRITICAL::2014-04-24
> > > > >
> > >
> 12:36:16,603::agent::103::ovirt_hosted_engine_ha.agent.agent.Agent::(run)
> > > > > Could not start ha-agent
> > > > > Traceback (most recent call last):
> > > > > File
> > > > >
> > >
> "/usr/lib/python2.6/site-packages/ovirt_hosted_engine_ha/agent/agent.py",
> > > > > line 97, in run
> > > > > self._run_agent()
> > > > > File
> > > > >
> > >
> "/usr/lib/python2.6/site-packages/ovirt_hosted_engine_ha/agent/agent.py",
> > > > > line 154, in _run_agent
> > > > >
> > > hosted_engine.HostedEngine(self.shutdown_requested).start_monitoring()
> > > > > File
> > > > >
> > >
>
"/usr/lib/python2.6/site-packages/ovirt_hosted_engine_ha/agent/hosted_engine.py",
> > > > > line 299, in start_monitoring
> > > > > self._initialize_vdsm()
> > > > > File
> > > > >
> > >
>
"/usr/lib/python2.6/site-packages/ovirt_hosted_engine_ha/agent/hosted_engine.py",
> > > > > line 418, in _initialize_vdsm
> > > > > self._sd_path = env_path.get_domain_path(self._config)
> > > > > File
> > > > >
> "/usr/lib/python2.6/site-packages/ovirt_hosted_engine_ha/env/path.py",
> > > line
> > > > > 40, in get_domain_path
> > > > > .format(sd_uuid, parent))
> > > > > Exception: path to storage domain
> aea040f8-ab9d-435b-9ecf-ddd4272e592f
> > > not
> > > > > found in /rhev/data-center/mnt
> > > > >
> > > > >
> > > > >
> > > > > 2014-04-23 17:40 GMT+02:00 Kevin Tibi
<kevintibi(a)hotmail.com>:
> > > > >
> > > > > top
> > > > >> 1729 vdsm 20 0 0 0 0 Z 373.8 0.0 252:08.51
> > > > >> ovirt-ha-broker <defunct>
> > > > >>
> > > > >>
> > > > >> [root@host01 ~]# ps axwu | grep 1729
> > > > >> vdsm 1729 0.7 0.0 0 0 ? Zl Apr02
240:24
> > > > >> [ovirt-ha-broker] <defunct>
> > > > >>
> > > > >> [root@host01 ~]# ll
> > > > >>
> > >
>
/rhev/data-center/mnt/host01.ovirt.lan\:_home_NFS01/aea040f8-ab9d-435b-9ecf-ddd4272e592f/ha_agent/
> > > > >> total 2028
> > > > >> -rw-rw----. 1 vdsm kvm 1048576 23 avril 17:35
> hosted-engine.lockspace
> > > > >> -rw-rw----. 1 vdsm kvm 1028096 23 avril 17:35
> hosted-engine.metadata
> > > > >>
> > > > >> cat /var/log/vdsm/vdsm.log
> > > > >>
> > > > >> Thread-120518::DEBUG::2014-04-23
> > > > >> 17:38:02,299::task::1185::TaskManager.Task::(prepare)
> > > > >> Task=`f13e71f1-ac7c-49ab-8079-8f099ebf72b6`::finished:
> > > > >> {'aea040f8-ab9d-435b-9ecf-ddd4272e592f':
{'code': 0, 'version':
> 3,
> > > > >> 'acquired': True, 'delay':
'0.000410963', 'lastCheck': '3.4',
> 'valid':
> > > > >> True}, '5ae613a4-44e4-42cb-89fc-7b5d34c1f30f':
{'code': 0,
> 'version':
> > > 3,
> > > > >> 'acquired': True, 'delay':
'0.000412357', 'lastCheck': '6.8',
> 'valid':
> > > > >> True}, 'cc51143e-8ad7-4b0b-a4d2-9024dffc1188':
{'code': 0,
> 'version':
> > > 0,
> > > > >> 'acquired': True, 'delay':
'0.000455292', 'lastCheck': '1.2',
> 'valid':
> > > > >> True}, 'ff98d346-4515-4349-8437-fb2f5e9eaadf':
{'code': 0,
> 'version':
> > > 0,
> > > > >> 'acquired': True, 'delay':
'0.00817113', 'lastCheck': '1.7',
> 'valid':
> > > > >> True}}
> > > > >> Thread-120518::DEBUG::2014-04-23
> > > > >> 17:38:02,300::task::595::TaskManager.Task::(_updateState)
> > > > >> Task=`f13e71f1-ac7c-49ab-8079-8f099ebf72b6`::moving from
state
> > > preparing
> > > > >> ->
> > > > >> state finished
> > > > >> Thread-120518::DEBUG::2014-04-23
> > > > >>
> > >
> 17:38:02,300::resourceManager::940::ResourceManager.Owner::(releaseAll)
> > > > >> Owner.releaseAll requests {} resources {}
> > > > >> Thread-120518::DEBUG::2014-04-23
> > > > >>
> 17:38:02,300::resourceManager::977::ResourceManager.Owner::(cancelAll)
> > > > >> Owner.cancelAll requests {}
> > > > >> Thread-120518::DEBUG::2014-04-23
> > > > >> 17:38:02,300::task::990::TaskManager.Task::(_decref)
> > > > >> Task=`f13e71f1-ac7c-49ab-8079-8f099ebf72b6`::ref 0 aborting
False
> > > > >> Thread-120518::ERROR::2014-04-23
> > > > >>
> > >
>
17:38:02,302::brokerlink::72::ovirt_hosted_engine_ha.lib.brokerlink.BrokerLink::(connect)
> > > > >> Failed to connect to broker: [Errno 2] No such file or
directory
> > > > >> Thread-120518::ERROR::2014-04-23
> > > > >> 17:38:02,302::API::1612::vds::(_getHaInfo) failed to
retrieve
> Hosted
> > > > >> Engine
> > > > >> HA info
> > > > >> Traceback (most recent call last):
> > > > >> File "/usr/share/vdsm/API.py", line 1603, in
_getHaInfo
> > > > >> stats = instance.get_all_stats()
> > > > >> File
> > > > >>
> > >
>
"/usr/lib/python2.6/site-packages/ovirt_hosted_engine_ha/client/client.py",
> > > > >> line 83, in get_all_stats
> > > > >> with broker.connection():
> > > > >> File "/usr/lib64/python2.6/contextlib.py", line
16, in
> __enter__
> > > > >> return self.gen.next()
> > > > >> File
> > > > >>
> > >
>
"/usr/lib/python2.6/site-packages/ovirt_hosted_engine_ha/lib/brokerlink.py",
> > > > >> line 96, in connection
> > > > >> self.connect()
> > > > >> File
> > > > >>
> > >
>
"/usr/lib/python2.6/site-packages/ovirt_hosted_engine_ha/lib/brokerlink.py",
> > > > >> line 64, in connect
> > > > >> self._socket.connect(constants.BROKER_SOCKET_FILE)
> > > > >> File "<string>", line 1, in connect
> > > > >> error: [Errno 2] No such file or directory
> > > > >> Thread-78::DEBUG::2014-04-23
> > > > >>
17:38:05,490::fileSD::225::Storage.Misc.excCmd::(getReadDelay)
> > > '/bin/dd
> > > > >> iflag=direct
> > > > >>
> > >
>
if=/rhev/data-center/mnt/host01.ovirt.lan:_home_DATA/5ae613a4-44e4-42cb-89fc-7b5d34c1f30f/dom_md/metadata
> > > > >> bs=4096 count=1' (cwd None)
> > > > >> Thread-78::DEBUG::2014-04-23
> > > > >>
17:38:05,523::fileSD::225::Storage.Misc.excCmd::(getReadDelay)
> > > SUCCESS:
> > > > >> <err> = '0+1 records in\n0+1 records out\n545 bytes
(545 B)
> copied,
> > > > >> 0.000412209 s, 1.3 MB/s\n'; <rc> = 0
> > > > >>
> > > > >>
> > > > >>
> > > > >>
> > > > >> 2014-04-23 17:27 GMT+02:00 Martin Sivak
<msivak(a)redhat.com>:
> > > > >>
> > > > >> Hi Kevin,
> > > > >>>
> > > > >>> > same pb.
> > > > >>>
> > > > >>> Are you missing the lockspace file as well while running
on top
> of
> > > > >>> GlusterFS?
> > > > >>>
> > > > >>> > ovirt-ha-broker have 400% cpu and is defunct. I
can't kill
> with -9.
> > > > >>>
> > > > >>> Defunct process eating full four cores? I wonder how is
that
> > > possible..
> > > > >>> What are the status flags of that process when you do ps
axwu?
> > > > >>>
> > > > >>> Can you attach the log files please?
> > > > >>>
> > > > >>> --
> > > > >>> Martin Sivák
> > > > >>> msivak(a)redhat.com
> > > > >>> Red Hat Czech
> > > > >>> RHEV-M SLA / Brno, CZ
> > > > >>>
> > > > >>> ----- Original Message -----
> > > > >>> > same pb. ovirt-ha-broker have 400% cpu and is
defunct. I
> can't kill
> > > > >>> with -9.
> > > > >>> >
> > > > >>> >
> > > > >>> > 2014-04-23 13:55 GMT+02:00 Martin Sivak
<msivak(a)redhat.com>:
> > > > >>> >
> > > > >>> > > Hi,
> > > > >>> > >
> > > > >>> > > > Isn't this file created when hosted
engine is started?
> > > > >>> > >
> > > > >>> > > The file is created by the setup script. If it
got lost then
> > > there
> > > > >>> was
> > > > >>> > > probably something bad happening in your NFS or
Gluster
> storage.
> > > > >>> > >
> > > > >>> > > > Or how can I create this file manually?
> > > > >>> > >
> > > > >>> > > I can give you experimental treatment for this.
We do not
> have
> > > any
> > > > >>> > > official way as this is something that should
not ever
> happen :)
> > > > >>> > >
> > > > >>> > > !! But before you do that make sure you do not
have any
> nodes
> > > running
> > > > >>> > > properly. This will destroy and reinitialize
the lockspace
> > > database
> > > > >>> for the
> > > > >>> > > whole hosted-engine environment (which you
apparently lack,
> > > but..).
> > > > >>> !!
> > > > >>> > >
> > > > >>> > > You have to create the
ha_agent/hosted-engine.lockspace file
> > > with the
> > > > >>> > > expected size (1MB) and then tell sanlock to
initialize it
> as a
> > > > >>> lockspace
> > > > >>> > > using:
> > > > >>> > >
> > > > >>> > > # python
> > > > >>> > > >>> import sanlock
> > > > >>> > > >>>
sanlock.write_lockspace(lockspace="hosted-engine",
> > > > >>> > > ...
path="/rhev/data-center/mnt/<nfs>/<hosted engine storage
> > > > >>> > >
domain>/ha_agent/hosted-engine.lockspace",
> > > > >>> > > ... offset=0)
> > > > >>> > > >>>
> > > > >>> > >
> > > > >>> > > Then try starting the services (both broker and
agent)
> again.
> > > > >>> > >
> > > > >>> > > --
> > > > >>> > > Martin Sivák
> > > > >>> > > msivak(a)redhat.com
> > > > >>> > > Red Hat Czech
> > > > >>> > > RHEV-M SLA / Brno, CZ
> > > > >>> > >
> > > > >>> > >
> > > > >>> > > ----- Original Message -----
> > > > >>> > > > On 04/23/2014 11:08 AM, Martin Sivak
wrote:
> > > > >>> > > > > Hi René,
> > > > >>> > > > >
> > > > >>> > > > >>>> libvirtError: Failed to
acquire lock: No space left
> on
> > > device
> > > > >>> > > > >
> > > > >>> > > > >>>> 2014-04-22 12:38:17+0200
654 [3093]: r2 cmd_acquire
> > > 2,9,5733
> > > > >>> invalid
> > > > >>> > > > >>>> lockspace found -1 failed
0 name
> > > > >>> > > 2851af27-8744-445d-9fb1-a0d083c8dc82
> > > > >>> > > > >
> > > > >>> > > > > Can you please check the contents of
> /rhev/data-center/<your
> > > nfs
> > > > >>> > > > > mount>/<nfs domain
uuid>/ha_agent/?
> > > > >>> > > > >
> > > > >>> > > > > This is how it should look like:
> > > > >>> > > > >
> > > > >>> > > > > [root@dev-03 ~]# ls -al
> > > > >>> > > > >
> > > > >>> > >
> > > > >>>
> > >
>
/rhev/data-center/mnt/euryale\:_home_ovirt_he/e16de6a2-53f5-4ab3-95a3-255d08398824/ha_agent/
> > > > >>> > > > > total 2036
> > > > >>> > > > > drwxr-x---. 2 vdsm kvm 4096 Mar 19
18:46 .
> > > > >>> > > > > drwxr-xr-x. 6 vdsm kvm 4096 Mar 19
18:46 ..
> > > > >>> > > > > -rw-rw----. 1 vdsm kvm 1048576 Apr 23
11:05
> > > > >>> hosted-engine.lockspace
> > > > >>> > > > > -rw-rw----. 1 vdsm kvm 1028096 Mar 19
18:46
> > > > >>> hosted-engine.metadata
> > > > >>> > > > >
> > > > >>> > > > > The errors seem to indicate that you
somehow lost the
> > > lockspace
> > > > >>> file.
> > > > >>> > > >
> > > > >>> > > > True :)
> > > > >>> > > > Isn't this file created when hosted
engine is started? Or
> how
> > > can I
> > > > >>> > > > create this file manually?
> > > > >>> > > >
> > > > >>> > > > >
> > > > >>> > > > > --
> > > > >>> > > > > Martin Sivák
> > > > >>> > > > > msivak(a)redhat.com
> > > > >>> > > > > Red Hat Czech
> > > > >>> > > > > RHEV-M SLA / Brno, CZ
> > > > >>> > > > >
> > > > >>> > > > > ----- Original Message -----
> > > > >>> > > > >> On 04/23/2014 12:28 AM, Doron
Fediuck wrote:
> > > > >>> > > > >>> Hi Rene,
> > > > >>> > > > >>> any idea what closed your
ovirtmgmt bridge?
> > > > >>> > > > >>> as long as it is down vdsm
may have issues starting up
> > > properly
> > > > >>> > > > >>> and this is why you see the
complaints on the rpc
> server.
> > > > >>> > > > >>>
> > > > >>> > > > >>> Can you try manually fixing
the network part first
> and then
> > > > >>> > > > >>> restart vdsm?
> > > > >>> > > > >>> Once vdsm is happy hosted
engine VM will start.
> > > > >>> > > > >>
> > > > >>> > > > >> Thanks for your feedback, Doron.
> > > > >>> > > > >>
> > > > >>> > > > >> My ovirtmgmt bridge seems to be
on or isn't it:
> > > > >>> > > > >> # brctl show ovirtmgmt
> > > > >>> > > > >> bridge name bridge id
STP enabled
> > > > >>> interfaces
> > > > >>> > > > >> ovirtmgmt
8000.0025907587c2 no
> > > > >>> eth0.200
> > > > >>> > > > >>
> > > > >>> > > > >> # ip a s ovirtmgmt
> > > > >>> > > > >> 7: ovirtmgmt:
<BROADCAST,MULTICAST,UP,LOWER_UP> mtu
> 1500
> > > qdisc
> > > > >>> noqueue
> > > > >>> > > > >> state UNKNOWN
> > > > >>> > > > >> link/ether
00:25:90:75:87:c2 brd
> ff:ff:ff:ff:ff:ff
> > > > >>> > > > >> inet 10.0.200.102/24 brd
10.0.200.255 scope
> global
> > > > >>> ovirtmgmt
> > > > >>> > > > >> inet6
fe80::225:90ff:fe75:87c2/64 scope link
> > > > >>> > > > >> valid_lft forever
preferred_lft forever
> > > > >>> > > > >>
> > > > >>> > > > >> # ip a s eth0.200
> > > > >>> > > > >> 6: eth0.200@eth0:
<BROADCAST,MULTICAST,UP,LOWER_UP>
> mtu
> > > 1500
> > > > >>> qdisc
> > > > >>> > > > >> noqueue state UP
> > > > >>> > > > >> link/ether
00:25:90:75:87:c2 brd
> ff:ff:ff:ff:ff:ff
> > > > >>> > > > >> inet6
fe80::225:90ff:fe75:87c2/64 scope link
> > > > >>> > > > >> valid_lft forever
preferred_lft forever
> > > > >>> > > > >>
> > > > >>> > > > >> I tried the following yesterday:
> > > > >>> > > > >> Copy virtual disk from GlusterFS
storage to local disk
> of
> > > host
> > > > >>> and
> > > > >>> > > > >> create a new vm with virt-manager
which loads ovirtmgmt
> > > disk. I
> > > > >>> could
> > > > >>> > > > >> reach my engine over the
ovirtmgmt bridge (so bridge
> must be
> > > > >>> working).
> > > > >>> > > > >>
> > > > >>> > > > >> I also started libvirtd with
Option -v and I saw the
> > > following
> > > > >>> in
> > > > >>> > > > >> libvirtd.log when trying to start
ovirt engine:
> > > > >>> > > > >> 2014-04-22 14:18:25.432+0000:
8901: debug :
> > > > >>> virCommandRunAsync:2250 :
> > > > >>> > > > >> Command result 0, with PID 11491
> > > > >>> > > > >> 2014-04-22 14:18:25.478+0000:
8901: debug :
> > > virCommandRun:2045 :
> > > > >>> > > Result
> > > > >>> > > > >> exit status 255, stdout:
'' stderr: 'iptables v1.4.7:
> goto
> > > > >>> 'FO-vnet0'
> > > > >>> > > is
> > > > >>> > > > >> not a chain
> > > > >>> > > > >>
> > > > >>> > > > >> So it could be that something is
broken in my
> hosted-engine
> > > > >>> network.
> > > > >>> > > Do
> > > > >>> > > > >> you have any clue how I can
troubleshoot this?
> > > > >>> > > > >>
> > > > >>> > > > >>
> > > > >>> > > > >> Thanks,
> > > > >>> > > > >> René
> > > > >>> > > > >>
> > > > >>> > > > >>
> > > > >>> > > > >>>
> > > > >>> > > > >>> ----- Original Message -----
> > > > >>> > > > >>>> From: "René
Koch" <rkoch(a)linuxland.at>
> > > > >>> > > > >>>> To: "Martin
Sivak" <msivak(a)redhat.com>
> > > > >>> > > > >>>> Cc: users(a)ovirt.org
> > > > >>> > > > >>>> Sent: Tuesday, April 22,
2014 1:46:38 PM
> > > > >>> > > > >>>> Subject: Re:
[ovirt-users] hosted engine health check
> > > issues
> > > > >>> > > > >>>>
> > > > >>> > > > >>>> Hi,
> > > > >>> > > > >>>>
> > > > >>> > > > >>>> I rebooted one of my
ovirt hosts today and the
> result is
> > > now
> > > > >>> that I
> > > > >>> > > > >>>> can't start
hosted-engine anymore.
> > > > >>> > > > >>>>
> > > > >>> > > > >>>> ovirt-ha-agent isn't
running because the lockspace
> file is
> > > > >>> missing
> > > > >>> > > > >>>> (sanlock complains about
it).
> > > > >>> > > > >>>> So I tried to start
hosted-engine with --vm-start
> and I
> > > get
> > > > >>> the
> > > > >>> > > > >>>> following errors:
> > > > >>> > > > >>>>
> > > > >>> > > > >>>> ==>
/var/log/sanlock.log <==
> > > > >>> > > > >>>> 2014-04-22 12:38:17+0200
654 [3093]: r2 cmd_acquire
> > > 2,9,5733
> > > > >>> invalid
> > > > >>> > > > >>>> lockspace found -1 failed
0 name
> > > > >>> > > 2851af27-8744-445d-9fb1-a0d083c8dc82
> > > > >>> > > > >>>>
> > > > >>> > > > >>>> ==> /var/log/messages
<==
> > > > >>> > > > >>>> Apr 22 12:38:17
ovirt-host02 sanlock[3079]:
> 2014-04-22
> > > > >>> > > 12:38:17+0200 654
> > > > >>> > > > >>>> [3093]: r2 cmd_acquire
2,9,5733 invalid lockspace
> found -1
> > > > >>> failed 0
> > > > >>> > > name
> > > > >>> > > > >>>>
2851af27-8744-445d-9fb1-a0d083c8dc82
> > > > >>> > > > >>>> Apr 22 12:38:17
ovirt-host02 kernel: ovirtmgmt: port
> > > 2(vnet0)
> > > > >>> > > entering
> > > > >>> > > > >>>> disabled state
> > > > >>> > > > >>>> Apr 22 12:38:17
ovirt-host02 kernel: device vnet0
> left
> > > > >>> promiscuous
> > > > >>> > > mode
> > > > >>> > > > >>>> Apr 22 12:38:17
ovirt-host02 kernel: ovirtmgmt: port
> > > 2(vnet0)
> > > > >>> > > entering
> > > > >>> > > > >>>> disabled state
> > > > >>> > > > >>>>
> > > > >>> > > > >>>> ==>
/var/log/vdsm/vdsm.log <==
> > > > >>> > > > >>>>
Thread-21::DEBUG::2014-04-22
> > > > >>> > > > >>>>
12:38:17,563::libvirtconnection::124::root::(wrapper)
> > > Unknown
> > > > >>> > > > >>>> libvirterror: ecode: 38
edom: 42 level: 2 message:
> Failed
> > > to
> > > > >>> acquire
> > > > >>> > > > >>>> lock: No space left on
device
> > > > >>> > > > >>>>
Thread-21::DEBUG::2014-04-22
> > > > >>> > > > >>>>
12:38:17,563::vm::2263::vm.Vm::(_startUnderlyingVm)
> > > > >>> > > > >>>>
> > > vmId=`f26dd37e-13b5-430c-b2f2-ecd098b82a91`::_ongoingCreations
> > > > >>> > > released
> > > > >>> > > > >>>>
Thread-21::ERROR::2014-04-22
> > > > >>> > > > >>>>
12:38:17,564::vm::2289::vm.Vm::(_startUnderlyingVm)
> > > > >>> > > > >>>>
vmId=`f26dd37e-13b5-430c-b2f2-ecd098b82a91`::The vm
> start
> > > > >>> process
> > > > >>> > > failed
> > > > >>> > > > >>>> Traceback (most recent
call last):
> > > > >>> > > > >>>> File
"/usr/share/vdsm/vm.py", line 2249, in
> > > > >>> _startUnderlyingVm
> > > > >>> > > > >>>> self._run()
> > > > >>> > > > >>>> File
"/usr/share/vdsm/vm.py", line 3170, in _run
> > > > >>> > > > >>>>
self._connection.createXML(domxml, flags),
> > > > >>> > > > >>>> File
> > > > >>> > > > >>>>
> > > > >>>
"/usr/lib64/python2.6/site-packages/vdsm/libvirtconnection.py",
> > > > >>> > > > >>>> line 92, in wrapper
> > > > >>> > > > >>>> ret = f(*args,
**kwargs)
> > > > >>> > > > >>>> File
> "/usr/lib64/python2.6/site-packages/libvirt.py",
> > > > >>> line
> > > > >>> > > 2665, in
> > > > >>> > > > >>>> createXML
> > > > >>> > > > >>>> if ret is
None:raise
> > > libvirtError('virDomainCreateXML()
> > > > >>> > > failed',
> > > > >>> > > > >>>> conn=self)
> > > > >>> > > > >>>> libvirtError: Failed to
acquire lock: No space left
> on
> > > device
> > > > >>> > > > >>>>
> > > > >>> > > > >>>> ==> /var/log/messages
<==
> > > > >>> > > > >>>> Apr 22 12:38:17
ovirt-host02 vdsm vm.Vm ERROR
> > > > >>> > > > >>>>
vmId=`f26dd37e-13b5-430c-b2f2-ecd098b82a91`::The vm
> start
> > > > >>> process
> > > > >>> > > > >>>> failed#012Traceback (most
recent call last):#012
> File
> > > > >>> > > > >>>>
"/usr/share/vdsm/vm.py", line 2249, in
> > > _startUnderlyingVm#012
> > > > >>> > > > >>>> self._run()#012 File
"/usr/share/vdsm/vm.py", line
> 3170,
> > > in
> > > > >>> > > _run#012
> > > > >>> > > > >>>>
self._connection.createXML(domxml, flags),#012
> File
> > > > >>> > > > >>>>
> > > > >>>
"/usr/lib64/python2.6/site-packages/vdsm/libvirtconnection.py",
> > > > >>> > > line 92,
> > > > >>> > > > >>>> in wrapper#012 ret =
f(*args, **kwargs)#012 File
> > > > >>> > > > >>>>
"/usr/lib64/python2.6/site-packages/libvirt.py", line
> > > 2665, in
> > > > >>> > > > >>>> createXML#012 if ret
is None:raise
> > > > >>> > > libvirtError('virDomainCreateXML()
> > > > >>> > > > >>>> failed',
conn=self)#012libvirtError: Failed to
> acquire
> > > lock:
> > > > >>> No
> > > > >>> > > space
> > > > >>> > > > >>>> left on device
> > > > >>> > > > >>>>
> > > > >>> > > > >>>> ==>
/var/log/vdsm/vdsm.log <==
> > > > >>> > > > >>>>
Thread-21::DEBUG::2014-04-22
> > > > >>> > > > >>>>
12:38:17,569::vm::2731::vm.Vm::(setDownStatus)
> > > > >>> > > > >>>>
vmId=`f26dd37e-13b5-430c-b2f2-ecd098b82a91`::Changed
> > > state to
> > > > >>> Down:
> > > > >>> > > > >>>> Failed to acquire lock:
No space left on device
> > > > >>> > > > >>>>
> > > > >>> > > > >>>>
> > > > >>> > > > >>>> No space left on device
is nonsense as there is
> enough
> > > space
> > > > >>> (I had
> > > > >>> > > this
> > > > >>> > > > >>>> issue last time as well
where I had to patch
> machine.py,
> > > but
> > > > >>> this
> > > > >>> > > file
> > > > >>> > > > >>>> is now Python 2.6.6
compatible.
> > > > >>> > > > >>>>
> > > > >>> > > > >>>> Any idea what prevents
hosted-engine from starting?
> > > > >>> > > > >>>> ovirt-ha-broker, vdsmd
and sanlock are running btw.
> > > > >>> > > > >>>>
> > > > >>> > > > >>>> Btw, I can see in log
that json rpc server module is
> > > missing
> > > > >>> - which
> > > > >>> > > > >>>> package is required for
CentOS 6.5?
> > > > >>> > > > >>>> Apr 22 12:37:14
ovirt-host02 vdsm vds WARNING Unable
> to
> > > load
> > > > >>> the
> > > > >>> > > json
> > > > >>> > > > >>>> rpc server module. Please
make sure it is installed.
> > > > >>> > > > >>>>
> > > > >>> > > > >>>>
> > > > >>> > > > >>>> Thanks,
> > > > >>> > > > >>>> René
> > > > >>> > > > >>>>
> > > > >>> > > > >>>>
> > > > >>> > > > >>>>
> > > > >>> > > > >>>> On 04/17/2014 10:02 AM,
Martin Sivak wrote:
> > > > >>> > > > >>>>> Hi,
> > > > >>> > > > >>>>>
> > > > >>> > > > >>>>>>>> How can I
disable notifications?
> > > > >>> > > > >>>>>
> > > > >>> > > > >>>>> The notification is
configured in
> > > > >>> > > > >>>>>
/etc/ovirt-hosted-engine-ha/broker.conf
> > > > >>> > > > >>>>> section
notification.
> > > > >>> > > > >>>>> The email is sent
when the key state_transition
> exists
> > > and
> > > > >>> the
> > > > >>> > > string
> > > > >>> > > > >>>>> OldState-NewState
contains the (case insensitive)
> regexp
> > > > >>> from the
> > > > >>> > > > >>>>> value.
> > > > >>> > > > >>>>>
> > > > >>> > > > >>>>>>>> Is it
intended to send out these messages and
> detect
> > > that
> > > > >>> ovirt
> > > > >>> > > > >>>>>>>> engine
> > > > >>> > > > >>>>>>>> is down
(which is false anyway), but not to
> restart
> > > the
> > > > >>> vm?
> > > > >>> > > > >>>>>
> > > > >>> > > > >>>>> Forget about emails
for now and check the
> > > > >>> > > > >>>>>
/var/log/ovirt-hosted-engine-ha/agent.log and
> broker.log
> > > (and
> > > > >>> > > attach
> > > > >>> > > > >>>>> them
> > > > >>> > > > >>>>> as well btw).
> > > > >>> > > > >>>>>
> > > > >>> > > > >>>>>>>> oVirt
hosts think that hosted engine is down
> because
> > > it
> > > > >>> seems
> > > > >>> > > that
> > > > >>> > > > >>>>>>>> hosts
> > > > >>> > > > >>>>>>>> can't
write to hosted-engine.lockspace due to
> > > glusterfs
> > > > >>> issues
> > > > >>> > > (or
> > > > >>> > > > >>>>>>>> at
> > > > >>> > > > >>>>>>>> least I
think so).
> > > > >>> > > > >>>>>
> > > > >>> > > > >>>>> The hosts think so or
can't really write there? The
> > > > >>> lockspace is
> > > > >>> > > > >>>>> managed
> > > > >>> > > > >>>>> by
> > > > >>> > > > >>>>> sanlock and our HA
daemons do not touch it at all.
> We
> > > only
> > > > >>> ask
> > > > >>> > > sanlock
> > > > >>> > > > >>>>> to
> > > > >>> > > > >>>>> get make sure we have
unique server id.
> > > > >>> > > > >>>>>
> > > > >>> > > > >>>>>>>> Is is
possible or planned to make the whole ha
> feature
> > > > >>> optional?
> > > > >>> > > > >>>>>
> > > > >>> > > > >>>>> Well the system
won't perform any automatic actions
> if
> > > you
> > > > >>> put the
> > > > >>> > > > >>>>> hosted
> > > > >>> > > > >>>>> engine to global
maintenance and only
> start/stop/migrate
> > > the
> > > > >>> VM
> > > > >>> > > > >>>>> manually.
> > > > >>> > > > >>>>> I would discourage
you from stopping agent/broker,
> > > because
> > > > >>> the
> > > > >>> > > engine
> > > > >>> > > > >>>>> itself has some logic
based on the reporting.
> > > > >>> > > > >>>>>
> > > > >>> > > > >>>>> Regards
> > > > >>> > > > >>>>>
> > > > >>> > > > >>>>> --
> > > > >>> > > > >>>>> Martin Sivák
> > > > >>> > > > >>>>> msivak(a)redhat.com
> > > > >>> > > > >>>>> Red Hat Czech
> > > > >>> > > > >>>>> RHEV-M SLA / Brno,
CZ
> > > > >>> > > > >>>>>
> > > > >>> > > > >>>>> ----- Original
Message -----
> > > > >>> > > > >>>>>> On 04/15/2014
04:53 PM, Jiri Moskovcak wrote:
> > > > >>> > > > >>>>>>> On 04/14/2014
10:50 AM, René Koch wrote:
> > > > >>> > > > >>>>>>>> Hi,
> > > > >>> > > > >>>>>>>>
> > > > >>> > > > >>>>>>>> I have
some issues with hosted engine status.
> > > > >>> > > > >>>>>>>>
> > > > >>> > > > >>>>>>>> oVirt
hosts think that hosted engine is down
> because
> > > it
> > > > >>> seems
> > > > >>> > > that
> > > > >>> > > > >>>>>>>> hosts
> > > > >>> > > > >>>>>>>> can't
write to hosted-engine.lockspace due to
> > > glusterfs
> > > > >>> issues
> > > > >>> > > (or
> > > > >>> > > > >>>>>>>> at
> > > > >>> > > > >>>>>>>> least I
think so).
> > > > >>> > > > >>>>>>>>
> > > > >>> > > > >>>>>>>>
Here's the output of vm-status:
> > > > >>> > > > >>>>>>>>
> > > > >>> > > > >>>>>>>> #
hosted-engine --vm-status
> > > > >>> > > > >>>>>>>>
> > > > >>> > > > >>>>>>>>
> > > > >>> > > > >>>>>>>> --== Host
1 status ==--
> > > > >>> > > > >>>>>>>>
> > > > >>> > > > >>>>>>>> Status
up-to-date : False
> > > > >>> > > > >>>>>>>> Hostname
: 10.0.200.102
> > > > >>> > > > >>>>>>>> Host ID
: 1
> > > > >>> > > > >>>>>>>> Engine
status : unknown
> > > stale-data
> > > > >>> > > > >>>>>>>> Score
: 2400
> > > > >>> > > > >>>>>>>> Local
maintenance : False
> > > > >>> > > > >>>>>>>> Host
timestamp : 1397035677
> > > > >>> > > > >>>>>>>> Extra
metadata (valid at timestamp):
> > > > >>> > > > >>>>>>>>
metadata_parse_version=1
> > > > >>> > > > >>>>>>>>
metadata_feature_version=1
> > > > >>> > > > >>>>>>>>
timestamp=1397035677 (Wed Apr 9 11:27:57
> > > 2014)
> > > > >>> > > > >>>>>>>>
host-id=1
> > > > >>> > > > >>>>>>>>
score=2400
> > > > >>> > > > >>>>>>>>
maintenance=False
> > > > >>> > > > >>>>>>>>
state=EngineUp
> > > > >>> > > > >>>>>>>>
> > > > >>> > > > >>>>>>>>
> > > > >>> > > > >>>>>>>> --== Host
2 status ==--
> > > > >>> > > > >>>>>>>>
> > > > >>> > > > >>>>>>>> Status
up-to-date : True
> > > > >>> > > > >>>>>>>> Hostname
: 10.0.200.101
> > > > >>> > > > >>>>>>>> Host ID
: 2
> > > > >>> > > > >>>>>>>> Engine
status : {'reason':
> 'vm
> > > not
> > > > >>> running
> > > > >>> > > on
> > > > >>> > > > >>>>>>>> this
> > > > >>> > > > >>>>>>>>
host', 'health': 'bad', 'vm': 'down',
'detail':
> > > 'unknown'}
> > > > >>> > > > >>>>>>>> Score
: 0
> > > > >>> > > > >>>>>>>> Local
maintenance : False
> > > > >>> > > > >>>>>>>> Host
timestamp : 1397464031
> > > > >>> > > > >>>>>>>> Extra
metadata (valid at timestamp):
> > > > >>> > > > >>>>>>>>
metadata_parse_version=1
> > > > >>> > > > >>>>>>>>
metadata_feature_version=1
> > > > >>> > > > >>>>>>>>
timestamp=1397464031 (Mon Apr 14 10:27:11
> > > 2014)
> > > > >>> > > > >>>>>>>>
host-id=2
> > > > >>> > > > >>>>>>>>
score=0
> > > > >>> > > > >>>>>>>>
maintenance=False
> > > > >>> > > > >>>>>>>>
state=EngineUnexpectedlyDown
> > > > >>> > > > >>>>>>>>
timeout=Mon Apr 14 10:35:05 2014
> > > > >>> > > > >>>>>>>>
> > > > >>> > > > >>>>>>>> oVirt
engine is sending me 2 emails every 10
> minutes
> > > with
> > > > >>> the
> > > > >>> > > > >>>>>>>>
following
> > > > >>> > > > >>>>>>>>
subjects:
> > > > >>> > > > >>>>>>>> -
ovirt-hosted-engine state transition
> > > > >>> EngineDown-EngineStart
> > > > >>> > > > >>>>>>>> -
ovirt-hosted-engine state transition
> > > > >>> EngineStart-EngineUp
> > > > >>> > > > >>>>>>>>
> > > > >>> > > > >>>>>>>> In oVirt
webadmin I can see the following
> message:
> > > > >>> > > > >>>>>>>> VM
HostedEngine is down. Exit message: internal
> error
> > > > >>> Failed to
> > > > >>> > > > >>>>>>>> acquire
> > > > >>> > > > >>>>>>>> lock:
error -243.
> > > > >>> > > > >>>>>>>>
> > > > >>> > > > >>>>>>>> These
messages are really annoying as oVirt isn't
> > > doing
> > > > >>> anything
> > > > >>> > > > >>>>>>>> with
> > > > >>> > > > >>>>>>>> hosted
engine - I have an uptime of 9 days in my
> > > engine
> > > > >>> vm.
> > > > >>> > > > >>>>>>>>
> > > > >>> > > > >>>>>>>> So my
questions are now:
> > > > >>> > > > >>>>>>>> Is it
intended to send out these messages and
> detect
> > > that
> > > > >>> ovirt
> > > > >>> > > > >>>>>>>> engine
> > > > >>> > > > >>>>>>>> is down
(which is false anyway), but not to
> restart
> > > the
> > > > >>> vm?
> > > > >>> > > > >>>>>>>>
> > > > >>> > > > >>>>>>>> How can I
disable notifications? I'm planning to
> > > write a
> > > > >>> Nagios
> > > > >>> > > > >>>>>>>> plugin
> > > > >>> > > > >>>>>>>> which
parses the output of hosted-engine
> --vm-status
> > > and
> > > > >>> only
> > > > >>> > > Nagios
> > > > >>> > > > >>>>>>>> should
notify me, not hosted-engine script.
> > > > >>> > > > >>>>>>>>
> > > > >>> > > > >>>>>>>> Is is
possible or planned to make the whole ha
> feature
> > > > >>> > > optional? I
> > > > >>> > > > >>>>>>>> really
really really hate cluster software as it
> > > causes
> > > > >>> more
> > > > >>> > > > >>>>>>>> troubles
> > > > >>> > > > >>>>>>>> then
standalone machines and in my case the
> > > hosted-engine
> > > > >>> ha
> > > > >>> > > feature
> > > > >>> > > > >>>>>>>> really
causes troubles (and I didn't had a
> hardware or
> > > > >>> network
> > > > >>> > > > >>>>>>>> outage
> > > > >>> > > > >>>>>>>> yet only
issues with hosted-engine ha agent). I
> don't
> > > > >>> need any
> > > > >>> > > ha
> > > > >>> > > > >>>>>>>> feature
for hosted engine. I just want to run
> engine
> > > > >>> > > virtualized on
> > > > >>> > > > >>>>>>>> oVirt and
if engine vm fails (e.g. because of
> issues
> > > with
> > > > >>> a
> > > > >>> > > host)
> > > > >>> > > > >>>>>>>> I'll
> > > > >>> > > > >>>>>>>> restart
it on another node.
> > > > >>> > > > >>>>>>>
> > > > >>> > > > >>>>>>> Hi, you can:
> > > > >>> > > > >>>>>>> 1. edit
> > > > >>> /etc/ovirt-hosted-engine-ha/{agent,broker}-log.conf and
> > > > >>> > > tweak
> > > > >>> > > > >>>>>>> the logger as
you like
> > > > >>> > > > >>>>>>> 2. or kill
ovirt-ha-broker & ovirt-ha-agent
> services
> > > > >>> > > > >>>>>>
> > > > >>> > > > >>>>>> Thanks for the
information.
> > > > >>> > > > >>>>>> So engine is able
to run when ovirt-ha-broker and
> > > > >>> ovirt-ha-agent
> > > > >>> > > isn't
> > > > >>> > > > >>>>>> running?
> > > > >>> > > > >>>>>>
> > > > >>> > > > >>>>>>
> > > > >>> > > > >>>>>> Regards,
> > > > >>> > > > >>>>>> René
> > > > >>> > > > >>>>>>
> > > > >>> > > > >>>>>>>
> > > > >>> > > > >>>>>>> --Jirka
> > > > >>> > > > >>>>>>>>
> > > > >>> > > > >>>>>>>> Thanks,
> > > > >>> > > > >>>>>>>> René
> > > > >>> > > > >>>>>>>>
> > > > >>> > > > >>>>>>>>
> > > > >>> > > > >>>>>>>
> > > > >>> > > > >>>>>>
_______________________________________________
> > > > >>> > > > >>>>>> Users mailing
list
> > > > >>> > > > >>>>>> Users(a)ovirt.org
> > > > >>> > > > >>>>>>
http://lists.ovirt.org/mailman/listinfo/users
> > > > >>> > > > >>>>>>
> > > > >>> > > > >>>>
_______________________________________________
> > > > >>> > > > >>>> Users mailing list
> > > > >>> > > > >>>> Users(a)ovirt.org
> > > > >>> > > > >>>>
http://lists.ovirt.org/mailman/listinfo/users
> > > > >>> > > > >>>>
> > > > >>> > > > >>
> > > > >>> > > >
> > > > >>> > >
_______________________________________________
> > > > >>> > > Users mailing list
> > > > >>> > > Users(a)ovirt.org
> > > > >>> > >
http://lists.ovirt.org/mailman/listinfo/users
> > > > >>> > >
> > > > >>> >
> > > > >>> _______________________________________________
> > > > >>> Users mailing list
> > > > >>> Users(a)ovirt.org
> > > > >>>
http://lists.ovirt.org/mailman/listinfo/users
> > > > >>>
> > > > >>
> > > > >>
> > > > >
> > > >
> > > _______________________________________________
> > > Users mailing list
> > > Users(a)ovirt.org
> > >
http://lists.ovirt.org/mailman/listinfo/users
> > >
> >
> _______________________________________________
> Users mailing list
> Users(a)ovirt.org
>
http://lists.ovirt.org/mailman/listinfo/users
>