
I'am on Centos 6.5 and this repo is for fedora... 2014-04-28 12:16 GMT+02:00 Kevin Tibi <kevintibi@hotmail.com>:
Hi,
qemu-kvm-0.12.1.2-2.415.el6_5.8.x86_64 libvirt-0.10.2-29.el6_5.7.x86_64 vdsm-4.14.6-0.el6.x86_64 kernel-2.6.32-431.el6.x86_64 kernel-2.6.32-431.11.2.el6.x86_64
i add this repop and try to update.
2014-04-28 11:57 GMT+02:00 Martin Sivak <msivak@redhat.com>:
Hi Kevin,
thanks for the information.
Agent.log and broker.log says nothing.
Can you please attach those files? I would like to see how the crashed Qemu process is reported to us and what are the state machine trainsitions that cause the load.
07:23:58,994::libvirtconnection::124::root::(wrapper) Unknown libvirterror: ecode: 84 edom: 10 level: 2 message: Operation not supported: live disk snapshot not supported with this QEMU binary
What are the versions of vdsm, libvirt, qemu-kvm and kernel?
If you feel like it try updating virt packages from the virt-preview repository: http://fedoraproject.org/wiki/Virtualization_Preview_Repository
-- Martin Sivák msivak@redhat.com Red Hat Czech RHEV-M SLA / Brno, CZ
Hi,
I use this version : ovirt-hosted-engine-ha-1.1.2-1.el6.noarch
For 3 days, my engine-ha worked perfectly but i tried to snapshot a Vm and ha service make defunct ==> 400% CPU !!
Agent.log and broker.log says nothing. But vdsm.log i have errors :
Thread-9462::DEBUG::2014-04-28 07:23:58,994::libvirtconnection::124::root::(wrapper) Unknown
----- Original Message ----- libvirterror:
ecode: 84 edom: 10 level: 2 message: Operation not supported: live disk snapshot not supported with this QEMU binary
Thread-9462::ERROR::2014-04-28 07:23:58,995::vm::4006::vm.Vm::(snapshot) vmId=`773f6e6d-c670-49f3-ae8c-dfbcfa22d0a5`::Unable to take snapshot
Thread-9352::DEBUG::2014-04-28 08:41:39,922::lvm::295::Storage.Misc.excCmd::(cmd) '/usr/bin/sudo -n /sbin/lvm vgs --config " devices { preferred_names = [\\"^/dev/mapper/\\"] ignore_suspended_devices=1 write_cache_state=0 disable_after_error_count=3 obtain_device_list_from_udev=0 filter = [ \'r|.*|\' ] } global { locking_type=1 prioritise_write_locks=1 wait_for_locks=1 } backup { retain_min = 50 retain_days = 0 } " --noheadings --units b --nosuffix --separator | -o
uuid,name,attr,size,free,extent_size,extent_count,free_count,tags,vg_mda_size,vg_mda_free,lv_count,pv_count,pv_name
cc51143e-8ad7-4b0b-a4d2-9024dffc1188 ff98d346-4515-4349-8437-fb2f5e9eaadf' (cwd None)
I'll try to reboot my node with hosted-engine.
2014-04-25 13:54 GMT+02:00 Martin Sivak <msivak@redhat.com>:
Hi Kevin,
can you please tell us what version of hosted-engine are you running?
rpm -q ovirt-hosted-engine-ha
Also, do I understand it correctly that the engine VM is running, but you see bad status when you execute the hosted-engine --vm-status command?
If that is so, can you give us current logs from /var/log/ovirt-hosted-engine-ha?
-- Martin Sivák msivak@redhat.com Red Hat Czech RHEV-M SLA / Brno, CZ
----- Original Message -----
Ok i mount manualy the domain for hosted engine and agent go up.
But vm-status :
--== Host 2 status ==--
Status up-to-date : False Hostname : 192.168.99.103 Host ID : 2 Engine status : unknown stale-data Score : 0 Local maintenance : False Host timestamp : 1398333438
And in my engine, host02 Ha is no active.
2014-04-24 12:48 GMT+02:00 Kevin Tibi <kevintibi@hotmail.com>:
Hi,
I try to reboot my hosts and now [supervdsmServer] is <defunct>.
/var/log/vdsm/supervdsm.log
MainProcess|Thread-120::DEBUG::2014-04-24
12:22:19,955::supervdsmServer::103::SuperVdsm.ServerCallback::(wrapper)
return validateAccess with None MainProcess|Thread-120::DEBUG::2014-04-24
12:22:20,010::supervdsmServer::96::SuperVdsm.ServerCallback::(wrapper) call
validateAccess with ('qemu', ('qemu', 'kvm'), '/rhev/data-center/mnt/host01.ovirt.lan:_home_export', 5) {} MainProcess|Thread-120::DEBUG::2014-04-24
12:22:20,014::supervdsmServer::103::SuperVdsm.ServerCallback::(wrapper)
return validateAccess with None MainProcess|Thread-120::DEBUG::2014-04-24
12:22:20,059::supervdsmServer::96::SuperVdsm.ServerCallback::(wrapper) call
validateAccess with ('qemu', ('qemu', 'kvm'), '/rhev/data-center/mnt/host01.ovirt.lan:_home_iso', 5) {} MainProcess|Thread-120::DEBUG::2014-04-24
12:22:20,063::supervdsmServer::103::SuperVdsm.ServerCallback::(wrapper)
return validateAccess with None
and one host don't mount the NFS used for hosted engine.
MainThread::CRITICAL::2014-04-24
12:36:16,603::agent::103::ovirt_hosted_engine_ha.agent.agent.Agent::(run)
Could not start ha-agent Traceback (most recent call last): File
"/usr/lib/python2.6/site-packages/ovirt_hosted_engine_ha/agent/agent.py",
line 97, in run self._run_agent() File
"/usr/lib/python2.6/site-packages/ovirt_hosted_engine_ha/agent/agent.py",
line 154, in _run_agent
hosted_engine.HostedEngine(self.shutdown_requested).start_monitoring()
File
"/usr/lib/python2.6/site-packages/ovirt_hosted_engine_ha/agent/hosted_engine.py",
line 299, in start_monitoring self._initialize_vdsm() File
"/usr/lib/python2.6/site-packages/ovirt_hosted_engine_ha/agent/hosted_engine.py",
line 418, in _initialize_vdsm self._sd_path = env_path.get_domain_path(self._config) File
"/usr/lib/python2.6/site-packages/ovirt_hosted_engine_ha/env/path.py", line
40, in get_domain_path .format(sd_uuid, parent)) Exception: path to storage domain aea040f8-ab9d-435b-9ecf-ddd4272e592f not found in /rhev/data-center/mnt
2014-04-23 17:40 GMT+02:00 Kevin Tibi <kevintibi@hotmail.com>:
top > 1729 vdsm 20 0 0 0 0 Z 373.8 0.0 252:08.51 > ovirt-ha-broker <defunct> > > > [root@host01 ~]# ps axwu | grep 1729 > vdsm 1729 0.7 0.0 0 0 ? Zl Apr02 240:24 > [ovirt-ha-broker] <defunct> > > [root@host01 ~]# ll >
/rhev/data-center/mnt/host01.ovirt.lan\:_home_NFS01/aea040f8-ab9d-435b-9ecf-ddd4272e592f/ha_agent/
> total 2028 > -rw-rw----. 1 vdsm kvm 1048576 23 avril 17:35 hosted-engine.lockspace > -rw-rw----. 1 vdsm kvm 1028096 23 avril 17:35 hosted-engine.metadata > > cat /var/log/vdsm/vdsm.log > > Thread-120518::DEBUG::2014-04-23 > 17:38:02,299::task::1185::TaskManager.Task::(prepare) > Task=`f13e71f1-ac7c-49ab-8079-8f099ebf72b6`::finished: > {'aea040f8-ab9d-435b-9ecf-ddd4272e592f': {'code': 0, 'version': 3, > 'acquired': True, 'delay': '0.000410963', 'lastCheck': '3.4', 'valid': > True}, '5ae613a4-44e4-42cb-89fc-7b5d34c1f30f': {'code': 0, 'version': 3, > 'acquired': True, 'delay': '0.000412357', 'lastCheck': '6.8', 'valid': > True}, 'cc51143e-8ad7-4b0b-a4d2-9024dffc1188': {'code': 0, 'version': 0, > 'acquired': True, 'delay': '0.000455292', 'lastCheck': '1.2', 'valid': > True}, 'ff98d346-4515-4349-8437-fb2f5e9eaadf': {'code': 0, 'version': 0, > 'acquired': True, 'delay': '0.00817113', 'lastCheck': '1.7', 'valid': > True}} > Thread-120518::DEBUG::2014-04-23 > 17:38:02,300::task::595::TaskManager.Task::(_updateState) > Task=`f13e71f1-ac7c-49ab-8079-8f099ebf72b6`::moving from state preparing > -> > state finished > Thread-120518::DEBUG::2014-04-23 >
17:38:02,300::resourceManager::940::ResourceManager.Owner::(releaseAll)
> Owner.releaseAll requests {} resources {} > Thread-120518::DEBUG::2014-04-23 > 17:38:02,300::resourceManager::977::ResourceManager.Owner::(cancelAll) > Owner.cancelAll requests {} > Thread-120518::DEBUG::2014-04-23 > 17:38:02,300::task::990::TaskManager.Task::(_decref) > Task=`f13e71f1-ac7c-49ab-8079-8f099ebf72b6`::ref 0 aborting False > Thread-120518::ERROR::2014-04-23 >
17:38:02,302::brokerlink::72::ovirt_hosted_engine_ha.lib.brokerlink.BrokerLink::(connect)
> Failed to connect to broker: [Errno 2] No such file or directory > Thread-120518::ERROR::2014-04-23 > 17:38:02,302::API::1612::vds::(_getHaInfo) failed to retrieve Hosted > Engine > HA info > Traceback (most recent call last): > File "/usr/share/vdsm/API.py", line 1603, in _getHaInfo > stats = instance.get_all_stats() > File >
"/usr/lib/python2.6/site-packages/ovirt_hosted_engine_ha/client/client.py",
> line 83, in get_all_stats > with broker.connection(): > File "/usr/lib64/python2.6/contextlib.py", line 16, in __enter__ > return self.gen.next() > File >
"/usr/lib/python2.6/site-packages/ovirt_hosted_engine_ha/lib/brokerlink.py",
> line 96, in connection > self.connect() > File >
"/usr/lib/python2.6/site-packages/ovirt_hosted_engine_ha/lib/brokerlink.py",
> line 64, in connect > self._socket.connect(constants.BROKER_SOCKET_FILE) > File "<string>", line 1, in connect > error: [Errno 2] No such file or directory > Thread-78::DEBUG::2014-04-23 > 17:38:05,490::fileSD::225::Storage.Misc.excCmd::(getReadDelay) '/bin/dd > iflag=direct >
if=/rhev/data-center/mnt/host01.ovirt.lan:_home_DATA/5ae613a4-44e4-42cb-89fc-7b5d34c1f30f/dom_md/metadata
> bs=4096 count=1' (cwd None) > Thread-78::DEBUG::2014-04-23 > 17:38:05,523::fileSD::225::Storage.Misc.excCmd::(getReadDelay) SUCCESS: > <err> = '0+1 records in\n0+1 records out\n545 bytes (545 B) copied, > 0.000412209 s, 1.3 MB/s\n'; <rc> = 0 > > > > > 2014-04-23 17:27 GMT+02:00 Martin Sivak <msivak@redhat.com>: > > Hi Kevin, >> >> > same pb. >> >> Are you missing the lockspace file as well while running on top of >> GlusterFS? >> >> > ovirt-ha-broker have 400% cpu and is defunct. I can't kill with -9. >> >> Defunct process eating full four cores? I wonder how is that possible.. >> What are the status flags of that process when you do ps axwu? >> >> Can you attach the log files please? >> >> -- >> Martin Sivák >> msivak@redhat.com >> Red Hat Czech >> RHEV-M SLA / Brno, CZ >> >> ----- Original Message ----- >> > same pb. ovirt-ha-broker have 400% cpu and is defunct. I can't kill >> with -9. >> > >> > >> > 2014-04-23 13:55 GMT+02:00 Martin Sivak <msivak@redhat.com>: >> > >> > > Hi, >> > > >> > > > Isn't this file created when hosted engine is started? >> > > >> > > The file is created by the setup script. If it got lost then there >> was >> > > probably something bad happening in your NFS or Gluster storage. >> > > >> > > > Or how can I create this file manually? >> > > >> > > I can give you experimental treatment for this. We do not have any >> > > official way as this is something that should not ever happen :) >> > > >> > > !! But before you do that make sure you do not have any nodes running >> > > properly. This will destroy and reinitialize the lockspace database >> for the >> > > whole hosted-engine environment (which you apparently lack, but..). >> !! >> > > >> > > You have to create the ha_agent/hosted-engine.lockspace file with the >> > > expected size (1MB) and then tell sanlock to initialize it as a >> lockspace >> > > using: >> > > >> > > # python >> > > >>> import sanlock >> > > >>> sanlock.write_lockspace(lockspace="hosted-engine", >> > > ... path="/rhev/data-center/mnt/<nfs>/<hosted engine storage >> > > domain>/ha_agent/hosted-engine.lockspace", >> > > ... offset=0) >> > > >>> >> > > >> > > Then try starting the services (both broker and agent) again. >> > > >> > > -- >> > > Martin Sivák >> > > msivak@redhat.com >> > > Red Hat Czech >> > > RHEV-M SLA / Brno, CZ >> > > >> > > >> > > ----- Original Message ----- >> > > > On 04/23/2014 11:08 AM, Martin Sivak wrote: >> > > > > Hi René, >> > > > > >> > > > >>>> libvirtError: Failed to acquire lock: No space left on device >> > > > > >> > > > >>>> 2014-04-22 12:38:17+0200 654 [3093]: r2 cmd_acquire 2,9,5733 >> invalid >> > > > >>>> lockspace found -1 failed 0 name >> > > 2851af27-8744-445d-9fb1-a0d083c8dc82 >> > > > > >> > > > > Can you please check the contents of /rhev/data-center/<your nfs >> > > > > mount>/<nfs domain uuid>/ha_agent/? >> > > > > >> > > > > This is how it should look like: >> > > > > >> > > > > [root@dev-03 ~]# ls -al >> > > > > >> > > >>
/rhev/data-center/mnt/euryale\:_home_ovirt_he/e16de6a2-53f5-4ab3-95a3-255d08398824/ha_agent/
>> > > > >>>>>>>> host-id=1 >> > > > >>>>>>>> score=2400 >> > > > >>>>>>>> maintenance=False >> > > > >>>>>>>> state=EngineUp >> > > > >>>>>>>> >> > > > >>>>>>>> >> > > > >>>>>>>> --== Host 2 status ==-- >> > > > >>>>>>>> >> > > > >>>>>>>> Status up-to-date : True >> > > > >>>>>>>> Hostname : 10.0.200.101 >> > > > >>>>>>>> Host ID : 2 >> > > > >>>>>>>> Engine status : {'reason': 'vm not >> running >> > > on >> > > > >>>>>>>> this >> > > > >>>>>>>> host', 'health': 'bad', 'vm': 'down', 'detail': 'unknown'} >> > > > >>>>>>>> Score : 0 >> > > > >>>>>>>> Local maintenance : False >> > > > >>>>>>>> Host timestamp : 1397464031 >> > > > >>>>>>>> Extra metadata (valid at timestamp): >> > > > >>>>>>>> metadata_parse_version=1 >> > > > >>>>>>>> metadata_feature_version=1 >> > > > >>>>>>>> timestamp=1397464031 (Mon Apr 14 10:27:11
>> > > > > total 2036 >> > > > > drwxr-x---. 2 vdsm kvm 4096 Mar 19 18:46 . >> > > > > drwxr-xr-x. 6 vdsm kvm 4096 Mar 19 18:46 .. >> > > > > -rw-rw----. 1 vdsm kvm 1048576 Apr 23 11:05 >> hosted-engine.lockspace >> > > > > -rw-rw----. 1 vdsm kvm 1028096 Mar 19 18:46 >> hosted-engine.metadata >> > > > > >> > > > > The errors seem to indicate that you somehow lost the lockspace >> file. >> > > > >> > > > True :) >> > > > Isn't this file created when hosted engine is started? Or how can I >> > > > create this file manually? >> > > > >> > > > > >> > > > > -- >> > > > > Martin Sivák >> > > > > msivak@redhat.com >> > > > > Red Hat Czech >> > > > > RHEV-M SLA / Brno, CZ >> > > > > >> > > > > ----- Original Message ----- >> > > > >> On 04/23/2014 12:28 AM, Doron Fediuck wrote: >> > > > >>> Hi Rene, >> > > > >>> any idea what closed your ovirtmgmt bridge? >> > > > >>> as long as it is down vdsm may have issues starting up properly >> > > > >>> and this is why you see the complaints on the rpc server. >> > > > >>> >> > > > >>> Can you try manually fixing the network part first and then >> > > > >>> restart vdsm? >> > > > >>> Once vdsm is happy hosted engine VM will start. >> > > > >> >> > > > >> Thanks for your feedback, Doron. >> > > > >> >> > > > >> My ovirtmgmt bridge seems to be on or isn't it: >> > > > >> # brctl show ovirtmgmt >> > > > >> bridge name bridge id STP enabled >> interfaces >> > > > >> ovirtmgmt 8000.0025907587c2 no >> eth0.200 >> > > > >> >> > > > >> # ip a s ovirtmgmt >> > > > >> 7: ovirtmgmt: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc >> noqueue >> > > > >> state UNKNOWN >> > > > >> link/ether 00:25:90:75:87:c2 brd ff:ff:ff:ff:ff:ff >> > > > >> inet 10.0.200.102/24 brd 10.0.200.255 scope global >> ovirtmgmt >> > > > >> inet6 fe80::225:90ff:fe75:87c2/64 scope link >> > > > >> valid_lft forever preferred_lft forever >> > > > >> >> > > > >> # ip a s eth0.200 >> > > > >> 6: eth0.200@eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 >> qdisc >> > > > >> noqueue state UP >> > > > >> link/ether 00:25:90:75:87:c2 brd ff:ff:ff:ff:ff:ff >> > > > >> inet6 fe80::225:90ff:fe75:87c2/64 scope link >> > > > >> valid_lft forever preferred_lft forever >> > > > >> >> > > > >> I tried the following yesterday: >> > > > >> Copy virtual disk from GlusterFS storage to local disk of host >> and >> > > > >> create a new vm with virt-manager which loads ovirtmgmt disk. I >> could >> > > > >> reach my engine over the ovirtmgmt bridge (so bridge must be >> working). >> > > > >> >> > > > >> I also started libvirtd with Option -v and I saw the following >> in >> > > > >> libvirtd.log when trying to start ovirt engine: >> > > > >> 2014-04-22 14:18:25.432+0000: 8901: debug : >> virCommandRunAsync:2250 : >> > > > >> Command result 0, with PID 11491 >> > > > >> 2014-04-22 14:18:25.478+0000: 8901: debug : virCommandRun:2045 : >> > > Result >> > > > >> exit status 255, stdout: '' stderr: 'iptables v1.4.7: goto >> 'FO-vnet0' >> > > is >> > > > >> not a chain >> > > > >> >> > > > >> So it could be that something is broken in my hosted-engine >> network. >> > > Do >> > > > >> you have any clue how I can troubleshoot this? >> > > > >> >> > > > >> >> > > > >> Thanks, >> > > > >> René >> > > > >> >> > > > >> >> > > > >>> >> > > > >>> ----- Original Message ----- >> > > > >>>> From: "René Koch" <rkoch@linuxland.at> >> > > > >>>> To: "Martin Sivak" <msivak@redhat.com> >> > > > >>>> Cc: users@ovirt.org >> > > > >>>> Sent: Tuesday, April 22, 2014 1:46:38 PM >> > > > >>>> Subject: Re: [ovirt-users] hosted engine health check issues >> > > > >>>> >> > > > >>>> Hi, >> > > > >>>> >> > > > >>>> I rebooted one of my ovirt hosts today and the result is now >> that I >> > > > >>>> can't start hosted-engine anymore. >> > > > >>>> >> > > > >>>> ovirt-ha-agent isn't running because the lockspace file is >> missing >> > > > >>>> (sanlock complains about it). >> > > > >>>> So I tried to start hosted-engine with --vm-start and I get >> the >> > > > >>>> following errors: >> > > > >>>> >> > > > >>>> ==> /var/log/sanlock.log <== >> > > > >>>> 2014-04-22 12:38:17+0200 654 [3093]: r2 cmd_acquire 2,9,5733 >> invalid >> > > > >>>> lockspace found -1 failed 0 name >> > > 2851af27-8744-445d-9fb1-a0d083c8dc82 >> > > > >>>> >> > > > >>>> ==> /var/log/messages <== >> > > > >>>> Apr 22 12:38:17 ovirt-host02 sanlock[3079]: 2014-04-22 >> > > 12:38:17+0200 654 >> > > > >>>> [3093]: r2 cmd_acquire 2,9,5733 invalid lockspace found -1 >> failed 0 >> > > name >> > > > >>>> 2851af27-8744-445d-9fb1-a0d083c8dc82 >> > > > >>>> Apr 22 12:38:17 ovirt-host02 kernel: ovirtmgmt: port 2(vnet0) >> > > entering >> > > > >>>> disabled state >> > > > >>>> Apr 22 12:38:17 ovirt-host02 kernel: device vnet0 left >> promiscuous >> > > mode >> > > > >>>> Apr 22 12:38:17 ovirt-host02 kernel: ovirtmgmt: port 2(vnet0) >> > > entering >> > > > >>>> disabled state >> > > > >>>> >> > > > >>>> ==> /var/log/vdsm/vdsm.log <== >> > > > >>>> Thread-21::DEBUG::2014-04-22 >> > > > >>>> 12:38:17,563::libvirtconnection::124::root::(wrapper) Unknown >> > > > >>>> libvirterror: ecode: 38 edom: 42 level: 2 message: Failed to >> acquire >> > > > >>>> lock: No space left on device >> > > > >>>> Thread-21::DEBUG::2014-04-22 >> > > > >>>> 12:38:17,563::vm::2263::vm.Vm::(_startUnderlyingVm) >> > > > >>>> vmId=`f26dd37e-13b5-430c-b2f2-ecd098b82a91`::_ongoingCreations >> > > released >> > > > >>>> Thread-21::ERROR::2014-04-22 >> > > > >>>> 12:38:17,564::vm::2289::vm.Vm::(_startUnderlyingVm) >> > > > >>>> vmId=`f26dd37e-13b5-430c-b2f2-ecd098b82a91`::The vm start >> process >> > > failed >> > > > >>>> Traceback (most recent call last): >> > > > >>>> File "/usr/share/vdsm/vm.py", line 2249, in >> _startUnderlyingVm >> > > > >>>> self._run() >> > > > >>>> File "/usr/share/vdsm/vm.py", line 3170, in _run >> > > > >>>> self._connection.createXML(domxml, flags), >> > > > >>>> File >> > > > >>>> >> "/usr/lib64/python2.6/site-packages/vdsm/libvirtconnection.py", >> > > > >>>> line 92, in wrapper >> > > > >>>> ret = f(*args, **kwargs) >> > > > >>>> File "/usr/lib64/python2.6/site-packages/libvirt.py", >> line >> > > 2665, in >> > > > >>>> createXML >> > > > >>>> if ret is None:raise libvirtError('virDomainCreateXML() >> > > failed', >> > > > >>>> conn=self) >> > > > >>>> libvirtError: Failed to acquire lock: No space left on device >> > > > >>>> >> > > > >>>> ==> /var/log/messages <== >> > > > >>>> Apr 22 12:38:17 ovirt-host02 vdsm vm.Vm ERROR >> > > > >>>> vmId=`f26dd37e-13b5-430c-b2f2-ecd098b82a91`::The vm start >> process >> > > > >>>> failed#012Traceback (most recent call last):#012 File >> > > > >>>> "/usr/share/vdsm/vm.py", line 2249, in _startUnderlyingVm#012 >> > > > >>>> self._run()#012 File "/usr/share/vdsm/vm.py", line 3170, in >> > > _run#012 >> > > > >>>> self._connection.createXML(domxml, flags),#012 File >> > > > >>>> >> "/usr/lib64/python2.6/site-packages/vdsm/libvirtconnection.py", >> > > line 92, >> > > > >>>> in wrapper#012 ret = f(*args, **kwargs)#012 File >> > > > >>>> "/usr/lib64/python2.6/site-packages/libvirt.py", line 2665, in >> > > > >>>> createXML#012 if ret is None:raise >> > > libvirtError('virDomainCreateXML() >> > > > >>>> failed', conn=self)#012libvirtError: Failed to acquire lock: >> No >> > > space >> > > > >>>> left on device >> > > > >>>> >> > > > >>>> ==> /var/log/vdsm/vdsm.log <== >> > > > >>>> Thread-21::DEBUG::2014-04-22 >> > > > >>>> 12:38:17,569::vm::2731::vm.Vm::(setDownStatus) >> > > > >>>> vmId=`f26dd37e-13b5-430c-b2f2-ecd098b82a91`::Changed state to >> Down: >> > > > >>>> Failed to acquire lock: No space left on device >> > > > >>>> >> > > > >>>> >> > > > >>>> No space left on device is nonsense as there is enough space >> (I had >> > > this >> > > > >>>> issue last time as well where I had to patch machine.py, but >> this >> > > file >> > > > >>>> is now Python 2.6.6 compatible. >> > > > >>>> >> > > > >>>> Any idea what prevents hosted-engine from starting? >> > > > >>>> ovirt-ha-broker, vdsmd and sanlock are running btw. >> > > > >>>> >> > > > >>>> Btw, I can see in log that json rpc server module is missing >> - which >> > > > >>>> package is required for CentOS 6.5? >> > > > >>>> Apr 22 12:37:14 ovirt-host02 vdsm vds WARNING Unable to load >> the >> > > json >> > > > >>>> rpc server module. Please make sure it is installed. >> > > > >>>> >> > > > >>>> >> > > > >>>> Thanks, >> > > > >>>> René >> > > > >>>> >> > > > >>>> >> > > > >>>> >> > > > >>>> On 04/17/2014 10:02 AM, Martin Sivak wrote: >> > > > >>>>> Hi, >> > > > >>>>> >> > > > >>>>>>>> How can I disable notifications? >> > > > >>>>> >> > > > >>>>> The notification is configured in >> > > > >>>>> /etc/ovirt-hosted-engine-ha/broker.conf >> > > > >>>>> section notification. >> > > > >>>>> The email is sent when the key state_transition exists and >> the >> > > string >> > > > >>>>> OldState-NewState contains the (case insensitive) regexp >> from the >> > > > >>>>> value. >> > > > >>>>> >> > > > >>>>>>>> Is it intended to send out these messages and detect that >> ovirt >> > > > >>>>>>>> engine >> > > > >>>>>>>> is down (which is false anyway), but not to restart the >> vm? >> > > > >>>>> >> > > > >>>>> Forget about emails for now and check the >> > > > >>>>> /var/log/ovirt-hosted-engine-ha/agent.log and broker.log (and >> > > attach >> > > > >>>>> them >> > > > >>>>> as well btw). >> > > > >>>>> >> > > > >>>>>>>> oVirt hosts think that hosted engine is down because it >> seems >> > > that >> > > > >>>>>>>> hosts >> > > > >>>>>>>> can't write to hosted-engine.lockspace due to glusterfs >> issues >> > > (or >> > > > >>>>>>>> at >> > > > >>>>>>>> least I think so). >> > > > >>>>> >> > > > >>>>> The hosts think so or can't really write there? The >> lockspace is >> > > > >>>>> managed >> > > > >>>>> by >> > > > >>>>> sanlock and our HA daemons do not touch it at all. We only >> ask >> > > sanlock >> > > > >>>>> to >> > > > >>>>> get make sure we have unique server id. >> > > > >>>>> >> > > > >>>>>>>> Is is possible or planned to make the whole ha feature >> optional? >> > > > >>>>> >> > > > >>>>> Well the system won't perform any automatic actions if you >> put the >> > > > >>>>> hosted >> > > > >>>>> engine to global maintenance and only start/stop/migrate the >> VM >> > > > >>>>> manually. >> > > > >>>>> I would discourage you from stopping agent/broker, because >> the >> > > engine >> > > > >>>>> itself has some logic based on the reporting. >> > > > >>>>> >> > > > >>>>> Regards >> > > > >>>>> >> > > > >>>>> -- >> > > > >>>>> Martin Sivák >> > > > >>>>> msivak@redhat.com >> > > > >>>>> Red Hat Czech >> > > > >>>>> RHEV-M SLA / Brno, CZ >> > > > >>>>> >> > > > >>>>> ----- Original Message ----- >> > > > >>>>>> On 04/15/2014 04:53 PM, Jiri Moskovcak wrote: >> > > > >>>>>>> On 04/14/2014 10:50 AM, René Koch wrote: >> > > > >>>>>>>> Hi, >> > > > >>>>>>>> >> > > > >>>>>>>> I have some issues with hosted engine status. >> > > > >>>>>>>> >> > > > >>>>>>>> oVirt hosts think that hosted engine is down because it >> seems >> > > that >> > > > >>>>>>>> hosts >> > > > >>>>>>>> can't write to hosted-engine.lockspace due to glusterfs >> issues >> > > (or >> > > > >>>>>>>> at >> > > > >>>>>>>> least I think so). >> > > > >>>>>>>> >> > > > >>>>>>>> Here's the output of vm-status: >> > > > >>>>>>>> >> > > > >>>>>>>> # hosted-engine --vm-status >> > > > >>>>>>>> >> > > > >>>>>>>> >> > > > >>>>>>>> --== Host 1 status ==-- >> > > > >>>>>>>> >> > > > >>>>>>>> Status up-to-date : False >> > > > >>>>>>>> Hostname : 10.0.200.102 >> > > > >>>>>>>> Host ID : 1 >> > > > >>>>>>>> Engine status : unknown stale-data >> > > > >>>>>>>> Score : 2400 >> > > > >>>>>>>> Local maintenance : False >> > > > >>>>>>>> Host timestamp : 1397035677 >> > > > >>>>>>>> Extra metadata (valid at timestamp): >> > > > >>>>>>>> metadata_parse_version=1 >> > > > >>>>>>>> metadata_feature_version=1 >> > > > >>>>>>>> timestamp=1397035677 (Wed Apr 9 11:27:57
>> > > > >>>>>>>> host-id=2 >> > > > >>>>>>>> score=0 >> > > > >>>>>>>> maintenance=False >> > > > >>>>>>>> state=EngineUnexpectedlyDown >> > > > >>>>>>>> timeout=Mon Apr 14 10:35:05 2014 >> > > > >>>>>>>> >> > > > >>>>>>>> oVirt engine is sending me 2 emails every 10 minutes with >> the >> > > > >>>>>>>> following >> > > > >>>>>>>> subjects: >> > > > >>>>>>>> - ovirt-hosted-engine state transition >> EngineDown-EngineStart >> > > > >>>>>>>> - ovirt-hosted-engine state transition >> EngineStart-EngineUp >> > > > >>>>>>>> >> > > > >>>>>>>> In oVirt webadmin I can see the following message: >> > > > >>>>>>>> VM HostedEngine is down. Exit message: internal error >> Failed to >> > > > >>>>>>>> acquire >> > > > >>>>>>>> lock: error -243. >> > > > >>>>>>>> >> > > > >>>>>>>> These messages are really annoying as oVirt isn't doing >> anything >> > > > >>>>>>>> with >> > > > >>>>>>>> hosted engine - I have an uptime of 9 days in my engine >> vm. >> > > > >>>>>>>> >> > > > >>>>>>>> So my questions are now: >> > > > >>>>>>>> Is it intended to send out these messages and detect that >> ovirt >> > > > >>>>>>>> engine >> > > > >>>>>>>> is down (which is false anyway), but not to restart the >> vm? >> > > > >>>>>>>> >> > > > >>>>>>>> How can I disable notifications? I'm planning to write a >> Nagios >> > > > >>>>>>>> plugin >> > > > >>>>>>>> which parses the output of hosted-engine --vm-status and >> only >> > > Nagios >> > > > >>>>>>>> should notify me, not hosted-engine script. >> > > > >>>>>>>> >> > > > >>>>>>>> Is is possible or planned to make the whole ha feature >> > > optional? I >> > > > >>>>>>>> really really really hate cluster software as it causes >> more >> > > > >>>>>>>> troubles >> > > > >>>>>>>> then standalone machines and in my case the hosted-engine >> ha >> > > feature >> > > > >>>>>>>> really causes troubles (and I didn't had a hardware or >> network >> > > > >>>>>>>> outage >> > > > >>>>>>>> yet only issues with hosted-engine ha agent). I don't >> need any >> > > ha >> > > > >>>>>>>> feature for hosted engine. I just want to run engine >> > > virtualized on >> > > > >>>>>>>> oVirt and if engine vm fails (e.g. because of issues with >> a >> > > host) >> > > > >>>>>>>> I'll >> > > > >>>>>>>> restart it on another node. >> > > > >>>>>>> >> > > > >>>>>>> Hi, you can: >> > > > >>>>>>> 1. edit >> /etc/ovirt-hosted-engine-ha/{agent,broker}-log.conf and >> > > tweak >> > > > >>>>>>> the logger as you like >> > > > >>>>>>> 2. or kill ovirt-ha-broker & ovirt-ha-agent services >> > > > >>>>>> >> > > > >>>>>> Thanks for the information. >> > > > >>>>>> So engine is able to run when ovirt-ha-broker and >> ovirt-ha-agent >> > > isn't >> > > > >>>>>> running? >> > > > >>>>>> >> > > > >>>>>> >> > > > >>>>>> Regards, >> > > > >>>>>> René >> > > > >>>>>> >> > > > >>>>>>> >> > > > >>>>>>> --Jirka >> > > > >>>>>>>> >> > > > >>>>>>>> Thanks, >> > > > >>>>>>>> René >> > > > >>>>>>>> >> > > > >>>>>>>> >> > > > >>>>>>> >> > > > >>>>>> _______________________________________________ >> > > > >>>>>> Users mailing list >> > > > >>>>>> Users@ovirt.org >> > > > >>>>>> http://lists.ovirt.org/mailman/listinfo/users >> > > > >>>>>> >> > > > >>>> _______________________________________________ >> > > > >>>> Users mailing list >> > > > >>>> Users@ovirt.org >> > > > >>>> http://lists.ovirt.org/mailman/listinfo/users >> > > > >>>> >> > > > >> >> > > > >> > > _______________________________________________ >> > > Users mailing list >> > > Users@ovirt.org >> > > http://lists.ovirt.org/mailman/listinfo/users >> > > >> > >> _______________________________________________ >> Users mailing list >> Users@ovirt.org >> http://lists.ovirt.org/mailman/listinfo/users >> > >
_______________________________________________ Users mailing list Users@ovirt.org http://lists.ovirt.org/mailman/listinfo/users
_______________________________________________ Users mailing list Users@ovirt.org http://lists.ovirt.org/mailman/listinfo/users