Hi,
I’m trying to setup oVirt 3.6.3 with self-hosted engine on 4 servers (vmhost-03, vmhost-04, vmhost-05 for compute; stor-01 for storage). The storage server is GlusterFS 3.7.6, all the servers are in the same network and are also connected through InfiniBand DDR.
Network is OK, RDMA is working, IPoIB has been configured, it is possible to manually mount GlusterFS volumes on each vmhost. firewalld and SELinux are disabled. Ovirtmgmt network is on ethernet.
The problem is that, after installing the hosted engine, I can connect to oVirt admin panel but:
- Datacenter is marked as down
- The only host is NOT recognized as an SPM
- In the storage tab there is no storage domain for the hosted engine (I only see a detached ISO domain and oVirt repo)
- when I try to create a storage domain, an error shows up (it’s an “Uncaught exception”)
- when I try to import a storage domain, an error shows up (it’s about datacenter down and SPM not available)
- also, in Virtual Machines tab there are no VMs (neither the hosted engine, which is obviously up and reported as up by command “hosted-engine --vm-status”)
So basically it is not possible to do anything.
After setting the host in maintenance mode and rebooting, I cannot start the engine VM anymore:
[root@SRV-VMHOST-05 ~]# hosted-engine --vm-start
VM exists and is down, destroying it
Machine destroyed
429eec6e-2126-4740-9911-9c5ad482e09f
Status = WaitForLaunch
nicModel = rtl8139,pv
statusTime = 4300834920
emulatedMachine = pc
pid = 0
vmName = HostedEngine
devices = [{'index': '2', 'iface': 'ide', 'specParams': {}, 'readonly': 'true', 'deviceId': '1c2205da-17c6-4ffe-9408-602a998d90dc', 'address': {'bus': '1', 'controller': '0', 'type': 'drive', 'target': '0', 'unit': '0'}, 'device': 'cdrom', 'shared': 'false', 'path': '', 'type': 'disk'}, {'index': '0', 'iface': 'virtio', 'format': 'raw', 'bootOrder': '1', 'poolID': '00000000-0000-0000-0000-000000000000', 'volumeID': 'fe82ba21-942d-48cc-9bdb-f41c0f172dde', 'imageID': '131460bc-4599-4326-a026-e9e224e4bb5f', 'specParams': {}, 'readonly': 'false', 'domainID': '162fc2e5-1897-46fb-b382-195c11ab8546', 'optional': 'false', 'deviceId': '131460bc-4599-4326-a026-e9e224e4bb5f', 'address': {'slot': '0x06', 'bus': '0x00', 'domain': '0x0000', 'type': 'pci', 'function': '0x0'}, 'device': 'disk', 'shared': 'exclusive', 'propagateErrors': 'off', 'type': 'disk'}, {'device': 'scsi', 'model': 'virtio-scsi', 'type': 'controller'}, {'nicModel': 'pv', 'macAddr': '00:16:3e:30:a9:6e', 'linkActive': 'true', 'network': 'ovirtmgmt', 'filter': 'vdsm-no-mac-spoofing', 'specParams': {}, 'deviceId': '3d3259a3-19a8-42c3-a50c-6724b475c1ab', 'address': {'slot': '0x03', 'bus': '0x00', 'domain': '0x0000', 'type': 'pci', 'function': '0x0'}, 'device': 'bridge', 'type': 'interface'}, {'device': 'console', 'specParams': {}, 'type': 'console', 'deviceId': '885cca16-2b59-42e4-a57c-0a89a0e823e8', 'alias': 'console0'}]
guestDiskMapping = {}
vmType = kvm
clientIp =
displaySecurePort = -1
memSize = 8192
displayPort = -1
cpuType = Nehalem
spiceSecureChannels = smain,sdisplay,sinputs,scursor,splayback,srecord,ssmartcard,susbredir
smp = 4
displayIp = 0
display = vnc
but the status remains {"reason": "bad vm status", "health": "bad", "vm": "down", "detail": "down"}
We tried to use, for the engine volume, both rdma and tcp – nothing changed
In /var/log/ovirt-hosted-engine-ha/agent.log , these are the only error we found:
MainThread::WARNING::2016-02-08 18:17:23,160::ovf_store::105::ovirt_hosted_engine_ha.lib.ovf.ovf_store.OVFStore::(scan) Unable to find OVF_STORE
MainThread::ERROR::2016-02-08 18:17:23,161::config::234::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine.config::(refresh_local_conf_file) Unable to get vm.conf from OVF_STORE, falling back to initial vm.conf
In vdsm.og I see
Thread-16399::INFO::2016-02-09 14:54:39,478::xmlrpc::84::vds.XMLRPCServer::(_process_requests) Request handler for 127.0.0.1:39823 started
Thread-16399::DEBUG::2016-02-09 14:54:39,478::bindingxmlrpc::1257::vds::(wrapper) client [127.0.0.1]::call vmGetStats with ('429eec6e-2126-4740-9911-9c5ad482e09f',) {}
Thread-16399::DEBUG::2016-02-09 14:54:39,479::bindingxmlrpc::1264::vds::(wrapper) return vmGetStats with {'status': {'message': 'Done', 'code': 0}, 'statsList': [{'status': 'Down', 'exitMessage': 'Failed to acquire lock: No space left on device', 'statusTime': '4302636100', 'vmId': '429eec6e-2126-4740-9911-9c5ad482e09f', 'exitReason': 1, 'exitCode': 1}]}
When executing hosted-engine –vm-start, in vdsm.log appears this:
Thread-16977::ERROR::2016-02-09 14:59:12,146::vm::759::virt.vm::(_startUnderlyingVm) vmId=`429eec6e-2126-4740-9911-9c5ad482e09f`::The vm start process failed
Traceback (most recent call last):
File "/usr/share/vdsm/virt/vm.py", line 703, in _startUnderlyingVm
self._run()
File "/usr/share/vdsm/virt/vm.py", line 1941, in _run
self._connection.createXML(domxml, flags),
File "/usr/lib/python2.7/site-packages/vdsm/libvirtconnection.py", line 124, in wrapper
ret = f(*args, **kwargs)
File "/usr/lib/python2.7/site-packages/vdsm/utils.py", line 1313, in wrapper
return func(inst, *args, **kwargs)
File "/usr/lib64/python2.7/site-packages/libvirt.py", line 3611, in createXML
if ret is None:raise libvirtError('virDomainCreateXML() failed', conn=self)
libvirtError: Failed to acquire lock: No space left on device
But
[root@SRV-VMHOST-05 vdsm]# df -h
Filesystem Size Used Avail Use% Mounted on
/dev/mapper/centos_srv--vmhost--05-root 50G 2.8G 48G 6% /
devtmpfs 16G 0 16G 0% /dev
tmpfs 16G 0 16G 0% /dev/shm
tmpfs 16G 105M 16G 1% /run
tmpfs 16G 0 16G 0% /sys/fs/cgroup
/dev/mapper/centos_srv--vmhost--05-home 84G 33M 84G 1% /home
/dev/sda1 497M 178M 319M 36% /boot
srv-stor-01:/ovirtengine 3.7T 3.0G 3.7T 1% /rhev/data-center/mnt/glusterSD/srv-stor-01:_ovirtengine
tmpfs 3.2G 0 3.2G 0% /run/user/0
I also verified that Gluster storage was correctly mounted:
[root@SRV-VMHOST-05 ~]# mount | grep gluster
srv-stor-01:/ovirtengine on /rhev/data-center/mnt/glusterSD/srv-stor-01:_ovirtengine type fuse.glusterfs (rw,relatime,user_id=0,group_id=0,default_permissions,allow_other,max_read=131072)
(if I create a file in that folder, it appears on the gluster server).
On the engine VM in /var/log/ovirt-engine/engine.log I found the following:
2016-02-09 11:55:41,165 INFO [org.ovirt.engine.core.vdsbroker.vdsbroker.FullListVDSCommand] (DefaultQuartzScheduler_Worker-93) [] START, FullListVDSCommand(HostName = , FullListVDSCommandParameters:{runAsyn
c='true', hostId='13ce38e6-f4b6-42fa-bb8c-5ec84ad00ce0', vds='Host[,13ce38e6-f4b6-42fa-bb8c-5ec84ad00ce0]', vmIds='[429eec6e-2126-4740-9911-9c5ad482e09f]'}), log id: 61eda464
2016-02-09 11:55:42,169 INFO [org.ovirt.engine.core.vdsbroker.vdsbroker.FullListVDSCommand] (DefaultQuartzScheduler_Worker-93) [] FINISH, FullListVDSCommand, return: [{status=Up, nicModel=rtl8139,pv, emulat
edMachine=pc, guestDiskMapping={}, vmId=429eec6e-2126-4740-9911-9c5ad482e09f, pid=11133, devices=[Ljava.lang.Object;@2099d011, smp=4, vmType=kvm, displayIp=0, display=vnc, displaySecurePort=-1, memSize=8192,
displayPort=5900, cpuType=Nehalem, spiceSecureChannels=smain,sdisplay,sinputs,scursor,splayback,srecord,ssmartcard,susbredir, statusTime=4364469020, vmName=HostedEngine, clientIp=, pauseCode=NOERR}], log id
: 61eda464
2016-02-09 11:55:42,173 INFO [org.ovirt.engine.core.bll.storage.GetExistingStorageDomainListQuery] (org.ovirt.thread.pool-8-thread-35) [] START, GetExistingStorageDomainListQuery(GetExistingStorageDomainLis
tParameters:{refresh='true', filtered='false'}), log id: 5611a666
2016-02-09 11:55:42,173 INFO [org.ovirt.engine.core.vdsbroker.vdsbroker.HSMGetStorageDomainsListVDSCommand] (org.ovirt.thread.pool-8-thread-35) [] START, HSMGetStorageDomainsListVDSCommand(HostName = srv-vm
host-05, HSMGetStorageDomainsListVDSCommandParameters:{runAsync='true', hostId='13ce38e6-f4b6-42fa-bb8c-5ec84ad00ce0', storagePoolId='00000000-0000-0000-0000-000000000000', storageType='null', storageDomainT
ype='Data', path='null'}), log id: 63695be3
2016-02-09 11:55:43,298 INFO [org.ovirt.engine.core.vdsbroker.vdsbroker.HSMGetStorageDomainsListVDSCommand] (org.ovirt.thread.pool-8-thread-35) [] FINISH, HSMGetStorageDomainsListVDSCommand, return: [162fc2
e5-1897-46fb-b382-195c11ab8546], log id: 63695be3
2016-02-09 11:55:43,365 INFO [org.ovirt.engine.core.vdsbroker.vdsbroker.HSMGetStorageDomainInfoVDSCommand] (org.ovirt.thread.pool-8-thread-35) [] START, HSMGetStorageDomainInfoVDSCommand(HostName = srv-vmho
st-05, HSMGetStorageDomainInfoVDSCommandParameters:{runAsync='true', hostId='13ce38e6-f4b6-42fa-bb8c-5ec84ad00ce0', storageDomainId='162fc2e5-1897-46fb-b382-195c11ab8546'}), log id: 7e520f35
2016-02-09 11:55:44,377 INFO [org.ovirt.engine.core.vdsbroker.vdsbroker.HSMGetStorageDomainInfoVDSCommand] (org.ovirt.thread.pool-8-thread-35) [] FINISH, HSMGetStorageDomainInfoVDSCommand, return: <StorageD
omainStatic:{name='EngineStorage', id='162fc2e5-1897-46fb-b382-195c11ab8546'}, null>, log id: 7e520f35
2016-02-09 11:55:44,377 INFO [org.ovirt.engine.core.bll.storage.GetExistingStorageDomainListQuery] (org.ovirt.thread.pool-8-thread-35) [] FINISH, GetExistingStorageDomainListQuery, log id: 5611a666
2016-02-09 11:55:44,378 INFO [org.ovirt.engine.core.bll.ImportHostedEngineStorageDomainCommand] (org.ovirt.thread.pool-8-thread-35) [23427de7] Lock Acquired to object 'EngineLock:{exclusiveLocks='[]', share
dLocks='null'}'
2016-02-09 11:55:44,379 WARN [org.ovirt.engine.core.bll.ImportHostedEngineStorageDomainCommand] (org.ovirt.thread.pool-8-thread-35) [23427de7] CanDoAction of action 'ImportHostedEngineStorageDomain' failed
for user SYSTEM. Reasons: VAR__ACTION__ADD,VAR__TYPE__STORAGE__DOMAIN,ACTION_TYPE_FAILED_MASTER_STORAGE_DOMAIN_NOT_ACTIVE
2016-02-09 11:55:44,379 INFO [org.ovirt.engine.core.bll.ImportHostedEngineStorageDomainCommand] (org.ovirt.thread.pool-8-thread-35) [23427de7] Lock freed to object 'EngineLock:{exclusiveLocks='[]', sharedLo
cks='null'}'
2016-02-09 11:55:46,625 INFO [org.ovirt.engine.core.bll.UpdateVdsGroupCommand] (default task-26) [5118b768] Running command: UpdateVdsGroupCommand internal: false. Entities affected : ID: 00000002-0002-000
2-0002-0000000000d9 Type: VdsGroupsAction group EDIT_CLUSTER_CONFIGURATION with role type ADMIN
2016-02-09 11:55:46,765 INFO [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (default task-26) [5118b768] Correlation ID: 5118b768, Call Stack: null, Custom Event ID: -1, Message: Hos
t cluster Default was updated by admin@internal
2016-02-09 11:55:46,932 INFO [org.ovirt.engine.core.vdsbroker.gluster.GlusterServersListVDSCommand] (default task-6) [] START, GlusterServersListVDSCommand(HostName = srv-vmhost-05, VdsIdVDSCommandParameter
sBase:{runAsync='true', hostId='13ce38e6-f4b6-42fa-bb8c-5ec84ad00ce0'}), log id: 559ab127
2016-02-09 11:55:47,503 INFO [org.ovirt.engine.core.vdsbroker.gluster.GlusterServersListVDSCommand] (default task-13) [] START, GlusterServersListVDSCommand(HostName = srv-vmhost-05, VdsIdVDSCommandParamete
rsBase:{runAsync='true', hostId='13ce38e6-f4b6-42fa-bb8c-5ec84ad00ce0'}), log id: 62d703e5
2016-02-09 11:55:47,510 INFO [org.ovirt.engine.core.vdsbroker.gluster.GlusterServersListVDSCommand] (default task-6) [] FINISH, GlusterServersListVDSCommand, log id: 559ab127
2016-02-09 11:55:47,511 ERROR [org.ovirt.engine.core.bll.gluster.GetAddedGlusterServersQuery] (default task-6) [] Query 'GetAddedGlusterServersQuery' failed: null
2016-02-09 11:55:47,511 ERROR [org.ovirt.engine.core.bll.gluster.GetAddedGlusterServersQuery] (default task-6) [] Exception: java.lang.NullPointerException
at org.ovirt.engine.core.bll.gluster.GetAddedGlusterServersQuery.getAddedGlusterServers(GetAddedGlusterServersQuery.java:54) [bll.jar:]
at org.ovirt.engine.core.bll.gluster.GetAddedGlusterServersQuery.executeQueryCommand(GetAddedGlusterServersQuery.java:45) [bll.jar:]
at org.ovirt.engine.core.bll.QueriesCommandBase.executeCommand(QueriesCommandBase.java:82) [bll.jar:]
at org.ovirt.engine.core.dal.VdcCommandBase.execute(VdcCommandBase.java:33) [dal.jar:]
at org.ovirt.engine.core.bll.Backend.runQueryImpl(Backend.java:537) [bll.jar:]
at org.ovirt.engine.core.bll.Backend.runQuery(Backend.java:511) [bll.jar:]
at sun.reflect.GeneratedMethodAccessor98.invoke(Unknown Source) [:1.8.0_71]
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) [rt.jar:1.8.0_71]
at java.lang.reflect.Method.invoke(Method.java:497) [rt.jar:1.8.0_71]
at org.jboss.as.ee.component.ManagedReferenceMethodInterceptor.processInvocation(ManagedReferenceMethodInterceptor.java:52)
at org.jboss.invocation.InterceptorContext.proceed(InterceptorContext.java:309)
at org.jboss.invocation.WeavedInterceptor.processInvocation(WeavedInterceptor.java:53)
at org.jboss.as.ee.component.interceptors.UserInterceptorFactory$1.processInvocation(UserInterceptorFactory.java:63)
at org.jboss.invocation.InterceptorContext.proceed(InterceptorContext.java:309)
at org.jboss.invocation.InterceptorContext$Invocation.proceed(InterceptorContext.java:407)
at org.jboss.as.weld.ejb.Jsr299BindingsInterceptor.delegateInterception(Jsr299BindingsInterceptor.java:70) [wildfly-weld-8.2.1.Final.jar:8.2.1.Final]
at org.jboss.as.weld.ejb.Jsr299BindingsInterceptor.doMethodInterception(Jsr299BindingsInterceptor.java:80) [wildfly-weld-8.2.1.Final.jar:8.2.1.Final]
at org.jboss.as.weld.ejb.Jsr299BindingsInterceptor.processInvocation(Jsr299BindingsInterceptor.java:93) [wildfly-weld-8.2.1.Final.jar:8.2.1.Final]
at org.jboss.as.ee.component.interceptors.UserInterceptorFactory$1.processInvocation(UserInterceptorFactory.java:63)
at org.jboss.invocation.InterceptorContext.proceed(InterceptorContext.java:309)
at org.jboss.invocation.InterceptorContext$Invocation.proceed(InterceptorContext.java:407)
at org.ovirt.engine.core.bll.interceptors.CorrelationIdTrackerInterceptor.aroundInvoke(CorrelationIdTrackerInterceptor.java:13) [bll.jar:]
at sun.reflect.GeneratedMethodAccessor74.invoke(Unknown Source) [:1.8.0_71]
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) [rt.jar:1.8.0_71]
....
2016-02-09 11:55:47,985 INFO [org.ovirt.engine.core.vdsbroker.gluster.GlusterServersListVDSCommand] (default task-14) [] START, GlusterServersListVDSCommand(HostName = srv-vmhost-05, VdsIdVDSCommandParametersBase:{runAsync='true', hostId='13ce38e6-f4b6-42fa-bb8c-5ec84ad00ce0'}), log id: 61100c4d
2016-02-09 11:55:47,986 INFO [org.ovirt.engine.core.vdsbroker.gluster.GlusterServersListVDSCommand] (default task-13) [] FINISH, GlusterServersListVDSCommand, log id: 62d703e5
2016-02-09 11:55:47,986 ERROR [org.ovirt.engine.core.bll.gluster.GetAddedGlusterServersQuery] (default task-13) [] Query 'GetAddedGlusterServersQuery' failed: null
2016-02-09 11:55:47,987 ERROR [org.ovirt.engine.core.bll.gluster.GetAddedGlusterServersQuery] (default task-13) [] Exception: java.lang.NullPointerException
at org.ovirt.engine.core.bll.gluster.GetAddedGlusterServersQuery.getAddedGlusterServers(GetAddedGlusterServersQuery.java:54) [bll.jar:]
at org.ovirt.engine.core.bll.gluster.GetAddedGlusterServersQuery.executeQueryCommand(GetAddedGlusterServersQuery.java:45) [bll.jar:]
at org.ovirt.engine.core.bll.QueriesCommandBase.executeCommand(QueriesCommandBase.java:82) [bll.jar:]
at org.ovirt.engine.core.dal.VdcCommandBase.execute(VdcCommandBase.java:33) [dal.jar:]
at org.ovirt.engine.core.bll.Backend.runQueryImpl(Backend.java:537) [bll.jar:]
at org.ovirt.engine.core.bll.Backend.runQuery(Backend.java:511) [bll.jar:]
at sun.reflect.GeneratedMethodAccessor98.invoke(Unknown Source) [:1.8.0_71]
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) [rt.jar:1.8.0_71]
at java.lang.reflect.Method.invoke(Method.java:497) [rt.jar:1.8.0_71]
at org.jboss.as.ee.component.ManagedReferenceMethodInterceptor.processInvocation(ManagedReferenceMethodInterceptor.java:52)
at org.jboss.invocation.InterceptorContext.proceed(InterceptorContext.java:309)
at org.jboss.invocation.WeavedInterceptor.processInvocation(WeavedInterceptor.java:53)
....
Do you have any ideas about what I should do?
Thanks,
Giuseppe
--
Giuseppe BerelliniPTV SISTeMA
Phone +39 06 993 444 15
Mobile +39 349 3241969
Fax +39 06 993 348 72
Via Ruggero Bonghi, 11/B – 00184 Roma
_______________________________________________
Users mailing list
Users@ovirt.org
http://lists.ovirt.org/mailman/listinfo/users