[ovirt-users] Problem installing oVirt 3.6.3-rc with GlusterFS over InfiniBand

Giuseppe Berellini Giuseppe.Berellini at ptvgroup.com
Tue Feb 9 14:41:45 UTC 2016


Hi,

I'm trying to setup oVirt 3.6.3 with self-hosted engine on 4 servers (vmhost-03, vmhost-04, vmhost-05 for compute; stor-01 for storage). The storage server is GlusterFS 3.7.6, all the servers are in the same network and are also connected through InfiniBand DDR.

Network is OK, RDMA is working, IPoIB has been configured, it is possible to manually mount GlusterFS volumes on each vmhost. firewalld and SELinux are disabled. Ovirtmgmt network is on ethernet.

The problem is that, after installing the hosted engine, I can connect to oVirt admin panel but:
- Datacenter is marked as down
- The only host is NOT recognized as an SPM
- In the storage tab there is no storage domain for the hosted engine (I only see a detached ISO domain and oVirt repo)
- when I try to create a storage domain, an error shows up (it's an "Uncaught exception")
- when I try to import a storage domain, an error shows up (it's about datacenter down and SPM not available)
- also, in Virtual Machines tab there are no VMs (neither the hosted engine, which is obviously up and reported as up by command "hosted-engine --vm-status")

So basically it is not possible to do anything.
After setting the host in maintenance mode and rebooting, I cannot start the engine VM anymore:

[root at SRV-VMHOST-05 ~]# hosted-engine --vm-start
VM exists and is down, destroying it
Machine destroyed

429eec6e-2126-4740-9911-9c5ad482e09f
        Status = WaitForLaunch
        nicModel = rtl8139,pv
        statusTime = 4300834920
        emulatedMachine = pc
        pid = 0
        vmName = HostedEngine
        devices = [{'index': '2', 'iface': 'ide', 'specParams': {}, 'readonly': 'true', 'deviceId': '1c2205da-17c6-4ffe-9408-602a998d90dc', 'address': {'bus': '1', 'controller': '0', 'type': 'drive', 'target': '0', 'unit': '0'}, 'device': 'cdrom', 'shared': 'false', 'path': '', 'type': 'disk'}, {'index': '0', 'iface': 'virtio', 'format': 'raw', 'bootOrder': '1', 'poolID': '00000000-0000-0000-0000-000000000000', 'volumeID': 'fe82ba21-942d-48cc-9bdb-f41c0f172dde', 'imageID': '131460bc-4599-4326-a026-e9e224e4bb5f', 'specParams': {}, 'readonly': 'false', 'domainID': '162fc2e5-1897-46fb-b382-195c11ab8546', 'optional': 'false', 'deviceId': '131460bc-4599-4326-a026-e9e224e4bb5f', 'address': {'slot': '0x06', 'bus': '0x00', 'domain': '0x0000', 'type': 'pci', 'function': '0x0'}, 'device': 'disk', 'shared': 'exclusive', 'propagateErrors': 'off', 'type': 'disk'}, {'device': 'scsi', 'model': 'virtio-scsi', 'type': 'controller'}, {'nicModel': 'pv', 'macAddr': '00:16:3e:30:a9:6e', 'linkActive': 'true', 'network': 'ovirtmgmt', 'filter': 'vdsm-no-mac-spoofing', 'specParams': {}, 'deviceId': '3d3259a3-19a8-42c3-a50c-6724b475c1ab', 'address': {'slot': '0x03', 'bus': '0x00', 'domain': '0x0000', 'type': 'pci', 'function': '0x0'}, 'device': 'bridge', 'type': 'interface'}, {'device': 'console', 'specParams': {}, 'type': 'console', 'deviceId': '885cca16-2b59-42e4-a57c-0a89a0e823e8', 'alias': 'console0'}]
        guestDiskMapping = {}
        vmType = kvm
        clientIp =
        displaySecurePort = -1
        memSize = 8192
        displayPort = -1
        cpuType = Nehalem
        spiceSecureChannels = smain,sdisplay,sinputs,scursor,splayback,srecord,ssmartcard,susbredir
        smp = 4
        displayIp = 0
        display = vnc
but the status remains {"reason": "bad vm status", "health": "bad", "vm": "down", "detail": "down"}
We tried to use, for the engine volume, both rdma and tcp - nothing changed

In /var/log/ovirt-hosted-engine-ha/agent.log , these are the only error we found:

MainThread::WARNING::2016-02-08 18:17:23,160::ovf_store::105::ovirt_hosted_engine_ha.lib.ovf.ovf_store.OVFStore::(scan) Unable to find OVF_STORE
MainThread::ERROR::2016-02-08 18:17:23,161::config::234::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine.config::(refresh_local_conf_file) Unable to get vm.conf from OVF_STORE, falling back to initial vm.conf

In vdsm.og I see
Thread-16399::INFO::2016-02-09 14:54:39,478::xmlrpc::84::vds.XMLRPCServer::(_process_requests) Request handler for 127.0.0.1:39823 started
Thread-16399::DEBUG::2016-02-09 14:54:39,478::bindingxmlrpc::1257::vds::(wrapper) client [127.0.0.1]::call vmGetStats with ('429eec6e-2126-4740-9911-9c5ad482e09f',) {}
Thread-16399::DEBUG::2016-02-09 14:54:39,479::bindingxmlrpc::1264::vds::(wrapper) return vmGetStats with {'status': {'message': 'Done', 'code': 0}, 'statsList': [{'status': 'Down', 'exitMessage': 'Failed to acquire lock: No space left on device', 'statusTime': '4302636100', 'vmId': '429eec6e-2126-4740-9911-9c5ad482e09f', 'exitReason': 1, 'exitCode': 1}]}

When executing hosted-engine -vm-start, in vdsm.log appears this:
Thread-16977::ERROR::2016-02-09 14:59:12,146::vm::759::virt.vm::(_startUnderlyingVm) vmId=`429eec6e-2126-4740-9911-9c5ad482e09f`::The vm start process failed
Traceback (most recent call last):
  File "/usr/share/vdsm/virt/vm.py", line 703, in _startUnderlyingVm
    self._run()
  File "/usr/share/vdsm/virt/vm.py", line 1941, in _run
    self._connection.createXML(domxml, flags),
  File "/usr/lib/python2.7/site-packages/vdsm/libvirtconnection.py", line 124, in wrapper
    ret = f(*args, **kwargs)
  File "/usr/lib/python2.7/site-packages/vdsm/utils.py", line 1313, in wrapper
    return func(inst, *args, **kwargs)
  File "/usr/lib64/python2.7/site-packages/libvirt.py", line 3611, in createXML
    if ret is None:raise libvirtError('virDomainCreateXML() failed', conn=self)
libvirtError: Failed to acquire lock: No space left on device

But
[root at SRV-VMHOST-05 vdsm]# df -h
Filesystem                               Size  Used Avail Use% Mounted on
/dev/mapper/centos_srv--vmhost--05-root   50G  2.8G   48G   6% /
devtmpfs                                  16G     0   16G   0% /dev
tmpfs                                     16G     0   16G   0% /dev/shm
tmpfs                                     16G  105M   16G   1% /run
tmpfs                                     16G     0   16G   0% /sys/fs/cgroup
/dev/mapper/centos_srv--vmhost--05-home   84G   33M   84G   1% /home
/dev/sda1                                497M  178M  319M  36% /boot
srv-stor-01:/ovirtengine                 3.7T  3.0G  3.7T   1% /rhev/data-center/mnt/glusterSD/srv-stor-01:_ovirtengine
tmpfs                                    3.2G     0  3.2G   0% /run/user/0


I also verified that Gluster storage was correctly mounted:
[root at SRV-VMHOST-05 ~]# mount | grep gluster
srv-stor-01:/ovirtengine on /rhev/data-center/mnt/glusterSD/srv-stor-01:_ovirtengine type fuse.glusterfs (rw,relatime,user_id=0,group_id=0,default_permissions,allow_other,max_read=131072)

(if I create a file in that folder, it appears on the gluster server).



On the engine VM in /var/log/ovirt-engine/engine.log I found the following:
2016-02-09 11:55:41,165 INFO  [org.ovirt.engine.core.vdsbroker.vdsbroker.FullListVDSCommand] (DefaultQuartzScheduler_Worker-93) [] START, FullListVDSCommand(HostName = , FullListVDSCommandParameters:{runAsyn
c='true', hostId='13ce38e6-f4b6-42fa-bb8c-5ec84ad00ce0', vds='Host[,13ce38e6-f4b6-42fa-bb8c-5ec84ad00ce0]', vmIds='[429eec6e-2126-4740-9911-9c5ad482e09f]'}), log id: 61eda464
2016-02-09 11:55:42,169 INFO  [org.ovirt.engine.core.vdsbroker.vdsbroker.FullListVDSCommand] (DefaultQuartzScheduler_Worker-93) [] FINISH, FullListVDSCommand, return: [{status=Up, nicModel=rtl8139,pv, emulat
edMachine=pc, guestDiskMapping={}, vmId=429eec6e-2126-4740-9911-9c5ad482e09f, pid=11133, devices=[Ljava.lang.Object;@2099d011, smp=4, vmType=kvm, displayIp=0, display=vnc, displaySecurePort=-1, memSize=8192,
displayPort=5900, cpuType=Nehalem, spiceSecureChannels=smain,sdisplay,sinputs,scursor,splayback,srecord,ssmartcard,susbredir, statusTime=4364469020, vmName=HostedEngine, clientIp=, pauseCode=NOERR}], log id
: 61eda464
2016-02-09 11:55:42,173 INFO  [org.ovirt.engine.core.bll.storage.GetExistingStorageDomainListQuery] (org.ovirt.thread.pool-8-thread-35) [] START, GetExistingStorageDomainListQuery(GetExistingStorageDomainLis
tParameters:{refresh='true', filtered='false'}), log id: 5611a666
2016-02-09 11:55:42,173 INFO  [org.ovirt.engine.core.vdsbroker.vdsbroker.HSMGetStorageDomainsListVDSCommand] (org.ovirt.thread.pool-8-thread-35) [] START, HSMGetStorageDomainsListVDSCommand(HostName = srv-vm
host-05, HSMGetStorageDomainsListVDSCommandParameters:{runAsync='true', hostId='13ce38e6-f4b6-42fa-bb8c-5ec84ad00ce0', storagePoolId='00000000-0000-0000-0000-000000000000', storageType='null', storageDomainT
ype='Data', path='null'}), log id: 63695be3
2016-02-09 11:55:43,298 INFO  [org.ovirt.engine.core.vdsbroker.vdsbroker.HSMGetStorageDomainsListVDSCommand] (org.ovirt.thread.pool-8-thread-35) [] FINISH, HSMGetStorageDomainsListVDSCommand, return: [162fc2
e5-1897-46fb-b382-195c11ab8546], log id: 63695be3
2016-02-09 11:55:43,365 INFO  [org.ovirt.engine.core.vdsbroker.vdsbroker.HSMGetStorageDomainInfoVDSCommand] (org.ovirt.thread.pool-8-thread-35) [] START, HSMGetStorageDomainInfoVDSCommand(HostName = srv-vmho
st-05, HSMGetStorageDomainInfoVDSCommandParameters:{runAsync='true', hostId='13ce38e6-f4b6-42fa-bb8c-5ec84ad00ce0', storageDomainId='162fc2e5-1897-46fb-b382-195c11ab8546'}), log id: 7e520f35
2016-02-09 11:55:44,377 INFO  [org.ovirt.engine.core.vdsbroker.vdsbroker.HSMGetStorageDomainInfoVDSCommand] (org.ovirt.thread.pool-8-thread-35) [] FINISH, HSMGetStorageDomainInfoVDSCommand, return: <StorageD
omainStatic:{name='EngineStorage', id='162fc2e5-1897-46fb-b382-195c11ab8546'}, null>, log id: 7e520f35
2016-02-09 11:55:44,377 INFO  [org.ovirt.engine.core.bll.storage.GetExistingStorageDomainListQuery] (org.ovirt.thread.pool-8-thread-35) [] FINISH, GetExistingStorageDomainListQuery, log id: 5611a666
2016-02-09 11:55:44,378 INFO  [org.ovirt.engine.core.bll.ImportHostedEngineStorageDomainCommand] (org.ovirt.thread.pool-8-thread-35) [23427de7] Lock Acquired to object 'EngineLock:{exclusiveLocks='[]', share
dLocks='null'}'
2016-02-09 11:55:44,379 WARN  [org.ovirt.engine.core.bll.ImportHostedEngineStorageDomainCommand] (org.ovirt.thread.pool-8-thread-35) [23427de7] CanDoAction of action 'ImportHostedEngineStorageDomain' failed
for user SYSTEM. Reasons: VAR__ACTION__ADD,VAR__TYPE__STORAGE__DOMAIN,ACTION_TYPE_FAILED_MASTER_STORAGE_DOMAIN_NOT_ACTIVE
2016-02-09 11:55:44,379 INFO  [org.ovirt.engine.core.bll.ImportHostedEngineStorageDomainCommand] (org.ovirt.thread.pool-8-thread-35) [23427de7] Lock freed to object 'EngineLock:{exclusiveLocks='[]', sharedLo
cks='null'}'
2016-02-09 11:55:46,625 INFO  [org.ovirt.engine.core.bll.UpdateVdsGroupCommand] (default task-26) [5118b768] Running command: UpdateVdsGroupCommand internal: false. Entities affected :  ID: 00000002-0002-000
2-0002-0000000000d9 Type: VdsGroupsAction group EDIT_CLUSTER_CONFIGURATION with role type ADMIN
2016-02-09 11:55:46,765 INFO  [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (default task-26) [5118b768] Correlation ID: 5118b768, Call Stack: null, Custom Event ID: -1, Message: Hos
t cluster Default was updated by admin at internal
2016-02-09 11:55:46,932 INFO  [org.ovirt.engine.core.vdsbroker.gluster.GlusterServersListVDSCommand] (default task-6) [] START, GlusterServersListVDSCommand(HostName = srv-vmhost-05, VdsIdVDSCommandParameter
sBase:{runAsync='true', hostId='13ce38e6-f4b6-42fa-bb8c-5ec84ad00ce0'}), log id: 559ab127
2016-02-09 11:55:47,503 INFO  [org.ovirt.engine.core.vdsbroker.gluster.GlusterServersListVDSCommand] (default task-13) [] START, GlusterServersListVDSCommand(HostName = srv-vmhost-05, VdsIdVDSCommandParamete
rsBase:{runAsync='true', hostId='13ce38e6-f4b6-42fa-bb8c-5ec84ad00ce0'}), log id: 62d703e5
2016-02-09 11:55:47,510 INFO  [org.ovirt.engine.core.vdsbroker.gluster.GlusterServersListVDSCommand] (default task-6) [] FINISH, GlusterServersListVDSCommand, log id: 559ab127
2016-02-09 11:55:47,511 ERROR [org.ovirt.engine.core.bll.gluster.GetAddedGlusterServersQuery] (default task-6) [] Query 'GetAddedGlusterServersQuery' failed: null
2016-02-09 11:55:47,511 ERROR [org.ovirt.engine.core.bll.gluster.GetAddedGlusterServersQuery] (default task-6) [] Exception: java.lang.NullPointerException
        at org.ovirt.engine.core.bll.gluster.GetAddedGlusterServersQuery.getAddedGlusterServers(GetAddedGlusterServersQuery.java:54) [bll.jar:]
        at org.ovirt.engine.core.bll.gluster.GetAddedGlusterServersQuery.executeQueryCommand(GetAddedGlusterServersQuery.java:45) [bll.jar:]
        at org.ovirt.engine.core.bll.QueriesCommandBase.executeCommand(QueriesCommandBase.java:82) [bll.jar:]
        at org.ovirt.engine.core.dal.VdcCommandBase.execute(VdcCommandBase.java:33) [dal.jar:]
        at org.ovirt.engine.core.bll.Backend.runQueryImpl(Backend.java:537) [bll.jar:]
        at org.ovirt.engine.core.bll.Backend.runQuery(Backend.java:511) [bll.jar:]
        at sun.reflect.GeneratedMethodAccessor98.invoke(Unknown Source) [:1.8.0_71]
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) [rt.jar:1.8.0_71]
        at java.lang.reflect.Method.invoke(Method.java:497) [rt.jar:1.8.0_71]
        at org.jboss.as.ee.component.ManagedReferenceMethodInterceptor.processInvocation(ManagedReferenceMethodInterceptor.java:52)
        at org.jboss.invocation.InterceptorContext.proceed(InterceptorContext.java:309)
        at org.jboss.invocation.WeavedInterceptor.processInvocation(WeavedInterceptor.java:53)
        at org.jboss.as.ee.component.interceptors.UserInterceptorFactory$1.processInvocation(UserInterceptorFactory.java:63)
        at org.jboss.invocation.InterceptorContext.proceed(InterceptorContext.java:309)
        at org.jboss.invocation.InterceptorContext$Invocation.proceed(InterceptorContext.java:407)
        at org.jboss.as.weld.ejb.Jsr299BindingsInterceptor.delegateInterception(Jsr299BindingsInterceptor.java:70) [wildfly-weld-8.2.1.Final.jar:8.2.1.Final]
        at org.jboss.as.weld.ejb.Jsr299BindingsInterceptor.doMethodInterception(Jsr299BindingsInterceptor.java:80) [wildfly-weld-8.2.1.Final.jar:8.2.1.Final]
        at org.jboss.as.weld.ejb.Jsr299BindingsInterceptor.processInvocation(Jsr299BindingsInterceptor.java:93) [wildfly-weld-8.2.1.Final.jar:8.2.1.Final]
        at org.jboss.as.ee.component.interceptors.UserInterceptorFactory$1.processInvocation(UserInterceptorFactory.java:63)
        at org.jboss.invocation.InterceptorContext.proceed(InterceptorContext.java:309)
        at org.jboss.invocation.InterceptorContext$Invocation.proceed(InterceptorContext.java:407)
        at org.ovirt.engine.core.bll.interceptors.CorrelationIdTrackerInterceptor.aroundInvoke(CorrelationIdTrackerInterceptor.java:13) [bll.jar:]
        at sun.reflect.GeneratedMethodAccessor74.invoke(Unknown Source) [:1.8.0_71]
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) [rt.jar:1.8.0_71]
                               ....
2016-02-09 11:55:47,985 INFO  [org.ovirt.engine.core.vdsbroker.gluster.GlusterServersListVDSCommand] (default task-14) [] START, GlusterServersListVDSCommand(HostName = srv-vmhost-05, VdsIdVDSCommandParametersBase:{runAsync='true', hostId='13ce38e6-f4b6-42fa-bb8c-5ec84ad00ce0'}), log id: 61100c4d
2016-02-09 11:55:47,986 INFO  [org.ovirt.engine.core.vdsbroker.gluster.GlusterServersListVDSCommand] (default task-13) [] FINISH, GlusterServersListVDSCommand, log id: 62d703e5
2016-02-09 11:55:47,986 ERROR [org.ovirt.engine.core.bll.gluster.GetAddedGlusterServersQuery] (default task-13) [] Query 'GetAddedGlusterServersQuery' failed: null
2016-02-09 11:55:47,987 ERROR [org.ovirt.engine.core.bll.gluster.GetAddedGlusterServersQuery] (default task-13) [] Exception: java.lang.NullPointerException
        at org.ovirt.engine.core.bll.gluster.GetAddedGlusterServersQuery.getAddedGlusterServers(GetAddedGlusterServersQuery.java:54) [bll.jar:]
        at org.ovirt.engine.core.bll.gluster.GetAddedGlusterServersQuery.executeQueryCommand(GetAddedGlusterServersQuery.java:45) [bll.jar:]
        at org.ovirt.engine.core.bll.QueriesCommandBase.executeCommand(QueriesCommandBase.java:82) [bll.jar:]
        at org.ovirt.engine.core.dal.VdcCommandBase.execute(VdcCommandBase.java:33) [dal.jar:]
        at org.ovirt.engine.core.bll.Backend.runQueryImpl(Backend.java:537) [bll.jar:]
        at org.ovirt.engine.core.bll.Backend.runQuery(Backend.java:511) [bll.jar:]
        at sun.reflect.GeneratedMethodAccessor98.invoke(Unknown Source) [:1.8.0_71]
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) [rt.jar:1.8.0_71]
        at java.lang.reflect.Method.invoke(Method.java:497) [rt.jar:1.8.0_71]
        at org.jboss.as.ee.component.ManagedReferenceMethodInterceptor.processInvocation(ManagedReferenceMethodInterceptor.java:52)
        at org.jboss.invocation.InterceptorContext.proceed(InterceptorContext.java:309)
        at org.jboss.invocation.WeavedInterceptor.processInvocation(WeavedInterceptor.java:53)
                               ....



Do you have any ideas about what I should do?

Thanks,
        Giuseppe


--
Giuseppe Berellini
PTV SISTeMA
Phone +39 06 993 444 15
Mobile +39 349 3241969
Fax +39 06 993 348 72
Via Ruggero Bonghi, 11/B - 00184 Roma
giuseppe.berellini at ptvgroup.com
www.sistemaits.com<http://www.sistemaits.com/>
facebook.com/sistemaits<https://www.facebook.com/sistemaits>
linkedin.com/SISTeMA<https://www.linkedin.com/company/sistema-soluzioni-per-l-ingegneria-dei-sistemi-di-trasporto-e-l-infomobilit-s-r-l->

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.ovirt.org/pipermail/users/attachments/20160209/01d9e1e6/attachment-0001.html>


More information about the Users mailing list