August 2019 - Users - oVirt List Archives

deprecating export domain?
by Charles Kozler 30 Aug '20

30 Aug '20

Hello, I recently read on this list from a redhat member that export domain is either being deprecated or looking at being deprecated To that end, can you share details? Can you share any notes/postings/bz's that document this? I would imagine something like this would be discussed in larger audience This seems like a somewhat significant change to make and I am curious where this is scheduled? Currently, a lot of my backups rely explicitly on an export domain for online snapshots, so I'd like to plan accordingly Thanks!

11 21

Re: Single instance scaleup.
by Strahil 05 Jun '20

05 Jun '20

Hi Leo, As you do not have a distributed volume , you can easily switch to replica 2 arbiter 1 or replica 3 volumes. You can use the following for adding the bricks: https://access.redhat.com/documentation/en-US/Red_Hat_Storage/2.1/html/Admi… Best Regards, Strahil NikolivOn May 26, 2019 10:54, Leo David <leoalex(a)gmail.com> wrote: > > Hi Stahil, > Thank you so much for yout input ! > > gluster volume info > > > Volume Name: engine > Type: Distribute > Volume ID: d7449fc2-cc35-4f80-a776-68e4a3dbd7e1 > Status: Started > Snapshot Count: 0 > Number of Bricks: 1 > Transport-type: tcp > Bricks: > Brick1: 192.168.80.191:/gluster_bricks/engine/engine > Options Reconfigured: > nfs.disable: on > transport.address-family: inet > storage.owner-uid: 36 > storage.owner-gid: 36 > features.shard: on > performance.low-prio-threads: 32 > performance.strict-o-direct: off > network.remote-dio: off > network.ping-timeout: 30 > user.cifs: off > performance.quick-read: off > performance.read-ahead: off > performance.io-cache: off > cluster.eager-lock: enable > Volume Name: ssd-samsung > Type: Distribute > Volume ID: 76576cc6-220b-4651-952d-99846178a19e > Status: Started > Snapshot Count: 0 > Number of Bricks: 1 > Transport-type: tcp > Bricks: > Brick1: 192.168.80.191:/gluster_bricks/sdc/data > Options Reconfigured: > cluster.eager-lock: enable > performance.io-cache: off > performance.read-ahead: off > performance.quick-read: off > user.cifs: off > network.ping-timeout: 30 > network.remote-dio: off > performance.strict-o-direct: on > performance.low-prio-threads: 32 > features.shard: on > storage.owner-gid: 36 > storage.owner-uid: 36 > transport.address-family: inet > nfs.disable: on > > The other two hosts will be 192.168.80.192/193 - this is gluster dedicated network over 10GB sfp+ switch. > - host 2 wil have identical harware configuration with host 1 ( each disk is actually a raid0 array ) > - host 3 has: > - 1 ssd for OS > - 1 ssd - for adding to engine volume in a full replica 3 > - 2 ssd's in a raid 1 array to be added as arbiter for the data volume ( ssd-samsung ) > So the plan is to have "engine" scaled in a full replica 3, and "ssd-samsung" scalled in a replica 3 arbitrated. > > > > > On Sun, May 26, 2019 at 10:34 AM Strahil <hunter86_bg(a)yahoo.com> wrote: >> >> Hi Leo, >> >> Gluster is quite smart, but in order to provide any hints , can you provide output of 'gluster volume info <glustervol>'. >> If you have 2 more systems , keep in mind that it is best to mirror the storage on the second replica (2 disks on 1 machine -> 2 disks on the new machine), while for the arbiter this is not neccessary. >> >> What is your network and NICs ? Based on my experience , I can recommend at least 10 gbit/s interfase(s). >> >> Best Regards, >> Strahil Nikolov >> >> On May 26, 2019 07:52, Leo David <leoalex(a)gmail.com> wrote: >>> >>> Hello Everyone, >>> Can someone help me to clarify this ? >>> I have a single-node 4.2.8 installation ( only two gluster storage domains - distributed single drive volumes ). Now I just got two identintical servers and I would like to go for a 3 nodes bundle. >>> Is it possible ( after joining the new nodes to the cluster ) to expand the existing volumes across the new nodes and change them to replica 3 arbitrated ? >>> If so, could you share with me what would it be the procedure ? >>> Thank you very much ! >>> >>> Leo > > > > -- > Best regards, Leo David

4 5

Failed to add storage domain
by thunderlight1＠gmail.com 31 May '20

31 May '20

Hi! I have installed oVirt using the iso ovirt-node-ng-installer-4.3.2-2019031908.el7. I the did run the Host-engine deployment through Cockpit. I got an error when it tries to create the domain storage. It sucessfully mounted the NFS-share on the host. Bellow is the error I got: 2019-04-14 10:40:38,967+0200 INFO ansible skipped {'status': 'SKIPPED', 'ansible_task': u'Check storage domain free space', 'ansible_host': u'localhost', 'ansible_playbook': u'/usr/share/ovirt-hosted-engine-setup/ansible/trigger_role.yml', 'ansible_type': 'task'} 2019-04-14 10:40:38,967+0200 DEBUG ansible on_any args <ansible.executor.task_result.TaskResult object at 0x7fb6918ad9d0> kwargs 2019-04-14 10:40:39,516+0200 INFO ansible task start {'status': 'OK', 'ansible_task': u'ovirt.hosted_engine_setup : Activate storage domain', 'ansible_playbook': u'/usr/share/ovirt-hosted-engine-setup/ansible/trigger_role.yml', 'ansible_type': 'task'} 2019-04-14 10:40:39,516+0200 DEBUG ansible on_any args TASK: ovirt.hosted_engine_setup : Activate storage domain kwargs is_conditional:False 2019-04-14 10:40:41,923+0200 DEBUG var changed: host "localhost" var "otopi_storage_domain_details" type "<type 'dict'>" value: "{ "changed": false, "exception": "Traceback (most recent call last):\n File \"/tmp/ansible_ovirt_storage_domain_payload_xSFxOp/__main__.py\", line 664, in main\n storage_domains_module.post_create_check(sd_id)\n File \"/tmp/ansible_ovirt_storage_domain_payload_xSFxOp/__main__.py\", line 526, in post_create_check\n id=storage_domain.id,\n File \"/usr/lib64/python2.7/site-packages/ovirtsdk4/services.py\", line 3053, in add\n return self._internal_add(storage_domain, headers, query, wait)\n File \"/usr/lib64/python2.7/site-packages/ovirtsdk4/service.py\", line 232, in _internal_add\n return future.wait() if wait else future\n File \"/usr/lib64/python2.7/site-packages/ovirtsdk4/service.py\", line 55, in wait\n return self._code(response)\n File \"/usr/lib64/python2.7/site-packages/ovirtsdk4/service.py\", line 229, in callback\n self._check_fault(response)\n File \"/usr/lib64/python2.7/site-packages/ovirtsdk4/service.py\", line 132, in _check_fault\n self._raise_error(response , body)\n File \"/usr/lib64/python2.7/site-packages/ovirtsdk4/service.py\", line 118, in _raise_error\n raise error\nError: Fault reason is \"Operation Failed\". Fault detail is \"[]\". HTTP response code is 400.\n", "failed": true, "msg": "Fault reason is \"Operation Failed\". Fault detail is \"[]\". HTTP response code is 400." }" 2019-04-14 10:40:41,924+0200 DEBUG var changed: host "localhost" var "ansible_play_hosts" type "<type 'list'>" value: "[]" 2019-04-14 10:40:41,924+0200 DEBUG var changed: host "localhost" var "play_hosts" type "<type 'list'>" value: "[]" 2019-04-14 10:40:41,924+0200 DEBUG var changed: host "localhost" var "ansible_play_batch" type "<type 'list'>" value: "[]" 2019-04-14 10:40:41,924+0200 ERROR ansible failed {'status': 'FAILED', 'ansible_type': 'task', 'ansible_task': u'Activate storage domain', 'ansible_result': u'type: <type \'dict\'>\nstr: {\'_ansible_parsed\': True, u\'exception\': u\'Traceback (most recent call last):\\n File "/tmp/ansible_ovirt_storage_domain_payload_xSFxOp/__main__.py", line 664, in main\\n storage_domains_module.post_create_check(sd_id)\\n File "/tmp/ansible_ovirt_storage_domain_payload_xSFxOp/__main__.py", line 526', 'task_duration': 2, 'ansible_host': u'localhost', 'ansible_playbook': u'/usr/share/ovirt-hosted-engine-setup/ansible/trigger_role.yml'} 2019-04-14 10:40:41,924+0200 DEBUG ansible on_any args <ansible.executor.task_result.TaskResult object at 0x7fb691843190> kwargs ignore_errors:None 2019-04-14 10:40:41,928+0200 INFO ansible stats { "ansible_playbook": "/usr/share/ovirt-hosted-engine-setup/ansible/trigger_role.yml", "ansible_playbook_duration": "00:37 Minutes", "ansible_result": "type: <type 'dict'>\nstr: {u'localhost': {'unreachable': 0, 'skipped': 6, 'ok': 23, 'changed': 1, 'failures': 1}}", "ansible_type": "finish", "status": "FAILED" } 2019-04-14 10:40:41,928+0200 INFO SUMMARY: Duration Task Name -------- -------- [ < 1 sec ] Execute just a specific set of steps [ 00:01 ] Force facts gathering [ 00:01 ] Check local VM dir stat [ 00:01 ] Obtain SSO token using username/password credentials [ 00:01 ] Fetch host facts [ < 1 sec ] Fetch cluster ID [ 00:01 ] Fetch cluster facts [ 00:01 ] Fetch Datacenter facts [ < 1 sec ] Fetch Datacenter ID [ < 1 sec ] Fetch Datacenter name [ 00:02 ] Add NFS storage domain [ 00:01 ] Get storage domain details [ 00:01 ] Find the appliance OVF [ 00:01 ] Parse OVF [ < 1 sec ] Get required size [ FAILED ] Activate storage domain 2019-04-14 10:40:41,928+0200 DEBUG ansible on_any args <ansible.executor.stats.AggregateStats object at 0x7fb69404eb90> kwargs Any suggestions on how fix this?

2 2

How to connect to a guest with vGPU ?
by Josep Manel Andrés Moscardó 29 May '20

29 May '20

Hi, I got vGPU through mdev working but I am wondering how I would connect to the client and make use of the GPU. So far I try to access the console through SPICE and at some point in the boot process it switches to GPU and I cannot see anything else. Thanks. -- Josep Manel Andrés Moscardó Systems Engineer, IT Operations EMBL Heidelberg T +49 6221 387-8394

3 4

Vm suddenly paused with error "vm has paused due to unknown storage error"
by Jasper Siero 18 Feb '20

18 Feb '20

Hi all, Since we upgraded our Ovirt nodes to CentOS 7 a vm (not a specific one but never more then one) will sometimes pause suddenly with the error "VM ... has paused due to unknown storage error". It happens now two times in a month. The Ovirt node uses san storage for the vm's running on it. When a specific vm is pausing with an error the other vm's keeps running without problems. The vm runs without problems after unpausing it. Versions: CentOS Linux release 7.1.1503 vdsm-4.14.17-0 libvirt-daemon-1.2.8-16 vdsm.log: VM Channels Listener::DEBUG::2015-10-25 07:43:54,382::vmChannels::95::vds::(_handle_timeouts) Timeout on fileno 78. libvirtEventLoop::INFO::2015-10-25 07:43:56,177::vm::4602::vm.Vm::(_onIOError) vmId=`77f07ae0-cc3e-4ae2-90ec-7fba7b11deeb`::abnormal vm stop device virtio-disk0 error eother libvirtEventLoop::DEBUG::2015-10-25 07:43:56,178::vm::5204::vm.Vm::(_onLibvirtLifecycleEvent) vmId=`77f07ae0-cc3e-4ae2-90ec-7fba7b11deeb`::event Suspended detail 2 opaque None libvirtEventLoop::INFO::2015-10-25 07:43:56,178::vm::4602::vm.Vm::(_onIOError) vmId=`77f07ae0-cc3e-4ae2-90ec-7fba7b11deeb`::abnormal vm stop device virtio-disk0 error eother ........... libvirtEventLoop::INFO::2015-10-25 07:43:56,180::vm::4602::vm.Vm::(_onIOError) vmId=`77f07ae0-cc3e-4ae2-90ec-7fba7b11deeb`::abnormal vm stop device virtio-disk0 error eother specific error part in libvirt vm log: block I/O error in device 'drive-virtio-disk0': Unknown error 32758 (32758) ........... block I/O error in device 'drive-virtio-disk0': Unknown error 32758 (32758) engine.log: 2015-10-25 07:44:48,945 INFO [org.ovirt.engine.core.vdsbroker.VdsUpdateRunTimeInfo] (DefaultQuartzScheduler_Worker-40) [a43dcc8] VM diataal-prod-cas1 77f07ae0-cc3e-4ae2-90ec-7fba7b11deeb moved from Up --> Paused 2015-10-25 07:44:49,003 ERROR [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (DefaultQuartzScheduler_Worker-40) [a43dcc8] Correlation ID: null, Call Stack: null, Custom Event ID: -1, Message: VM diataal-prod-cas1 has paused due to unknown storage error. Has anyone experienced the same problem or knows a way to solve this? Kind regards, Jasper

3 3

Ovirt-engine-ha cannot to see live status of Hosted Engine
by asm＠pioner.kz 01 Feb '20

01 Feb '20

Good day for all. I have some issues with Ovirt 4.2.6. But now the main this of it: I have two Centos 7 Nodes with same config and last Ovirt 4.2.6 with Hostedengine with disk on NFS storage. Also some of virtual machines working good. But, when HostedEngine running on one node (srv02.local) everything is fine. After migrating to another node (srv00.local), i see that agent cannot to check livelinness of HostedEngine. After few minutes HostedEngine going to reboot and after some time i see some situation. After migration to another node (srv00.local) all looks OK. hosted-engine --vm-status commang when HosterEngine on srv00 node: --== Host 1 status ==-- conf_on_shared_storage : True Status up-to-date : True Hostname : srv02.local Host ID : 1 Engine status : {"reason": "vm not running on this host", "health": "bad", "vm": "down_unexpected", "detail": "unknown"} Score : 0 stopped : False Local maintenance : False crc32 : ecc7ad2d local_conf_timestamp : 78328 Host timestamp : 78328 Extra metadata (valid at timestamp): metadata_parse_version=1 metadata_feature_version=1 timestamp=78328 (Tue Sep 18 12:44:18 2018) host-id=1 score=0 vm_conf_refresh_time=78328 (Tue Sep 18 12:44:18 2018) conf_on_shared_storage=True maintenance=False state=EngineUnexpectedlyDown stopped=False timeout=Fri Jan 2 03:49:58 1970 --== Host 2 status ==-- conf_on_shared_storage : True Status up-to-date : True Hostname : srv00.local Host ID : 2 Engine status : {"reason": "failed liveliness check", "health": "bad", "vm": "up", "detail": "Up"} Score : 3400 stopped : False Local maintenance : False crc32 : 1d62b106 local_conf_timestamp : 326288 Host timestamp : 326288 Extra metadata (valid at timestamp): metadata_parse_version=1 metadata_feature_version=1 timestamp=326288 (Tue Sep 18 12:44:21 2018) host-id=2 score=3400 vm_conf_refresh_time=326288 (Tue Sep 18 12:44:21 2018) conf_on_shared_storage=True maintenance=False state=EngineStarting stopped=False Log agent.log from srv00.local: MainThread::INFO::2018-09-18 12:40:51,749::states::779::ovirt_hosted_engine_ha.agent.hosted_engine.HostedE ngine::(consume) VM is powering up.. MainThread::INFO::2018-09-18 12:40:52,052::hosted_engine::491::ovirt_hosted_engine_ha.agent.hosted_engine. HostedEngine::(_monitoring_loop) Current state EngineStarting (score: 3400) MainThread::INFO::2018-09-18 12:41:01,066::states::779::ovirt_hosted_engine_ha.agent.hosted_engine.HostedE ngine::(consume) VM is powering up.. MainThread::INFO::2018-09-18 12:41:01,374::hosted_engine::491::ovirt_hosted_engine_ha.agent.hosted_engine. HostedEngine::(_monitoring_loop) Current state EngineStarting (score: 3400) MainThread::INFO::2018-09-18 12:41:11,393::state_machine::169::ovirt_hosted_engine_ha.agent.hosted_engine. HostedEngine::(refresh) Global metadata: {'maintenance': False} MainThread::INFO::2018-09-18 12:41:11,393::state_machine::174::ovirt_hosted_engine_ha.agent.hosted_engine. HostedEngine::(refresh) Host srv02.local.pioner.kz (id 1): {'conf_on_shared_storage': True, 'extra': 'meta data_parse_version=1\nmetadata_feature_version=1\ntimestamp=78128 (Tue Sep 18 12:40:58 2018)\nhost-id=1\ns core=0\nvm_conf_refresh_time=78128 (Tue Sep 18 12:40:58 2018)\nconf_on_shared_storage=True\nmaintenance=Fa lse\nstate=EngineUnexpectedlyDown\nstopped=False\ntimeout=Fri Jan 2 03:49:58 1970\n', 'hostname': 'srv02. local.pioner.kz', 'alive': True, 'host-id': 1, 'engine-status': {'reason': 'vm not running on this host', 'health': 'bad', 'vm': 'down_unexpected', 'detail': 'unknown'}, 'score': 0, 'stopped': False, 'maintenance ': False, 'crc32': 'e18e3f22', 'local_conf_timestamp': 78128, 'host-ts': 78128} MainThread::INFO::2018-09-18 12:41:11,393::state_machine::177::ovirt_hosted_engine_ha.agent.hosted_engine. HostedEngine::(refresh) Local (id 2): {'engine-health': {'reason': 'failed liveliness check', 'health': 'b ad', 'vm': 'up', 'detail': 'Up'}, 'bridge': True, 'mem-free': 12763.0, 'maintenance': False, 'cpu-load': 0 .0364, 'gateway': 1.0, 'storage-domain': True} MainThread::INFO::2018-09-18 12:41:11,393::states::779::ovirt_hosted_engine_ha.agent.hosted_engine.HostedE ngine::(consume) VM is powering up.. MainThread::INFO::2018-09-18 12:41:11,703::hosted_engine::491::ovirt_hosted_engine_ha.agent.hosted_engine. HostedEngine::(_monitoring_loop) Current state EngineStarting (score: 3400) MainThread::INFO::2018-09-18 12:41:21,716::states::779::ovirt_hosted_engine_ha.agent.hosted_engine.HostedE ngine::(consume) VM is powering up.. MainThread::INFO::2018-09-18 12:41:22,020::hosted_engine::491::ovirt_hosted_engine_ha.agent.hosted_engine. HostedEngine::(_monitoring_loop) Current state EngineStarting (score: 3400) MainThread::INFO::2018-09-18 12:41:31,033::states::779::ovirt_hosted_engine_ha.agent.hosted_engine.HostedE ngine::(consume) VM is powering up.. MainThread::INFO::2018-09-18 12:41:31,344::hosted_engine::491::ovirt_hosted_engine_ha.agent.hosted_engine. HostedEngine::(_monitoring_loop) Current state EngineStarting (score: 3400) As we can see, agent thinking that HostedEngine just in powering up mode. I cannot to do anythink with it. I allready reinstalled many times srv00 node without success. One time i even has to uninstall ovirt* and vdsm* software. Also here one interesting point, after installing just "yum install http://resources.ovirt.org/pub/yum-repo/ovirt-release42.rpm" on this node i try to install this node from engine web interface with "Deploy" action. But, installation was unsuccesfull, before i didnt install ovirt-hosted-engine-ha on this node. I dont see in documentation that its need bofore installation of new hosts. But this is for information and checking. After installing ovirt-hosted-engine-ha node was installed with HostedEngine support. But the main issue not changed. Thanks in advance for help. BR, Alexandr

3 5

Hyperconverged setup - storage architecture - scaling
by Leo David 10 Jan '20

10 Jan '20

Hello Everyone, Reading through the document: "Red Hat Hyperconverged Infrastructure for Virtualization 1.5 Automating RHHI for Virtualization deployment" Regarding storage scaling, i see the following statements: *2.7. SCALINGRed Hat Hyperconverged Infrastructure for Virtualization is supported for one node, and for clusters of 3, 6, 9, and 12 nodes.The initial deployment is either 1 or 3 nodes.There are two supported methods of horizontally scaling Red Hat Hyperconverged Infrastructure for Virtualization:* *1 Add new hyperconverged nodes to the cluster, in sets of three, up to the maximum of 12 hyperconverged nodes.* *2 Create new Gluster volumes using new disks on existing hyperconverged nodes.You cannot create a volume that spans more than 3 nodes, or expand an existing volume so that it spans across more than 3 nodes at a time* *2.9.1. Prerequisites for geo-replicationBe aware of the following requirements and limitations when configuring geo-replication:One geo-replicated volume onlyRed Hat Hyperconverged Infrastructure for Virtualization (RHHI for Virtualization) supports only one geo-replicated volume. Red Hat recommends backing up the volume that stores the data of your virtual machines, as this is usually contains the most valuable data.* ------ Also in oVirtEngine UI, when I add a brick to an existing volume i get the following warning: *"Expanding gluster volume in a hyper-converged setup is not recommended as it could lead to degraded performance. To expand storage for cluster, it is advised to add additional gluster volumes." * Those things are raising a couple of questions that maybe for some for you guys are easy to answer, but for me it creates a bit of confusion... I am also referring to RedHat product documentation, because I treat oVirt as production-ready as RHHI is. *1*. Is there any reason for not going to distributed-replicated volumes ( ie: spread one volume across 6,9, or 12 nodes ) ? - ie: is recomanded that in a 9 nodes scenario I should have 3 separated volumes, but how should I deal with the folowing question *2.* If only one geo-replicated volume can be configured, how should I deal with 2nd and 3rd volume replication for disaster recovery *3.* If the limit of hosts per datacenter is 250, then (in theory ) the recomended way in reaching this treshold would be to create 20 separated oVirt logical clusters with 12 nodes per each ( and datacenter managed from one ha-engine ) ? *4.* In present, I have the folowing one 9 nodes cluster , all hosts contributing with 2 disks each to a single replica 3 distributed replicated volume. They where added to the volume in the following order: node1 - disk1 node2 - disk1 ...... node9 - disk1 node1 - disk2 node2 - disk2 ...... node9 - disk2 At the moment, the volume is arbitrated, but I intend to go for full distributed replica 3. Is this a bad setup ? Why ? It oviously brakes the redhat recommended rules... Is there anyone so kind to discuss on these things ? Thank you very much ! Leo -- Best regards, Leo David -- Best regards, Leo David

3 5

disk locked after export as OVA
by adam_xu＠adagene.com.cn 15 Oct '19

15 Oct '19

Hello, everyone. I tried to export a vm as OVA. I got a error in webui: VDSM ovirt1.ntbaobei.com command HSMGetAllTasksStatusesVDS failed: Volume Group not big enough: (u'Not enough free extents for extending LV d9f2378f-92f0-4bc1-96a8-20a0f2c575cb/c515a3f9-0590-4ebe-81c5-9d0993f1fec9 (free=848, needed=872)',) seems no enough space in the storage domain. I deleted the vm which i wanted to export as OVA before, but I saw a disk with id 8140d2e7-6908-438e-a646-23e58a33913e left in the storage and the status is locked. here's some log in the engine.log: 2018-09-17 14:27:03,467+08 INFO [org.ovirt.engine.core.vdsbroker.irsbroker.CopyImageVDSCommand] (EE-ManagedThreadFactory-engine-Thread-107849) [a407f6a9-445d-4ccc-b998-f9dfc6dc67ec] START, CopyImageVDSCommand( CopyImageVDSCommandParameters:{storagePoolId='79432500-ad45-11e8-98f3-00163e188641', ignoreFailoverLimit='false', storageDomainId='d9f2378f-92f0-4bc1-96a8-20a0f2c575cb', imageGroupId='ee55bbe7-8001-4c82-b1a6-0cac7710704b', imageId='701a7472-8c00-4b9f-aa87-4837eb128695', dstImageGroupId='8140d2e7-6908-438e-a646-23e58a33913e', vmId='b3aa623e-3072-430e-87d6-b81a3b07f466', dstImageId='c515a3f9-0590-4ebe-81c5-9d0993f1fec9', imageDescription='', dstStorageDomainId='d9f2378f-92f0-4bc1-96a8-20a0f2c575cb', copyVolumeType='LeafVol', volumeFormat='COW', preallocate='Sparse', postZero='false', discard='false', force='true'}), log id: 164e3046 2018-09-17 14:27:03,467+08 INFO [org.ovirt.engine.core.vdsbroker.irsbroker.CopyImageVDSCommand] (EE-ManagedThreadFactory-engine-Thread-107849) [a407f6a9-445d-4ccc-b998-f9dfc6dc67ec] ++ dstImageGUID=8140d2e7-6908-438e-a646-23e58a33913e I think maybe some command did not been excuted after I export the vm as OVA. I can find the similar case here: https://access.redhat.com/solutions/1298893 I have used the tool unlock_entity.sh but cannot find any disk that is locked. So what should I do to delete the disk that is been locked? yours Adam

2 1

oVirt 4.3.5 glusterfs 6.3 performance tunning
by adrianquintero＠gmail.com 04 Oct '19

04 Oct '19

Hello, I have a hyperconverged setup using ovirt 4.3.5 and the "optimize for ovirt store" seems to fail on gluster volumes. I am seeing poor performance and trying to see how should I tune gluster to give better performance. Can you provide any suggestions on the following volume settings(parameters)? option Value ------ ----- cluster.lookup-unhashed on cluster.lookup-optimize on cluster.min-free-disk 10% cluster.min-free-inodes 5% cluster.rebalance-stats off cluster.subvols-per-directory (null) cluster.readdir-optimize off cluster.rsync-hash-regex (null) cluster.extra-hash-regex (null) cluster.dht-xattr-name trusted.glusterfs.dht cluster.randomize-hash-range-by-gfid off cluster.rebal-throttle normal cluster.lock-migration off cluster.force-migration off cluster.local-volume-name (null) cluster.weighted-rebalance on cluster.switch-pattern (null) cluster.entry-change-log on cluster.read-subvolume (null) cluster.read-subvolume-index -1 cluster.read-hash-mode 1 cluster.background-self-heal-count 8 cluster.metadata-self-heal off cluster.data-self-heal off cluster.entry-self-heal off cluster.self-heal-daemon on cluster.heal-timeout 600 cluster.self-heal-window-size 1 cluster.data-change-log on cluster.metadata-change-log on cluster.data-self-heal-algorithm full cluster.eager-lock enable disperse.eager-lock on disperse.other-eager-lock on disperse.eager-lock-timeout 1 disperse.other-eager-lock-timeout 1 cluster.quorum-type auto cluster.quorum-count (null) cluster.choose-local off cluster.self-heal-readdir-size 1KB cluster.post-op-delay-secs 1 cluster.ensure-durability on cluster.consistent-metadata no cluster.heal-wait-queue-length 128 cluster.favorite-child-policy none cluster.full-lock yes diagnostics.latency-measurement off diagnostics.dump-fd-stats off diagnostics.count-fop-hits off diagnostics.brick-log-level INFO diagnostics.client-log-level INFO diagnostics.brick-sys-log-level CRITICAL diagnostics.client-sys-log-level CRITICAL diagnostics.brick-logger (null) diagnostics.client-logger (null) diagnostics.brick-log-format (null) diagnostics.client-log-format (null) diagnostics.brick-log-buf-size 5 diagnostics.client-log-buf-size 5 diagnostics.brick-log-flush-timeout 120 diagnostics.client-log-flush-timeout 120 diagnostics.stats-dump-interval 0 diagnostics.fop-sample-interval 0 diagnostics.stats-dump-format json diagnostics.fop-sample-buf-size 65535 diagnostics.stats-dnscache-ttl-sec 86400 performance.cache-max-file-size 0 performance.cache-min-file-size 0 performance.cache-refresh-timeout 1 performance.cache-priority performance.cache-size 32MB performance.io-thread-count 16 performance.high-prio-threads 16 performance.normal-prio-threads 16 performance.low-prio-threads 32 performance.least-prio-threads 1 performance.enable-least-priority on performance.iot-watchdog-secs (null) performance.iot-cleanup-disconnected- reqsoff performance.iot-pass-through false performance.io-cache-pass-through false performance.cache-size 128MB performance.qr-cache-timeout 1 performance.cache-invalidation false performance.ctime-invalidation false performance.flush-behind on performance.nfs.flush-behind on performance.write-behind-window-size 1MB performance.resync-failed-syncs-after -fsyncoff performance.nfs.write-behind-window-s ize1MB performance.strict-o-direct on performance.nfs.strict-o-direct off performance.strict-write-ordering off performance.nfs.strict-write-ordering off performance.write-behind-trickling-wr iteson performance.aggregate-size 128KB performance.nfs.write-behind-tricklin g-writeson performance.lazy-open yes performance.read-after-open yes performance.open-behind-pass-through false performance.read-ahead-page-count 4 performance.read-ahead-pass-through false performance.readdir-ahead-pass-throug h false performance.md-cache-pass-through false performance.md-cache-timeout 1 performance.cache-swift-metadata true performance.cache-samba-metadata false performance.cache-capability-xattrs true performance.cache-ima-xattrs true performance.md-cache-statfs off performance.xattr-cache-list performance.nl-cache-pass-through false features.encryption off network.frame-timeout 1800 network.ping-timeout 30 network.tcp-window-size (null) client.ssl off network.remote-dio off client.event-threads 4 client.tcp-user-timeout 0 client.keepalive-time 20 client.keepalive-interval 2 client.keepalive-count 9 network.tcp-window-size (null) network.inode-lru-limit 16384 auth.allow * auth.reject (null) transport.keepalive 1 server.allow-insecure on server.root-squash off server.all-squash off server.anonuid 65534 server.anongid 65534 server.statedump-path /var/run/gluster server.outstanding-rpc-limit 64 server.ssl off auth.ssl-allow * server.manage-gids off server.dynamic-auth on client.send-gids on server.gid-timeout 300 server.own-thread (null) server.event-threads 4 server.tcp-user-timeout 42 server.keepalive-time 20 server.keepalive-interval 2 server.keepalive-count 9 transport.listen-backlog 1024 transport.address-family inet performance.write-behind on performance.read-ahead off performance.readdir-ahead on performance.io-cache off performance.open-behind on performance.quick-read off performance.nl-cache off performance.stat-prefetch on performance.client-io-threads on performance.nfs.write-behind on performance.nfs.read-ahead off performance.nfs.io-cache off performance.nfs.quick-read off performance.nfs.stat-prefetch off performance.nfs.io-threads off performance.force-readdirp true performance.cache-invalidation false performance.global-cache-invalidation true features.uss off features.snapshot-directory .snaps features.show-snapshot-directory off features.tag-namespaces off network.compression off network.compression.window-size -15 network.compression.mem-level 8 network.compression.min-size 0 network.compression.compression-level -1 network.compression.debug false features.default-soft-limit 80% features.soft-timeout 60 features.hard-timeout 5 features.alert-time 86400 features.quota-deem-statfs off geo-replication.indexing off geo-replication.indexing off geo-replication.ignore-pid-check off geo-replication.ignore-pid-check off features.quota off features.inode-quota off features.bitrot disable debug.trace off debug.log-history no debug.log-file no debug.exclude-ops (null) debug.include-ops (null) debug.error-gen off debug.error-failure (null) debug.error-number (null) debug.random-failure off debug.error-fops (null) nfs.disable on features.read-only off features.worm off features.worm-file-level off features.worm-files-deletable on features.default-retention-period 120 features.retention-mode relax features.auto-commit-period 180 storage.linux-aio off storage.batch-fsync-mode reverse-fsync storage.batch-fsync-delay-usec 0 storage.owner-uid 36 storage.owner-gid 36 storage.node-uuid-pathinfo off storage.health-check-interval 30 storage.build-pgfid off storage.gfid2path on storage.gfid2path-separator : storage.reserve 1 storage.health-check-timeout 10 storage.fips-mode-rchecksum off storage.force-create-mode 0 storage.force-directory-mode 0 storage.create-mask 777 storage.create-directory-mask 777 storage.max-hardlinks 100 features.ctime on config.gfproxyd off cluster.server-quorum-type server cluster.server-quorum-ratio 0 changelog.changelog off changelog.changelog-dir {{ brick.path }}/.glusterfs/changelogs changelog.encoding ascii changelog.rollover-time 15 changelog.fsync-interval 5 changelog.changelog-barrier-timeout 120 changelog.capture-del-path off features.barrier disable features.barrier-timeout 120 features.trash off features.trash-dir .trashcan features.trash-eliminate-path (null) features.trash-max-filesize 5MB features.trash-internal-op off cluster.enable-shared-storage disable locks.trace off locks.mandatory-locking off cluster.disperse-self-heal-daemon enable cluster.quorum-reads no client.bind-insecure (null) features.shard on features.shard-block-size 64MB features.shard-lru-limit 16384 features.shard-deletion-rate 100 features.scrub-throttle lazy features.scrub-freq biweekly features.scrub false features.expiry-time 120 features.cache-invalidation off features.cache-invalidation-timeout 60 features.leases off features.lease-lock-recall-timeout 60 disperse.background-heals 8 disperse.heal-wait-qlength 128 cluster.heal-timeout 600 dht.force-readdirp on disperse.read-policy gfid-hash cluster.shd-max-threads 8 cluster.shd-wait-qlength 10000 cluster.locking-scheme granular cluster.granular-entry-heal enable features.locks-revocation-secs 0 features.locks-revocation-clear-all false features.locks-revocation-max-blocked 0 features.locks-monkey-unlocking false features.locks-notify-contention no features.locks-notify-contention-dela y 5 disperse.shd-max-threads 1 disperse.shd-wait-qlength 1024 disperse.cpu-extensions auto disperse.self-heal-window-size 1 cluster.use-compound-fops off performance.parallel-readdir off performance.rda-request-size 131072 performance.rda-low-wmark 4096 performance.rda-high-wmark 128KB performance.rda-cache-limit 10MB performance.nl-cache-positive-entry false performance.nl-cache-limit 10MB performance.nl-cache-timeout 60 cluster.brick-multiplex off cluster.max-bricks-per-process 250 disperse.optimistic-change-log on disperse.stripe-cache 4 cluster.halo-enabled False cluster.halo-shd-max-latency 99999 cluster.halo-nfsd-max-latency 5 cluster.halo-max-latency 5 cluster.halo-max-replicas 99999 cluster.halo-min-replicas 2 features.selinux on cluster.daemon-log-level INFO debug.delay-gen off delay-gen.delay-percentage 10% delay-gen.delay-duration 100000 delay-gen.enable disperse.parallel-writes on features.sdfs off features.cloudsync off features.ctime on ctime.noatime on feature.cloudsync-storetype (null) features.enforce-mandatory-lock off thank you!

2 2

oVirt 4.3.5 WARN no gluster network found in cluster
by adrianquintero＠gmail.com 03 Oct '19

03 Oct '19

Hi, I have a 4.3.5 hyperconverged setup with 3 hosts, each host has 2x10G NIC ports Host1: NIC1: 192.168.1.11 NIC2: 192.168.0.67 (Gluster) Host2: NIC1: 10.10.1.12 NIC2: 192.168.0.68 (Gluster) Host3: NIC1: 10.10.1.13 NIC2: 192.168.0.69 (Gluster) I am able to ping all the gluster IPs from within the hosts. i.e from host1 i can ping 192.168.0.68 and 192.168.0.69 However from the HostedEngine VM I cant ping any of those IPs [root@ovirt-engine ~]# ping 192.168.0.9 PING 192.168.0.60 (192.168.0.60) 56(84) bytes of data. From 10.10.255.5 icmp_seq=1 Time to live exceeded and on the HostedEngine I see the following WARNINGS (only for host1) which make me think that I am not using a separate network exclusively for gluster. 2019-08-21 21:04:34,215-04 WARN [org.ovirt.engine.core.vdsbroker.gluster.GlusterVolumesListReturn] (DefaultQuartzScheduler10) [397b6038] Could not associate brick 'host1.example.com:/gluster_bricks/engine/engine' of volume 'ac1f73ce-cdf0-4bb9-817d-xxxxxxcxxx' with correct network as no gluster network found in cluster 'xxxxxxxx11e9-b8d3-00163e5d860d' Any ideas? thank you!! 2019-08-21 21:04:34,220-04 WARN [org.ovirt.engine.core.vdsbroker.gluster.GlusterVolumesListReturn] (DefaultQuartzScheduler10) [397b6038] Could not associate brick 'vmm01.virt.ord1d:/gluster_bricks/data/data' of volume 'bc26633a-9a0b-49de-b714-97e76f222a02' with correct network as no gluster network found in cluster 'e98e2c16-c31e-11e9-b8d3-00163e5d860d' 2019-08-21 21:04:34,224-04 WARN [org.ovirt.engine.core.vdsbroker.gluster.GlusterVolumesListReturn] (DefaultQuartzScheduler10) [397b6038] Could not associate brick 'host1:/gluster_bricks/vmstore/vmstore' of volume 'xxxxxxxxx-ca96-45cc-9e0f-649055e0e07b' with correct network as no gluster network found in cluster 'e98e2c16-c31e-11e9-b8d3xxxxxxxxxxxxxxx'

3 4