September 2019 - Users - oVirt List Archives

deprecating export domain?
by Charles Kozler 30 Aug '20

30 Aug '20

Hello, I recently read on this list from a redhat member that export domain is either being deprecated or looking at being deprecated To that end, can you share details? Can you share any notes/postings/bz's that document this? I would imagine something like this would be discussed in larger audience This seems like a somewhat significant change to make and I am curious where this is scheduled? Currently, a lot of my backups rely explicitly on an export domain for online snapshots, so I'd like to plan accordingly Thanks!

11 21

Re: Single instance scaleup.
by Strahil 05 Jun '20

05 Jun '20

Hi Leo, As you do not have a distributed volume , you can easily switch to replica 2 arbiter 1 or replica 3 volumes. You can use the following for adding the bricks: https://access.redhat.com/documentation/en-US/Red_Hat_Storage/2.1/html/Admi… Best Regards, Strahil NikolivOn May 26, 2019 10:54, Leo David <leoalex(a)gmail.com> wrote: > > Hi Stahil, > Thank you so much for yout input ! > > gluster volume info > > > Volume Name: engine > Type: Distribute > Volume ID: d7449fc2-cc35-4f80-a776-68e4a3dbd7e1 > Status: Started > Snapshot Count: 0 > Number of Bricks: 1 > Transport-type: tcp > Bricks: > Brick1: 192.168.80.191:/gluster_bricks/engine/engine > Options Reconfigured: > nfs.disable: on > transport.address-family: inet > storage.owner-uid: 36 > storage.owner-gid: 36 > features.shard: on > performance.low-prio-threads: 32 > performance.strict-o-direct: off > network.remote-dio: off > network.ping-timeout: 30 > user.cifs: off > performance.quick-read: off > performance.read-ahead: off > performance.io-cache: off > cluster.eager-lock: enable > Volume Name: ssd-samsung > Type: Distribute > Volume ID: 76576cc6-220b-4651-952d-99846178a19e > Status: Started > Snapshot Count: 0 > Number of Bricks: 1 > Transport-type: tcp > Bricks: > Brick1: 192.168.80.191:/gluster_bricks/sdc/data > Options Reconfigured: > cluster.eager-lock: enable > performance.io-cache: off > performance.read-ahead: off > performance.quick-read: off > user.cifs: off > network.ping-timeout: 30 > network.remote-dio: off > performance.strict-o-direct: on > performance.low-prio-threads: 32 > features.shard: on > storage.owner-gid: 36 > storage.owner-uid: 36 > transport.address-family: inet > nfs.disable: on > > The other two hosts will be 192.168.80.192/193 - this is gluster dedicated network over 10GB sfp+ switch. > - host 2 wil have identical harware configuration with host 1 ( each disk is actually a raid0 array ) > - host 3 has: > - 1 ssd for OS > - 1 ssd - for adding to engine volume in a full replica 3 > - 2 ssd's in a raid 1 array to be added as arbiter for the data volume ( ssd-samsung ) > So the plan is to have "engine" scaled in a full replica 3, and "ssd-samsung" scalled in a replica 3 arbitrated. > > > > > On Sun, May 26, 2019 at 10:34 AM Strahil <hunter86_bg(a)yahoo.com> wrote: >> >> Hi Leo, >> >> Gluster is quite smart, but in order to provide any hints , can you provide output of 'gluster volume info <glustervol>'. >> If you have 2 more systems , keep in mind that it is best to mirror the storage on the second replica (2 disks on 1 machine -> 2 disks on the new machine), while for the arbiter this is not neccessary. >> >> What is your network and NICs ? Based on my experience , I can recommend at least 10 gbit/s interfase(s). >> >> Best Regards, >> Strahil Nikolov >> >> On May 26, 2019 07:52, Leo David <leoalex(a)gmail.com> wrote: >>> >>> Hello Everyone, >>> Can someone help me to clarify this ? >>> I have a single-node 4.2.8 installation ( only two gluster storage domains - distributed single drive volumes ). Now I just got two identintical servers and I would like to go for a 3 nodes bundle. >>> Is it possible ( after joining the new nodes to the cluster ) to expand the existing volumes across the new nodes and change them to replica 3 arbitrated ? >>> If so, could you share with me what would it be the procedure ? >>> Thank you very much ! >>> >>> Leo > > > > -- > Best regards, Leo David

4 5

Failed to add storage domain
by thunderlight1＠gmail.com 31 May '20

31 May '20

Hi! I have installed oVirt using the iso ovirt-node-ng-installer-4.3.2-2019031908.el7. I the did run the Host-engine deployment through Cockpit. I got an error when it tries to create the domain storage. It sucessfully mounted the NFS-share on the host. Bellow is the error I got: 2019-04-14 10:40:38,967+0200 INFO ansible skipped {'status': 'SKIPPED', 'ansible_task': u'Check storage domain free space', 'ansible_host': u'localhost', 'ansible_playbook': u'/usr/share/ovirt-hosted-engine-setup/ansible/trigger_role.yml', 'ansible_type': 'task'} 2019-04-14 10:40:38,967+0200 DEBUG ansible on_any args <ansible.executor.task_result.TaskResult object at 0x7fb6918ad9d0> kwargs 2019-04-14 10:40:39,516+0200 INFO ansible task start {'status': 'OK', 'ansible_task': u'ovirt.hosted_engine_setup : Activate storage domain', 'ansible_playbook': u'/usr/share/ovirt-hosted-engine-setup/ansible/trigger_role.yml', 'ansible_type': 'task'} 2019-04-14 10:40:39,516+0200 DEBUG ansible on_any args TASK: ovirt.hosted_engine_setup : Activate storage domain kwargs is_conditional:False 2019-04-14 10:40:41,923+0200 DEBUG var changed: host "localhost" var "otopi_storage_domain_details" type "<type 'dict'>" value: "{ "changed": false, "exception": "Traceback (most recent call last):\n File \"/tmp/ansible_ovirt_storage_domain_payload_xSFxOp/__main__.py\", line 664, in main\n storage_domains_module.post_create_check(sd_id)\n File \"/tmp/ansible_ovirt_storage_domain_payload_xSFxOp/__main__.py\", line 526, in post_create_check\n id=storage_domain.id,\n File \"/usr/lib64/python2.7/site-packages/ovirtsdk4/services.py\", line 3053, in add\n return self._internal_add(storage_domain, headers, query, wait)\n File \"/usr/lib64/python2.7/site-packages/ovirtsdk4/service.py\", line 232, in _internal_add\n return future.wait() if wait else future\n File \"/usr/lib64/python2.7/site-packages/ovirtsdk4/service.py\", line 55, in wait\n return self._code(response)\n File \"/usr/lib64/python2.7/site-packages/ovirtsdk4/service.py\", line 229, in callback\n self._check_fault(response)\n File \"/usr/lib64/python2.7/site-packages/ovirtsdk4/service.py\", line 132, in _check_fault\n self._raise_error(response , body)\n File \"/usr/lib64/python2.7/site-packages/ovirtsdk4/service.py\", line 118, in _raise_error\n raise error\nError: Fault reason is \"Operation Failed\". Fault detail is \"[]\". HTTP response code is 400.\n", "failed": true, "msg": "Fault reason is \"Operation Failed\". Fault detail is \"[]\". HTTP response code is 400." }" 2019-04-14 10:40:41,924+0200 DEBUG var changed: host "localhost" var "ansible_play_hosts" type "<type 'list'>" value: "[]" 2019-04-14 10:40:41,924+0200 DEBUG var changed: host "localhost" var "play_hosts" type "<type 'list'>" value: "[]" 2019-04-14 10:40:41,924+0200 DEBUG var changed: host "localhost" var "ansible_play_batch" type "<type 'list'>" value: "[]" 2019-04-14 10:40:41,924+0200 ERROR ansible failed {'status': 'FAILED', 'ansible_type': 'task', 'ansible_task': u'Activate storage domain', 'ansible_result': u'type: <type \'dict\'>\nstr: {\'_ansible_parsed\': True, u\'exception\': u\'Traceback (most recent call last):\\n File "/tmp/ansible_ovirt_storage_domain_payload_xSFxOp/__main__.py", line 664, in main\\n storage_domains_module.post_create_check(sd_id)\\n File "/tmp/ansible_ovirt_storage_domain_payload_xSFxOp/__main__.py", line 526', 'task_duration': 2, 'ansible_host': u'localhost', 'ansible_playbook': u'/usr/share/ovirt-hosted-engine-setup/ansible/trigger_role.yml'} 2019-04-14 10:40:41,924+0200 DEBUG ansible on_any args <ansible.executor.task_result.TaskResult object at 0x7fb691843190> kwargs ignore_errors:None 2019-04-14 10:40:41,928+0200 INFO ansible stats { "ansible_playbook": "/usr/share/ovirt-hosted-engine-setup/ansible/trigger_role.yml", "ansible_playbook_duration": "00:37 Minutes", "ansible_result": "type: <type 'dict'>\nstr: {u'localhost': {'unreachable': 0, 'skipped': 6, 'ok': 23, 'changed': 1, 'failures': 1}}", "ansible_type": "finish", "status": "FAILED" } 2019-04-14 10:40:41,928+0200 INFO SUMMARY: Duration Task Name -------- -------- [ < 1 sec ] Execute just a specific set of steps [ 00:01 ] Force facts gathering [ 00:01 ] Check local VM dir stat [ 00:01 ] Obtain SSO token using username/password credentials [ 00:01 ] Fetch host facts [ < 1 sec ] Fetch cluster ID [ 00:01 ] Fetch cluster facts [ 00:01 ] Fetch Datacenter facts [ < 1 sec ] Fetch Datacenter ID [ < 1 sec ] Fetch Datacenter name [ 00:02 ] Add NFS storage domain [ 00:01 ] Get storage domain details [ 00:01 ] Find the appliance OVF [ 00:01 ] Parse OVF [ < 1 sec ] Get required size [ FAILED ] Activate storage domain 2019-04-14 10:40:41,928+0200 DEBUG ansible on_any args <ansible.executor.stats.AggregateStats object at 0x7fb69404eb90> kwargs Any suggestions on how fix this?

2 2

How to connect to a guest with vGPU ?
by Josep Manel Andrés Moscardó 29 May '20

29 May '20

Hi, I got vGPU through mdev working but I am wondering how I would connect to the client and make use of the GPU. So far I try to access the console through SPICE and at some point in the boot process it switches to GPU and I cannot see anything else. Thanks. -- Josep Manel Andrés Moscardó Systems Engineer, IT Operations EMBL Heidelberg T +49 6221 387-8394

3 4

ovirt-guest-agent for CentOS 8
by Eduardo Mayoral 30 Mar '20

30 Mar '20

Hi, Just like many of you I am testing my first CentOS 8 VMs on top of ovirt. I am not finding the package ovirt-guest-agent.noarch . Closest I can find is qemu-guest-agent.x86_64 After installing and starting it, I do see information reported on the "Guest info" tab. Can anybody confirm if this is indeed the agent we should be using? Is there - or will there be - a more specific package for ovirt guests? Thanks! -- Eduardo Mayoral Jimeno Systems engineer, platform department. Arsys Internet. emayoral(a)arsys.es - +34 941 620 105 - ext 2153

6 7

Vm suddenly paused with error "vm has paused due to unknown storage error"
by Jasper Siero 18 Feb '20

18 Feb '20

Hi all, Since we upgraded our Ovirt nodes to CentOS 7 a vm (not a specific one but never more then one) will sometimes pause suddenly with the error "VM ... has paused due to unknown storage error". It happens now two times in a month. The Ovirt node uses san storage for the vm's running on it. When a specific vm is pausing with an error the other vm's keeps running without problems. The vm runs without problems after unpausing it. Versions: CentOS Linux release 7.1.1503 vdsm-4.14.17-0 libvirt-daemon-1.2.8-16 vdsm.log: VM Channels Listener::DEBUG::2015-10-25 07:43:54,382::vmChannels::95::vds::(_handle_timeouts) Timeout on fileno 78. libvirtEventLoop::INFO::2015-10-25 07:43:56,177::vm::4602::vm.Vm::(_onIOError) vmId=`77f07ae0-cc3e-4ae2-90ec-7fba7b11deeb`::abnormal vm stop device virtio-disk0 error eother libvirtEventLoop::DEBUG::2015-10-25 07:43:56,178::vm::5204::vm.Vm::(_onLibvirtLifecycleEvent) vmId=`77f07ae0-cc3e-4ae2-90ec-7fba7b11deeb`::event Suspended detail 2 opaque None libvirtEventLoop::INFO::2015-10-25 07:43:56,178::vm::4602::vm.Vm::(_onIOError) vmId=`77f07ae0-cc3e-4ae2-90ec-7fba7b11deeb`::abnormal vm stop device virtio-disk0 error eother ........... libvirtEventLoop::INFO::2015-10-25 07:43:56,180::vm::4602::vm.Vm::(_onIOError) vmId=`77f07ae0-cc3e-4ae2-90ec-7fba7b11deeb`::abnormal vm stop device virtio-disk0 error eother specific error part in libvirt vm log: block I/O error in device 'drive-virtio-disk0': Unknown error 32758 (32758) ........... block I/O error in device 'drive-virtio-disk0': Unknown error 32758 (32758) engine.log: 2015-10-25 07:44:48,945 INFO [org.ovirt.engine.core.vdsbroker.VdsUpdateRunTimeInfo] (DefaultQuartzScheduler_Worker-40) [a43dcc8] VM diataal-prod-cas1 77f07ae0-cc3e-4ae2-90ec-7fba7b11deeb moved from Up --> Paused 2015-10-25 07:44:49,003 ERROR [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (DefaultQuartzScheduler_Worker-40) [a43dcc8] Correlation ID: null, Call Stack: null, Custom Event ID: -1, Message: VM diataal-prod-cas1 has paused due to unknown storage error. Has anyone experienced the same problem or knows a way to solve this? Kind regards, Jasper

3 3

Ovirt-engine-ha cannot to see live status of Hosted Engine
by asm＠pioner.kz 01 Feb '20

01 Feb '20

Good day for all. I have some issues with Ovirt 4.2.6. But now the main this of it: I have two Centos 7 Nodes with same config and last Ovirt 4.2.6 with Hostedengine with disk on NFS storage. Also some of virtual machines working good. But, when HostedEngine running on one node (srv02.local) everything is fine. After migrating to another node (srv00.local), i see that agent cannot to check livelinness of HostedEngine. After few minutes HostedEngine going to reboot and after some time i see some situation. After migration to another node (srv00.local) all looks OK. hosted-engine --vm-status commang when HosterEngine on srv00 node: --== Host 1 status ==-- conf_on_shared_storage : True Status up-to-date : True Hostname : srv02.local Host ID : 1 Engine status : {"reason": "vm not running on this host", "health": "bad", "vm": "down_unexpected", "detail": "unknown"} Score : 0 stopped : False Local maintenance : False crc32 : ecc7ad2d local_conf_timestamp : 78328 Host timestamp : 78328 Extra metadata (valid at timestamp): metadata_parse_version=1 metadata_feature_version=1 timestamp=78328 (Tue Sep 18 12:44:18 2018) host-id=1 score=0 vm_conf_refresh_time=78328 (Tue Sep 18 12:44:18 2018) conf_on_shared_storage=True maintenance=False state=EngineUnexpectedlyDown stopped=False timeout=Fri Jan 2 03:49:58 1970 --== Host 2 status ==-- conf_on_shared_storage : True Status up-to-date : True Hostname : srv00.local Host ID : 2 Engine status : {"reason": "failed liveliness check", "health": "bad", "vm": "up", "detail": "Up"} Score : 3400 stopped : False Local maintenance : False crc32 : 1d62b106 local_conf_timestamp : 326288 Host timestamp : 326288 Extra metadata (valid at timestamp): metadata_parse_version=1 metadata_feature_version=1 timestamp=326288 (Tue Sep 18 12:44:21 2018) host-id=2 score=3400 vm_conf_refresh_time=326288 (Tue Sep 18 12:44:21 2018) conf_on_shared_storage=True maintenance=False state=EngineStarting stopped=False Log agent.log from srv00.local: MainThread::INFO::2018-09-18 12:40:51,749::states::779::ovirt_hosted_engine_ha.agent.hosted_engine.HostedE ngine::(consume) VM is powering up.. MainThread::INFO::2018-09-18 12:40:52,052::hosted_engine::491::ovirt_hosted_engine_ha.agent.hosted_engine. HostedEngine::(_monitoring_loop) Current state EngineStarting (score: 3400) MainThread::INFO::2018-09-18 12:41:01,066::states::779::ovirt_hosted_engine_ha.agent.hosted_engine.HostedE ngine::(consume) VM is powering up.. MainThread::INFO::2018-09-18 12:41:01,374::hosted_engine::491::ovirt_hosted_engine_ha.agent.hosted_engine. HostedEngine::(_monitoring_loop) Current state EngineStarting (score: 3400) MainThread::INFO::2018-09-18 12:41:11,393::state_machine::169::ovirt_hosted_engine_ha.agent.hosted_engine. HostedEngine::(refresh) Global metadata: {'maintenance': False} MainThread::INFO::2018-09-18 12:41:11,393::state_machine::174::ovirt_hosted_engine_ha.agent.hosted_engine. HostedEngine::(refresh) Host srv02.local.pioner.kz (id 1): {'conf_on_shared_storage': True, 'extra': 'meta data_parse_version=1\nmetadata_feature_version=1\ntimestamp=78128 (Tue Sep 18 12:40:58 2018)\nhost-id=1\ns core=0\nvm_conf_refresh_time=78128 (Tue Sep 18 12:40:58 2018)\nconf_on_shared_storage=True\nmaintenance=Fa lse\nstate=EngineUnexpectedlyDown\nstopped=False\ntimeout=Fri Jan 2 03:49:58 1970\n', 'hostname': 'srv02. local.pioner.kz', 'alive': True, 'host-id': 1, 'engine-status': {'reason': 'vm not running on this host', 'health': 'bad', 'vm': 'down_unexpected', 'detail': 'unknown'}, 'score': 0, 'stopped': False, 'maintenance ': False, 'crc32': 'e18e3f22', 'local_conf_timestamp': 78128, 'host-ts': 78128} MainThread::INFO::2018-09-18 12:41:11,393::state_machine::177::ovirt_hosted_engine_ha.agent.hosted_engine. HostedEngine::(refresh) Local (id 2): {'engine-health': {'reason': 'failed liveliness check', 'health': 'b ad', 'vm': 'up', 'detail': 'Up'}, 'bridge': True, 'mem-free': 12763.0, 'maintenance': False, 'cpu-load': 0 .0364, 'gateway': 1.0, 'storage-domain': True} MainThread::INFO::2018-09-18 12:41:11,393::states::779::ovirt_hosted_engine_ha.agent.hosted_engine.HostedE ngine::(consume) VM is powering up.. MainThread::INFO::2018-09-18 12:41:11,703::hosted_engine::491::ovirt_hosted_engine_ha.agent.hosted_engine. HostedEngine::(_monitoring_loop) Current state EngineStarting (score: 3400) MainThread::INFO::2018-09-18 12:41:21,716::states::779::ovirt_hosted_engine_ha.agent.hosted_engine.HostedE ngine::(consume) VM is powering up.. MainThread::INFO::2018-09-18 12:41:22,020::hosted_engine::491::ovirt_hosted_engine_ha.agent.hosted_engine. HostedEngine::(_monitoring_loop) Current state EngineStarting (score: 3400) MainThread::INFO::2018-09-18 12:41:31,033::states::779::ovirt_hosted_engine_ha.agent.hosted_engine.HostedE ngine::(consume) VM is powering up.. MainThread::INFO::2018-09-18 12:41:31,344::hosted_engine::491::ovirt_hosted_engine_ha.agent.hosted_engine. HostedEngine::(_monitoring_loop) Current state EngineStarting (score: 3400) As we can see, agent thinking that HostedEngine just in powering up mode. I cannot to do anythink with it. I allready reinstalled many times srv00 node without success. One time i even has to uninstall ovirt* and vdsm* software. Also here one interesting point, after installing just "yum install http://resources.ovirt.org/pub/yum-repo/ovirt-release42.rpm" on this node i try to install this node from engine web interface with "Deploy" action. But, installation was unsuccesfull, before i didnt install ovirt-hosted-engine-ha on this node. I dont see in documentation that its need bofore installation of new hosts. But this is for information and checking. After installing ovirt-hosted-engine-ha node was installed with HostedEngine support. But the main issue not changed. Thanks in advance for help. BR, Alexandr

3 5

Hyperconverged setup - storage architecture - scaling
by Leo David 10 Jan '20

10 Jan '20

Hello Everyone, Reading through the document: "Red Hat Hyperconverged Infrastructure for Virtualization 1.5 Automating RHHI for Virtualization deployment" Regarding storage scaling, i see the following statements: *2.7. SCALINGRed Hat Hyperconverged Infrastructure for Virtualization is supported for one node, and for clusters of 3, 6, 9, and 12 nodes.The initial deployment is either 1 or 3 nodes.There are two supported methods of horizontally scaling Red Hat Hyperconverged Infrastructure for Virtualization:* *1 Add new hyperconverged nodes to the cluster, in sets of three, up to the maximum of 12 hyperconverged nodes.* *2 Create new Gluster volumes using new disks on existing hyperconverged nodes.You cannot create a volume that spans more than 3 nodes, or expand an existing volume so that it spans across more than 3 nodes at a time* *2.9.1. Prerequisites for geo-replicationBe aware of the following requirements and limitations when configuring geo-replication:One geo-replicated volume onlyRed Hat Hyperconverged Infrastructure for Virtualization (RHHI for Virtualization) supports only one geo-replicated volume. Red Hat recommends backing up the volume that stores the data of your virtual machines, as this is usually contains the most valuable data.* ------ Also in oVirtEngine UI, when I add a brick to an existing volume i get the following warning: *"Expanding gluster volume in a hyper-converged setup is not recommended as it could lead to degraded performance. To expand storage for cluster, it is advised to add additional gluster volumes." * Those things are raising a couple of questions that maybe for some for you guys are easy to answer, but for me it creates a bit of confusion... I am also referring to RedHat product documentation, because I treat oVirt as production-ready as RHHI is. *1*. Is there any reason for not going to distributed-replicated volumes ( ie: spread one volume across 6,9, or 12 nodes ) ? - ie: is recomanded that in a 9 nodes scenario I should have 3 separated volumes, but how should I deal with the folowing question *2.* If only one geo-replicated volume can be configured, how should I deal with 2nd and 3rd volume replication for disaster recovery *3.* If the limit of hosts per datacenter is 250, then (in theory ) the recomended way in reaching this treshold would be to create 20 separated oVirt logical clusters with 12 nodes per each ( and datacenter managed from one ha-engine ) ? *4.* In present, I have the folowing one 9 nodes cluster , all hosts contributing with 2 disks each to a single replica 3 distributed replicated volume. They where added to the volume in the following order: node1 - disk1 node2 - disk1 ...... node9 - disk1 node1 - disk2 node2 - disk2 ...... node9 - disk2 At the moment, the volume is arbitrated, but I intend to go for full distributed replica 3. Is this a bad setup ? Why ? It oviously brakes the redhat recommended rules... Is there anyone so kind to discuss on these things ? Thank you very much ! Leo -- Best regards, Leo David -- Best regards, Leo David

3 5

hyperconverged single node with SSD cache fails gluster creation
by thomas＠hoberg.net 01 Dec '19

01 Dec '19

I am seeing more success than failures at creating single and triple node hyperconverged setups after some weeks of experimentation so I am branching out to additional features: In this case the ability to use SSDs as cache media for hard disks. I tried first with a single node that combined caching and compression and that fails during the creation of LVMs. I tried again without the VDO compression, but actually the results where identical whilst VDO compression but without the LV cache worked ok. I tried various combinations, using less space etc., but the results are always the same and unfortunately rather cryptic (substituted the physical disk label with {disklabel}): TASK [gluster.infra/roles/backend_setup : Extend volume group] ***************** failed: [{hostname}] (item={u'vgname': u'gluster_vg_{disklabel}p1', u'cachethinpoolname': u'gluster_thinpool_gluster_vg_{disklabel}p1', u'cachelvname': u'cachelv_gluster_thinpool_gluster_vg_{disklabel}p1', u'cachedisk': u'/dev/sda4', u'cachemetalvname': u'cache_gluster_thinpool_gluster_vg_{disklabel}p1', u'cachemode': u'writeback', u'cachemetalvsize': u'70G', u'cachelvsize': u'630G'}) => {"ansible_loop_var": "item", "changed": false, "err": " Physical volume \"/dev/mapper/vdo_{disklabel}p1\" still in use\n", "item": {"cachedisk": "/dev/sda4", "cachelvname": "cachelv_gluster_thinpool_gluster_vg_{disklabel}p1", "cachelvsize": "630G", "cachemetalvname": "cache_gluster_thinpool_gluster_vg_{disklabel}p1", "cachemetalvsize": "70G", "cachemode": "writeback", "cachethinpoolname": "gluster_thinpool_gluster_vg_{disklabel}p1", "vgname": "gluster_vg_{disklabel}p1"}, "msg": "Unable to reduce gluster_vg_{disklabel}p1 by /dev/dm-15.", "rc": 5} somewhere within that I see something that points to a race condition ("still in use"). Unfortunately I have not been able to pinpoint the raw logs which are used at that stage and I wasn't able to obtain more info. At this point quite a bit of storage setup is already done, so rolling back for a clean new attempt, can be a bit complicated, with reboots to reconcile the kernel with data on disk. I don't actually believe it's related to single node and I'd be quite happy to move the creation of the SSD cache to a later stage, but in a VDO setup, this looks slightly complex to someone without intimate knowledge of LVS-with-cache-and-perhaps-thin/VDO/Gluster all thrown into one. Needless the feature set (SSD caching & compressed-dedup) sounds terribly attractive but when things don't just work, it's more terrifying.

5 7

How to pass parameters between VDSM Hooks domxml in single run
by Vrgotic, Marko 06 Nov '19

06 Nov '19

Dear oVIrt, A while ago we discussed on ways to change/update content of parameters of domxml in certain action. As I mentioned before, we have added the VDSMHook 60_nsupdate which removes the DNS record entries when a VM is destroyed: … domxml = hooking.read_domxml() name = domxml.getElementsByTagName('name')[0] name = " ".join(name.nodeValue for name in name.childNodes if name.nodeType == name.TEXT_NODE) nsupdate_commands = """server {server_ip} update delete {vm_name}.example.com a update delete {vm_name}. example.com aaaa update delete {vm_name}. example.com txt send """.format(server_ip="172.16.1.10", vm_name=name) … The goal: However, we did not want to execute remove dns records when VM is only migrated. Since its considered a “destroy” action we took following approach. * In state “before_vm_migrate_source add hook which will write flag “is_migration” to domxml * Once VM is scheduled for migration, this hook should add the flag “is_migration” to domxml * Once 60_nsupdate is triggered, it will check for the flag and if there, skip executing dns record action, but only remove the flag “is_migration” from domxml of the VM … domxml = hooking.read_domxml() migration = domxml.createElement("is_migration") domxml.getElementsByTagName("domain")[0].appendChild(migration) logging.info("domxml_updated {}".format(domxml.toprettyxml())) hooking.write_domxml(domxml) … When executing first time, we observed that flag “ <name>hookiesvm</name> <uuid>fcfa66cb-b251-43a3-8e2b-f33b3024a749</uuid> <metadata xmlns:ns0="http://ovirt.org/vm/tune/1.0" xmlns:ns1="http://ovirt.org/vm/1.0"> <ns0:qos/> <ovirt-vm:vm xmlns:ovirt-vm="http://ovirt.org/vm/1.0"> <ovirt-vm:clusterVersion>4.3</ovirt-vm:clusterVersion> <ovirt-vm:destroy_on_reboot type="bool">False</ovirt-vm:destroy_on_reboot> <ovirt-vm:launchPaused>false</ovirt-vm:launchPaused> <ovirt-vm:memGuaranteedSize type="int">1024</ovirt-vm:memGuaranteedSize> <ovirt-vm:minGuaranteedMemoryMb type="int">1024</ovirt-vm:minGuaranteedMemoryMb> ...skipping... <address bus="0x00" domain="0x0000" function="0x0" slot="0x09" type="pci"/> </rng> </devices> <seclabel model="selinux" relabel="yes" type="dynamic"> <label>system_u:system_r:svirt_t:s0:c169,c575</label> <imagelabel>system_u:object_r:svirt_image_t:s0:c169,c575</imagelabel> </seclabel> <seclabel model="dac" relabel="yes" type="dynamic"> <label>+107:+107</label> <imagelabel>+107:+107</imagelabel> </seclabel> <is_migration/> </domain> is added to domxml, but was present once 60_nsupdate hook was executed. The question: How do we make sure that, when domxml is updated, that the update is visible/usable by following hook, in single run? How to pass these changes between hooks? Kindly awaiting your reply. — — — Met vriendelijke groet / Kind regards, Marko Vrgotic

2 6