Ok, so only for the HA engine eh? Been meaning to ask about that, since my hosted engine wasn’t using it. Tolerable, live disk migrations are way less valuable to me than better disk performance :) Other comments inline:
From: Sahina Bose <sabose@redhat.com>
Subject: Re: [ovirt-users] [ANN] oVirt 4.1.9 Release is now available
Date: January 25, 2018 at 12:24:34 AM CST
To: Darrell Budic
Cc: Lev Veyde; users
- it doesn’t seem to affect my HA vms, I’ve seen my 4.1.8 system properly restart systems using it (node/libvirtd crash that seems to have been related to spectre/meltdown firmwares)HA is an issue only when the gluster server used to provide volume information is down. For instance, if you have provided the "serverA:/volumeA" in your storage domain path , and if the other servers in replica are up but serverA is down, VM cannot restart. Have you tested this?I’m using a DNS based methods & backup server mount options to ensure the volume info is always available, so this particular problem shouldn’t affect me (and hasn’t in my testing).Can you share any performance improvement numbers that you have seen after turning on libgfapi access, also information about your workload would be helpful.thanks!Some not-very-scientific testing (can’t arrange dedicated disk time on this system) on a VM that hadn’t been covered yet gives me:Before using gfapi:]# dd if=/dev/urandom of=test.file bs=1M count=10241024+0 records in1024+0 records out1073741824 bytes (1.1 GB) copied, 90.1843 s, 11.9 MB/s# echo 3 > /proc/sys/vm/drop_caches# dd if=test.file of=/dev/null2097152+0 records in2097152+0 records out1073741824 bytes (1.1 GB) copied, 3.94715 s, 272 MB/s# hdparm -tT /dev/vda/dev/vda:Timing cached reads: 17322 MB in 2.00 seconds = 8673.49 MB/secTiming buffered disk reads: 996 MB in 3.00 seconds = 331.97 MB/sec#bonnie++ -d . -s 8G -n 0 -m pre-glapi -f -b -u rootVersion 1.97 ------Sequential Output------ --Sequential Input- --Random-Concurrency 1 -Per Chr- --Block-- -Rewrite- -Per Chr- --Block-- --Seeks--Machine Size K/sec %CP K/sec %CP K/sec %CP K/sec %CP K/sec %CP /sec %CPpre-glapi 8G 196245 30 105331 15 962775 49 1638 34Latency 1578ms 1383ms 201ms 301msVersion 1.97 ------Sequential Output------ --Sequential Input- --Random-Concurrency 1 -Per Chr- --Block-- -Rewrite- -Per Chr- --Block-- --Seeks--Machine Size K/sec %CP K/sec %CP K/sec %CP K/sec %CP K/sec %CP /sec %CPpre-glapi 8G 155937 27 102899 14 1030285 54 1763 45Latency 694ms 1333ms 114ms 229ms(note, sequential reads seem to have been influenced by caching somewhere…)After switching to gfapi:# dd if=/dev/urandom of=test.file bs=1M count=10241024+0 records in1024+0 records out1073741824 bytes (1.1 GB) copied, 80.8317 s, 13.3 MB/s# echo 3 > /proc/sys/vm/drop_caches# dd if=test.file of=/dev/null2097152+0 records in2097152+0 records out1073741824 bytes (1.1 GB) copied, 3.3473 s, 321 MB/s# hdparm -tT /dev/vda/dev/vda:Timing cached reads: 17112 MB in 2.00 seconds = 8568.86 MB/secTiming buffered disk reads: 1406 MB in 3.01 seconds = 467.70 MB/sec#bonnie++ -d . -s 8G -n 0 -m glapi -f -b -u rootVersion 1.97 ------Sequential Output------ --Sequential Input- --Random-Concurrency 1 -Per Chr- --Block-- -Rewrite- -Per Chr- --Block-- --Seeks--Machine Size K/sec %CP K/sec %CP K/sec %CP K/sec %CP K/sec %CP /sec %CPglapi 8G 359100 59 185289 24 489575 31 2079 67Latency 160ms 355ms 36041us 185msVersion 1.97 ------Sequential Output------ --Sequential Input- --Random-Concurrency 1 -Per Chr- --Block-- -Rewrite- -Per Chr- --Block-- --Seeks--Machine Size K/sec %CP K/sec %CP K/sec %CP K/sec %CP K/sec %CP /sec %CPglapi 8G 341307 57 180546 24 472572 35 2655 61Latency 153ms 394ms 101ms 116msSo excellent improvement in write throughput, but the significant improvement in latency is what was most noticed by users. Anecdotal reports of 2x+ performance improvements, with one remarking that it’s like having dedicated disks :)This system is on my production cluster, so it’s not getting exclusive disk access, but this VM is not doing anything else itself. The cluster is 3 xeon E5-2609 v3 @ 1.90GHz servers w/ 64G ram, SATA2 disks; 2 with 9x spindles each, 1 with 8x slightly faster disks (all spinners). Using ZFS stripes with lz4 compression and 10G connectivity to 8 hosts. Running gluster 3.12.3 at the moment. The cluster itself has about 70 running VMs in varying states of switching to gfapi use, but my main sql servers are using their own volumes and not competing for this one. These have not yet had the spectre/meltdown patches applied.This will be skewed because I forced it to not steal all the ram on the server (reads will certainly be cached), but an idea of what it can do disk wise, on the volume used above:# bonnie++ -d . -s 8G -n 0 -m zfs-server -f -b -u root -r 4096Version 1.97 ------Sequential Output------ --Sequential Input- --Random-Concurrency 1 -Per Chr- --Block-- -Rewrite- -Per Chr- --Block-- --Seeks--Machine Size K/sec %CP K/sec %CP K/sec %CP K/sec %CP K/sec %CP /sec %CPzfs-server 8G 604940 79 510410 87 1393862 99 3164 91Latency 99545us 100ms 247us 152msJust for fun from one of the servers showing base load and this testing: