On Tue, Jul 6, 2021 at 10:21 AM Gianluca Cecchi
<gianluca.cecchi(a)gmail.com> wrote:
On Mon, Jul 5, 2021 at 5:06 PM Nir Soffer <nsoffer(a)redhat.com> wrote:
>
>
> qemu-img is busy in posix_fallocate(), wiring one byte to every 4k block.
>
> If you add -tt -T (as I suggested), we can see how much time each write takes,
> which may explain why this takes so much time.
>
> strace -f -p 14342 --tt -T
>
It seems I missed part of your suggestion... i didn't get the "-tt -T" (or
I didn't see it...)
With it I get this during the export (in networking of host console 4 mbit/s....):
# strace -f -p 25243 -tt -T
strace: Process 25243 attached with 2 threads
[pid 25243] 09:17:32.503907 ppoll([{fd=9, events=POLLIN|POLLERR|POLLHUP}], 1, NULL, NULL,
8 <unfinished ...>
[pid 25244] 09:17:32.694207 pwrite64(12, "\0", 1, 3773509631) = 1
<0.000059>
[pid 25244] 09:17:32.694412 pwrite64(12, "\0", 1, 3773513727) = 1
<0.000078>
[pid 25244] 09:17:32.694608 pwrite64(12, "\0", 1, 3773517823) = 1
<0.000056>
[pid 25244] 09:17:32.694729 pwrite64(12, "\0", 1, 3773521919) = 1
<0.000024>
[pid 25244] 09:17:32.694796 pwrite64(12, "\0", 1, 3773526015) = 1
<0.000020>
[pid 25244] 09:17:32.694855 pwrite64(12, "\0", 1, 3773530111) = 1
<0.000015>
[pid 25244] 09:17:32.694908 pwrite64(12, "\0", 1, 3773534207) = 1
<0.000014>
[pid 25244] 09:17:32.694950 pwrite64(12, "\0", 1, 3773538303) = 1
<0.000016>
[pid 25244] 09:17:32.694993 pwrite64(12, "\0", 1, 3773542399) = 1
<0.200032>
[pid 25244] 09:17:32.895140 pwrite64(12, "\0", 1, 3773546495) = 1
<0.000034>
[pid 25244] 09:17:32.895227 pwrite64(12, "\0", 1, 3773550591) = 1
<0.000029>
[pid 25244] 09:17:32.895296 pwrite64(12, "\0", 1, 3773554687) = 1
<0.000024>
[pid 25244] 09:17:32.895353 pwrite64(12, "\0", 1, 3773558783) = 1
<0.000016>
[pid 25244] 09:17:32.895400 pwrite64(12, "\0", 1, 3773562879) = 1
<0.000015>
[pid 25244] 09:17:32.895443 pwrite64(12, "\0", 1, 3773566975) = 1
<0.000015>
[pid 25244] 09:17:32.895485 pwrite64(12, "\0", 1, 3773571071) = 1
<0.000015>
[pid 25244] 09:17:32.895527 pwrite64(12, "\0", 1, 3773575167) = 1
<0.000017>
[pid 25244] 09:17:32.895570 pwrite64(12, "\0", 1, 3773579263) = 1
<0.199493>
[pid 25244] 09:17:33.095147 pwrite64(12, "\0", 1, 3773583359) = 1
<0.000031>
[pid 25244] 09:17:33.095262 pwrite64(12, "\0", 1, 3773587455) = 1
<0.000061>
[pid 25244] 09:17:33.095378 pwrite64(12, "\0", 1, 3773591551) = 1
<0.000027>
[pid 25244] 09:17:33.095445 pwrite64(12, "\0", 1, 3773595647) = 1
<0.000021>
[pid 25244] 09:17:33.095498 pwrite64(12, "\0", 1, 3773599743) = 1
<0.000016>
[pid 25244] 09:17:33.095542 pwrite64(12, "\0", 1, 3773603839) = 1
<0.000014>
Most writes are pretty fast, but from time to time there is a very slow write.
From the small sample you posted, we have:
awk '{print $11}' strace.out | sed -e "s/<//" -e "s/>//"
| awk
'{sum+=$1; if ($1 < 0.1) {fast+=$1; fast_nr++} else {slow+=$1;
slow_nr++}} END{printf "average: %.6f slow: %.6f fast: %.6f\n",
sum/NR, slow/slow_nr, fast/fast_nr}'
average: 0.016673 slow: 0.199763 fast: 0.000028
Preallocating a 300 GiB disk will take about 15 days :-)
>> 300*1024**3 / 4096 * 0.016673 / 3600 / 24
15.17613511111111
If all writes would be fast, it will take less than an hour:
>> 300*1024**3 / 4096 * 0.000028 / 3600
0.6116693333333333
. . .
BTW: it seems my NAS appliance doesn't support 4.2 version of NFS, because if I force
it, I then get an error in mount and in engine.log this error for both nodes as they try
to mount:
2021-07-05 17:01:56,082+02 ERROR
[org.ovirt.engine.core.bll.storage.connection.FileStorageHelper]
(EE-ManagedThreadFactory-engine-Thread-2554190) [642eb6be] The connection with details
'172.16.1.137:/nas/EXPORT-DOMAIN' failed because of error code '477' and
error message is: problem while trying to mount target
and in vdsm.log:
MountError: (32, ';mount.nfs: Protocol not supported\n')
Too bad.
You can evaluate how ovirt 4.4. will work with this appliance using
this dd command:
dd if=/dev/zero bs=8M count=38400 of=/path/to/new/disk
oflag=direct conv=fsync
We don't use dd for this, but the operation is the same on NFS < 4.2.
Based on the 50 MiB/s rate you reported earlier, I guess you have a
1Gbit network to
this appliance, so zeroing can do up to 128 MiB/s, which will take
about 40 minutes
for 300G.
Using NFS 4.2, fallocate will complete in less than a second.
Here is example from my test system, creating 90g raw preallocated volume:
2021-07-06 15:46:40,382+0300 INFO (tasks/1) [storage.Volume] Request
to create RAW volume /rhev/data-center/mnt/storage2:_exp
ort_00/a600ba04-34f9-4793-a5dc-6d4150716d14/images/bcf7c623-8fd8-47b3-aaee-a65c0872536d/82def38d-b41b-4126-826e-0513d669f1b5
with capacity = 96636764160 (fileVolume:493)
...
2021-07-06 15:46:40,447+0300 INFO (tasks/1) [storage.Volume]
Preallocating volume
/rhev/data-center/mnt/storage2:_export_00/a600ba04-34f9-4793-a5dc-6d4150716d14/images/bcf7c623-8fd8-47b3-aaee-a65c0872536d/82def38d-b41b-4126-826e-0513d669f1b5:
0.05 seconds (utils:390)
Nir