
On Mon, Jul 5, 2021 at 3:36 PM Gianluca Cecchi <gianluca.cecchi@gmail.com> wrote:
On Mon, Jul 5, 2021 at 2:13 PM Nir Soffer <nsoffer@redhat.com> wrote:
vdsm 14342 3270 0 11:17 ? 00:00:03 /usr/bin/qemu-img convert -p -t none -T none -f raw /rhev/data-center/mnt/blockSD/679c0725-75fb-4af7-bff1-7c447c5d789c/images/530b3e7f-4ce4-4051-9cac-1112f5f9e8b5/d2a89b5e-7d62-4695-96d8-b762ce52b379 -O raw -o preallocation=falloc /rhev/data-center/mnt/172.16.1.137:_nas_EXPORT-DOMAIN/20433d5d-9d82-4079-9252-0e746ce54106/images/530b3e7f-4ce4-4051-9cac-1112f5f9e8b5/d2a89b5e-7d62-4695-96d8-b762ce52b379
-o preallocation + NFS 4.0 + very slow NFS is your problem.
qemu-img is using posix-fallocate() to preallocate the entire image at the start of the copy. With NFS 4.2 this uses fallocate() linux specific syscall that allocates the space very efficiently in no time. With older NFS versions, this becomes a very slow loop, writing one byte for every 4k block.
If you see -o preallocation, it means you are using an old vdsm version, we stopped using -o preallocation in 4.4.2, see https://bugzilla.redhat.com/1850267.
OK. As I said at the beginning the environment is latest 4.3 We are going to upgrade to 4.4 and we are making some complimentary backups, for safeness.
On the hypervisor the ls commands quite hang, so from another hypervisor I see that the disk size seems to remain at 4Gb even if timestamp updates...
# ll /rhev/data-center/mnt/172.16.1.137\:_nas_EXPORT-DOMAIN/20433d5d-9d82-4079-9252-0e746ce54106/images/530b3e7f-4ce4-4051-9cac-1112f5f9e8b5/ total 4260941 -rw-rw----. 1 nobody nobody 4363202560 Jul 5 11:23 d2a89b5e-7d62-4695-96d8-b762ce52b379 -rw-r--r--. 1 nobody nobody 261 Jul 5 11:17 d2a89b5e-7d62-4695-96d8-b762ce52b379.meta
On host console I see a throughput of 4mbit/s...
# strace -p 14342
This shows only the main thread use -f use -f to show all threads.
# strace -f -p 14342 strace: Process 14342 attached with 2 threads [pid 14342] ppoll([{fd=9, events=POLLIN|POLLERR|POLLHUP}], 1, NULL, NULL, 8 <unfinished ...> [pid 14343] pwrite64(12, "\0", 1, 16474968063) = 1 [pid 14343] pwrite64(12, "\0", 1, 16474972159) = 1 [pid 14343] pwrite64(12, "\0", 1, 16474976255) = 1 [pid 14343] pwrite64(12, "\0", 1, 16474980351) = 1 [pid 14343] pwrite64(12, "\0", 1, 16474984447) = 1 [pid 14343] pwrite64(12, "\0", 1, 16474988543) = 1 [pid 14343] pwrite64(12, "\0", 1, 16474992639) = 1 [pid 14343] pwrite64(12, "\0", 1, 16474996735) = 1 [pid 14343] pwrite64(12, "\0", 1, 16475000831) = 1 [pid 14343] pwrite64(12, "\0", 1, 16475004927) = 1
qemu-img is busy in posix_fallocate(), wiring one byte to every 4k block. If you add -tt -T (as I suggested), we can see how much time each write takes, which may explain why this takes so much time. strace -f -p 14342 --tt -T
. . . and so on . . .
This is a test oVirt env so I can wait and eventually test something... Let me know your suggestions
I would start by changing the NFS storage domain to version 4.2.
I'm going to try. RIght now I have set it to the default of autonegotiated...
1. kill the hang qemu-img (it will probably cannot be killed, but worth trying) 2. deactivate the storage domain 3. fix the ownership on the storage domain (should be vdsm:kvm, not nobody:nobody)3.
Unfortunately it is an appliance. I have asked the guys that have it in charge if we can set them. Thanks for the other concepts explained.
Gianluca