[ovirt-users] Re: Any way to terminate stuck export task

5 Jul 2021

      On Mon, Jul 5, 2021 at 3:36 PM Gianluca Cecchi
<gianluca.cecchi@gmail.com> wrote:
...
On Mon, Jul 5, 2021 at 2:13 PM Nir Soffer <nsoffer@redhat.com> wrote:
...
...
vdsm     14342  3270  0 11:17 ?        00:00:03 /usr/bin/qemu-img convert -p -t none -T none -f raw /rhev/data-center/mnt/blockSD/679c0725-75fb-4af7-bff1-7c447c5d789c/images/530b3e7f-4ce4-4051-9cac-1112f5f9e8b5/d2a89b5e-7d62-4695-96d8-b762ce52b379 -O raw -o preallocation=falloc /rhev/data-center/mnt/172.16.1.137:_nas_EXPORT-DOMAIN/20433d5d-9d82-4079-9252-0e746ce54106/images/530b3e7f-4ce4-4051-9cac-1112f5f9e8b5/d2a89b5e-7d62-4695-96d8-b762ce52b379
-o preallocation + NFS 4.0 + very slow NFS is your problem.
qemu-img is using posix-fallocate() to preallocate the entire image at
the start of the copy. With NFS 4.2
this uses fallocate() linux specific syscall that allocates the space
very efficiently in no time. With older
NFS versions, this becomes a very slow loop, writing one byte for
every 4k block.
If you see -o preallocation, it means you are using an old vdsm
version, we stopped using -o preallocation
in 4.4.2, see https://bugzilla.redhat.com/1850267.
OK. As I said at the beginning the environment is latest 4.3
We are going to upgrade to 4.4 and we are making some complimentary backups, for safeness.
...
...
On the hypervisor the ls commands quite hang, so from another hypervisor I see that the disk size seems to remain at 4Gb even if timestamp updates...
# ll /rhev/data-center/mnt/172.16.1.137\:_nas_EXPORT-DOMAIN/20433d5d-9d82-4079-9252-0e746ce54106/images/530b3e7f-4ce4-4051-9cac-1112f5f9e8b5/
total 4260941
-rw-rw----. 1 nobody nobody 4363202560 Jul  5 11:23 d2a89b5e-7d62-4695-96d8-b762ce52b379
-rw-r--r--. 1 nobody nobody        261 Jul  5 11:17 d2a89b5e-7d62-4695-96d8-b762ce52b379.meta
On host console I see a throughput of 4mbit/s...
# strace -p 14342
This shows only the main thread use -f use -f to show all threads.
# strace -f -p 14342
strace: Process 14342 attached with 2 threads
[pid 14342] ppoll([{fd=9, events=POLLIN|POLLERR|POLLHUP}], 1, NULL, NULL, 8 <unfinished ...>
[pid 14343] pwrite64(12, "\0", 1, 16474968063) = 1
[pid 14343] pwrite64(12, "\0", 1, 16474972159) = 1
[pid 14343] pwrite64(12, "\0", 1, 16474976255) = 1
[pid 14343] pwrite64(12, "\0", 1, 16474980351) = 1
[pid 14343] pwrite64(12, "\0", 1, 16474984447) = 1
[pid 14343] pwrite64(12, "\0", 1, 16474988543) = 1
[pid 14343] pwrite64(12, "\0", 1, 16474992639) = 1
[pid 14343] pwrite64(12, "\0", 1, 16474996735) = 1
[pid 14343] pwrite64(12, "\0", 1, 16475000831) = 1
[pid 14343] pwrite64(12, "\0", 1, 16475004927) = 1
qemu-img is busy in posix_fallocate(), wiring one byte to every 4k block.

If you add -tt -T (as I suggested), we can see how much time each write takes,
which may explain why this takes so much time.

    strace -f -p 14342 --tt -T
...
. . . and so on . . .
...
...
This is a test oVirt env so I can wait and eventually test something...
Let me know your suggestions
I would start by changing the NFS storage domain to version 4.2.
I'm going to try. RIght now I have set it to the default of autonegotiated...
...
1. kill the hang qemu-img (it will probably cannot be killed, but worth trying)
2. deactivate the storage domain
3. fix the ownership on the storage domain (should be vdsm:kvm, not
nobody:nobody)3.
Unfortunately it is an appliance. I have asked the guys that have it in charge if we can set them.
Thanks for the other concepts explained.
Gianluca