On Mon, Jul 5, 2021 at 3:36 PM Gianluca Cecchi
<gianluca.cecchi(a)gmail.com> wrote:
On Mon, Jul 5, 2021 at 2:13 PM Nir Soffer <nsoffer(a)redhat.com> wrote:
>
>
> >
> > vdsm 14342 3270 0 11:17 ? 00:00:03 /usr/bin/qemu-img convert -p -t
none -T none -f raw
/rhev/data-center/mnt/blockSD/679c0725-75fb-4af7-bff1-7c447c5d789c/images/530b3e7f-4ce4-4051-9cac-1112f5f9e8b5/d2a89b5e-7d62-4695-96d8-b762ce52b379
-O raw -o preallocation=falloc
/rhev/data-center/mnt/172.16.1.137:_nas_EXPORT-DOMAIN/20433d5d-9d82-4079-9252-0e746ce54106/images/530b3e7f-4ce4-4051-9cac-1112f5f9e8b5/d2a89b5e-7d62-4695-96d8-b762ce52b379
>
> -o preallocation + NFS 4.0 + very slow NFS is your problem.
>
> qemu-img is using posix-fallocate() to preallocate the entire image at
> the start of the copy. With NFS 4.2
> this uses fallocate() linux specific syscall that allocates the space
> very efficiently in no time. With older
> NFS versions, this becomes a very slow loop, writing one byte for
> every 4k block.
>
> If you see -o preallocation, it means you are using an old vdsm
> version, we stopped using -o preallocation
> in 4.4.2, see
https://bugzilla.redhat.com/1850267.
OK. As I said at the beginning the environment is latest 4.3
We are going to upgrade to 4.4 and we are making some complimentary backups, for
safeness.
>
> > On the hypervisor the ls commands quite hang, so from another hypervisor I see
that the disk size seems to remain at 4Gb even if timestamp updates...
> >
> > # ll
/rhev/data-center/mnt/172.16.1.137\:_nas_EXPORT-DOMAIN/20433d5d-9d82-4079-9252-0e746ce54106/images/530b3e7f-4ce4-4051-9cac-1112f5f9e8b5/
> > total 4260941
> > -rw-rw----. 1 nobody nobody 4363202560 Jul 5 11:23
d2a89b5e-7d62-4695-96d8-b762ce52b379
> > -rw-r--r--. 1 nobody nobody 261 Jul 5 11:17
d2a89b5e-7d62-4695-96d8-b762ce52b379.meta
> >
> > On host console I see a throughput of 4mbit/s...
> >
> > # strace -p 14342
>
> This shows only the main thread use -f use -f to show all threads.
# strace -f -p 14342
strace: Process 14342 attached with 2 threads
[pid 14342] ppoll([{fd=9, events=POLLIN|POLLERR|POLLHUP}], 1, NULL, NULL, 8
<unfinished ...>
[pid 14343] pwrite64(12, "\0", 1, 16474968063) = 1
[pid 14343] pwrite64(12, "\0", 1, 16474972159) = 1
[pid 14343] pwrite64(12, "\0", 1, 16474976255) = 1
[pid 14343] pwrite64(12, "\0", 1, 16474980351) = 1
[pid 14343] pwrite64(12, "\0", 1, 16474984447) = 1
[pid 14343] pwrite64(12, "\0", 1, 16474988543) = 1
[pid 14343] pwrite64(12, "\0", 1, 16474992639) = 1
[pid 14343] pwrite64(12, "\0", 1, 16474996735) = 1
[pid 14343] pwrite64(12, "\0", 1, 16475000831) = 1
[pid 14343] pwrite64(12, "\0", 1, 16475004927) = 1
qemu-img is busy in posix_fallocate(), wiring one byte to every 4k block.
If you add -tt -T (as I suggested), we can see how much time each write takes,
which may explain why this takes so much time.
strace -f -p 14342 --tt -T
. . . and so on . . .
> >
> > This is a test oVirt env so I can wait and eventually test something...
> > Let me know your suggestions
>
> I would start by changing the NFS storage domain to version 4.2.
I'm going to try. RIght now I have set it to the default of autonegotiated...
>
> 1. kill the hang qemu-img (it will probably cannot be killed, but worth trying)
> 2. deactivate the storage domain
> 3. fix the ownership on the storage domain (should be vdsm:kvm, not
> nobody:nobody)3.
Unfortunately it is an appliance. I have asked the guys that have it in charge if we can
set them.
Thanks for the other concepts explained.
Gianluca