On Thu, Sep 28, 2017 at 2:34 PM, Kevin Wolf <kwolf@redhat.com> wrote:
Am 28.09.2017 um 12:44 hat Nir Soffer geschrieben:
> On Thu, Sep 28, 2017 at 12:03 PM Gianluca Cecchi <gianluca.cecchi@gmail.com>
> wrote:
>
> > Hello,
> > I'm on 4.1.5 and I'm cloning a snapshot of a VM with 3 disks for a total
> > of about 200Gb to copy
> > The target I choose is on a different domain than the source one.
> > They are both FC storage domains, with the source on SSD disks and the
> > target on SAS disks.
 
[snip]
 
> >
> > but despite capabilities it seems it is copying using very low system
> > resources.
> >
>
> We run qemu-img convert (and other storage related commands) with:
>
>     nice -n 19 ionice -c 3 qemu-img ...
>
> ionice should not have any effect unless you use the CFQ I/O scheduler.
>
> The intent is to limit the effect of virtual machines.
>

Ah, ok.
The hypervisor is ovirt node based on CentOS 7 so the default scheduler should be deadline if not customized in node.
And in fact in /sys/block/sd*/queue/scheduler I see only [deadline] contents and also for dm-* block devices where it is not none, it is deadline too
 
 
>
> > I see this both using iotop and vmstat
> >
> > vmstat 3 gives:
> > ----io---- -system-- ------cpu-----
> > bi    bo   in   cs us sy id wa st
> > 2527   698 3771 29394  1  0 89 10  0
> >
>
> us 94% also seems very high - maybe this hypervisor is overloaded with
> other workloads?
> wa 89% seems very high

The alignment in the table is a bit off, but us is 1%. The 94 you saw is
part of cs=29394. A high percentage for wait is generally a good sign
because that means that the system is busy with actual I/O work.
Obviously, this I/O work is rather slow, but at least qemu-img is making
requests to the kernel instead of doing other work, otherwise user would
be much higher.

Kevin


Yes, probably a misalignement of output I truncated. Actually a sampling of about 300 lines (once every 3 seconds) shows these number of lines occurrences and related percentage 

user time
    195 0% 
     95 1%

So user time is indeed quite low 

wait time:
    105 7%
     58 8%
     33 9%
     21 6%
     17 10%
     16 5%
     16%
     12 11%
      9 12%
      7 14%
      6 13%
      2 15%
      1 4%
      1 16%
      1 0%

with wait time an average of 7%

AT the end the overall performance in copying has been around 30MB/s that probably is to be expected do to how the qemu-img process is run.
What about the events I reported instead?

Gianluca