Thanks, I really hope someone can help, because right now I'm afraid to
thin provision any large volume due to this. I forgot to mention that,
when I do the export on the NAS, node that is SPM at that moment is
running qemu-img convert process and VDSM process on that node is
running wild (400% cpu load from time to time), but qemu-img convert
never occupies more than 1 thread (100%). There are a few (4-5) nfsd
processes that are going from 40% to 90%. CPU load on the engine admin
panel shows that vm's that run on SPM show abnormaly high cpu load
(40-60%) and they usually run idle at the time. So:
- Runing exprort when SPM node does not have any vm's running is fast
(thin or fully provisioned, doesn't matteR)
- Running export when SPM node does have 3-4 vm's running is painfully
slow for thin provisioned vm's
- Running export when SPM node does have 3-4 vm's running but vm that
is exporting is not thin provisioned is fast enough even though vm's on
SPM show an increased cpu load also
I'm yet to measure I/O utilization in all scenarios, but I'm positive on
what I wrote for the thin provisioned volume when there are vms runing
on spm, it goes 15MBps max and just at bursts every three or four
seconds (I measure this on the SPM node because the qemu-img convert
runs on that node, even though vm is residing on another node, with its
thin provisioned disk on its local storage shared via nfs).
On čet, 2014-06-12 at 13:31 +0000, Sven Kieske wrote:
CC'ing the devel list, maybe some VDSM and storage people can
explain this?
Am 10.06.2014 12:24, schrieb combuster:
> /etc/libvirt/libvirtd.conf and /etc/vdsm/logger.conf
>
> , but unfortunately maybe I've jumped to conclusions, last weekend, that
> very same thin provisioned vm was running a simple export for 3hrs
> before I've killed the process. But I wondered:
>
> 1. The process that runs behind the export is qemu-img convert (from raw
> to raw), and running iotop shows that every three or four seconds it
> reads 10-13 MBps and then idles for a few seconds. Run the numbers on
> 100GB (why is he covering the entire 100 of 15GB used on thin volume I
> still don't get it) and you get precisely 3-4 hrs estimated time remaining.
> 2. When I run export with SPM on a node that doesn't have any vm's
> running, export finishes for aprox. 30min (iotop shows 40-70MBps read
> speed constantly)
> 3. Renicing I/O priority of the qemu-img process as well as the CPU
> priority gave no results, it was still runing slow beyond any explanation.
>
> Debug logs showed nothing of interest, so I disabled anything above
> warning and it suddenly accelerated the export, so I've connected the
> wrong dots.
>
> On 06/10/2014 11:18 AM, Andrew Lau wrote:
>> Interesting, which files did you modify to lower the log levels?
>>
>> On Tue, Jun 3, 2014 at 12:38 AM, <combuster(a)archlinux.us> wrote:
>>> One word of caution so far, when exporting any vm, the node that acts
>>> as SPM
>>> is stressed out to the max. I releived the stress by a certain margin
>>> with
>>> lowering libvirtd and vdsm log levels to WARNING. That shortened out the
>>> export procedure by at least five times. But vdsm process on the SPM
>>> node is
>>> still with high cpu usage so it's best that the SPM node should be
>>> left with a
>>> decent CPU time amount to spare. Also, export of VM's with high vdisk
>>> capacity
>>> and thin provisioning enabled (let's say 14GB used of 100GB defined)
>>> took
>>> around 50min over a 10Gb ethernet interface to a 1Gb export NAS
>>> device that
>>> was not stressed out at all by other processes. When I did that
>>> export with
>>> debug log levels it took 5hrs :(
>>>
>>> So lowering log levels is a must in production enviroment. I've
>>> deleted the
>>> lun that I exported on the storage (removed it first from ovirt) and
>>> for the
>>> next weekend I am planing to add a new one, export it again on all
>>> the nodes
>>> and start a few fresh vm installations. Things I'm going to look for
are
>>> partition alignment and running them from different nodes in the
>>> cluster at
>>> the same time. I just hope that not all I/O is going to pass through
>>> the SPM,
>>> this is the one thing that bothers me the most.
>>>
>>> I'll report back on these results next week, but if anyone has
>>> experience with
>>> this kind of things or can point to some documentation would be great.
>>>
>>> On Monday, 2. June 2014. 18.51.52 you wrote:
>>>> I'm curious to hear what other comments arise, as we're
analyzing a
>>>> production setup shortly.
>>>>
>>>> On Sun, Jun 1, 2014 at 10:11 PM, <combuster(a)archlinux.us> wrote:
>>>>> I need to scratch gluster off because setup is based on CentOS 6.5,
so
>>>>> essential prerequisites like qemu 1.3 and libvirt 1.0.1 are not
met.
>>>> Gluster would still work with EL6, afaik it just won't use libgfapi
and
>>>> instead use just a standard mount.
>>>>
>>>>> Any info regarding FC storage domain would be appreciated though.
>>>>>
>>>>> Thanks
>>>>>
>>>>> Ivan
>>>>>
>>>>> On Sunday, 1. June 2014. 11.44.33 combuster(a)archlinux.us wrote:
>>>>>> Hi,
>>>>>>
>>>>>> I have a 4 node cluster setup and my storage options right now
are
>>>>>> a FC
>>>>>> based storage, one partition per node on a local drive (~200GB
>>>>>> each) and
>>>>>> a
>>>>>> NFS based NAS device. I want to setup export and ISO domain on
the
>>>>>> NAS
>>>>>> and
>>>>>> there are no issues or questions regarding those two. I
wasn't
>>>>>> aware of
>>>>>> any
>>>>>> other options at the time for utilizing a local storage (since
>>>>>> this is a
>>>>>> shared based datacenter) so I exported a directory from each
>>>>>> partition
>>>>>> via
>>>>>> NFS and it works. But I am little in the dark with the
following:
>>>>>>
>>>>>> 1. Are there any advantages for switching from NFS based local
>>>>>> storage to
>>>>>> a
>>>>>> Gluster based domain with blocks for each partition. I guess it
>>>>>> can be
>>>>>> only
>>>>>> performance wise but maybe I'm wrong. If there are
advantages, are
>>>>>> there
>>>>>> any tips regarding xfs mount options etc ?
>>>>>>
>>>>>> 2. I've created a volume on the FC based storage and
exported it
>>>>>> to all
>>>>>> of
>>>>>> the nodes in the cluster on the storage itself. I've
configured
>>>>>> multipathing correctly and added an alias for the wwid of the
LUN
>>>>>> so I
>>>>>> can
>>>>>> distinct this one and any other future volumes more easily. At
>>>>>> first I
>>>>>> created a partition on it but since oVirt saw only the whole
LUN
>>>>>> as raw
>>>>>> device I erased it before adding it as the FC master storage
>>>>>> domain. I've
>>>>>> imported a few VM's and point them to the FC storage domain.
This
>>>>>> setup
>>>>>> works, but:
>>>>>>
>>>>>> - All of the nodes see a device with the alias for the wwid of
the
>>>>>> volume,
>>>>>> but only the node wich is currently the SPM for the cluster can
see
>>>>>> logical
>>>>>> volumes inside. Also when I setup the high availability for
VM's
>>>>>> residing
>>>>>> on the FC storage and select to start on any node on the
cluster,
>>>>>> they
>>>>>> always start on the SPM. Can multiple nodes run different
VM's on the
>>>>>> same
>>>>>> FC storage at the same time (logical thing would be that they
can,
>>>>>> but I
>>>>>> wanted to be sure first). I am not familiar with the logic
oVirt
>>>>>> utilizes
>>>>>> that locks the vm's logical volume to prevent corruption.
>>>>>>
>>>>>> - Fdisk shows that logical volumes on the LUN of the FC volume
are
>>>>>> missaligned (partition doesn't end on cylindar boundary), so
I
>>>>>> wonder if
>>>>>> this is becuase I imported the VM's with disks that were
created
>>>>>> on local
>>>>>> storage before and that any _new_ VM's with disks on the fc
>>>>>> storage would
>>>>>> be propperly aligned.
>>>>>>
>>>>>> This is a new setup with oVirt 3.4 (did an export of all the
VM's
>>>>>> on 3.3
>>>>>> and after a fresh installation of the 3.4 imported them back
>>>>>> again). I
>>>>>> have room to experiment a little with 2 of the 4 nodes because
>>>>>> currently
>>>>>> they are free from running any VM's, but I have limited room
for
>>>>>> anything else that would cause an unplanned downtime for four
virtual
>>>>>> machines running on the other two nodes on the cluster
(currently
>>>>>> highly
>>>>>> available and their drives are on the FC storage domain). All
in
>>>>>> all I
>>>>>> have 12 VM's running and I'm asking on the list for
advice and
>>>>>> guidance
>>>>>> before I make any changes.
>>>>>>
>>>>>> Just trying to find as much info regarding all of this as
possible
>>>>>> before
>>>>>> acting upon.
>>>>>>
>>>>>> Thank you in advance,
>>>>>>
>>>>>> Ivan
--
Mit freundlichen Grüßen / Regards
Sven Kieske
Systemadministrator
Mittwald CM Service GmbH & Co. KG
Königsberger Straße 6
32339 Espelkamp
T: +49-5772-293-100
F: +49-5772-293-333
https://www.mittwald.de
Geschäftsführer: Robert Meyer
St.Nr.: 331/5721/1033, USt-IdNr.: DE814773217, HRA 6640, AG Bad Oeynhausen
Komplementärin: Robert Meyer Verwaltungs GmbH, HRB 13260, AG Bad Oeynhausen