[ovirt-users] VM has been paused due to no Storage space error.

Nir Soffer nsoffer at redhat.com
Thu Apr 14 10:29:53 UTC 2016


On Thu, Apr 14, 2016 at 1:23 PM,  <nicolas at devels.es> wrote:
> Hi Nir,
>
> El 2016-04-14 11:02, Nir Soffer escribió:
>>
>> On Thu, Apr 14, 2016 at 12:38 PM, Fred Rolland <frolland at redhat.com>
>> wrote:
>>>
>>> Nir,
>>> See attached the repoplot output.
>>
>>
>> So we have about one concurrent lvm command without any disk operations,
>> and
>> everything seems snappy.
>>
>> Nicolás, maybe this storage or the host is overloaded by the vms? Are your
>> vms
>> doing lot of io?
>>
>
> Not that I know, actually it should have been a "calm" time slot as far as
> IOs go, nor the storage was overloaded at that time. If I'm not mistaken, on
> the repoplot report I see there are two LVM operations at a time, maybe that
> has something to do with it?

The operation that took about 50 seconds started in the same time that another
operation started, but it does not explain why several other lvm comands took
about 15 seconds each.

> (although as you say, the lvextend is just a
> metadata change...)
>
>
>> lvextend operation should be very fast operation, this is just a
>> metadata change,
>> allocating couple of extents to that lv.
>>
>> Zdenek, how do you suggest to debug slow lvm commands?
>>
>> See the attached pdf, lvm commands took 15-50 seconds.
>>
>>>
>>> On Thu, Apr 14, 2016 at 12:18 PM, Nir Soffer <nsoffer at redhat.com> wrote:
>>>>
>>>>
>>>> On Thu, Apr 14, 2016 at 12:02 PM, Fred Rolland <frolland at redhat.com>
>>>> wrote:
>>>> > From the log, we can see that the lvextend command took 18 sec, which
>>>> > is
>>>> > quite long.
>>>>
>>>> Fred, can you run repoplot on this log file? it will may explain why
>>>> this
>>>> lvm
>>>> call took 18 seconds.
>>>>
>>>> Nir
>>>>
>>>> >
>>>> > 60decf0c-6d9a-4c3b-bee6-de9d2ff05e85::DEBUG::2016-04-13
>>>> > 10:52:06,759::lvm::290::Storage.Misc.excCmd::(cmd) /usr/bin/taskset
>>>> > --cpu-list 0-23 /usr/bin/sudo -n /usr/sbin/lvm lvextend --config '
>>>> > devices {
>>>> > preferred_names = ["^/dev/mapper/"] ignore_suspended_devices=1
>>>> > write_cache_state=0 disable_after_error_count=3 filter = [
>>>> > '\''a|/dev/mapper/36000eb3a4f1acbc20000000000000043|'\'',
>>>> > '\''r|.*|'\''
>>>> > ] }
>>>> > global {  locking_type=1  prioritise_write_locks=1  wait_for_locks=1
>>>> > use_lvmetad=0 }  backup {  retain_min = 50  retain_days = 0 } '
>>>> > --autobackup
>>>> > n --size 6016m
>>>> >
>>>> >
>>>> > 5de4a000-a9c4-489c-8eee-10368647c413/721d09bc-60e7-4310-9ba2-522d2a4b03d0
>>>> > (cwd None)
>>>> > ....
>>>> > 60decf0c-6d9a-4c3b-bee6-de9d2ff05e85::DEBUG::2016-04-13
>>>> > 10:52:22,217::lvm::290::Storage.Misc.excCmd::(cmd) SUCCESS: <err> = '
>>>> > WARNING: lvmetad is running but disabled. Restart lvmetad before
>>>> > enabling
>>>> > it!\n  WARNING: This metadata update is NOT backed up\n'; <rc> = 0
>>>> >
>>>> >
>>>> > The watermark can be configured by the following value:
>>>> >
>>>> > 'volume_utilization_percent', '50',
>>>> >     'Together with volume_utilization_chunk_mb, set the minimal free '
>>>> >     'space before a thin provisioned block volume is extended. Use '
>>>> >     'lower values to extend earlier.')
>>>> >
>>>> > On Thu, Apr 14, 2016 at 11:42 AM, Michal Skrivanek
>>>> > <michal.skrivanek at redhat.com> wrote:
>>>> >>
>>>> >>
>>>> >> > On 14 Apr 2016, at 09:57, nicolas at devels.es wrote:
>>>> >> >
>>>> >> > Ok, that makes sense, thanks for the insight both Alex and Fred.
>>>> >> > I'm
>>>> >> > attaching the VDSM log of the SPM node at the time of the pause. I
>>>> >> > couldn't
>>>> >> > find anything that would clearly identify the problem, but maybe
>>>> >> > you'll be
>>>> >> > able to.
>>>> >>
>>>> >> In extreme conditions it will happen. When your storage is slow to
>>>> >> respond
>>>> >> to extension request, and when your write rate is very high then it
>>>> >> may
>>>> >> happen, as it is happening to you, that you run out space sooner than
>>>> >> the
>>>> >> extension finishes. You can change the watermark value I guess(right,
>>>> >> Fred?), but better would be to plan a bit more ahead and either use
>>>> >> preallocated or create thin and then allocate expected size in
>>>> >> advance
>>>> >> before the operation causing it (typically it only happens during
>>>> >> untarring
>>>> >> gigabytes of data, or huge database dump/restore)
>>>> >> Even then, the VM should always be automatially resumed once the disk
>>>> >> space is allocated
>>>> >>
>>>> >> Thanks,
>>>> >> michal
>>>> >>
>>>> >> >
>>>> >> > Thanks.
>>>> >> >
>>>> >> > Regards.
>>>> >> >
>>>> >> > El 2016-04-13 13:09, Fred Rolland escribió:
>>>> >> >> Hi,
>>>> >> >> Yes, just as Alex explained, if the disk has been created as thin
>>>> >> >> provisioning, the vdsm will extends once a watermark is reached.
>>>> >> >> Usually it should not get to the state the Vm is paused.
>>>> >> >> From the log, you can see that the request for extension has been
>>>> >> >> sent
>>>> >> >> before the VM got to the No Space Error.
>>>> >> >> Later, we can see the VM resuming.
>>>> >> >> INFO::2016-04-13
>>>> >> >> 10:52:04,182::vm::1026::virt.vm::(extendDrivesIfNeeded)
>>>> >> >> vmId=`f9cd282e-110a-4896-98d3-6d320662744d`::Requesting extension
>>>> >> >> for
>>>> >> >> volume
>>>> >> >> ....
>>>> >> >> INFO::2016-04-13 10:52:29,360::vm::3728::virt.vm::(onIOError)
>>>> >> >> vmId=`f9cd282e-110a-4896-98d3-6d320662744d`::abnormal vm stop
>>>> >> >> device
>>>> >> >> virtio-disk0 error enospc
>>>> >> >> ....
>>>> >> >> INFO::2016-04-13
>>>> >> >> 10:52:54,317::vm::5084::virt.vm::(_logGuestCpuStatus)
>>>> >> >> vmId=`f9cd282e-110a-4896-98d3-6d320662744d`::CPU running: onResume
>>>> >> >> Note that the extension is done on the SPM host, so it would be
>>>> >> >> interesting to see the vdsm log from the host that was in SPM role
>>>> >> >> at
>>>> >> >> this timeframe.
>>>> >> >> Regards,
>>>> >> >> Fred
>>>> >> >> On Wed, Apr 13, 2016 at 2:43 PM, Alex Crow
>>>> >> >> <acrow at integrafin.co.uk>
>>>> >> >> wrote:
>>>> >> >>> Hi,
>>>> >> >>> If you have set up VM disks as Thin Provisioned, the VM has to
>>>> >> >>> pause when the disk image needs to expand. You won't see this on
>>>> >> >>> VMs
>>>> >> >>> with preallocated storage.
>>>> >> >>> It's not the SAN that's running out of space, it's the VM image
>>>> >> >>> needing to be expanded incrementally each time.
>>>> >> >>> Cheers
>>>> >> >>> Alex
>>>> >> >>> On 13/04/16 12:04, nicolas at devels.es wrote:
>>>> >> >>> Hi Fred,
>>>> >> >>> This is an iSCSI storage. I'm attaching the VDSM logs from the
>>>> >> >>> host
>>>> >> >>> where this machine has been running. Should you need any further
>>>> >> >>> info, don't hesitate to ask.
>>>> >> >>> Thanks.
>>>> >> >>> Regards.
>>>> >> >>> El 2016-04-13 11:54, Fred Rolland escribió:
>>>> >> >>> Hi,
>>>> >> >>> What kind of storage do you have ? (ISCSI,FC,NFS...)
>>>> >> >>> Can you provide the vdsm logs from the host where this VM runs ?
>>>> >> >>> Thanks,
>>>> >> >>> Freddy
>>>> >> >>> On Wed, Apr 13, 2016 at 1:02 PM, <nicolas at devels.es> wrote:
>>>> >> >>> Hi,
>>>> >> >>> We're running oVirt 3.6.4.1-1. Lately we're seeing a bunch of
>>>> >> >>> events like these:
>>>> >> >>> 2016-04-13 10:52:30,735 INFO
>>>> >> >>> [org.ovirt.engine.core.vdsbroker.VmAnalyzer]
>>>> >> >>> (DefaultQuartzScheduler_Worker-86) [60dea18f] VM
>>>> >> >>> 'f9cd282e-110a-4896-98d3-6d320662744d'(vm.domain.com [1] [1])
>>>> >> >>> moved
>>>> >> >>> from
>>>> >> >>> 'Up' --> 'Paused'
>>>> >> >>> 2016-04-13 10:52:30,815 INFO
>>>> >> >>
>>>> >> >>
>>>> >> >> [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector]
>>>> >> >>> (DefaultQuartzScheduler_Worker-86) [60dea18f] Correlation ID:
>>>> >> >>> null,
>>>> >> >>> Call Stack: null, Custom Event ID: -1, Message: VM vm.domain.com
>>>> >> >>> [1] [1]
>>>> >> >>> has been paused.
>>>> >> >>> 2016-04-13 10:52:30,898 ERROR
>>>> >> >>
>>>> >> >>
>>>> >> >> [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector]
>>>> >> >>> (DefaultQuartzScheduler_Worker-86) [60dea18f] Correlation ID:
>>>> >> >>> null,
>>>> >> >>> Call Stack: null, Custom Event ID: -1, Message: VM vm.domain.com
>>>> >> >>> [1] [1]
>>>> >> >>> has been paused due to no Storage space error.
>>>> >> >>> 2016-04-13 10:52:52,320 WARN
>>>> >> >>> [org.ovirt.engine.core.vdsbroker.irsbroker.IrsProxyData]
>>>> >> >>> (org.ovirt.thread.pool-8-thread-38) [] domain
>>>> >> >>> '5de4a000-a9c4-489c-8eee-10368647c413:iscsi01' in problem. vds:
>>>> >> >>> 'host6.domain.com [2] [2]'
>>>> >> >>> 2016-04-13 10:52:55,183 INFO
>>>> >> >>> [org.ovirt.engine.core.vdsbroker.VmAnalyzer]
>>>> >> >>> (DefaultQuartzScheduler_Worker-70) [3da0f3d4] VM
>>>> >> >>> 'f9cd282e-110a-4896-98d3-6d320662744d'(vm.domain.com [1] [1])
>>>> >> >>> moved
>>>> >> >>> from
>>>> >> >>> 'Paused' --> 'Up'
>>>> >> >>> 2016-04-13 10:52:55,318 INFO
>>>> >> >>
>>>> >> >>
>>>> >> >> [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector]
>>>> >> >>> (DefaultQuartzScheduler_Worker-70) [3da0f3d4] Correlation ID:
>>>> >> >>> null,
>>>> >> >>> Call Stack: null, Custom Event ID: -1, Message: VM vm.domain.com
>>>> >> >>> [1] [1]
>>>> >> >>> has recovered from paused back to up.
>>>> >> >>> The storage domain is far from being full, though (400+ G
>>>> >> >>> available
>>>> >> >>> right now). Could this be related to this other issue [1]? If
>>>> >> >>> not,
>>>> >> >>> how could I debug what's going on?
>>>> >> >>> Thanks.
>>>> >> >>>  [1]: https://www.mail-archive.com/users@ovirt.org/msg32079.html
>>>> >> >>> [3]
>>>> >> >>> [3]
>>>> >> >>> _______________________________________________
>>>> >> >>> Users mailing list
>>>> >> >>> Users at ovirt.org
>>>> >> >>> http://lists.ovirt.org/mailman/listinfo/users [4] [4]
>>>> >> >>> Links:
>>>> >> >>> ------
>>>> >> >>> [1] http://vm.domain.com [1]
>>>> >> >>> [2] http://host6.domain.com [2]
>>>> >> >>> [3] https://www.mail-archive.com/users@ovirt.org/msg32079.html
>>>> >> >>> [3]
>>>> >> >>> [4] http://lists.ovirt.org/mailman/listinfo/users [4]
>>>> >> >> _______________________________________________
>>>> >> >> Users mailing list
>>>> >> >> Users at ovirt.org
>>>> >> >> http://lists.ovirt.org/mailman/listinfo/users [4]
>>>> >> >> --
>>>> >> >> This message is intended only for the addressee and may contain
>>>> >> >> confidential information. Unless you are that person, you may not
>>>> >> >> disclose its contents or use it in any way and are requested to
>>>> >> >> delete
>>>> >> >> the message along with any attachments and notify us immediately.
>>>> >> >> This email is not intended to, nor should it be taken to,
>>>> >> >> constitute
>>>> >> >> advice.
>>>> >> >> The information provided is correct to our knowledge & belief and
>>>> >> >> must
>>>> >> >> not
>>>> >> >> be used as a substitute for obtaining tax, regulatory, investment,
>>>> >> >> legal or
>>>> >> >> any other appropriate advice.
>>>> >> >> "Transact" is operated by Integrated Financial Arrangements Ltd.
>>>> >> >> 29 Clement's Lane, London EC4N 7AE. Tel: (020) 7608 4900 Fax:
>>>> >> >> (020)
>>>> >> >> 7608 5300.
>>>> >> >> (Registered office: as above; Registered in England and Wales
>>>> >> >> under
>>>> >> >> number: 3727592). Authorised and regulated by the Financial
>>>> >> >> Conduct
>>>> >> >> Authority (entered on the Financial Services Register; no.
>>>> >> >> 190856).
>>>> >> >> _______________________________________________
>>>> >> >> Users mailing list
>>>> >> >> Users at ovirt.org
>>>> >> >> http://lists.ovirt.org/mailman/listinfo/users [4]
>>>> >> >> Links:
>>>> >> >> ------
>>>> >> >> [1] http://vm.domain.com
>>>> >> >> [2] http://host6.domain.com
>>>> >> >> [3] https://www.mail-archive.com/users@ovirt.org/msg32079.html
>>>> >> >> [4] http://lists.ovirt.org/mailman/listinfo/users
>>>> >> >> _______________________________________________
>>>> >> >> Users mailing list
>>>> >> >> Users at ovirt.org
>>>> >> >> http://lists.ovirt.org/mailman/listinfo/users
>>>> >> > <vdsm.log.gz>_______________________________________________
>>>> >> > Users mailing list
>>>> >> > Users at ovirt.org
>>>> >> > http://lists.ovirt.org/mailman/listinfo/users
>>>> >>
>>>> >
>>>> >
>>>> > _______________________________________________
>>>> > Users mailing list
>>>> > Users at ovirt.org
>>>> > http://lists.ovirt.org/mailman/listinfo/users
>>>> >
>>>
>>>
>>>
>> _______________________________________________
>> Users mailing list
>> Users at ovirt.org
>> http://lists.ovirt.org/mailman/listinfo/users



More information about the Users mailing list