[ovirt-users] Ovirt Hypervisor vdsm.Scheduler logs fill partition

Sun Oct 23 15:37:05 UTC 2016

Do you know when .34 will be released?

http://mirror.centos.org/centos/7/virt/x86_64/ovirt-3.6/
Latest version is:
vdsm-cli-4.17.32-1.el7.noarch.rpm 08-Aug-2016 17:36

On Fri, Oct 14, 2016 at 1:11 AM, Francesco Romani <fromani at redhat.com>
wrote:

>
> ----- Original Message -----
> > From: "Simone Tiraboschi" <stirabos at redhat.com>
> > To: "Steve Dainard" <sdainard at spd1.com>, "Francesco Romani" <
> fromani at redhat.com>
> > Cc: "users" <users at ovirt.org>
> > Sent: Friday, October 14, 2016 9:59:49 AM
> > Subject: Re: [ovirt-users] Ovirt Hypervisor vdsm.Scheduler logs fill
> partition
> >
> > On Fri, Oct 14, 2016 at 1:12 AM, Steve Dainard <sdainard at spd1.com>
> wrote:
> >
> > > Hello,
> > >
> > > I had a hypervisor semi-crash this week, 4 of ~10 VM's continued to
> run,
> > > but the others were killed off somehow and all VM's running on this
> host
> > > had '?' status in the ovirt UI.
> > >
> > > This appears to have been caused by vdsm logs filling up disk space on
> the
> > > logging partition.
> > >
> > > I've attached the log file vdsm.log.27.xz which shows this error:
> > >
> > > vdsm.Scheduler::DEBUG::2016-10-11
> > > 16:42:09,318::executor::216::Executor::(_discard)
> > > Worker discarded: <Worker name=periodic/3017 running <Operation
> > > action=<VmDispatcher operation=<class
> > > 'virt.periodic.DriveWatermarkMonitor'>
> > > at 0x7f8e90021210> at 0x7f8e90021250> discarded at 0x7f8dd123e850>
> > >
> > > which happens more and more frequently throughout the log.
> > >
> > > It was a bit difficult to understand what caused the failure, but the
> logs
> > > were getting really large, then being xz'd which compressed 11G+ into
> a few
> > > MB. Once this happened the disk space would be freed, and nagios
> wouldn't
> > > hit the 3rd check to throw a warning, until pretty much right at the
> crash.
> > >
> > > I was able to restart vdsmd to resolve the issue, but I still need to
> know
> > > why these logs started to stack up so I can avoid this issue in the
> future.
> > >
> >
> > We had this one: https://bugzilla.redhat.com/show_bug.cgi?id=1383259
> > but in your case the logs are rotating.
> > Francesco?
>
> Hi,
>
> yes, it is a different issue. Here the log messages are caused by the
> Worker threads
> of the periodic subsystem, which are leaking[1].
> This was a bug in Vdsm (insufficient protection against rogue domains),
> but the
> real problem is that some of your domain are being unresponsive at
> hypervisor level.
> The most likely cause is in turn unresponsive storages.
>
> Fixes are been committed and shipped with Vdsm 4.17.34.
>
> See: ttps://bugzilla.redhat.com/1364925
>
> HTH,
>
> +++
>
> [1] actually, they are replaced too quickly, leading to unbound growth.
> So those aren't actually "leaking", Vdsm is just overzealous handling one
> error condition,
> making things worse than before.
> Still serious issue, no doubt, but quite different cause.
>
> --
> Francesco Romani
> Red Hat Engineering Virtualization R & D
> Phone: 8261328
> IRC: fromani
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.ovirt.org/pipermail/users/attachments/20161023/199928d2/attachment-0001.html>