[ovirt-devel] [vdsm] logging levels and noise in the log

Dan Kenigsberg danken at redhat.com
Thu Jul 10 16:07:09 UTC 2014


On Thu, Jul 10, 2014 at 10:15:23AM -0400, Nir Soffer wrote:
> ----- Original Message -----
> > From: "Martin Sivak" <msivak at redhat.com>
> > To: devel at ovirt.org
> > Sent: Thursday, July 10, 2014 1:09:14 PM
> > Subject: [ovirt-devel] [vdsm] logging levels and noise in the log
> > 
> > Hi,
> > 
> > we discussed the right amount of logging with Nir and Francesco while
> > reviewing my patches. Francesco was against one DEBUG log message that could
> > potentially flood the logs. But the message in question was important for me
> > and SLA, because it was logging the VDSM changes in response to MoM command.
> > 
> > Since DEBUG really is meant to be used for debug messages, I have one
> > proposal:
> > 
> > 1) Change the default log level of vdsm.log to INFO
> 
> Not good enough - our log are full or tracebacks logged in error
> and warning levels, which are not helpful to the administrator.
> 
> So this will create log that is not useful for anything.
> 
> > 2) Log (only) DEBUG messages to a separate file vdsm-debug.log
> 
> The debug log should include *all* log levels, basically being the same as
> the current vdsm log. When you need the logs, you want all the information in
> the same file.
> 
> > 3) Make the vdsm-debug.log rotate faster (every hour, keep only last couple
> > of hours?) so it does not grow too much
> 
> I just debugged a problem in production, where the mailbox stopped working
> 86 logs ago - so having only couple of hours is too little.
> 
> However when using proper logging, This issue should be clear even
> in info log level, which show error, warnings, and info message such
> as mailbox stopping.
> 
> > 
> > This way the customer would be able to monitor INFO logs (much smaller)
> > without all the noise and we would be able to collect the DEBUG part in case
> > something happens.
> 
> I think you got it right with the dual log approach. We should have two logs:
> 
> 1. Admin log (events.log?)
> 
>    This log should show important events and errors, but it should not contain
>    any tracebacks, thread name, file names etc. Basically, something like the
>    events log in the engine.
> 
>    If we take storage for example, this log can contain storage domain state
>    change, or configuration changes triggered by the engine, or events like
>    disk that becoming too full.

I have recently introduced a connectivity.log, that tracks changes in
host network devices. Adding parallel logs for other important
admin-understandable changes makes sense.

I'd be happy to hear which bits of information do oVirt admins wish to
see in this "events.log".

> 
> 2. Developer log (vdsm.log)
> 
>    There is lot of work to to here:
> 
>    - Disable library logs - we have lot of infrastructure code that log useless
>      messages. This code should be verified by tests, not by looking at the logs. 
> 
>    - Removing junk log messages:
>      - new ioprocess logs: https://bugzilla.redhat.com/1114908
>      - new jsonrpc logs: same issue - no bug yet
> 
>    - Adding useful info like VDSM changes in response to MoM command.
>      I did not look at the patch, but if vdsm changes state, it must be logged
>  
>    - Improving log format
>      http://gerrit.ovirt.org/28869

The only problem is to agree which format is better. I personally like
condensed format, that is possibly machine-readable. Spaces, "in
function" and repeating module name and its full path, are suboptimal
imo.

> 
>    - Removing duplicate errors/tracebacks for same error, caused by code that does
>      
>      try:
>         something()
>      except:
>         log.error('something failed...')
>         raise

+1, I hate those.

>      
>      Should be:
> 
>      something()
> 
>      And let the upper level handle the failure, logging a trackback.
> 
>      Or:
> 
>      try:
>          something()
>      Except LowLevelError:
>          raise FlowError("description...")
> 
>     FlowError can log a traceback of both errors by using sys.exec_info.
>     See http://hg.moinmo.in/moin/1.9/file/274a7f675261/MoinMoin/error.py
> 
>    - Disabling this log by default?
>      Only if an admin can easily enable or disable the log, ideally from the ui.

It all too often that a user comes up with a bug or a question. Seeing
his logs usually gives a good hint regarding the solution, or at least
who is to blame. It is important to me to keep this tool handy.

> 
> So for example, when flow fails, the event log will contain the a single error message
> related to the flow, while the debug log will contains the same error message with
> a traceback and useful debugging messages during the flow.
> 
> Lets make this a feature for 3.6?

Vdsm has a setLogLevel; it must have rotton out of disuse. But I'd love
to see an Engine option for "cluster debug level", that is overridable
per-host. Each admin should decide what does they care more about:
debugablility by developers, or the bliss of quiet logs.

Dan.




More information about the Devel mailing list