[ovirt-devel] [vdsm] logging levels and noise in the log

Michal Skrivanek michal.skrivanek at redhat.com
Fri Jul 11 07:13:27 UTC 2014


On Jul 10, 2014, at 18:07 , Dan Kenigsberg <danken at redhat.com> wrote:

> On Thu, Jul 10, 2014 at 10:15:23AM -0400, Nir Soffer wrote:
>> ----- Original Message -----
>>> From: "Martin Sivak" <msivak at redhat.com>
>>> To: devel at ovirt.org
>>> Sent: Thursday, July 10, 2014 1:09:14 PM
>>> Subject: [ovirt-devel] [vdsm] logging levels and noise in the log
>>> 
>>> Hi,
>>> 
>>> we discussed the right amount of logging with Nir and Francesco while
>>> reviewing my patches. Francesco was against one DEBUG log message that could
>>> potentially flood the logs. But the message in question was important for me
>>> and SLA, because it was logging the VDSM changes in response to MoM command.

well, that's an easy decision…if it can flood, you have to fix it so it doesn't;-)
also likely the DEBUG level is not right if you deem it's important

>>> 
>>> Since DEBUG really is meant to be used for debug messages, I have one
>>> proposal:
>>> 
>>> 1) Change the default log level of vdsm.log to INFO
>> 
>> Not good enough - our log are full or tracebacks logged in error
>> and warning levels, which are not helpful to the administrator.
>> 
>> So this will create log that is not useful for anything.
>> 
>>> 2) Log (only) DEBUG messages to a separate file vdsm-debug.log
>> 
>> The debug log should include *all* log levels, basically being the same as
>> the current vdsm log. When you need the logs, you want all the information in
>> the same file.
>> 
>>> 3) Make the vdsm-debug.log rotate faster (every hour, keep only last couple
>>> of hours?) so it does not grow too much
>> 
>> I just debugged a problem in production, where the mailbox stopped working
>> 86 logs ago - so having only couple of hours is too little.

exactly. Let's also not forget the omnipresent ENOSPACE on node…due to logs
If it's debug, it should be likely triggered only while troubleshooting. We should have a run-time setting for debug levels

>> 
>> However when using proper logging, This issue should be clear even
>> in info log level, which show error, warnings, and info message such
>> as mailbox stopping.
>> 
>>> 
>>> This way the customer would be able to monitor INFO logs (much smaller)
>>> without all the noise and we would be able to collect the DEBUG part in case
>>> something happens.
>> 
>> I think you got it right with the dual log approach. We should have two logs:
>> 
>> 1. Admin log (events.log?)
>> 
>>   This log should show important events and errors, but it should not contain
>>   any tracebacks, thread name, file names etc. Basically, something like the
>>   events log in the engine.
>> 
>>   If we take storage for example, this log can contain storage domain state
>>   change, or configuration changes triggered by the engine, or events like
>>   disk that becoming too full.
> 
> I have recently introduced a connectivity.log, that tracks changes in
> host network devices. Adding parallel logs for other important
> admin-understandable changes makes sense.

I wonder if this is the right approach…IMHO it's not, as you have to now take care of the additional file in the context of your code. It's easy to diverge then.
Why not follow the syslog-like way with config, splitting/limiting files based on e.g. subpackage prefix, etc…

I'm sure it was brought up many times before….why don't we just send it to syslog?:)

> 
> I'd be happy to hear which bits of information do oVirt admins wish to
> see in this "events.log".
> 
>> 
>> 2. Developer log (vdsm.log)
>> 
>>   There is lot of work to to here:
>> 
>>   - Disable library logs - we have lot of infrastructure code that log useless
>>     messages. This code should be verified by tests, not by looking at the logs. 
>> 
>>   - Removing junk log messages:
>>     - new ioprocess logs: https://bugzilla.redhat.com/1114908
>>     - new jsonrpc logs: same issue - no bug yet
>> 
>>   - Adding useful info like VDSM changes in response to MoM command.
>>     I did not look at the patch, but if vdsm changes state, it must be logged
>> 
>>   - Improving log format
>>     http://gerrit.ovirt.org/28869
> 
> The only problem is to agree which format is better. I personally like
> condensed format, that is possibly machine-readable. Spaces, "in
> function" and repeating module name and its full path, are suboptimal
> imo.
+1

although that was also brought up before IIRC….journald?:)

> 
>> 
>>   - Removing duplicate errors/tracebacks for same error, caused by code that does
>> 
>>     try:
>>        something()
>>     except:
>>        log.error('something failed...')
>>        raise
> 
> +1, I hate those.
> 
>> 
>>     Should be:
>> 
>>     something()
>> 
>>     And let the upper level handle the failure, logging a trackback.
>> 
>>     Or:
>> 
>>     try:
>>         something()
>>     Except LowLevelError:
>>         raise FlowError("description...")
>> 
>>    FlowError can log a traceback of both errors by using sys.exec_info.
>>    See http://hg.moinmo.in/moin/1.9/file/274a7f675261/MoinMoin/error.py
>> 
>>   - Disabling this log by default?
>>     Only if an admin can easily enable or disable the log, ideally from the ui.
> 
> It all too often that a user comes up with a bug or a question. Seeing
> his logs usually gives a good hint regarding the solution, or at least
> who is to blame. It is important to me to keep this tool handy.
> 
>> 
>> So for example, when flow fails, the event log will contain the a single error message
>> related to the flow, while the debug log will contains the same error message with
>> a traceback and useful debugging messages during the flow.
>> 
>> Lets make this a feature for 3.6?
> 
> Vdsm has a setLogLevel; it must have rotton out of disuse. But I'd love
> to see an Engine option for "cluster debug level", that is overridable
> per-host. Each admin should decide what does they care more about:
> debugablility by developers, or the bliss of quiet logs.

per host sounds good for vdsm (btw engine suffers from the same)
I wouldn't bother with UI though….I kind of never understood the tools which have "enable debug" checkbox in UI but then you anyway have no idea what is it now creating, what's the impact on the system, where to look for it.
I think debug should remain a purely troubleshooting measure, hence requiring you to log to host and do something, look/analyze/copy the big dump of info…and then turn it back. 
High debug levels typically bring some caveats like less performance…it shouldn't be something I can blindly check and forgot about it in UI.

Thanks,
michal

> 
> Dan.
> 
> _______________________________________________
> Devel mailing list
> Devel at ovirt.org
> http://lists.ovirt.org/mailman/listinfo/devel




More information about the Devel mailing list