[Users] Excessive syslog logging from vdsm/sampling.py

Wed Dec 18 15:10:36 UTC 2013

----- Original Message -----
> From: "Sander Grendelman" <sander at grendelman.com>
> To: "Nir Soffer" <nsoffer at redhat.com>
> Cc: "Michal Skrivanek" <mskrivan at redhat.com>, users at ovirt.org, "Ayal Baron" <abaron at redhat.com>
> Sent: Wednesday, December 18, 2013 4:00:21 PM
> Subject: Re: [Users] Excessive syslog logging from vdsm/sampling.py
> 
> On Wed, Dec 18, 2013 at 1:01 PM, Nir Soffer <nsoffer at redhat.com> wrote:
> >> > We should check why vmDrive does not have the format attribute in your
> >> > case.
> >>
> >> Probably because this was checked during migration it's a race between
> >> migration finish and monitoring checking this.
> >> You can see the error repeated in other cases as well (in bugzilla):
> >> 988047,
> >> 994534
> >
> > Then this patch should fix your problem:
> > http://gerrit.ovirt.org/22518
> 
> I applied this patch and restarted vdsmd on both nodes
> (maintenance->restart->activate).
> The errors still occur, but it is limited to fewer machines ( see [1] ).
> 
> - Almost all VMs were imported with virt-v2v (not only the VMs with errors).
> - Almost all VMs were migrated to another storage domain (offline).
> - Migration to another host was done with all VMs
> 
> The current vdsm logs for both are attached to this e-mail.
> Please let me know if you need any additional information.
> 
> [1]: unique sampling.py errors on both nodes.
> 
> #### Node 1
> ## Before patch
> [root at gnkvm01 ~]# tail -n 2000 /var/log/messages-20131215 | awk
> '/sampling.py/ {print $8}' | sort -u
> vmId=`0ae3a3d7-ead9-4c0d-9df0-3901b6e6859c`::Stats
> vmId=`22654002-cbef-454d-b001-7823da5f592f`::Stats
> vmId=`3e481c73-57df-4dc3-8b1c-421a74308a5e`::Stats
> vmId=`57dbe688-4e18-4358-aa3e-f3f6022ef9b3`::Stats
> vmId=`66aa5555-2299-4d93-931d-b7a2e421b7e9`::Stats
> vmId=`6df65698-4995-4c75-9433-75affe9b9c38`::Stats
> vmId=`9260c69c-93a2-4f8a-b5e9-eaab5e4f4708`::Stats
> vmId=`9edb3e08-f098-4633-a122-e5ba29ae12ea`::Stats
> vmId=`c6f56584-1ccd-4c02-be94-897a4e747d34`::Stats
> vmId=`d3dae626-279b-4bcf-afc4-7a3c198a3035`::Stats
> 
> ## After patch
> [root at gnkvm01 ~]# tail -n 2000 /var/log/messages | awk '/sampling.py/
> {print $8}' | sort -u
> vmId=`007ca72e-d0d0-4477-87d4-fb60328cd882`::Stats
> vmId=`1075a178-a4c6-4a8f-a199-56401cd0652f`::Stats
> 
> #### Node 2
> ## Before patch
> [root at gnkvm02 ~]# tail -n 2000 /var/log/messages-20131215 | awk
> '/sampling.py/ {print $8}' | sort -u
> vmId=`00317758-16fe-4ac6-b9fd-d522c9908861`::Stats
> vmId=`007ca72e-d0d0-4477-87d4-fb60328cd882`::Stats
> vmId=`06405f12-d763-4bd6-b5e5-997e3f6bb1f6`::Stats
> vmId=`1075a178-a4c6-4a8f-a199-56401cd0652f`::Stats
> vmId=`1bba8930-9c04-4c5c-8b15-c9fe14022cb5`::Stats
> vmId=`2036c21d-e0a4-4d55-a9a7-4cd9dd9d250d`::Stats
> vmId=`5fff0cc7-24e4-4e4a-b220-ba49f9145060`::Stats
> vmId=`86708f62-fcc6-4d0f-978a-3788a61f9775`::Stats
> vmId=`9b8e6d07-295c-404d-a672-efc94a24b6bc`::Stats
> vmId=`aa0445b6-8ca5-4557-9f9b-ee543d6435df`::Stats
> 
> ## After patch
> [root at gnkvm02 ~]# tail -n 2000 /var/log/messages | awk '/sampling.py/
> {print $8}' | sort -u
> vmId=`d3dae626-279b-4bcf-afc4-7a3c198a3035`::Stats

Well in node1.log, we have 7687 errors:
$ grep 'has no attribute' vdsm-node1.log | wc -l
7687

But no such errors in vdsm-node2.log:
$ grep 'has no attribute' vdsm-node2.log | wc -l
0

Can you explain what is the difference between node1.log and node2.log?

Can you send before and after log files, or point to the time in the log where you started the version with the patch?

Thanks,
Nir