Re: [Users] Excessive syslog logging from vdsm/sampling.py

18 Dec 2013

      ----- Original Message -----
...
From: "Sander Grendelman" <sander@grendelman.com>
To: "Nir Soffer" <nsoffer@redhat.com>
Cc: "Michal Skrivanek" <mskrivan@redhat.com>, users@ovirt.org, "Ayal Baron" <abaron@redhat.com>
Sent: Wednesday, December 18, 2013 4:00:21 PM
Subject: Re: [Users] Excessive syslog logging from vdsm/sampling.py
On Wed, Dec 18, 2013 at 1:01 PM, Nir Soffer <nsoffer@redhat.com> wrote:
...
...
...
We should check why vmDrive does not have the format attribute in your
case.
Probably because this was checked during migration it's a race between
migration finish and monitoring checking this.
You can see the error repeated in other cases as well (in bugzilla):
988047,
994534
Then this patch should fix your problem:
http://gerrit.ovirt.org/22518
I applied this patch and restarted vdsmd on both nodes
(maintenance->restart->activate).
The errors still occur, but it is limited to fewer machines ( see [1] ).
- Almost all VMs were imported with virt-v2v (not only the VMs with errors).
- Almost all VMs were migrated to another storage domain (offline).
- Migration to another host was done with all VMs
The current vdsm logs for both are attached to this e-mail.
Please let me know if you need any additional information.
[1]: unique sampling.py errors on both nodes.
#### Node 1
## Before patch
[root@gnkvm01 ~]# tail -n 2000 /var/log/messages-20131215 | awk
'/sampling.py/ {print $8}' | sort -u
vmId=`0ae3a3d7-ead9-4c0d-9df0-3901b6e6859c`::Stats
vmId=`22654002-cbef-454d-b001-7823da5f592f`::Stats
vmId=`3e481c73-57df-4dc3-8b1c-421a74308a5e`::Stats
vmId=`57dbe688-4e18-4358-aa3e-f3f6022ef9b3`::Stats
vmId=`66aa5555-2299-4d93-931d-b7a2e421b7e9`::Stats
vmId=`6df65698-4995-4c75-9433-75affe9b9c38`::Stats
vmId=`9260c69c-93a2-4f8a-b5e9-eaab5e4f4708`::Stats
vmId=`9edb3e08-f098-4633-a122-e5ba29ae12ea`::Stats
vmId=`c6f56584-1ccd-4c02-be94-897a4e747d34`::Stats
vmId=`d3dae626-279b-4bcf-afc4-7a3c198a3035`::Stats
## After patch
[root@gnkvm01 ~]# tail -n 2000 /var/log/messages | awk '/sampling.py/
{print $8}' | sort -u
vmId=`007ca72e-d0d0-4477-87d4-fb60328cd882`::Stats
vmId=`1075a178-a4c6-4a8f-a199-56401cd0652f`::Stats
#### Node 2
## Before patch
[root@gnkvm02 ~]# tail -n 2000 /var/log/messages-20131215 | awk
'/sampling.py/ {print $8}' | sort -u
vmId=`00317758-16fe-4ac6-b9fd-d522c9908861`::Stats
vmId=`007ca72e-d0d0-4477-87d4-fb60328cd882`::Stats
vmId=`06405f12-d763-4bd6-b5e5-997e3f6bb1f6`::Stats
vmId=`1075a178-a4c6-4a8f-a199-56401cd0652f`::Stats
vmId=`1bba8930-9c04-4c5c-8b15-c9fe14022cb5`::Stats
vmId=`2036c21d-e0a4-4d55-a9a7-4cd9dd9d250d`::Stats
vmId=`5fff0cc7-24e4-4e4a-b220-ba49f9145060`::Stats
vmId=`86708f62-fcc6-4d0f-978a-3788a61f9775`::Stats
vmId=`9b8e6d07-295c-404d-a672-efc94a24b6bc`::Stats
vmId=`aa0445b6-8ca5-4557-9f9b-ee543d6435df`::Stats
## After patch
[root@gnkvm02 ~]# tail -n 2000 /var/log/messages | awk '/sampling.py/
{print $8}' | sort -u
vmId=`d3dae626-279b-4bcf-afc4-7a3c198a3035`::Stats
Well in node1.log, we have 7687 errors:
$ grep 'has no attribute' vdsm-node1.log | wc -l
7687

But no such errors in vdsm-node2.log:
$ grep 'has no attribute' vdsm-node2.log | wc -l
0

Can you explain what is the difference between node1.log and node2.log?

Can you send before and after log files, or point to the time in the log where you started the version with the patch?

Thanks,
Nir