[Users] Excessive syslog logging from vdsm/sampling.py

Wed Dec 18 15:46:37 UTC 2013

----- Original Message -----
> From: "Sander Grendelman" <sander at grendelman.com>
> To: "Nir Soffer" <nsoffer at redhat.com>
> Cc: "Michal Skrivanek" <mskrivan at redhat.com>, users at ovirt.org, "Ayal Baron" <abaron at redhat.com>
> Sent: Wednesday, December 18, 2013 4:00:21 PM
> Subject: Re: [Users] Excessive syslog logging from vdsm/sampling.py
> 
> On Wed, Dec 18, 2013 at 1:01 PM, Nir Soffer <nsoffer at redhat.com> wrote:
> >> > We should check why vmDrive does not have the format attribute in your
> >> > case.
> >>
> >> Probably because this was checked during migration it's a race between
> >> migration finish and monitoring checking this.
> >> You can see the error repeated in other cases as well (in bugzilla):
> >> 988047,
> >> 994534
> >
> > Then this patch should fix your problem:
> > http://gerrit.ovirt.org/22518
> 
> I applied this patch and restarted vdsmd on both nodes
> (maintenance->restart->activate).
> The errors still occur, but it is limited to fewer machines ( see [1] ).
> 
> - Almost all VMs were imported with virt-v2v (not only the VMs with errors).
> - Almost all VMs were migrated to another storage domain (offline).
> - Migration to another host was done with all VMs
> 
> The current vdsm logs for both are attached to this e-mail.
> Please let me know if you need any additional information.
> 
> [1]: unique sampling.py errors on both nodes.
> 
> #### Node 1
> ## Before patch
> [root at gnkvm01 ~]# tail -n 2000 /var/log/messages-20131215 | awk
> '/sampling.py/ {print $8}' | sort -u
> vmId=`0ae3a3d7-ead9-4c0d-9df0-3901b6e6859c`::Stats
> vmId=`22654002-cbef-454d-b001-7823da5f592f`::Stats
> vmId=`3e481c73-57df-4dc3-8b1c-421a74308a5e`::Stats
> vmId=`57dbe688-4e18-4358-aa3e-f3f6022ef9b3`::Stats
> vmId=`66aa5555-2299-4d93-931d-b7a2e421b7e9`::Stats
> vmId=`6df65698-4995-4c75-9433-75affe9b9c38`::Stats
> vmId=`9260c69c-93a2-4f8a-b5e9-eaab5e4f4708`::Stats
> vmId=`9edb3e08-f098-4633-a122-e5ba29ae12ea`::Stats
> vmId=`c6f56584-1ccd-4c02-be94-897a4e747d34`::Stats
> vmId=`d3dae626-279b-4bcf-afc4-7a3c198a3035`::Stats
> 
> ## After patch
> [root at gnkvm01 ~]# tail -n 2000 /var/log/messages | awk '/sampling.py/
> {print $8}' | sort -u
> vmId=`007ca72e-d0d0-4477-87d4-fb60328cd882`::Stats
> vmId=`1075a178-a4c6-4a8f-a199-56401cd0652f`::Stats
> 
> #### Node 2
> ## Before patch
> [root at gnkvm02 ~]# tail -n 2000 /var/log/messages-20131215 | awk
> '/sampling.py/ {print $8}' | sort -u
> vmId=`00317758-16fe-4ac6-b9fd-d522c9908861`::Stats
> vmId=`007ca72e-d0d0-4477-87d4-fb60328cd882`::Stats
> vmId=`06405f12-d763-4bd6-b5e5-997e3f6bb1f6`::Stats
> vmId=`1075a178-a4c6-4a8f-a199-56401cd0652f`::Stats
> vmId=`1bba8930-9c04-4c5c-8b15-c9fe14022cb5`::Stats
> vmId=`2036c21d-e0a4-4d55-a9a7-4cd9dd9d250d`::Stats
> vmId=`5fff0cc7-24e4-4e4a-b220-ba49f9145060`::Stats
> vmId=`86708f62-fcc6-4d0f-978a-3788a61f9775`::Stats
> vmId=`9b8e6d07-295c-404d-a672-efc94a24b6bc`::Stats
> vmId=`aa0445b6-8ca5-4557-9f9b-ee543d6435df`::Stats
> 
> ## After patch
> [root at gnkvm02 ~]# tail -n 2000 /var/log/messages | awk '/sampling.py/
> {print $8}' | sort -u
> vmId=`d3dae626-279b-4bcf-afc4-7a3c198a3035`::Stats

I think I understand your logs:

On node1 you started vdsm with the patch on 2013-12-18 13:33:18.

The error "object has no attribute 'format'" is gone with this patch.

On node2 I don't see this error at all, and I don't see a restart of vdsm. Maybe the entire log was generated with the patch?

It would be nice if you open a bug for this issue.

So one problem is gone, but you have other problems:

Thread-361::DEBUG::2013-12-18 14:41:42,512::libvirtconnection::108::libvirtconnection::(wrapper) Unknown libvirterror: ecode: 8 edom: 10 level: 2 message: invalid argument: invalid path /rhev/data-center/mnt/blockSD/b526b148-b810-47c6-9bdd-4fd8d8226855/images/c928f2c7-9ba0-496f-8be9-b8804fdc1a6d/a2da1070-8de7-4e1f-b736-4c88d089a5cc not assigned to domain

$ grep 'not assigned to domain' vdsm-node1.log | wc -l
11488
$ grep 'not assigned to domain' vdsm-node2.log | wc -l
3326

$ grep 'not assigned to domain' vdsm-node1.log | awk '{print $6}' | sort -u
/rhev/data-center/mnt/blockSD/b526b148-b810-47c6-9bdd-4fd8d8226855/images/347f2238-c018-4370-94df-bd1e81f8b854/9e5dad95-73ea-4e5c-aa13-522efd9bad11
/rhev/data-center/mnt/blockSD/b526b148-b810-47c6-9bdd-4fd8d8226855/images/47015859-1995-4ce2-880c-a3c7068a67dd/341b32d6-4276-454d-b3f0-789b705c99cc
/rhev/data-center/mnt/blockSD/b526b148-b810-47c6-9bdd-4fd8d8226855/images/c928f2c7-9ba0-496f-8be9-b8804fdc1a6d/a2da1070-8de7-4e1f-b736-4c88d089a5cc

$ grep 'not assigned to domain' vdsm-node2.log | awk '{print $6}' | sort -u
/rhev/data-center/mnt/blockSD/b526b148-b810-47c6-9bdd-4fd8d8226855/images/347f2238-c018-4370-94df-bd1e81f8b854/9e5dad95-73ea-4e5c-aa13-522efd9bad11

This issue deserve its own bug.