[ovirt-users] ?3.4: VDSM Memory consumption

Daniel Helgenberger daniel.helgenberger at m-box.de
Wed Oct 1 11:30:31 UTC 2014


On 30.09.2014 17:09, Sandro Bonazzola wrote:
> Il 30/09/2014 17:03, Dan Kenigsberg ha scritto:
>> On Tue, Sep 30, 2014 at 10:23:47AM +0000, Daniel Helgenberger wrote:
>>> On 30.09.2014 11:57, Piotr Kliczewski wrote:
>>>>
>>>>
>>>> ----- Original Message -----
>>>>> From: "Daniel Helgenberger" <daniel.helgenberger at m-box.de>
>>>>> To: "Piotr Kliczewski" <pkliczew at redhat.com>, "Dan Kenigsberg" <danken at redhat.com>
>>>>> Cc: "Francesco Romani" <fromani at redhat.com>, users at ovirt.org
>>>>> Sent: Tuesday, September 30, 2014 11:50:28 AM
>>>>> Subject: Re: [ovirt-users]?3.4: VDSM Memory consumption
>>>>>
>>>>> Hello Piotr,
>>>>>
>>>>> On 30.09.2014 08:37, Piotr Kliczewski wrote:
>>>>>>
>>>>>> ----- Original Message -----
>>>>>>> From: "Dan Kenigsberg" <danken at redhat.com>
>>>>>>> To: "Daniel Helgenberger" <daniel.helgenberger at m-box.de>,
>>>>>>> pkliczew at redhat.com
>>>>>>> Cc: "Francesco Romani" <fromani at redhat.com>, users at ovirt.org
>>>>>>> Sent: Tuesday, September 30, 2014 1:11:42 AM
>>>>>>> Subject: Re: [ovirt-users]?3.4: VDSM Memory consumption
>>>>>>>
>>>>>>> On Mon, Sep 29, 2014 at 09:02:19PM +0000, Daniel Helgenberger wrote:
>>>>>>>> Hello Francesco,
>>>>>>>>
>>>>>>>> --
>>>>>>>> Daniel Helgenberger
>>>>>>>> m box bewegtbild GmbH
>>>>>>>>
>>>>>>>> P: +49/30/2408781-22
>>>>>>>> F: +49/30/2408781-10
>>>>>>>> ACKERSTR. 19
>>>>>>>> D-10115 BERLIN
>>>>>>>> www.m-box.de  www.monkeymen.tv
>>>>>>>>
>>>>>>>>> On 29.09.2014, at 22:19, Francesco Romani <fromani at redhat.com> wrote:
>>>>>>>>>
>>>>>>>>> ----- Original Message -----
>>>>>>>>>> From: "Daniel Helgenberger" <daniel.helgenberger at m-box.de>
>>>>>>>>>> To: "Francesco Romani" <fromani at redhat.com>
>>>>>>>>>> Cc: "Dan Kenigsberg" <danken at redhat.com>, users at ovirt.org
>>>>>>>>>> Sent: Monday, September 29, 2014 2:54:13 PM
>>>>>>>>>> Subject: Re: [ovirt-users]    3.4: VDSM Memory consumption
>>>>>>>>>>
>>>>>>>>>> Hello Francesco,
>>>>>>>>>>
>>>>>>>>>>> On 29.09.2014 13:55, Francesco Romani wrote:
>>>>>>>>>>> ----- Original Message -----
>>>>>>>>>>>> From: "Daniel Helgenberger" <daniel.helgenberger at m-box.de>
>>>>>>>>>>>> To: "Dan Kenigsberg" <danken at redhat.com>
>>>>>>>>>>>> Cc: users at ovirt.org
>>>>>>>>>>>> Sent: Monday, September 29, 2014 12:25:22 PM
>>>>>>>>>>>> Subject: Re: [ovirt-users]    3.4: VDSM Memory consumption
>>>>>>>>>>>>
>>>>>>>>>>>> Dan,
>>>>>>>>>>>>
>>>>>>>>>>>> I just reply to the list since I do not want to clutter BZ:
>>>>>>>>>>>>
>>>>>>>>>>>> While migrating VMs is easy (and the sampling is already running),
>>>>>>>>>>>> can
>>>>>>>>>>>> someone tell me the correct polling port to block with iptables?
>>>>>>>>>>>>
>>>>>>>>>>>> Thanks,
>>>>>>>>>>> Hi Daniel,
>>>>>>>>>>>
>>>>>>>>>>> there is indeed a memory profiling patch under discussion:
>>>>>>>>>>> http://gerrit.ovirt.org/#/c/32019/
>>>>>>>>>>>
>>>>>>>>>>> but for your case we'll need a backport to 3.4.x and clearer install
>>>>>>>>>>> instructions,
>>>>>>>>>>> which I'll prepare as soon as possible.
>>>>>>>>>> I updated the BZ (and are now blocking 54321/tcp on one of my hosts).
>>>>>>>>>> and verified it is not reachable. As  general info: This system I am
>>>>>>>>>> using is my LAB / Test / eval setup for a final deployment for ovirt
>>>>>>>>>> (then 3.5) in production; so it will go away some time in the future (a
>>>>>>>>>> few weeks / months). If I am the only one experiencing this problem
>>>>>>>>>> then
>>>>>>>>>> you might be better of allocating resources elsewhere ;)
>>>>>>>>> Thanks for your understanding :)
>>>>>>>>>
>>>>>>>>> Unfortunately it is true that developer resources aren't so abundant,
>>>>>>>>> but it is also true that memleaks should never be discarded easily and
>>>>>>>>> without
>>>>>>>>> due investigation, considering the nature and the role of VDSM.
>>>>>>>>>
>>>>>>>>> So, I'm all in for further investigation regarding this issue.
>>>>>>>>>
>>>>>>>>>>> As for your question: if I understood correctly what you are asking
>>>>>>>>>>> (still catching up the thread), if you are trying to rule out the
>>>>>>>>>>> stats
>>>>>>>>>>> polling
>>>>>>>>>>> made by Engine to this bad leak, one simple way to test is just to
>>>>>>>>>>> shutdown
>>>>>>>>>>> Engine,
>>>>>>>>>>> and let VDSMs run unguarded on hypervisors. You'll be able to command
>>>>>>>>>>> these
>>>>>>>>>>> VDSMs using vdsClient or restarting Engine.
>>>>>>>>>> As I said in my BZ comment this is not an option right now, but if
>>>>>>>>>> understand the matter correctly IPTABLES reject should ultimately do
>>>>>>>>>> the
>>>>>>>>>> same?
>>>>>>>>> Definitely yes! Just do whatever it is more convenient for you.
>>>>>>>>>
>>>>>>>> As you might have already seen in the BZ comment the leak stopped after
>>>>>>>> blocking the port. Though this is clearly no permanent option - please
>>>>>>>> let
>>>>>>>> me know if I can be of any more assistance!
>>>>>>> The immediate suspect in this situation is M2Crypto. Could you verify
>>>>>>> that by re-opening the firewall and setting ssl=False in vdsm.conf?
>>>>>>>
>>>>>>> You should disable ssl on Engine side and restart both Engine and Vdsm
>>>>>>> (too bad I do not recall how that's done on Engine: Piotr, can you help?).
>>>>>>>
>>>>>> In vdc_options table there is option EncryptHostCommunication.
>>>>> Please confirm the following procedure is correct:
>>>>>
>>>>> 1. Change Postgres table value:
>>>>> # sudo -u postgres psql -U postgres engine -c "update vdc_options set
>>>>> option_value = 'false' where option_name = 'EncryptHostCommunication';"
>>>>> engine=# SELECT * from vdc_options where
>>>>> option_name='EncryptHostCommunication';
>>>>>  option_id |       option_name        | option_value | version
>>>>> -----------+--------------------------+--------------+---------
>>>>>        335 | EncryptHostCommunication | false        | general
>>>>> (1 row)
>>>>>
>>>>> 2. Restart engine
>>>>> 3. On the hosts;
>>>>> grep ssl /etc/vdsm/vdsm.conf
>>>>> #ssl = true
>>>>> ssl = false
>>>>>
>>>>> 4. restart VDSM
>>>>>
>>>>> I assume I have to set 'ssl = false' this on on all hosts?
>>>>>> Please to set it to false and restart the engine.
>>>>>>
>>>> I believe that you need to update a bit more on vdsm side.
>>>> Please follow [1] section "Configure ovirt-engine and vdsm to work in non-secure mode"
>>>>
>>>> There is wrong name of the option and it should be EncryptHostCommunication.
>>>>
>>>> [1] http://www.ovirt.org/Developers_All_In_One
>>> I forgot; I suppose hosted-engine-ha is out of order because of disabled
>>> ssl?
>> Indeed. And in hosted-engine, too, I need someone else's help (Sando?)
>> to tell how to disable ssl.
> in /etc/ovirt-hosted-engine:
> hosted-engine.conf just change:
> 	vdsm_use_ssl=true
> to
> 	vdsm_use_ssl=false
>
Hello Sandro,

although engine works with the hosts I cannot migrate VMs anymore
because libvirt cannot connect to the other host. First I had a libvirt
connection error for qemu+tcp; after stopping iptables I get:

vdsm.log
Thread-68935::ERROR::2014-10-01 10:50:18,099::vm::266::vm.Vm::(_recover)
vmId=`e68a11c8-1251-4c13-9e3b-3847bbb4fa3d`::internal error Attempt to
migrate guest to the same host 45d7fabc-7e2e-4288-92c9-bd3713ce3eb4
Thread-68935::ERROR::2014-10-01 10:50:18,433::vm::365::vm.Vm::(run)
vmId=`e68a11c8-1251-4c13-9e3b-3847bbb4fa3d`::Failed to migrate
Traceback (most recent call last):
  File "/usr/share/vdsm/vm.py", line 351, in run
    self._startUnderlyingMigration(time.time())
  File "/usr/share/vdsm/vm.py", line 433, in _startUnderlyingMigration
    None, maxBandwidth)
  File "/usr/share/vdsm/vm.py", line 928, in f
    ret = attr(*args, **kwargs)
  File "/usr/lib64/python2.6/site-packages/vdsm/libvirtconnection.py",
line 92, in wrapper
    ret = f(*args, **kwargs)
  File "/usr/lib64/python2.6/site-packages/libvirt.py", line 1178, in
migrateToURI2
    if ret == -1: raise libvirtError ('virDomainMigrateToURI2() failed',
dom=self)
libvirtError: internal error Attempt to migrate guest to the same host
45d7fabc-7e2e-4288-92c9-bd3713ce3eb4

relevant engine.log;
2014-10-01 13:26:11,520 INFO 
[org.ovirt.engine.core.bll.MigrateVmToServerCommand]
(ajp--127.0.0.1-8702-5) [2123e886] Lock Acquired to object EngineLock
[exclusiveLocks= key: e68a11c8-1251-4c13-9e3b-3847bbb4fa3d value: VM
, sharedLocks= ]
2014-10-01 13:26:11,582 INFO 
[org.ovirt.engine.core.bll.MigrateVmToServerCommand]
(org.ovirt.thread.pool-6-thread-49) [2123e886] Running command:
MigrateVmToServerCommand internal: false. Entities affected :  ID:
e68a11c8-1251-4c13-9e3b-3847bbb4fa3d Type: VM
2014-10-01 13:26:11,604 INFO 
[org.ovirt.engine.core.vdsbroker.MigrateVDSCommand]
(org.ovirt.thread.pool-6-thread-49) [2123e886] START,
MigrateVDSCommand(HostName = node-hv02, HostId =
fb17dc51-f7e7-4236-bde6-3779fd84c4d6,
vmId=e68a11c8-1251-4c13-9e3b-3847bbb4fa3d, srcHost=192.168.50.202,
dstVdsId=d2d47535-991a-444b-9acd-1efcc70b1ea6,
dstHost=192.168.50.201:54321, migrationMethod=ONLINE,
tunnelMigration=false, migrationDowntime=0), log id: 695e7366
2014-10-01 13:26:11,605 INFO 
[org.ovirt.engine.core.vdsbroker.vdsbroker.MigrateBrokerVDSCommand]
(org.ovirt.thread.pool-6-thread-49) [2123e886]
VdsBroker::migrate::Entered
(vm_guid=e68a11c8-1251-4c13-9e3b-3847bbb4fa3d, srcHost=192.168.50.202,
dstHost=192.168.50.201:54321,  method=online
2014-10-01 13:26:11,607 INFO 
[org.ovirt.engine.core.vdsbroker.vdsbroker.MigrateBrokerVDSCommand]
(org.ovirt.thread.pool-6-thread-49) [2123e886] START,
MigrateBrokerVDSCommand(HostName = node-hv02, HostId =
fb17dc51-f7e7-4236-bde6-3779fd84c4d6,
vmId=e68a11c8-1251-4c13-9e3b-3847bbb4fa3d, srcHost=192.168.50.202,
dstVdsId=d2d47535-991a-444b-9acd-1efcc70b1ea6,
dstHost=192.168.50.201:54321, migrationMethod=ONLINE,
tunnelMigration=false, migrationDowntime=0), log id: 1a598c99
2014-10-01 13:26:11,614 INFO 
[org.ovirt.engine.core.vdsbroker.vdsbroker.MigrateBrokerVDSCommand]
(org.ovirt.thread.pool-6-thread-49) [2123e886] FINISH,
MigrateBrokerVDSCommand, log id: 1a598c99
2014-10-01 13:26:11,620 INFO 
[org.ovirt.engine.core.vdsbroker.MigrateVDSCommand]
(org.ovirt.thread.pool-6-thread-49) [2123e886] FINISH,
MigrateVDSCommand, return: MigratingFrom, log id: 695e7366
2014-10-01 13:26:11,657 INFO 
[org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector]
(org.ovirt.thread.pool-6-thread-49) [2123e886] Correlation ID: 2123e886,
Job ID: 75055580-a972-4366-b4a8-6ec4b9f661e6, Call Stack: null, Custom
Event ID: -1, Message: Migration started (VM: HostedEngine, Source:
node-hv02, Destination: node-hv01, User: daniel).
2014-10-01 13:26:15,017 INFO 
[org.ovirt.engine.core.vdsbroker.VdsUpdateRunTimeInfo]
(DefaultQuartzScheduler_Worker-66) VM HostedEngine
e68a11c8-1251-4c13-9e3b-3847bbb4fa3d moved from MigratingFrom --> Up
2014-10-01 13:26:15,017 INFO 
[org.ovirt.engine.core.vdsbroker.VdsUpdateRunTimeInfo]
(DefaultQuartzScheduler_Worker-66) Adding VM
e68a11c8-1251-4c13-9e3b-3847bbb4fa3d to re-run list
2014-10-01 13:26:15,051 ERROR
[org.ovirt.engine.core.vdsbroker.VdsUpdateRunTimeInfo]
(DefaultQuartzScheduler_Worker-66) Rerun vm
e68a11c8-1251-4c13-9e3b-3847bbb4fa3d. Called from vds node-hv02
2014-10-01 13:26:15,057 INFO 
[org.ovirt.engine.core.vdsbroker.vdsbroker.MigrateStatusVDSCommand]
(org.ovirt.thread.pool-6-thread-7) START,
MigrateStatusVDSCommand(HostName = node-hv02, HostId =
fb17dc51-f7e7-4236-bde6-3779fd84c4d6,
vmId=e68a11c8-1251-4c13-9e3b-3847bbb4fa3d), log id: 4a71cde4
2014-10-01 13:26:15,061 ERROR
[org.ovirt.engine.core.vdsbroker.vdsbroker.MigrateStatusVDSCommand]
(org.ovirt.thread.pool-6-thread-7) Failed in MigrateStatusVDS method
2014-10-01 13:26:15,061 INFO 
[org.ovirt.engine.core.vdsbroker.vdsbroker.MigrateStatusVDSCommand]
(org.ovirt.thread.pool-6-thread-7) Command
org.ovirt.engine.core.vdsbroker.vdsbroker.MigrateStatusVDSCommand return
value
 StatusOnlyReturnForXmlRpc [mStatus=StatusForXmlRpc [mCode=12,
mMessage=Fatal error during migration]]
2014-10-01 13:26:15,062 INFO 
[org.ovirt.engine.core.vdsbroker.vdsbroker.MigrateStatusVDSCommand]
(org.ovirt.thread.pool-6-thread-7) HostName = node-hv02
2014-10-01 13:26:15,063 ERROR
[org.ovirt.engine.core.vdsbroker.vdsbroker.MigrateStatusVDSCommand]
(org.ovirt.thread.pool-6-thread-7) Command
MigrateStatusVDSCommand(HostName = node-hv02, HostId =
fb17dc51-f7e7-4236-bde6-3779fd84c4d6,
vmId=e68a11c8-1251-4c13-9e3b-3847bbb4fa3d) execution failed. Exception:
VDSErrorException: VDSGenericException: VDSErrorException: Failed to
MigrateStatusVDS, error = Fatal error during migration, code = 12
2014-10-01 13:26:15,064 INFO 
[org.ovirt.engine.core.vdsbroker.vdsbroker.MigrateStatusVDSCommand]
(org.ovirt.thread.pool-6-thread-7) FINISH, MigrateStatusVDSCommand, log
id: 4a71cde4
2014-10-01 13:26:15,071 INFO 
[org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector]
(org.ovirt.thread.pool-6-thread-7) Correlation ID: 2123e886, Job ID:
75055580-a972-4366-b4a8-6ec4b9f661e6, Call Stack: null, Custom Event ID:
-1, Message: Migration failed due to Error: Fatal error during migration
(VM: HostedEngine, Source: node-hv02, Destination: node-hv01).
2014-10-01 13:26:15,079 INFO 
[org.ovirt.engine.core.bll.MigrateVmToServerCommand]
(org.ovirt.thread.pool-6-thread-7) Lock freed to object EngineLock
[exclusiveLocks= key: e68a11c8-1251-4c13-9e3b-3847bbb4fa3d va

This is defiantly not the same host...


>
>>> hosted-engine --connect-storage
>>> Connecting Storage Server
>>> Traceback (most recent call last):
>>>   File "/usr/share/vdsm/vdsClient.py", line 2578, in <module>
>>>     code, message = commands[command][0](commandArgs)
>>>   File "/usr/share/vdsm/vdsClient.py", line 712, in connectStorageServer
>>>     res = self.s.connectStorageServer(serverType, spUUID, conList)
>>>   File "/usr/lib64/python2.6/xmlrpclib.py", line 1199, in __call__
>>>     return self.__send(self.__name, args)
>>>   File "/usr/lib64/python2.6/xmlrpclib.py", line 1489, in __request
>>>     verbose=self.__verbose
>>>   File "/usr/lib64/python2.6/xmlrpclib.py", line 1235, in request
>>>     self.send_content(h, request_body)
>>>   File "/usr/lib64/python2.6/xmlrpclib.py", line 1349, in send_content
>>>     connection.endheaders()
>>>   File "/usr/lib64/python2.6/httplib.py", line 908, in endheaders
>>>     self._send_output()
>>>   File "/usr/lib64/python2.6/httplib.py", line 780, in _send_output
>>>     self.send(msg)
>>>   File "/usr/lib64/python2.6/httplib.py", line 739, in send
>>>     self.connect()
>>>   File "/usr/lib64/python2.6/site-packages/vdsm/SecureXMLRPCServer.py",
>>> line 195, in connect
>>>     cert_reqs=self.cert_reqs)
>>>   File "/usr/lib64/python2.6/ssl.py", line 342, in wrap_socket
>>>     suppress_ragged_eofs=suppress_ragged_eofs)
>>>   File "/usr/lib64/python2.6/ssl.py", line 120, in __init__
>>>     self.do_handshake()
>>>   File "/usr/lib64/python2.6/ssl.py", line 279, in do_handshake
>>>     self._sslobj.do_handshake()
>>> SSLError: [Errno 8] _ssl.c:492: EOF occurred in violation of protocol
>

-- 
Daniel Helgenberger
m box bewegtbild GmbH

P: +49/30/2408781-22
F: +49/30/2408781-10

ACKERSTR. 19
D-10115 BERLIN


www.m-box.de  www.monkeymen.tv

Geschäftsführer: Martin Retschitzegger / Michaela Göllner
Handeslregister: Amtsgericht Charlottenburg / HRB 112767




More information about the Users mailing list