
On 30.09.2014 17:09, Sandro Bonazzola wrote:
Il 30/09/2014 17:03, Dan Kenigsberg ha scritto:
On Tue, Sep 30, 2014 at 10:23:47AM +0000, Daniel Helgenberger wrote:
On 30.09.2014 11:57, Piotr Kliczewski wrote:
----- Original Message -----
From: "Daniel Helgenberger" <daniel.helgenberger@m-box.de> To: "Piotr Kliczewski" <pkliczew@redhat.com>, "Dan Kenigsberg" <danken@redhat.com> Cc: "Francesco Romani" <fromani@redhat.com>, users@ovirt.org Sent: Tuesday, September 30, 2014 11:50:28 AM Subject: Re: [ovirt-users]?3.4: VDSM Memory consumption
Hello Piotr,
On 30.09.2014 08:37, Piotr Kliczewski wrote:
----- Original Message ----- > From: "Dan Kenigsberg" <danken@redhat.com> > To: "Daniel Helgenberger" <daniel.helgenberger@m-box.de>, > pkliczew@redhat.com > Cc: "Francesco Romani" <fromani@redhat.com>, users@ovirt.org > Sent: Tuesday, September 30, 2014 1:11:42 AM > Subject: Re: [ovirt-users]?3.4: VDSM Memory consumption > > On Mon, Sep 29, 2014 at 09:02:19PM +0000, Daniel Helgenberger wrote: >> Hello Francesco, >> >> -- >> Daniel Helgenberger >> m box bewegtbild GmbH >> >> P: +49/30/2408781-22 >> F: +49/30/2408781-10 >> ACKERSTR. 19 >> D-10115 BERLIN >> www.m-box.de www.monkeymen.tv >> >>> On 29.09.2014, at 22:19, Francesco Romani <fromani@redhat.com> wrote: >>> >>> ----- Original Message ----- >>>> From: "Daniel Helgenberger" <daniel.helgenberger@m-box.de> >>>> To: "Francesco Romani" <fromani@redhat.com> >>>> Cc: "Dan Kenigsberg" <danken@redhat.com>, users@ovirt.org >>>> Sent: Monday, September 29, 2014 2:54:13 PM >>>> Subject: Re: [ovirt-users] 3.4: VDSM Memory consumption >>>> >>>> Hello Francesco, >>>> >>>>> On 29.09.2014 13:55, Francesco Romani wrote: >>>>> ----- Original Message ----- >>>>>> From: "Daniel Helgenberger" <daniel.helgenberger@m-box.de> >>>>>> To: "Dan Kenigsberg" <danken@redhat.com> >>>>>> Cc: users@ovirt.org >>>>>> Sent: Monday, September 29, 2014 12:25:22 PM >>>>>> Subject: Re: [ovirt-users] 3.4: VDSM Memory consumption >>>>>> >>>>>> Dan, >>>>>> >>>>>> I just reply to the list since I do not want to clutter BZ: >>>>>> >>>>>> While migrating VMs is easy (and the sampling is already running), >>>>>> can >>>>>> someone tell me the correct polling port to block with iptables? >>>>>> >>>>>> Thanks, >>>>> Hi Daniel, >>>>> >>>>> there is indeed a memory profiling patch under discussion: >>>>> http://gerrit.ovirt.org/#/c/32019/ >>>>> >>>>> but for your case we'll need a backport to 3.4.x and clearer install >>>>> instructions, >>>>> which I'll prepare as soon as possible. >>>> I updated the BZ (and are now blocking 54321/tcp on one of my hosts). >>>> and verified it is not reachable. As general info: This system I am >>>> using is my LAB / Test / eval setup for a final deployment for ovirt >>>> (then 3.5) in production; so it will go away some time in the future (a >>>> few weeks / months). If I am the only one experiencing this problem >>>> then >>>> you might be better of allocating resources elsewhere ;) >>> Thanks for your understanding :) >>> >>> Unfortunately it is true that developer resources aren't so abundant, >>> but it is also true that memleaks should never be discarded easily and >>> without >>> due investigation, considering the nature and the role of VDSM. >>> >>> So, I'm all in for further investigation regarding this issue. >>> >>>>> As for your question: if I understood correctly what you are asking >>>>> (still catching up the thread), if you are trying to rule out the >>>>> stats >>>>> polling >>>>> made by Engine to this bad leak, one simple way to test is just to >>>>> shutdown >>>>> Engine, >>>>> and let VDSMs run unguarded on hypervisors. You'll be able to command >>>>> these >>>>> VDSMs using vdsClient or restarting Engine. >>>> As I said in my BZ comment this is not an option right now, but if >>>> understand the matter correctly IPTABLES reject should ultimately do >>>> the >>>> same? >>> Definitely yes! Just do whatever it is more convenient for you. >>> >> As you might have already seen in the BZ comment the leak stopped after >> blocking the port. Though this is clearly no permanent option - please >> let >> me know if I can be of any more assistance! > The immediate suspect in this situation is M2Crypto. Could you verify > that by re-opening the firewall and setting ssl=False in vdsm.conf? > > You should disable ssl on Engine side and restart both Engine and Vdsm > (too bad I do not recall how that's done on Engine: Piotr, can you help?). > In vdc_options table there is option EncryptHostCommunication.
Please confirm the following procedure is correct:
1. Change Postgres table value: # sudo -u postgres psql -U postgres engine -c "update vdc_options set option_value = 'false' where option_name = 'EncryptHostCommunication';" engine=# SELECT * from vdc_options where option_name='EncryptHostCommunication'; option_id | option_name | option_value | version -----------+--------------------------+--------------+--------- 335 | EncryptHostCommunication | false | general (1 row)
2. Restart engine 3. On the hosts; grep ssl /etc/vdsm/vdsm.conf #ssl = true ssl = false
4. restart VDSM
I assume I have to set 'ssl = false' this on on all hosts?
Please to set it to false and restart the engine.
I believe that you need to update a bit more on vdsm side. Please follow [1] section "Configure ovirt-engine and vdsm to work in non-secure mode"
There is wrong name of the option and it should be EncryptHostCommunication.
I forgot; I suppose hosted-engine-ha is out of order because of disabled ssl? Indeed. And in hosted-engine, too, I need someone else's help (Sando?) to tell how to disable ssl. in /etc/ovirt-hosted-engine: hosted-engine.conf just change: vdsm_use_ssl=true to vdsm_use_ssl=false
Hello Sandro, although engine works with the hosts I cannot migrate VMs anymore because libvirt cannot connect to the other host. First I had a libvirt connection error for qemu+tcp; after stopping iptables I get: vdsm.log Thread-68935::ERROR::2014-10-01 10:50:18,099::vm::266::vm.Vm::(_recover) vmId=`e68a11c8-1251-4c13-9e3b-3847bbb4fa3d`::internal error Attempt to migrate guest to the same host 45d7fabc-7e2e-4288-92c9-bd3713ce3eb4 Thread-68935::ERROR::2014-10-01 10:50:18,433::vm::365::vm.Vm::(run) vmId=`e68a11c8-1251-4c13-9e3b-3847bbb4fa3d`::Failed to migrate Traceback (most recent call last): File "/usr/share/vdsm/vm.py", line 351, in run self._startUnderlyingMigration(time.time()) File "/usr/share/vdsm/vm.py", line 433, in _startUnderlyingMigration None, maxBandwidth) File "/usr/share/vdsm/vm.py", line 928, in f ret = attr(*args, **kwargs) File "/usr/lib64/python2.6/site-packages/vdsm/libvirtconnection.py", line 92, in wrapper ret = f(*args, **kwargs) File "/usr/lib64/python2.6/site-packages/libvirt.py", line 1178, in migrateToURI2 if ret == -1: raise libvirtError ('virDomainMigrateToURI2() failed', dom=self) libvirtError: internal error Attempt to migrate guest to the same host 45d7fabc-7e2e-4288-92c9-bd3713ce3eb4 relevant engine.log; 2014-10-01 13:26:11,520 INFO [org.ovirt.engine.core.bll.MigrateVmToServerCommand] (ajp--127.0.0.1-8702-5) [2123e886] Lock Acquired to object EngineLock [exclusiveLocks= key: e68a11c8-1251-4c13-9e3b-3847bbb4fa3d value: VM , sharedLocks= ] 2014-10-01 13:26:11,582 INFO [org.ovirt.engine.core.bll.MigrateVmToServerCommand] (org.ovirt.thread.pool-6-thread-49) [2123e886] Running command: MigrateVmToServerCommand internal: false. Entities affected : ID: e68a11c8-1251-4c13-9e3b-3847bbb4fa3d Type: VM 2014-10-01 13:26:11,604 INFO [org.ovirt.engine.core.vdsbroker.MigrateVDSCommand] (org.ovirt.thread.pool-6-thread-49) [2123e886] START, MigrateVDSCommand(HostName = node-hv02, HostId = fb17dc51-f7e7-4236-bde6-3779fd84c4d6, vmId=e68a11c8-1251-4c13-9e3b-3847bbb4fa3d, srcHost=192.168.50.202, dstVdsId=d2d47535-991a-444b-9acd-1efcc70b1ea6, dstHost=192.168.50.201:54321, migrationMethod=ONLINE, tunnelMigration=false, migrationDowntime=0), log id: 695e7366 2014-10-01 13:26:11,605 INFO [org.ovirt.engine.core.vdsbroker.vdsbroker.MigrateBrokerVDSCommand] (org.ovirt.thread.pool-6-thread-49) [2123e886] VdsBroker::migrate::Entered (vm_guid=e68a11c8-1251-4c13-9e3b-3847bbb4fa3d, srcHost=192.168.50.202, dstHost=192.168.50.201:54321, method=online 2014-10-01 13:26:11,607 INFO [org.ovirt.engine.core.vdsbroker.vdsbroker.MigrateBrokerVDSCommand] (org.ovirt.thread.pool-6-thread-49) [2123e886] START, MigrateBrokerVDSCommand(HostName = node-hv02, HostId = fb17dc51-f7e7-4236-bde6-3779fd84c4d6, vmId=e68a11c8-1251-4c13-9e3b-3847bbb4fa3d, srcHost=192.168.50.202, dstVdsId=d2d47535-991a-444b-9acd-1efcc70b1ea6, dstHost=192.168.50.201:54321, migrationMethod=ONLINE, tunnelMigration=false, migrationDowntime=0), log id: 1a598c99 2014-10-01 13:26:11,614 INFO [org.ovirt.engine.core.vdsbroker.vdsbroker.MigrateBrokerVDSCommand] (org.ovirt.thread.pool-6-thread-49) [2123e886] FINISH, MigrateBrokerVDSCommand, log id: 1a598c99 2014-10-01 13:26:11,620 INFO [org.ovirt.engine.core.vdsbroker.MigrateVDSCommand] (org.ovirt.thread.pool-6-thread-49) [2123e886] FINISH, MigrateVDSCommand, return: MigratingFrom, log id: 695e7366 2014-10-01 13:26:11,657 INFO [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (org.ovirt.thread.pool-6-thread-49) [2123e886] Correlation ID: 2123e886, Job ID: 75055580-a972-4366-b4a8-6ec4b9f661e6, Call Stack: null, Custom Event ID: -1, Message: Migration started (VM: HostedEngine, Source: node-hv02, Destination: node-hv01, User: daniel). 2014-10-01 13:26:15,017 INFO [org.ovirt.engine.core.vdsbroker.VdsUpdateRunTimeInfo] (DefaultQuartzScheduler_Worker-66) VM HostedEngine e68a11c8-1251-4c13-9e3b-3847bbb4fa3d moved from MigratingFrom --> Up 2014-10-01 13:26:15,017 INFO [org.ovirt.engine.core.vdsbroker.VdsUpdateRunTimeInfo] (DefaultQuartzScheduler_Worker-66) Adding VM e68a11c8-1251-4c13-9e3b-3847bbb4fa3d to re-run list 2014-10-01 13:26:15,051 ERROR [org.ovirt.engine.core.vdsbroker.VdsUpdateRunTimeInfo] (DefaultQuartzScheduler_Worker-66) Rerun vm e68a11c8-1251-4c13-9e3b-3847bbb4fa3d. Called from vds node-hv02 2014-10-01 13:26:15,057 INFO [org.ovirt.engine.core.vdsbroker.vdsbroker.MigrateStatusVDSCommand] (org.ovirt.thread.pool-6-thread-7) START, MigrateStatusVDSCommand(HostName = node-hv02, HostId = fb17dc51-f7e7-4236-bde6-3779fd84c4d6, vmId=e68a11c8-1251-4c13-9e3b-3847bbb4fa3d), log id: 4a71cde4 2014-10-01 13:26:15,061 ERROR [org.ovirt.engine.core.vdsbroker.vdsbroker.MigrateStatusVDSCommand] (org.ovirt.thread.pool-6-thread-7) Failed in MigrateStatusVDS method 2014-10-01 13:26:15,061 INFO [org.ovirt.engine.core.vdsbroker.vdsbroker.MigrateStatusVDSCommand] (org.ovirt.thread.pool-6-thread-7) Command org.ovirt.engine.core.vdsbroker.vdsbroker.MigrateStatusVDSCommand return value StatusOnlyReturnForXmlRpc [mStatus=StatusForXmlRpc [mCode=12, mMessage=Fatal error during migration]] 2014-10-01 13:26:15,062 INFO [org.ovirt.engine.core.vdsbroker.vdsbroker.MigrateStatusVDSCommand] (org.ovirt.thread.pool-6-thread-7) HostName = node-hv02 2014-10-01 13:26:15,063 ERROR [org.ovirt.engine.core.vdsbroker.vdsbroker.MigrateStatusVDSCommand] (org.ovirt.thread.pool-6-thread-7) Command MigrateStatusVDSCommand(HostName = node-hv02, HostId = fb17dc51-f7e7-4236-bde6-3779fd84c4d6, vmId=e68a11c8-1251-4c13-9e3b-3847bbb4fa3d) execution failed. Exception: VDSErrorException: VDSGenericException: VDSErrorException: Failed to MigrateStatusVDS, error = Fatal error during migration, code = 12 2014-10-01 13:26:15,064 INFO [org.ovirt.engine.core.vdsbroker.vdsbroker.MigrateStatusVDSCommand] (org.ovirt.thread.pool-6-thread-7) FINISH, MigrateStatusVDSCommand, log id: 4a71cde4 2014-10-01 13:26:15,071 INFO [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (org.ovirt.thread.pool-6-thread-7) Correlation ID: 2123e886, Job ID: 75055580-a972-4366-b4a8-6ec4b9f661e6, Call Stack: null, Custom Event ID: -1, Message: Migration failed due to Error: Fatal error during migration (VM: HostedEngine, Source: node-hv02, Destination: node-hv01). 2014-10-01 13:26:15,079 INFO [org.ovirt.engine.core.bll.MigrateVmToServerCommand] (org.ovirt.thread.pool-6-thread-7) Lock freed to object EngineLock [exclusiveLocks= key: e68a11c8-1251-4c13-9e3b-3847bbb4fa3d va This is defiantly not the same host...
hosted-engine --connect-storage Connecting Storage Server Traceback (most recent call last): File "/usr/share/vdsm/vdsClient.py", line 2578, in <module> code, message = commands[command][0](commandArgs) File "/usr/share/vdsm/vdsClient.py", line 712, in connectStorageServer res = self.s.connectStorageServer(serverType, spUUID, conList) File "/usr/lib64/python2.6/xmlrpclib.py", line 1199, in __call__ return self.__send(self.__name, args) File "/usr/lib64/python2.6/xmlrpclib.py", line 1489, in __request verbose=self.__verbose File "/usr/lib64/python2.6/xmlrpclib.py", line 1235, in request self.send_content(h, request_body) File "/usr/lib64/python2.6/xmlrpclib.py", line 1349, in send_content connection.endheaders() File "/usr/lib64/python2.6/httplib.py", line 908, in endheaders self._send_output() File "/usr/lib64/python2.6/httplib.py", line 780, in _send_output self.send(msg) File "/usr/lib64/python2.6/httplib.py", line 739, in send self.connect() File "/usr/lib64/python2.6/site-packages/vdsm/SecureXMLRPCServer.py", line 195, in connect cert_reqs=self.cert_reqs) File "/usr/lib64/python2.6/ssl.py", line 342, in wrap_socket suppress_ragged_eofs=suppress_ragged_eofs) File "/usr/lib64/python2.6/ssl.py", line 120, in __init__ self.do_handshake() File "/usr/lib64/python2.6/ssl.py", line 279, in do_handshake self._sslobj.do_handshake() SSLError: [Errno 8] _ssl.c:492: EOF occurred in violation of protocol
-- Daniel Helgenberger m box bewegtbild GmbH P: +49/30/2408781-22 F: +49/30/2408781-10 ACKERSTR. 19 D-10115 BERLIN www.m-box.de www.monkeymen.tv Geschäftsführer: Martin Retschitzegger / Michaela Göllner Handeslregister: Amtsgericht Charlottenburg / HRB 112767