On 30.09.2014 17:09, Sandro Bonazzola wrote:
Il 30/09/2014 17:03, Dan Kenigsberg ha scritto:
> On Tue, Sep 30, 2014 at 10:23:47AM +0000, Daniel Helgenberger wrote:
>> On 30.09.2014 11:57, Piotr Kliczewski wrote:
>>>
>>>
>>> ----- Original Message -----
>>>> From: "Daniel Helgenberger"
<daniel.helgenberger(a)m-box.de>
>>>> To: "Piotr Kliczewski" <pkliczew(a)redhat.com>, "Dan
Kenigsberg" <danken(a)redhat.com>
>>>> Cc: "Francesco Romani" <fromani(a)redhat.com>,
users(a)ovirt.org
>>>> Sent: Tuesday, September 30, 2014 11:50:28 AM
>>>> Subject: Re: [ovirt-users]?3.4: VDSM Memory consumption
>>>>
>>>> Hello Piotr,
>>>>
>>>> On 30.09.2014 08:37, Piotr Kliczewski wrote:
>>>>>
>>>>> ----- Original Message -----
>>>>>> From: "Dan Kenigsberg" <danken(a)redhat.com>
>>>>>> To: "Daniel Helgenberger"
<daniel.helgenberger(a)m-box.de>,
>>>>>> pkliczew(a)redhat.com
>>>>>> Cc: "Francesco Romani" <fromani(a)redhat.com>,
users(a)ovirt.org
>>>>>> Sent: Tuesday, September 30, 2014 1:11:42 AM
>>>>>> Subject: Re: [ovirt-users]?3.4: VDSM Memory consumption
>>>>>>
>>>>>> On Mon, Sep 29, 2014 at 09:02:19PM +0000, Daniel Helgenberger
wrote:
>>>>>>> Hello Francesco,
>>>>>>>
>>>>>>> --
>>>>>>> Daniel Helgenberger
>>>>>>> m box bewegtbild GmbH
>>>>>>>
>>>>>>> P: +49/30/2408781-22
>>>>>>> F: +49/30/2408781-10
>>>>>>> ACKERSTR. 19
>>>>>>> D-10115 BERLIN
>>>>>>>
www.m-box.de www.monkeymen.tv
>>>>>>>
>>>>>>>> On 29.09.2014, at 22:19, Francesco Romani
<fromani(a)redhat.com> wrote:
>>>>>>>>
>>>>>>>> ----- Original Message -----
>>>>>>>>> From: "Daniel Helgenberger"
<daniel.helgenberger(a)m-box.de>
>>>>>>>>> To: "Francesco Romani"
<fromani(a)redhat.com>
>>>>>>>>> Cc: "Dan Kenigsberg"
<danken(a)redhat.com>, users(a)ovirt.org
>>>>>>>>> Sent: Monday, September 29, 2014 2:54:13 PM
>>>>>>>>> Subject: Re: [ovirt-users] 3.4: VDSM Memory
consumption
>>>>>>>>>
>>>>>>>>> Hello Francesco,
>>>>>>>>>
>>>>>>>>>> On 29.09.2014 13:55, Francesco Romani wrote:
>>>>>>>>>> ----- Original Message -----
>>>>>>>>>>> From: "Daniel Helgenberger"
<daniel.helgenberger(a)m-box.de>
>>>>>>>>>>> To: "Dan Kenigsberg"
<danken(a)redhat.com>
>>>>>>>>>>> Cc: users(a)ovirt.org
>>>>>>>>>>> Sent: Monday, September 29, 2014 12:25:22 PM
>>>>>>>>>>> Subject: Re: [ovirt-users] 3.4: VDSM
Memory consumption
>>>>>>>>>>>
>>>>>>>>>>> Dan,
>>>>>>>>>>>
>>>>>>>>>>> I just reply to the list since I do not want
to clutter BZ:
>>>>>>>>>>>
>>>>>>>>>>> While migrating VMs is easy (and the sampling
is already running),
>>>>>>>>>>> can
>>>>>>>>>>> someone tell me the correct polling port to
block with iptables?
>>>>>>>>>>>
>>>>>>>>>>> Thanks,
>>>>>>>>>> Hi Daniel,
>>>>>>>>>>
>>>>>>>>>> there is indeed a memory profiling patch under
discussion:
>>>>>>>>>>
http://gerrit.ovirt.org/#/c/32019/
>>>>>>>>>>
>>>>>>>>>> but for your case we'll need a backport to
3.4.x and clearer install
>>>>>>>>>> instructions,
>>>>>>>>>> which I'll prepare as soon as possible.
>>>>>>>>> I updated the BZ (and are now blocking 54321/tcp on
one of my hosts).
>>>>>>>>> and verified it is not reachable. As general info:
This system I am
>>>>>>>>> using is my LAB / Test / eval setup for a final
deployment for ovirt
>>>>>>>>> (then 3.5) in production; so it will go away some
time in the future (a
>>>>>>>>> few weeks / months). If I am the only one
experiencing this problem
>>>>>>>>> then
>>>>>>>>> you might be better of allocating resources elsewhere
;)
>>>>>>>> Thanks for your understanding :)
>>>>>>>>
>>>>>>>> Unfortunately it is true that developer resources
aren't so abundant,
>>>>>>>> but it is also true that memleaks should never be
discarded easily and
>>>>>>>> without
>>>>>>>> due investigation, considering the nature and the role of
VDSM.
>>>>>>>>
>>>>>>>> So, I'm all in for further investigation regarding
this issue.
>>>>>>>>
>>>>>>>>>> As for your question: if I understood correctly
what you are asking
>>>>>>>>>> (still catching up the thread), if you are trying
to rule out the
>>>>>>>>>> stats
>>>>>>>>>> polling
>>>>>>>>>> made by Engine to this bad leak, one simple way
to test is just to
>>>>>>>>>> shutdown
>>>>>>>>>> Engine,
>>>>>>>>>> and let VDSMs run unguarded on hypervisors.
You'll be able to command
>>>>>>>>>> these
>>>>>>>>>> VDSMs using vdsClient or restarting Engine.
>>>>>>>>> As I said in my BZ comment this is not an option
right now, but if
>>>>>>>>> understand the matter correctly IPTABLES reject
should ultimately do
>>>>>>>>> the
>>>>>>>>> same?
>>>>>>>> Definitely yes! Just do whatever it is more convenient
for you.
>>>>>>>>
>>>>>>> As you might have already seen in the BZ comment the leak
stopped after
>>>>>>> blocking the port. Though this is clearly no permanent option
- please
>>>>>>> let
>>>>>>> me know if I can be of any more assistance!
>>>>>> The immediate suspect in this situation is M2Crypto. Could you
verify
>>>>>> that by re-opening the firewall and setting ssl=False in
vdsm.conf?
>>>>>>
>>>>>> You should disable ssl on Engine side and restart both Engine and
Vdsm
>>>>>> (too bad I do not recall how that's done on Engine: Piotr,
can you help?).
>>>>>>
>>>>> In vdc_options table there is option EncryptHostCommunication.
>>>> Please confirm the following procedure is correct:
>>>>
>>>> 1. Change Postgres table value:
>>>> # sudo -u postgres psql -U postgres engine -c "update vdc_options
set
>>>> option_value = 'false' where option_name =
'EncryptHostCommunication';"
>>>> engine=# SELECT * from vdc_options where
>>>> option_name='EncryptHostCommunication';
>>>> option_id | option_name | option_value | version
>>>> -----------+--------------------------+--------------+---------
>>>> 335 | EncryptHostCommunication | false | general
>>>> (1 row)
>>>>
>>>> 2. Restart engine
>>>> 3. On the hosts;
>>>> grep ssl /etc/vdsm/vdsm.conf
>>>> #ssl = true
>>>> ssl = false
>>>>
>>>> 4. restart VDSM
>>>>
>>>> I assume I have to set 'ssl = false' this on on all hosts?
>>>>> Please to set it to false and restart the engine.
>>>>>
>>> I believe that you need to update a bit more on vdsm side.
>>> Please follow [1] section "Configure ovirt-engine and vdsm to work in
non-secure mode"
>>>
>>> There is wrong name of the option and it should be EncryptHostCommunication.
>>>
>>> [1]
http://www.ovirt.org/Developers_All_In_One
>> I forgot; I suppose hosted-engine-ha is out of order because of disabled
>> ssl?
> Indeed. And in hosted-engine, too, I need someone else's help (Sando?)
> to tell how to disable ssl.
in /etc/ovirt-hosted-engine:
hosted-engine.conf just change:
vdsm_use_ssl=true
to
vdsm_use_ssl=false
Hello Sandro,
although engine works with the hosts I cannot migrate VMs anymore
because libvirt cannot connect to the other host. First I had a libvirt
connection error for qemu+tcp; after stopping iptables I get:
vdsm.log
Thread-68935::ERROR::2014-10-01 10:50:18,099::vm::266::vm.Vm::(_recover)
vmId=`e68a11c8-1251-4c13-9e3b-3847bbb4fa3d`::internal error Attempt to
migrate guest to the same host 45d7fabc-7e2e-4288-92c9-bd3713ce3eb4
Thread-68935::ERROR::2014-10-01 10:50:18,433::vm::365::vm.Vm::(run)
vmId=`e68a11c8-1251-4c13-9e3b-3847bbb4fa3d`::Failed to migrate
Traceback (most recent call last):
File "/usr/share/vdsm/vm.py", line 351, in run
self._startUnderlyingMigration(time.time())
File "/usr/share/vdsm/vm.py", line 433, in _startUnderlyingMigration
None, maxBandwidth)
File "/usr/share/vdsm/vm.py", line 928, in f
ret = attr(*args, **kwargs)
File "/usr/lib64/python2.6/site-packages/vdsm/libvirtconnection.py",
line 92, in wrapper
ret = f(*args, **kwargs)
File "/usr/lib64/python2.6/site-packages/libvirt.py", line 1178, in
migrateToURI2
if ret == -1: raise libvirtError ('virDomainMigrateToURI2() failed',
dom=self)
libvirtError: internal error Attempt to migrate guest to the same host
45d7fabc-7e2e-4288-92c9-bd3713ce3eb4
relevant engine.log;
2014-10-01 13:26:11,520 INFO
[org.ovirt.engine.core.bll.MigrateVmToServerCommand]
(ajp--127.0.0.1-8702-5) [2123e886] Lock Acquired to object EngineLock
[exclusiveLocks= key: e68a11c8-1251-4c13-9e3b-3847bbb4fa3d value: VM
, sharedLocks= ]
2014-10-01 13:26:11,582 INFO
[org.ovirt.engine.core.bll.MigrateVmToServerCommand]
(org.ovirt.thread.pool-6-thread-49) [2123e886] Running command:
MigrateVmToServerCommand internal: false. Entities affected : ID:
e68a11c8-1251-4c13-9e3b-3847bbb4fa3d Type: VM
2014-10-01 13:26:11,604 INFO
[org.ovirt.engine.core.vdsbroker.MigrateVDSCommand]
(org.ovirt.thread.pool-6-thread-49) [2123e886] START,
MigrateVDSCommand(HostName = node-hv02, HostId =
fb17dc51-f7e7-4236-bde6-3779fd84c4d6,
vmId=e68a11c8-1251-4c13-9e3b-3847bbb4fa3d, srcHost=192.168.50.202,
dstVdsId=d2d47535-991a-444b-9acd-1efcc70b1ea6,
dstHost=192.168.50.201:54321, migrationMethod=ONLINE,
tunnelMigration=false, migrationDowntime=0), log id: 695e7366
2014-10-01 13:26:11,605 INFO
[org.ovirt.engine.core.vdsbroker.vdsbroker.MigrateBrokerVDSCommand]
(org.ovirt.thread.pool-6-thread-49) [2123e886]
VdsBroker::migrate::Entered
(vm_guid=e68a11c8-1251-4c13-9e3b-3847bbb4fa3d, srcHost=192.168.50.202,
dstHost=192.168.50.201:54321, method=online
2014-10-01 13:26:11,607 INFO
[org.ovirt.engine.core.vdsbroker.vdsbroker.MigrateBrokerVDSCommand]
(org.ovirt.thread.pool-6-thread-49) [2123e886] START,
MigrateBrokerVDSCommand(HostName = node-hv02, HostId =
fb17dc51-f7e7-4236-bde6-3779fd84c4d6,
vmId=e68a11c8-1251-4c13-9e3b-3847bbb4fa3d, srcHost=192.168.50.202,
dstVdsId=d2d47535-991a-444b-9acd-1efcc70b1ea6,
dstHost=192.168.50.201:54321, migrationMethod=ONLINE,
tunnelMigration=false, migrationDowntime=0), log id: 1a598c99
2014-10-01 13:26:11,614 INFO
[org.ovirt.engine.core.vdsbroker.vdsbroker.MigrateBrokerVDSCommand]
(org.ovirt.thread.pool-6-thread-49) [2123e886] FINISH,
MigrateBrokerVDSCommand, log id: 1a598c99
2014-10-01 13:26:11,620 INFO
[org.ovirt.engine.core.vdsbroker.MigrateVDSCommand]
(org.ovirt.thread.pool-6-thread-49) [2123e886] FINISH,
MigrateVDSCommand, return: MigratingFrom, log id: 695e7366
2014-10-01 13:26:11,657 INFO
[org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector]
(org.ovirt.thread.pool-6-thread-49) [2123e886] Correlation ID: 2123e886,
Job ID: 75055580-a972-4366-b4a8-6ec4b9f661e6, Call Stack: null, Custom
Event ID: -1, Message: Migration started (VM: HostedEngine, Source:
node-hv02, Destination: node-hv01, User: daniel).
2014-10-01 13:26:15,017 INFO
[org.ovirt.engine.core.vdsbroker.VdsUpdateRunTimeInfo]
(DefaultQuartzScheduler_Worker-66) VM HostedEngine
e68a11c8-1251-4c13-9e3b-3847bbb4fa3d moved from MigratingFrom --> Up
2014-10-01 13:26:15,017 INFO
[org.ovirt.engine.core.vdsbroker.VdsUpdateRunTimeInfo]
(DefaultQuartzScheduler_Worker-66) Adding VM
e68a11c8-1251-4c13-9e3b-3847bbb4fa3d to re-run list
2014-10-01 13:26:15,051 ERROR
[org.ovirt.engine.core.vdsbroker.VdsUpdateRunTimeInfo]
(DefaultQuartzScheduler_Worker-66) Rerun vm
e68a11c8-1251-4c13-9e3b-3847bbb4fa3d. Called from vds node-hv02
2014-10-01 13:26:15,057 INFO
[org.ovirt.engine.core.vdsbroker.vdsbroker.MigrateStatusVDSCommand]
(org.ovirt.thread.pool-6-thread-7) START,
MigrateStatusVDSCommand(HostName = node-hv02, HostId =
fb17dc51-f7e7-4236-bde6-3779fd84c4d6,
vmId=e68a11c8-1251-4c13-9e3b-3847bbb4fa3d), log id: 4a71cde4
2014-10-01 13:26:15,061 ERROR
[org.ovirt.engine.core.vdsbroker.vdsbroker.MigrateStatusVDSCommand]
(org.ovirt.thread.pool-6-thread-7) Failed in MigrateStatusVDS method
2014-10-01 13:26:15,061 INFO
[org.ovirt.engine.core.vdsbroker.vdsbroker.MigrateStatusVDSCommand]
(org.ovirt.thread.pool-6-thread-7) Command
org.ovirt.engine.core.vdsbroker.vdsbroker.MigrateStatusVDSCommand return
value
StatusOnlyReturnForXmlRpc [mStatus=StatusForXmlRpc [mCode=12,
mMessage=Fatal error during migration]]
2014-10-01 13:26:15,062 INFO
[org.ovirt.engine.core.vdsbroker.vdsbroker.MigrateStatusVDSCommand]
(org.ovirt.thread.pool-6-thread-7) HostName = node-hv02
2014-10-01 13:26:15,063 ERROR
[org.ovirt.engine.core.vdsbroker.vdsbroker.MigrateStatusVDSCommand]
(org.ovirt.thread.pool-6-thread-7) Command
MigrateStatusVDSCommand(HostName = node-hv02, HostId =
fb17dc51-f7e7-4236-bde6-3779fd84c4d6,
vmId=e68a11c8-1251-4c13-9e3b-3847bbb4fa3d) execution failed. Exception:
VDSErrorException: VDSGenericException: VDSErrorException: Failed to
MigrateStatusVDS, error = Fatal error during migration, code = 12
2014-10-01 13:26:15,064 INFO
[org.ovirt.engine.core.vdsbroker.vdsbroker.MigrateStatusVDSCommand]
(org.ovirt.thread.pool-6-thread-7) FINISH, MigrateStatusVDSCommand, log
id: 4a71cde4
2014-10-01 13:26:15,071 INFO
[org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector]
(org.ovirt.thread.pool-6-thread-7) Correlation ID: 2123e886, Job ID:
75055580-a972-4366-b4a8-6ec4b9f661e6, Call Stack: null, Custom Event ID:
-1, Message: Migration failed due to Error: Fatal error during migration
(VM: HostedEngine, Source: node-hv02, Destination: node-hv01).
2014-10-01 13:26:15,079 INFO
[org.ovirt.engine.core.bll.MigrateVmToServerCommand]
(org.ovirt.thread.pool-6-thread-7) Lock freed to object EngineLock
[exclusiveLocks= key: e68a11c8-1251-4c13-9e3b-3847bbb4fa3d va
This is defiantly not the same host...
>> hosted-engine --connect-storage
>> Connecting Storage Server
>> Traceback (most recent call last):
>> File "/usr/share/vdsm/vdsClient.py", line 2578, in <module>
>> code, message = commands[command][0](commandArgs)
>> File "/usr/share/vdsm/vdsClient.py", line 712, in
connectStorageServer
>> res = self.s.connectStorageServer(serverType, spUUID, conList)
>> File "/usr/lib64/python2.6/xmlrpclib.py", line 1199, in __call__
>> return self.__send(self.__name, args)
>> File "/usr/lib64/python2.6/xmlrpclib.py", line 1489, in __request
>> verbose=self.__verbose
>> File "/usr/lib64/python2.6/xmlrpclib.py", line 1235, in request
>> self.send_content(h, request_body)
>> File "/usr/lib64/python2.6/xmlrpclib.py", line 1349, in send_content
>> connection.endheaders()
>> File "/usr/lib64/python2.6/httplib.py", line 908, in endheaders
>> self._send_output()
>> File "/usr/lib64/python2.6/httplib.py", line 780, in _send_output
>> self.send(msg)
>> File "/usr/lib64/python2.6/httplib.py", line 739, in send
>> self.connect()
>> File
"/usr/lib64/python2.6/site-packages/vdsm/SecureXMLRPCServer.py",
>> line 195, in connect
>> cert_reqs=self.cert_reqs)
>> File "/usr/lib64/python2.6/ssl.py", line 342, in wrap_socket
>> suppress_ragged_eofs=suppress_ragged_eofs)
>> File "/usr/lib64/python2.6/ssl.py", line 120, in __init__
>> self.do_handshake()
>> File "/usr/lib64/python2.6/ssl.py", line 279, in do_handshake
>> self._sslobj.do_handshake()
>> SSLError: [Errno 8] _ssl.c:492: EOF occurred in violation of protocol
--
Daniel Helgenberger
m box bewegtbild GmbH
P: +49/30/2408781-22
F: +49/30/2408781-10
ACKERSTR. 19
D-10115 BERLIN
www.m-box.de www.monkeymen.tv
Geschäftsführer: Martin Retschitzegger / Michaela Göllner
Handeslregister: Amtsgericht Charlottenburg / HRB 112767