Hosted Engine fail after upgrading to 3.5

Hi. After upgrading from 3.4 to 3.5 (I've followed the official RHEV documentation) the Hosted Engine VM cannot boot up anymore. "hosted-engine --vm-status" says "Engine status: unknown stale-data" agent.log says: Error: ''NoneType' object has no attribute 'iteritems'' Some logs: - agent on node ov0h21: http://fpaste.org/144822/ - agent on node ov0h21: http://fpaste.org/144824/ - "hosted-engine --vm-status": http://fpaste.org/144825/ Thank you, -- Stefano Stagnaro Prisma Engineering S.r.l. Via Petrocchi, 4 20127 Milano – Italy Tel. 02 26113507 int 339 e-mail: stefanos@prisma-eng.com skype: stefano.stagnaro

On 10/24/2014 01:11 PM, Stefano Stagnaro wrote:
Hi. After upgrading from 3.4 to 3.5 (I've followed the official RHEV documentation) the Hosted Engine VM cannot boot up anymore.
"hosted-engine --vm-status" says "Engine status: unknown stale-data" agent.log says: Error: ''NoneType' object has no attribute 'iteritems''
Some logs: - agent on node ov0h21: http://fpaste.org/144822/ - agent on node ov0h21: http://fpaste.org/144824/ - "hosted-engine --vm-status": http://fpaste.org/144825/
Hi Stefano, can you please provide also the broker log? Thank you, Jirka
Thank you,

On 10/24/2014 01:30 PM, Jiri Moskovcak wrote:
On 10/24/2014 01:11 PM, Stefano Stagnaro wrote:
Hi. After upgrading from 3.4 to 3.5 (I've followed the official RHEV documentation) the Hosted Engine VM cannot boot up anymore.
"hosted-engine --vm-status" says "Engine status: unknown stale-data" agent.log says: Error: ''NoneType' object has no attribute 'iteritems''
Some logs: - agent on node ov0h21: http://fpaste.org/144822/ - agent on node ov0h21: http://fpaste.org/144824/ - "hosted-engine --vm-status": http://fpaste.org/144825/
Hi Stefano, can you please provide also the broker log?
- and also this file: /rhev/data-center/mnt/ov0nfs:_engine/e4e8282e-6bde-4332-ad68-313287b4fc65/ha_agent/hosted-engine.metadata Thanks, Jirka
Thank you, Jirka
Thank you,
_______________________________________________ Users mailing list Users@ovirt.org http://lists.ovirt.org/mailman/listinfo/users

Hi Jirka, thank you for the reply. I've uploaded all the relevant logs in here: https://www.dropbox.com/sh/qh2rbews45ky2g8/AAC4_4_j94cw6sI_hfaSFg-Fa?dl=0 Thank you, -- Stefano Stagnaro Prisma Engineering S.r.l. Via Petrocchi, 4 20127 Milano – Italy Tel. 02 26113507 int 339 e-mail: stefanos@prisma-eng.com skype: stefano.stagnaro On Fri, 2014-10-24 at 13:30 +0200, Jiri Moskovcak wrote:
On 10/24/2014 01:11 PM, Stefano Stagnaro wrote:
Hi. After upgrading from 3.4 to 3.5 (I've followed the official RHEV documentation) the Hosted Engine VM cannot boot up anymore.
"hosted-engine --vm-status" says "Engine status: unknown stale-data" agent.log says: Error: ''NoneType' object has no attribute 'iteritems''
Some logs: - agent on node ov0h21: http://fpaste.org/144822/ - agent on node ov0h21: http://fpaste.org/144824/ - "hosted-engine --vm-status": http://fpaste.org/144825/
Hi Stefano, can you please provide also the broker log?
Thank you, Jirka
Thank you,

On 10/24/2014 02:12 PM, Stefano Stagnaro wrote:
Hi Jirka,
thank you for the reply. I've uploaded all the relevant logs in here: https://www.dropbox.com/sh/qh2rbews45ky2g8/AAC4_4_j94cw6sI_hfaSFg-Fa?dl=0
Thank you,
Hi Stefano, I'd say, that agent is not able to parse the metadata from the previous version, so as a workaround before I fix it you can try to zero out the metadata file (backup the original just in case..) 1. stop agent and broker on all hosts 2. truncate the file this should do the trick: $ service ovirt-ha-agent stop; service ovirt-ha-broker stop; $ truncate --size 0 /rhev/data-center/mnt/ov0nfs:_engine/e4e8282e-6bde-4332-ad68-313287b4fc65/ha_agent/hosted-engine.metadata $ truncate --size 1M /rhev/data-center/mnt/ov0nfs:_engine/e4e8282e-6bde-4332-ad68-313287b4fc65/ha_agent/hosted-engine.metadata $ service ovirt-ha-broker start; service ovirt-ha-agent start --Jirka

Hi Jirka, after truncating the metadata file the Engine is running again. Unfortunately now the HA migration is not working anymore. If I put the host with the running Engine in local maintenance, the HA go trough EngineMigratingAway -> ReinitializeFSM -> LocalMaintenance but the VM never migrates. I can read this error in the agent.log: MainThread::ERROR::2014-10-27 18:02:51,053::hosted_engine::867::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(_monitor_migration) Failed to migrate Traceback (most recent call last): File "/usr/lib/python2.6/site-packages/ovirt_hosted_engine_ha/agent/hosted_engine.py", line 863, in _monitor_migration vm_id, File "/usr/lib/python2.6/site-packages/ovirt_hosted_engine_ha/lib/vds_client.py", line 85, in run_vds_client_cmd response['status']['message']) DetailedError: Error 12 from migrateStatus: Fatal error during migration Thank you, -- Stefano Stagnaro Prisma Engineering S.r.l. Via Petrocchi, 4 20127 Milano – Italy Tel. 02 26113507 int 339 e-mail: stefanos@prisma-eng.com skype: stefano.stagnaro On Fri, 2014-10-24 at 15:00 +0200, Jiri Moskovcak wrote:
On 10/24/2014 02:12 PM, Stefano Stagnaro wrote:
Hi Jirka,
thank you for the reply. I've uploaded all the relevant logs in here: https://www.dropbox.com/sh/qh2rbews45ky2g8/AAC4_4_j94cw6sI_hfaSFg-Fa?dl=0
Thank you,
Hi Stefano, I'd say, that agent is not able to parse the metadata from the previous version, so as a workaround before I fix it you can try to zero out the metadata file (backup the original just in case..)
1. stop agent and broker on all hosts 2. truncate the file
this should do the trick:
$ service ovirt-ha-agent stop; service ovirt-ha-broker stop; $ truncate --size 0 /rhev/data-center/mnt/ov0nfs:_engine/e4e8282e-6bde-4332-ad68-313287b4fc65/ha_agent/hosted-engine.metadata
$ truncate --size 1M /rhev/data-center/mnt/ov0nfs:_engine/e4e8282e-6bde-4332-ad68-313287b4fc65/ha_agent/hosted-engine.metadata $ service ovirt-ha-broker start; service ovirt-ha-agent start
--Jirka

On 10/27/2014 06:07 PM, Stefano Stagnaro wrote:
Hi Jirka,
after truncating the metadata file the Engine is running again.
Unfortunately now the HA migration is not working anymore. If I put the host with the running Engine in local maintenance, the HA go trough EngineMigratingAway -> ReinitializeFSM -> LocalMaintenance but the VM never migrates.
I can read this error in the agent.log:
MainThread::ERROR::2014-10-27 18:02:51,053::hosted_engine::867::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(_monitor_migration) Failed to migrate Traceback (most recent call last): File "/usr/lib/python2.6/site-packages/ovirt_hosted_engine_ha/agent/hosted_engine.py", line 863, in _monitor_migration vm_id, File "/usr/lib/python2.6/site-packages/ovirt_hosted_engine_ha/lib/vds_client.py", line 85, in run_vds_client_cmd response['status']['message']) DetailedError: Error 12 from migrateStatus: Fatal error during migration
Thank you,
Hi, to debug the migration failure I gonna need the engine.log. Thanks, Jirka

Hi, please find the new logs at the same place: https://www.dropbox.com/sh/qh2rbews45ky2g8/AAC4_4_j94cw6sI_hfaSFg-Fa?dl=0 Thank you, -- Stefano Stagnaro Prisma Engineering S.r.l. Via Petrocchi, 4 20127 Milano – Italy Tel. 02 26113507 int 339 e-mail: stefanos@prisma-eng.com skype: stefano.stagnaro On Wed, 2014-10-29 at 08:19 +0100, Jiri Moskovcak wrote:
On 10/27/2014 06:07 PM, Stefano Stagnaro wrote:
Hi Jirka,
after truncating the metadata file the Engine is running again.
Unfortunately now the HA migration is not working anymore. If I put the host with the running Engine in local maintenance, the HA go trough EngineMigratingAway -> ReinitializeFSM -> LocalMaintenance but the VM never migrates.
I can read this error in the agent.log:
MainThread::ERROR::2014-10-27 18:02:51,053::hosted_engine::867::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(_monitor_migration) Failed to migrate Traceback (most recent call last): File "/usr/lib/python2.6/site-packages/ovirt_hosted_engine_ha/agent/hosted_engine.py", line 863, in _monitor_migration vm_id, File "/usr/lib/python2.6/site-packages/ovirt_hosted_engine_ha/lib/vds_client.py", line 85, in run_vds_client_cmd response['status']['message']) DetailedError: Error 12 from migrateStatus: Fatal error during migration
Thank you,
Hi, to debug the migration failure I gonna need the engine.log.
Thanks, Jirka

On 10/29/2014 11:32 AM, Stefano Stagnaro wrote:
Hi,
please find the new logs at the same place: https://www.dropbox.com/sh/qh2rbews45ky2g8/AAC4_4_j94cw6sI_hfaSFg-Fa?dl=0
Thank you,
Hi, I can't see any exception related to the migration process, but it's full of some exception about getting storage data from database. Nir, can you please take a look if it's something critical which might influence the migration process? Thanks, Jirka

On 10/24/2014 02:12 PM, Stefano Stagnaro wrote:
Hi Jirka,
thank you for the reply. I've uploaded all the relevant logs in here: https://www.dropbox.com/sh/qh2rbews45ky2g8/AAC4_4_j94cw6sI_hfaSFg-Fa?dl=0
Thank you,
After closer look I noticed, that there is 'null' as the engine-state in the metadata. This shouldn't happen, but we also should be able to recover when this happens. --Jirka
participants (2)
-
Jiri Moskovcak
-
Stefano Stagnaro