[ovirt-users] Moving a Hosted Engine from Fedora 20 to CentOS 7

John Florian jflorian at doubledog.org
Wed Nov 11 23:55:52 UTC 2015


On 11/10/2015 03:45 AM, Alon Bar-Lev wrote:
>
> ----- Original Message -----
>> From: "Yedidyah Bar David" <didi at redhat.com>
>> To: "John Florian" <jflorian at doubledog.org>, "Alon Bar-Lev" <alonbl at redhat.com>, "Roy Golan" <rgolan at redhat.com>,
>> "Eli Mesika" <emesika at redhat.com>
>> Cc: "users" <users at ovirt.org>
>> Sent: Tuesday, November 10, 2015 10:39:27 AM
>> Subject: Re: [ovirt-users] Moving a Hosted Engine from Fedora 20 to CentOS 7
>>
>> On Tue, Nov 10, 2015 at 2:16 AM, John Florian <jflorian at doubledog.org> wrote:
>>> On 11/09/2015 06:25 PM, John Florian wrote:
>>>> I don't think it has anything to do with name resolution either. I
>>>> believe the telltale clue is this bit... 2015-11-09 18:22:31,738 WARN
>>>> [org.apache.sshd.client.session.ClientSessionImpl] (pool-20-thread-3)
>>>> Exception caught: java.lang.IllegalStateException: Unable to negotiate
>>>> key exchange for kex algorithms (client: diffie-hellman-group1-sha1 /
>>>> server:
>>>> curve25519-sha256 at libssh.org,ecdh-sha2-nistp256,ecdh-sha2-nistp384,ecdh-sha2-nistp521,diffie-hellman-group-exchange-sha256,diffie-hellman-group14-sha1)
>>>> As mentioned, I can ssh from my engine to the host just fine. It
>>>> appears that the Java-based ssh client however cannot.
>>> I got past the above problem by adding the following line to the
>>> /etc/ssh/sshd_config of the new F22 host:
>>>
>>> KexAlgorithms
>>> curve25519-sha256 at libssh.org,ecdh-sha2-nistp256,ecdh-sha2-nistp384,ecdh-sha2-nistp521,diffie-hellman-group-exchange-sha256,diffie-hellman-group14-sha1,diffie-hellman-group1-sha1
>>>
>>> This represents the defaults for F22 -- at least according to
>>> sshd_config(5) -- but with the addition of diffie-hellman-group1-sha1
>>> that the Java-based ssh client seems insistent on using.
>>>
>> Adding Alon for this. Not sure if we can configure the java ssh client
>> and how.
>>
> I think that newer than apache-sshd-0.14 altered its behavior, can you please try to downgrade to apache-sshd-0.13 and see if it helps, if it does we will enforce it. Only in apache-sshd-1.1.0 (unreleased) we will be able to migrate properly (I hope).

This is my fault.  I see now that this was mentioned in the 3.6.0
release notes as a Fedora 22 specific issue (see also
https://bugzilla.redhat.com/show_bug.cgi?id=1225531).  The work around
is just exactly as I mentioned above, though I figured it out, then read
the release notes; doh!

I don't even see an apache-sshd package installed on my engine (or hosts).
>>> However, all is not rosy.  The deploy script ground to a halt with:
>>> [ INFO  ] Waiting for the host to become operational in the engine. This
>>> may take several minutes...
>>>           The host hosted_engine_2 is in non-operational state.
>>>           Please try to activate it via the engine webadmin UI.
>>>           Retry checking host status or ignore this and continue (Retry,
>>> Ignore)[Retry]?
>>>
>>> So I did as suggested and tried to activate the host from the webadmin
>>> UI.  That didn't work either.  The status message at the bottom of the
>>> browser page shows:
>>>
>>> Host hosted_engine_2 is installed with VDSM version (<UNKNOWN>) and
>>> cannot join cluster Default which is compatible with VDSM versions
>>> [4.13, 4.14, 4.9, 4.16, 4.11, 4.15, 4.12, 4.10].
>>>
>>> The attempt to activate the host via the web UI also caused the
>>> following to be logged on the engine:
>>>
>>> 2015-11-09 19:12:39,828 INFO
>>> [org.ovirt.engine.core.bll.ActivateVdsCommand] (ajp--127.0.0.1-8702-7)
>>> [4bf460e8] Lock Acquired to object EngineLock [exclusiveLocks= key:
>>> fab55ebe-cc0f-4f95-87aa-fc3a5e08a5df value: VDS
>>> , sharedLocks= ]
>>> 2015-11-09 19:12:39,838 INFO
>>> [org.ovirt.engine.core.bll.ActivateVdsCommand]
>>> (org.ovirt.thread.pool-8-thread-49) [4bf460e8] Running command:
>>> ActivateVdsCommand internal: false. Entities affected :  ID:
>>> fab55ebe-cc0f-4f95-87aa-fc3a5e08a5df Type: VDSAction group
>>> MANIPULATE_HOST with role type ADMIN
>>> 2015-11-09 19:12:39,851 INFO
>>> [org.ovirt.engine.core.bll.ActivateVdsCommand]
>>> (org.ovirt.thread.pool-8-thread-49) [4bf460e8] Before acquiring lock in
>>> order to prevent monitoring for host hosted_engine_2 from data-center
>>> Default
>>> 2015-11-09 19:12:39,856 INFO
>>> [org.ovirt.engine.core.bll.ActivateVdsCommand]
>>> (org.ovirt.thread.pool-8-thread-49) [4bf460e8] Lock acquired, from now a
>>> monitoring of host will be skipped for host hosted_engine_2 from
>>> data-center Default
>>> 2015-11-09 19:12:39,861 INFO
>>> [org.ovirt.engine.core.vdsbroker.SetVdsStatusVDSCommand]
>>> (org.ovirt.thread.pool-8-thread-49) [4bf460e8] START,
>>> SetVdsStatusVDSCommand(HostName = hosted_engine_2, HostId =
>>> fab55ebe-cc0f-4f95-87aa-fc3a5e08a5df, status=Unassigned,
>>> nonOperationalReason=NONE, stopSpmFailureLogged=false), log id: 1d206899
>>> 2015-11-09 19:12:39,870 INFO
>>> [org.ovirt.engine.core.vdsbroker.SetVdsStatusVDSCommand]
>>> (org.ovirt.thread.pool-8-thread-49) [4bf460e8] FINISH,
>>> SetVdsStatusVDSCommand, log id: 1d206899
>>> 2015-11-09 19:12:39,888 INFO
>>> [org.ovirt.engine.core.bll.ActivateVdsCommand]
>>> (org.ovirt.thread.pool-8-thread-49) Activate finished. Lock released.
>>> Monitoring can run now for host hosted_engine_2 from data-center Default
>>> 2015-11-09 19:12:39,892 INFO
>>> [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector]
>>> (org.ovirt.thread.pool-8-thread-49) Correlation ID: 4bf460e8, Job ID:
>>> 08a2b1ad-1c1c-425c-b657-7739df72b764, Call Stack: null, Custom Event ID:
>>> -1, Message: Host hosted_engine_2 was activated by admin at internal.
>>> 2015-11-09 19:12:39,895 INFO
>>> [org.ovirt.engine.core.bll.ActivateVdsCommand]
>>> (org.ovirt.thread.pool-8-thread-49) Lock freed to object EngineLock
>>> [exclusiveLocks= key: fab55ebe-cc0f-4f95-87aa-fc3a5e08a5df value: VDS
>>> , sharedLocks= ]
>>> 2015-11-09 19:12:40,263 INFO
>>> [org.ovirt.engine.core.vdsbroker.vdsbroker.GetHardwareInfoVDSCommand]
>>> (DefaultQuartzScheduler_Worker-12) [79b24fed] START,
>>> GetHardwareInfoVDSCommand(HostName = hosted_engine_2, HostId =
>>> fab55ebe-cc0f-4f95-87aa-fc3a5e08a5df,
>>> vds=Host[hosted_engine_2,fab55ebe-cc0f-4f95-87aa-fc3a5e08a5df]), log id:
>>> 7be846bb
>>> 2015-11-09 19:12:40,298 INFO
>>> [org.ovirt.engine.core.vdsbroker.vdsbroker.GetHardwareInfoVDSCommand]
>>> (DefaultQuartzScheduler_Worker-12) [79b24fed] FINISH,
>>> GetHardwareInfoVDSCommand, log id: 7be846bb
>>> 2015-11-09 19:12:40,326 INFO
>>> [org.ovirt.engine.core.bll.SetNonOperationalVdsCommand]
>>> (DefaultQuartzScheduler_Worker-12) [5569d8a6] Running command:
>>> SetNonOperationalVdsCommand internal: true. Entities affected :  ID:
>>> fab55ebe-cc0f-4f95-87aa-fc3a5e08a5df Type: VDS
>>> 2015-11-09 19:12:40,328 INFO
>>> [org.ovirt.engine.core.vdsbroker.SetVdsStatusVDSCommand]
>>> (DefaultQuartzScheduler_Worker-12) [5569d8a6] START,
>>> SetVdsStatusVDSCommand(HostName = hosted_engine_2, HostId =
>>> fab55ebe-cc0f-4f95-87aa-fc3a5e08a5df, status=NonOperational,
>>> nonOperationalReason=NETWORK_UNREACHABLE, stopSpmFailureLogged=false),
>>> log id: 56456697
>>> 2015-11-09 19:12:40,330 INFO
>>> [org.ovirt.engine.core.vdsbroker.SetVdsStatusVDSCommand]
>>> (DefaultQuartzScheduler_Worker-12) [5569d8a6] FINISH,
>>> SetVdsStatusVDSCommand, log id: 56456697
>>> 2015-11-09 19:12:40,332 ERROR
>>> [org.ovirt.engine.core.bll.SetNonOperationalVdsCommand]
>>> (DefaultQuartzScheduler_Worker-12) [5569d8a6] Host hosted_engine_2 is
>>> set to Non-Operational, it is missing the following networks: ovirtmgmt
>>> 2015-11-09 19:12:40,335 WARN
>>> [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector]
>>> (DefaultQuartzScheduler_Worker-12) [5569d8a6] Correlation ID: 5569d8a6,
>>> Job ID: 98718a7b-4f64-4a3d-bb72-e6d46100edc5, Call Stack: null, Custom
>>> Event ID: -1, Message: Host hosted_engine_2 does not comply with the
>>> cluster Default networks, the following networks are missing on host:
>>> 'ovirtmgmt'
>> Didn't you see this one anywhere? There was some problem creating the
>> bridge, or something else making the engine think so.
>>
>> If it seems ok to you, perhaps check also vdsm logs.
>>
>>> 2015-11-09 19:12:40,341 INFO
>>> [org.ovirt.engine.core.bll.HandleVdsCpuFlagsOrClusterChangedCommand]
>>> (DefaultQuartzScheduler_Worker-12) [a4459ff] Running command:
>>> HandleVdsCpuFlagsOrClusterChangedCommand internal: true. Entities
>>> affected :  ID: fab55ebe-cc0f-4f95-87aa-fc3a5e08a5df Type: VDS
>>> 2015-11-09 19:12:40,383 INFO
>>> [org.ovirt.engine.core.vdsbroker.vdsbroker.GetHardwareInfoVDSCommand]
>>> (DefaultQuartzScheduler_Worker-12) [a4459ff] START,
>>> GetHardwareInfoVDSCommand(HostName = hosted_engine_2, HostId =
>>> fab55ebe-cc0f-4f95-87aa-fc3a5e08a5df,
>>> vds=Host[hosted_engine_2,fab55ebe-cc0f-4f95-87aa-fc3a5e08a5df]), log id:
>>> 3cf70a78
>>> 2015-11-09 19:12:40,387 INFO
>>> [org.ovirt.engine.core.vdsbroker.vdsbroker.GetHardwareInfoVDSCommand]
>>> (DefaultQuartzScheduler_Worker-12) [a4459ff] FINISH,
>>> GetHardwareInfoVDSCommand, log id: 3cf70a78
>>> 2015-11-09 19:12:40,407 INFO
>>> [org.ovirt.engine.core.bll.SetNonOperationalVdsCommand]
>>> (DefaultQuartzScheduler_Worker-12) [3031cc4e] Running command:
>>> SetNonOperationalVdsCommand internal: true. Entities affected :  ID:
>>> fab55ebe-cc0f-4f95-87aa-fc3a5e08a5df Type: VDS
>>> 2015-11-09 19:12:40,408 INFO
>>> [org.ovirt.engine.core.vdsbroker.SetVdsStatusVDSCommand]
>>> (DefaultQuartzScheduler_Worker-12) [3031cc4e] START,
>>> SetVdsStatusVDSCommand(HostName = hosted_engine_2, HostId =
>>> fab55ebe-cc0f-4f95-87aa-fc3a5e08a5df, status=NonOperational,
>>> nonOperationalReason=NETWORK_UNREACHABLE, stopSpmFailureLogged=false),
>>> log id: 17fa8ac0
>>> 2015-11-09 19:12:40,411 INFO
>>> [org.ovirt.engine.core.vdsbroker.SetVdsStatusVDSCommand]
>>> (DefaultQuartzScheduler_Worker-12) [3031cc4e] FINISH,
>>> SetVdsStatusVDSCommand, log id: 17fa8ac0
>>> 2015-11-09 19:12:40,413 ERROR
>>> [org.ovirt.engine.core.bll.SetNonOperationalVdsCommand]
>>> (DefaultQuartzScheduler_Worker-12) [3031cc4e] Host hosted_engine_2 is
>>> set to Non-Operational, it is missing the following networks: ovirtmgmt
>>> 2015-11-09 19:12:40,418 INFO
>>> [org.ovirt.engine.core.bll.HandleVdsCpuFlagsOrClusterChangedCommand]
>>> (DefaultQuartzScheduler_Worker-12) [5e83f7aa] Running command:
>>> HandleVdsCpuFlagsOrClusterChangedCommand internal: true. Entities
>>> affected :  ID: fab55ebe-cc0f-4f95-87aa-fc3a5e08a5df Type: VDS
>>> 2015-11-09 19:12:40,439 INFO
>>> [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector]
>>> (DefaultQuartzScheduler_Worker-12) [5e83f7aa] Correlation ID: null, Call
>>> Stack: null, Custom Event ID: -1, Message: Status of host
>>> hosted_engine_2 was set to NonOperational.
>>> 2015-11-09 19:12:40,443 ERROR
>>> [org.ovirt.engine.core.vdsbroker.VdsUpdateRunTimeInfo]
>>> (DefaultQuartzScheduler_Worker-12) [5e83f7aa]
>>> ResourceManager::refreshVdsRunTimeInfo: Error:
>>> DataIntegrityViolationException: CallableStatementCallback; SQL [{call
>>> updatevdsdynamic(?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?,
>>> ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?,
>>> ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?)}]; ERROR: value too long for type
>>> character varying(255)
>> That's ugly too, not sure why it happened and how reproducible, but
>> I think you should first take care of the ovirtmgmt network.
>>
>> Adding Eli anyway.
>>
>>>   Where: SQL statement "UPDATE vds_dynamic
>>>       SET cpu_cores = v_cpu_cores,cpu_threads = v_cpu_threads,
>>>       cpu_model = v_cpu_model,cpu_speed_mh = v_cpu_speed_mh,
>>>       if_total_speed = v_if_total_speed,kvm_enabled = v_kvm_enabled,
>>>       mem_commited = v_mem_commited,physical_mem_mb = v_physical_mem_mb,
>>>       status = v_status,vm_active = v_vm_active,vm_count = v_vm_count,
>>>       vm_migrating = v_vm_migrating,reserved_mem = v_reserved_mem,
>>>       guest_overhead = v_guest_overhead,rpm_version = v_rpm_version,
>>> software_version = v_software_version,
>>>       version_name = v_version_name,build_name =
>>> v_build_name,previous_status = v_previous_status,
>>>       cpu_flags = v_cpu_flags,
>>>       vms_cores_count = v_vms_cores_count,pending_vcpus_count =
>>> v_pending_vcpus_count,
>>>       pending_vmem_size = v_pending_vmem_size,
>>>       cpu_sockets = v_cpu_sockets,net_config_dirty = v_net_config_dirty,
>>>       supported_cluster_levels = v_supported_cluster_levels,
>>>       supported_engines = v_supported_engines,host_os = v_host_os,
>>>       kvm_version = v_kvm_version,libvirt_version =
>>> v_libvirt_version,spice_version = v_spice_version,
>>>       gluster_version = v_gluster_version,
>>>       kernel_version = v_kernel_version,iscsi_initiator_name =
>>> v_iscsi_initiator_name,
>>>       transparent_hugepages_state = v_transparent_hugepages_state,
>>>       hooks = v_hooks,
>>>       _update_date = LOCALTIMESTAMP,non_operational_reason =
>>> v_non_operational_reason,
>>>       hw_manufacturer = v_hw_manufacturer, hw_product_name =
>>> v_hw_product_name,
>>>       hw_version = v_hw_version, hw_serial_number = v_hw_serial_number,
>>>       hw_uuid = v_hw_uuid, hw_family = v_hw_family, hbas = v_hbas,
>>> supported_emulated_machines = v_supported_emulated_machines,
>>>       kdump_status = v_kdump_status, selinux_enforce_mode =
>>> v_selinux_enforce_mode,
>>>       auto_numa_balancing = v_auto_numa_balancing,
>>>       is_numa_supported = v_is_numa_supported,
>>>       supported_rng_sources = v_supported_rng_sources,
>>>       is_live_snapshot_supported = v_is_live_snapshot_supported,
>>>       is_live_merge_supported = v_is_live_merge_supported,
>>>       online_cpus = v_online_cpus
>>>       WHERE vds_id = v_vds_id"
>>>
>>>
>>> The SQL error then seems to repeat several more times.
>>>
>>> -- John Florian
>> Thanks for the report.
>>
>> Best,
>> --
>> Didi
>>


-- 
John Florian




More information about the Users mailing list