
On Tue, Nov 10, 2015 at 2:16 AM, John Florian <jflorian@doubledog.org> wrote:
On 11/09/2015 06:25 PM, John Florian wrote:
I don't think it has anything to do with name resolution either. I believe the telltale clue is this bit... 2015-11-09 18:22:31,738 WARN [org.apache.sshd.client.session.ClientSessionImpl] (pool-20-thread-3) Exception caught: java.lang.IllegalStateException: Unable to negotiate key exchange for kex algorithms (client: diffie-hellman-group1-sha1 / server: curve25519-sha256@libssh.org,ecdh-sha2-nistp256,ecdh-sha2-nistp384,ecdh-sha2-nistp521,diffie-hellman-group-exchange-sha256,diffie-hellman-group14-sha1) As mentioned, I can ssh from my engine to the host just fine. It appears that the Java-based ssh client however cannot.
I got past the above problem by adding the following line to the /etc/ssh/sshd_config of the new F22 host:
KexAlgorithms curve25519-sha256@libssh.org,ecdh-sha2-nistp256,ecdh-sha2-nistp384,ecdh-sha2-nistp521,diffie-hellman-group-exchange-sha256,diffie-hellman-group14-sha1,diffie-hellman-group1-sha1
This represents the defaults for F22 -- at least according to sshd_config(5) -- but with the addition of diffie-hellman-group1-sha1 that the Java-based ssh client seems insistent on using.
Adding Alon for this. Not sure if we can configure the java ssh client and how.
However, all is not rosy. The deploy script ground to a halt with: [ INFO ] Waiting for the host to become operational in the engine. This may take several minutes... The host hosted_engine_2 is in non-operational state. Please try to activate it via the engine webadmin UI. Retry checking host status or ignore this and continue (Retry, Ignore)[Retry]?
So I did as suggested and tried to activate the host from the webadmin UI. That didn't work either. The status message at the bottom of the browser page shows:
Host hosted_engine_2 is installed with VDSM version (<UNKNOWN>) and cannot join cluster Default which is compatible with VDSM versions [4.13, 4.14, 4.9, 4.16, 4.11, 4.15, 4.12, 4.10].
The attempt to activate the host via the web UI also caused the following to be logged on the engine:
2015-11-09 19:12:39,828 INFO [org.ovirt.engine.core.bll.ActivateVdsCommand] (ajp--127.0.0.1-8702-7) [4bf460e8] Lock Acquired to object EngineLock [exclusiveLocks= key: fab55ebe-cc0f-4f95-87aa-fc3a5e08a5df value: VDS , sharedLocks= ] 2015-11-09 19:12:39,838 INFO [org.ovirt.engine.core.bll.ActivateVdsCommand] (org.ovirt.thread.pool-8-thread-49) [4bf460e8] Running command: ActivateVdsCommand internal: false. Entities affected : ID: fab55ebe-cc0f-4f95-87aa-fc3a5e08a5df Type: VDSAction group MANIPULATE_HOST with role type ADMIN 2015-11-09 19:12:39,851 INFO [org.ovirt.engine.core.bll.ActivateVdsCommand] (org.ovirt.thread.pool-8-thread-49) [4bf460e8] Before acquiring lock in order to prevent monitoring for host hosted_engine_2 from data-center Default 2015-11-09 19:12:39,856 INFO [org.ovirt.engine.core.bll.ActivateVdsCommand] (org.ovirt.thread.pool-8-thread-49) [4bf460e8] Lock acquired, from now a monitoring of host will be skipped for host hosted_engine_2 from data-center Default 2015-11-09 19:12:39,861 INFO [org.ovirt.engine.core.vdsbroker.SetVdsStatusVDSCommand] (org.ovirt.thread.pool-8-thread-49) [4bf460e8] START, SetVdsStatusVDSCommand(HostName = hosted_engine_2, HostId = fab55ebe-cc0f-4f95-87aa-fc3a5e08a5df, status=Unassigned, nonOperationalReason=NONE, stopSpmFailureLogged=false), log id: 1d206899 2015-11-09 19:12:39,870 INFO [org.ovirt.engine.core.vdsbroker.SetVdsStatusVDSCommand] (org.ovirt.thread.pool-8-thread-49) [4bf460e8] FINISH, SetVdsStatusVDSCommand, log id: 1d206899 2015-11-09 19:12:39,888 INFO [org.ovirt.engine.core.bll.ActivateVdsCommand] (org.ovirt.thread.pool-8-thread-49) Activate finished. Lock released. Monitoring can run now for host hosted_engine_2 from data-center Default 2015-11-09 19:12:39,892 INFO [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (org.ovirt.thread.pool-8-thread-49) Correlation ID: 4bf460e8, Job ID: 08a2b1ad-1c1c-425c-b657-7739df72b764, Call Stack: null, Custom Event ID: -1, Message: Host hosted_engine_2 was activated by admin@internal. 2015-11-09 19:12:39,895 INFO [org.ovirt.engine.core.bll.ActivateVdsCommand] (org.ovirt.thread.pool-8-thread-49) Lock freed to object EngineLock [exclusiveLocks= key: fab55ebe-cc0f-4f95-87aa-fc3a5e08a5df value: VDS , sharedLocks= ] 2015-11-09 19:12:40,263 INFO [org.ovirt.engine.core.vdsbroker.vdsbroker.GetHardwareInfoVDSCommand] (DefaultQuartzScheduler_Worker-12) [79b24fed] START, GetHardwareInfoVDSCommand(HostName = hosted_engine_2, HostId = fab55ebe-cc0f-4f95-87aa-fc3a5e08a5df, vds=Host[hosted_engine_2,fab55ebe-cc0f-4f95-87aa-fc3a5e08a5df]), log id: 7be846bb 2015-11-09 19:12:40,298 INFO [org.ovirt.engine.core.vdsbroker.vdsbroker.GetHardwareInfoVDSCommand] (DefaultQuartzScheduler_Worker-12) [79b24fed] FINISH, GetHardwareInfoVDSCommand, log id: 7be846bb 2015-11-09 19:12:40,326 INFO [org.ovirt.engine.core.bll.SetNonOperationalVdsCommand] (DefaultQuartzScheduler_Worker-12) [5569d8a6] Running command: SetNonOperationalVdsCommand internal: true. Entities affected : ID: fab55ebe-cc0f-4f95-87aa-fc3a5e08a5df Type: VDS 2015-11-09 19:12:40,328 INFO [org.ovirt.engine.core.vdsbroker.SetVdsStatusVDSCommand] (DefaultQuartzScheduler_Worker-12) [5569d8a6] START, SetVdsStatusVDSCommand(HostName = hosted_engine_2, HostId = fab55ebe-cc0f-4f95-87aa-fc3a5e08a5df, status=NonOperational, nonOperationalReason=NETWORK_UNREACHABLE, stopSpmFailureLogged=false), log id: 56456697 2015-11-09 19:12:40,330 INFO [org.ovirt.engine.core.vdsbroker.SetVdsStatusVDSCommand] (DefaultQuartzScheduler_Worker-12) [5569d8a6] FINISH, SetVdsStatusVDSCommand, log id: 56456697 2015-11-09 19:12:40,332 ERROR [org.ovirt.engine.core.bll.SetNonOperationalVdsCommand] (DefaultQuartzScheduler_Worker-12) [5569d8a6] Host hosted_engine_2 is set to Non-Operational, it is missing the following networks: ovirtmgmt 2015-11-09 19:12:40,335 WARN [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (DefaultQuartzScheduler_Worker-12) [5569d8a6] Correlation ID: 5569d8a6, Job ID: 98718a7b-4f64-4a3d-bb72-e6d46100edc5, Call Stack: null, Custom Event ID: -1, Message: Host hosted_engine_2 does not comply with the cluster Default networks, the following networks are missing on host: 'ovirtmgmt'
Didn't you see this one anywhere? There was some problem creating the bridge, or something else making the engine think so. If it seems ok to you, perhaps check also vdsm logs.
2015-11-09 19:12:40,341 INFO [org.ovirt.engine.core.bll.HandleVdsCpuFlagsOrClusterChangedCommand] (DefaultQuartzScheduler_Worker-12) [a4459ff] Running command: HandleVdsCpuFlagsOrClusterChangedCommand internal: true. Entities affected : ID: fab55ebe-cc0f-4f95-87aa-fc3a5e08a5df Type: VDS 2015-11-09 19:12:40,383 INFO [org.ovirt.engine.core.vdsbroker.vdsbroker.GetHardwareInfoVDSCommand] (DefaultQuartzScheduler_Worker-12) [a4459ff] START, GetHardwareInfoVDSCommand(HostName = hosted_engine_2, HostId = fab55ebe-cc0f-4f95-87aa-fc3a5e08a5df, vds=Host[hosted_engine_2,fab55ebe-cc0f-4f95-87aa-fc3a5e08a5df]), log id: 3cf70a78 2015-11-09 19:12:40,387 INFO [org.ovirt.engine.core.vdsbroker.vdsbroker.GetHardwareInfoVDSCommand] (DefaultQuartzScheduler_Worker-12) [a4459ff] FINISH, GetHardwareInfoVDSCommand, log id: 3cf70a78 2015-11-09 19:12:40,407 INFO [org.ovirt.engine.core.bll.SetNonOperationalVdsCommand] (DefaultQuartzScheduler_Worker-12) [3031cc4e] Running command: SetNonOperationalVdsCommand internal: true. Entities affected : ID: fab55ebe-cc0f-4f95-87aa-fc3a5e08a5df Type: VDS 2015-11-09 19:12:40,408 INFO [org.ovirt.engine.core.vdsbroker.SetVdsStatusVDSCommand] (DefaultQuartzScheduler_Worker-12) [3031cc4e] START, SetVdsStatusVDSCommand(HostName = hosted_engine_2, HostId = fab55ebe-cc0f-4f95-87aa-fc3a5e08a5df, status=NonOperational, nonOperationalReason=NETWORK_UNREACHABLE, stopSpmFailureLogged=false), log id: 17fa8ac0 2015-11-09 19:12:40,411 INFO [org.ovirt.engine.core.vdsbroker.SetVdsStatusVDSCommand] (DefaultQuartzScheduler_Worker-12) [3031cc4e] FINISH, SetVdsStatusVDSCommand, log id: 17fa8ac0 2015-11-09 19:12:40,413 ERROR [org.ovirt.engine.core.bll.SetNonOperationalVdsCommand] (DefaultQuartzScheduler_Worker-12) [3031cc4e] Host hosted_engine_2 is set to Non-Operational, it is missing the following networks: ovirtmgmt 2015-11-09 19:12:40,418 INFO [org.ovirt.engine.core.bll.HandleVdsCpuFlagsOrClusterChangedCommand] (DefaultQuartzScheduler_Worker-12) [5e83f7aa] Running command: HandleVdsCpuFlagsOrClusterChangedCommand internal: true. Entities affected : ID: fab55ebe-cc0f-4f95-87aa-fc3a5e08a5df Type: VDS 2015-11-09 19:12:40,439 INFO [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (DefaultQuartzScheduler_Worker-12) [5e83f7aa] Correlation ID: null, Call Stack: null, Custom Event ID: -1, Message: Status of host hosted_engine_2 was set to NonOperational. 2015-11-09 19:12:40,443 ERROR [org.ovirt.engine.core.vdsbroker.VdsUpdateRunTimeInfo] (DefaultQuartzScheduler_Worker-12) [5e83f7aa] ResourceManager::refreshVdsRunTimeInfo: Error: DataIntegrityViolationException: CallableStatementCallback; SQL [{call updatevdsdynamic(?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?)}]; ERROR: value too long for type character varying(255)
That's ugly too, not sure why it happened and how reproducible, but I think you should first take care of the ovirtmgmt network. Adding Eli anyway.
Where: SQL statement "UPDATE vds_dynamic SET cpu_cores = v_cpu_cores,cpu_threads = v_cpu_threads, cpu_model = v_cpu_model,cpu_speed_mh = v_cpu_speed_mh, if_total_speed = v_if_total_speed,kvm_enabled = v_kvm_enabled, mem_commited = v_mem_commited,physical_mem_mb = v_physical_mem_mb, status = v_status,vm_active = v_vm_active,vm_count = v_vm_count, vm_migrating = v_vm_migrating,reserved_mem = v_reserved_mem, guest_overhead = v_guest_overhead,rpm_version = v_rpm_version, software_version = v_software_version, version_name = v_version_name,build_name = v_build_name,previous_status = v_previous_status, cpu_flags = v_cpu_flags, vms_cores_count = v_vms_cores_count,pending_vcpus_count = v_pending_vcpus_count, pending_vmem_size = v_pending_vmem_size, cpu_sockets = v_cpu_sockets,net_config_dirty = v_net_config_dirty, supported_cluster_levels = v_supported_cluster_levels, supported_engines = v_supported_engines,host_os = v_host_os, kvm_version = v_kvm_version,libvirt_version = v_libvirt_version,spice_version = v_spice_version, gluster_version = v_gluster_version, kernel_version = v_kernel_version,iscsi_initiator_name = v_iscsi_initiator_name, transparent_hugepages_state = v_transparent_hugepages_state, hooks = v_hooks, _update_date = LOCALTIMESTAMP,non_operational_reason = v_non_operational_reason, hw_manufacturer = v_hw_manufacturer, hw_product_name = v_hw_product_name, hw_version = v_hw_version, hw_serial_number = v_hw_serial_number, hw_uuid = v_hw_uuid, hw_family = v_hw_family, hbas = v_hbas, supported_emulated_machines = v_supported_emulated_machines, kdump_status = v_kdump_status, selinux_enforce_mode = v_selinux_enforce_mode, auto_numa_balancing = v_auto_numa_balancing, is_numa_supported = v_is_numa_supported, supported_rng_sources = v_supported_rng_sources, is_live_snapshot_supported = v_is_live_snapshot_supported, is_live_merge_supported = v_is_live_merge_supported, online_cpus = v_online_cpus WHERE vds_id = v_vds_id"
The SQL error then seems to repeat several more times.
-- John Florian
Thanks for the report. Best, -- Didi