On Tue, Nov 10, 2015 at 2:16 AM, John Florian <jflorian(a)doubledog.org> wrote:
On 11/09/2015 06:25 PM, John Florian wrote:
> I don't think it has anything to do with name resolution either. I
> believe the telltale clue is this bit... 2015-11-09 18:22:31,738 WARN
> [org.apache.sshd.client.session.ClientSessionImpl] (pool-20-thread-3)
> Exception caught: java.lang.IllegalStateException: Unable to negotiate
> key exchange for kex algorithms (client: diffie-hellman-group1-sha1 /
> server:
>
curve25519-sha256@libssh.org,ecdh-sha2-nistp256,ecdh-sha2-nistp384,ecdh-sha2-nistp521,diffie-hellman-group-exchange-sha256,diffie-hellman-group14-sha1)
> As mentioned, I can ssh from my engine to the host just fine. It
> appears that the Java-based ssh client however cannot.
I got past the above problem by adding the following line to the
/etc/ssh/sshd_config of the new F22 host:
KexAlgorithms
curve25519-sha256@libssh.org,ecdh-sha2-nistp256,ecdh-sha2-nistp384,ecdh-sha2-nistp521,diffie-hellman-group-exchange-sha256,diffie-hellman-group14-sha1,diffie-hellman-group1-sha1
This represents the defaults for F22 -- at least according to
sshd_config(5) -- but with the addition of diffie-hellman-group1-sha1
that the Java-based ssh client seems insistent on using.
Adding Alon for this. Not sure if we can configure the java ssh client
and how.
However, all is not rosy. The deploy script ground to a halt with:
[ INFO ] Waiting for the host to become operational in the engine. This
may take several minutes...
The host hosted_engine_2 is in non-operational state.
Please try to activate it via the engine webadmin UI.
Retry checking host status or ignore this and continue (Retry,
Ignore)[Retry]?
So I did as suggested and tried to activate the host from the webadmin
UI. That didn't work either. The status message at the bottom of the
browser page shows:
Host hosted_engine_2 is installed with VDSM version (<UNKNOWN>) and
cannot join cluster Default which is compatible with VDSM versions
[4.13, 4.14, 4.9, 4.16, 4.11, 4.15, 4.12, 4.10].
The attempt to activate the host via the web UI also caused the
following to be logged on the engine:
2015-11-09 19:12:39,828 INFO
[org.ovirt.engine.core.bll.ActivateVdsCommand] (ajp--127.0.0.1-8702-7)
[4bf460e8] Lock Acquired to object EngineLock [exclusiveLocks= key:
fab55ebe-cc0f-4f95-87aa-fc3a5e08a5df value: VDS
, sharedLocks= ]
2015-11-09 19:12:39,838 INFO
[org.ovirt.engine.core.bll.ActivateVdsCommand]
(org.ovirt.thread.pool-8-thread-49) [4bf460e8] Running command:
ActivateVdsCommand internal: false. Entities affected : ID:
fab55ebe-cc0f-4f95-87aa-fc3a5e08a5df Type: VDSAction group
MANIPULATE_HOST with role type ADMIN
2015-11-09 19:12:39,851 INFO
[org.ovirt.engine.core.bll.ActivateVdsCommand]
(org.ovirt.thread.pool-8-thread-49) [4bf460e8] Before acquiring lock in
order to prevent monitoring for host hosted_engine_2 from data-center
Default
2015-11-09 19:12:39,856 INFO
[org.ovirt.engine.core.bll.ActivateVdsCommand]
(org.ovirt.thread.pool-8-thread-49) [4bf460e8] Lock acquired, from now a
monitoring of host will be skipped for host hosted_engine_2 from
data-center Default
2015-11-09 19:12:39,861 INFO
[org.ovirt.engine.core.vdsbroker.SetVdsStatusVDSCommand]
(org.ovirt.thread.pool-8-thread-49) [4bf460e8] START,
SetVdsStatusVDSCommand(HostName = hosted_engine_2, HostId =
fab55ebe-cc0f-4f95-87aa-fc3a5e08a5df, status=Unassigned,
nonOperationalReason=NONE, stopSpmFailureLogged=false), log id: 1d206899
2015-11-09 19:12:39,870 INFO
[org.ovirt.engine.core.vdsbroker.SetVdsStatusVDSCommand]
(org.ovirt.thread.pool-8-thread-49) [4bf460e8] FINISH,
SetVdsStatusVDSCommand, log id: 1d206899
2015-11-09 19:12:39,888 INFO
[org.ovirt.engine.core.bll.ActivateVdsCommand]
(org.ovirt.thread.pool-8-thread-49) Activate finished. Lock released.
Monitoring can run now for host hosted_engine_2 from data-center Default
2015-11-09 19:12:39,892 INFO
[org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector]
(org.ovirt.thread.pool-8-thread-49) Correlation ID: 4bf460e8, Job ID:
08a2b1ad-1c1c-425c-b657-7739df72b764, Call Stack: null, Custom Event ID:
-1, Message: Host hosted_engine_2 was activated by admin@internal.
2015-11-09 19:12:39,895 INFO
[org.ovirt.engine.core.bll.ActivateVdsCommand]
(org.ovirt.thread.pool-8-thread-49) Lock freed to object EngineLock
[exclusiveLocks= key: fab55ebe-cc0f-4f95-87aa-fc3a5e08a5df value: VDS
, sharedLocks= ]
2015-11-09 19:12:40,263 INFO
[org.ovirt.engine.core.vdsbroker.vdsbroker.GetHardwareInfoVDSCommand]
(DefaultQuartzScheduler_Worker-12) [79b24fed] START,
GetHardwareInfoVDSCommand(HostName = hosted_engine_2, HostId =
fab55ebe-cc0f-4f95-87aa-fc3a5e08a5df,
vds=Host[hosted_engine_2,fab55ebe-cc0f-4f95-87aa-fc3a5e08a5df]), log id:
7be846bb
2015-11-09 19:12:40,298 INFO
[org.ovirt.engine.core.vdsbroker.vdsbroker.GetHardwareInfoVDSCommand]
(DefaultQuartzScheduler_Worker-12) [79b24fed] FINISH,
GetHardwareInfoVDSCommand, log id: 7be846bb
2015-11-09 19:12:40,326 INFO
[org.ovirt.engine.core.bll.SetNonOperationalVdsCommand]
(DefaultQuartzScheduler_Worker-12) [5569d8a6] Running command:
SetNonOperationalVdsCommand internal: true. Entities affected : ID:
fab55ebe-cc0f-4f95-87aa-fc3a5e08a5df Type: VDS
2015-11-09 19:12:40,328 INFO
[org.ovirt.engine.core.vdsbroker.SetVdsStatusVDSCommand]
(DefaultQuartzScheduler_Worker-12) [5569d8a6] START,
SetVdsStatusVDSCommand(HostName = hosted_engine_2, HostId =
fab55ebe-cc0f-4f95-87aa-fc3a5e08a5df, status=NonOperational,
nonOperationalReason=NETWORK_UNREACHABLE, stopSpmFailureLogged=false),
log id: 56456697
2015-11-09 19:12:40,330 INFO
[org.ovirt.engine.core.vdsbroker.SetVdsStatusVDSCommand]
(DefaultQuartzScheduler_Worker-12) [5569d8a6] FINISH,
SetVdsStatusVDSCommand, log id: 56456697
2015-11-09 19:12:40,332 ERROR
[org.ovirt.engine.core.bll.SetNonOperationalVdsCommand]
(DefaultQuartzScheduler_Worker-12) [5569d8a6] Host hosted_engine_2 is
set to Non-Operational, it is missing the following networks: ovirtmgmt
2015-11-09 19:12:40,335 WARN
[org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector]
(DefaultQuartzScheduler_Worker-12) [5569d8a6] Correlation ID: 5569d8a6,
Job ID: 98718a7b-4f64-4a3d-bb72-e6d46100edc5, Call Stack: null, Custom
Event ID: -1, Message: Host hosted_engine_2 does not comply with the
cluster Default networks, the following networks are missing on host:
'ovirtmgmt'
Didn't you see this one anywhere? There was some problem creating the
bridge, or something else making the engine think so.
If it seems ok to you, perhaps check also vdsm logs.
2015-11-09 19:12:40,341 INFO
[org.ovirt.engine.core.bll.HandleVdsCpuFlagsOrClusterChangedCommand]
(DefaultQuartzScheduler_Worker-12) [a4459ff] Running command:
HandleVdsCpuFlagsOrClusterChangedCommand internal: true. Entities
affected : ID: fab55ebe-cc0f-4f95-87aa-fc3a5e08a5df Type: VDS
2015-11-09 19:12:40,383 INFO
[org.ovirt.engine.core.vdsbroker.vdsbroker.GetHardwareInfoVDSCommand]
(DefaultQuartzScheduler_Worker-12) [a4459ff] START,
GetHardwareInfoVDSCommand(HostName = hosted_engine_2, HostId =
fab55ebe-cc0f-4f95-87aa-fc3a5e08a5df,
vds=Host[hosted_engine_2,fab55ebe-cc0f-4f95-87aa-fc3a5e08a5df]), log id:
3cf70a78
2015-11-09 19:12:40,387 INFO
[org.ovirt.engine.core.vdsbroker.vdsbroker.GetHardwareInfoVDSCommand]
(DefaultQuartzScheduler_Worker-12) [a4459ff] FINISH,
GetHardwareInfoVDSCommand, log id: 3cf70a78
2015-11-09 19:12:40,407 INFO
[org.ovirt.engine.core.bll.SetNonOperationalVdsCommand]
(DefaultQuartzScheduler_Worker-12) [3031cc4e] Running command:
SetNonOperationalVdsCommand internal: true. Entities affected : ID:
fab55ebe-cc0f-4f95-87aa-fc3a5e08a5df Type: VDS
2015-11-09 19:12:40,408 INFO
[org.ovirt.engine.core.vdsbroker.SetVdsStatusVDSCommand]
(DefaultQuartzScheduler_Worker-12) [3031cc4e] START,
SetVdsStatusVDSCommand(HostName = hosted_engine_2, HostId =
fab55ebe-cc0f-4f95-87aa-fc3a5e08a5df, status=NonOperational,
nonOperationalReason=NETWORK_UNREACHABLE, stopSpmFailureLogged=false),
log id: 17fa8ac0
2015-11-09 19:12:40,411 INFO
[org.ovirt.engine.core.vdsbroker.SetVdsStatusVDSCommand]
(DefaultQuartzScheduler_Worker-12) [3031cc4e] FINISH,
SetVdsStatusVDSCommand, log id: 17fa8ac0
2015-11-09 19:12:40,413 ERROR
[org.ovirt.engine.core.bll.SetNonOperationalVdsCommand]
(DefaultQuartzScheduler_Worker-12) [3031cc4e] Host hosted_engine_2 is
set to Non-Operational, it is missing the following networks: ovirtmgmt
2015-11-09 19:12:40,418 INFO
[org.ovirt.engine.core.bll.HandleVdsCpuFlagsOrClusterChangedCommand]
(DefaultQuartzScheduler_Worker-12) [5e83f7aa] Running command:
HandleVdsCpuFlagsOrClusterChangedCommand internal: true. Entities
affected : ID: fab55ebe-cc0f-4f95-87aa-fc3a5e08a5df Type: VDS
2015-11-09 19:12:40,439 INFO
[org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector]
(DefaultQuartzScheduler_Worker-12) [5e83f7aa] Correlation ID: null, Call
Stack: null, Custom Event ID: -1, Message: Status of host
hosted_engine_2 was set to NonOperational.
2015-11-09 19:12:40,443 ERROR
[org.ovirt.engine.core.vdsbroker.VdsUpdateRunTimeInfo]
(DefaultQuartzScheduler_Worker-12) [5e83f7aa]
ResourceManager::refreshVdsRunTimeInfo: Error:
DataIntegrityViolationException: CallableStatementCallback; SQL [{call
updatevdsdynamic(?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?,
?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?,
?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?)}]; ERROR: value too long for type
character varying(255)
That's ugly too, not sure why it happened and how reproducible, but
I think you should first take care of the ovirtmgmt network.
Adding Eli anyway.
Where: SQL statement "UPDATE vds_dynamic
SET cpu_cores = v_cpu_cores,cpu_threads = v_cpu_threads,
cpu_model = v_cpu_model,cpu_speed_mh = v_cpu_speed_mh,
if_total_speed = v_if_total_speed,kvm_enabled = v_kvm_enabled,
mem_commited = v_mem_commited,physical_mem_mb = v_physical_mem_mb,
status = v_status,vm_active = v_vm_active,vm_count = v_vm_count,
vm_migrating = v_vm_migrating,reserved_mem = v_reserved_mem,
guest_overhead = v_guest_overhead,rpm_version = v_rpm_version,
software_version = v_software_version,
version_name = v_version_name,build_name =
v_build_name,previous_status = v_previous_status,
cpu_flags = v_cpu_flags,
vms_cores_count = v_vms_cores_count,pending_vcpus_count =
v_pending_vcpus_count,
pending_vmem_size = v_pending_vmem_size,
cpu_sockets = v_cpu_sockets,net_config_dirty = v_net_config_dirty,
supported_cluster_levels = v_supported_cluster_levels,
supported_engines = v_supported_engines,host_os = v_host_os,
kvm_version = v_kvm_version,libvirt_version =
v_libvirt_version,spice_version = v_spice_version,
gluster_version = v_gluster_version,
kernel_version = v_kernel_version,iscsi_initiator_name =
v_iscsi_initiator_name,
transparent_hugepages_state = v_transparent_hugepages_state,
hooks = v_hooks,
_update_date = LOCALTIMESTAMP,non_operational_reason =
v_non_operational_reason,
hw_manufacturer = v_hw_manufacturer, hw_product_name =
v_hw_product_name,
hw_version = v_hw_version, hw_serial_number = v_hw_serial_number,
hw_uuid = v_hw_uuid, hw_family = v_hw_family, hbas = v_hbas,
supported_emulated_machines = v_supported_emulated_machines,
kdump_status = v_kdump_status, selinux_enforce_mode =
v_selinux_enforce_mode,
auto_numa_balancing = v_auto_numa_balancing,
is_numa_supported = v_is_numa_supported,
supported_rng_sources = v_supported_rng_sources,
is_live_snapshot_supported = v_is_live_snapshot_supported,
is_live_merge_supported = v_is_live_merge_supported,
online_cpus = v_online_cpus
WHERE vds_id = v_vds_id"
The SQL error then seems to repeat several more times.
-- John Florian
Thanks for the report.
Best,
--
Didi