On 11/10/2015 03:39 AM, Yedidyah Bar David wrote:
On Tue, Nov 10, 2015 at 2:16 AM, John Florian
<jflorian(a)doubledog.org> wrote:
> On 11/09/2015 06:25 PM, John Florian wrote:
>> I don't think it has anything to do with name resolution either. I
>> believe the telltale clue is this bit... 2015-11-09 18:22:31,738 WARN
>> [org.apache.sshd.client.session.ClientSessionImpl] (pool-20-thread-3)
>> Exception caught: java.lang.IllegalStateException: Unable to negotiate
>> key exchange for kex algorithms (client: diffie-hellman-group1-sha1 /
>> server:
>>
curve25519-sha256@libssh.org,ecdh-sha2-nistp256,ecdh-sha2-nistp384,ecdh-sha2-nistp521,diffie-hellman-group-exchange-sha256,diffie-hellman-group14-sha1)
>> As mentioned, I can ssh from my engine to the host just fine. It
>> appears that the Java-based ssh client however cannot.
> I got past the above problem by adding the following line to the
> /etc/ssh/sshd_config of the new F22 host:
>
> KexAlgorithms
>
curve25519-sha256@libssh.org,ecdh-sha2-nistp256,ecdh-sha2-nistp384,ecdh-sha2-nistp521,diffie-hellman-group-exchange-sha256,diffie-hellman-group14-sha1,diffie-hellman-group1-sha1
>
> This represents the defaults for F22 -- at least according to
> sshd_config(5) -- but with the addition of diffie-hellman-group1-sha1
> that the Java-based ssh client seems insistent on using.
>
Adding Alon for this. Not sure if we can configure the java ssh client
and how.
> However, all is not rosy. The deploy script ground to a halt with:
> [ INFO ] Waiting for the host to become operational in the engine. This
> may take several minutes...
> The host hosted_engine_2 is in non-operational state.
> Please try to activate it via the engine webadmin UI.
> Retry checking host status or ignore this and continue (Retry,
> Ignore)[Retry]?
>
> So I did as suggested and tried to activate the host from the webadmin
> UI. That didn't work either. The status message at the bottom of the
> browser page shows:
>
> Host hosted_engine_2 is installed with VDSM version (<UNKNOWN>) and
> cannot join cluster Default which is compatible with VDSM versions
> [4.13, 4.14, 4.9, 4.16, 4.11, 4.15, 4.12, 4.10].
>
> The attempt to activate the host via the web UI also caused the
> following to be logged on the engine:
>
> 2015-11-09 19:12:39,828 INFO
> [org.ovirt.engine.core.bll.ActivateVdsCommand] (ajp--127.0.0.1-8702-7)
> [4bf460e8] Lock Acquired to object EngineLock [exclusiveLocks= key:
> fab55ebe-cc0f-4f95-87aa-fc3a5e08a5df value: VDS
> , sharedLocks= ]
> 2015-11-09 19:12:39,838 INFO
> [org.ovirt.engine.core.bll.ActivateVdsCommand]
> (org.ovirt.thread.pool-8-thread-49) [4bf460e8] Running command:
> ActivateVdsCommand internal: false. Entities affected : ID:
> fab55ebe-cc0f-4f95-87aa-fc3a5e08a5df Type: VDSAction group
> MANIPULATE_HOST with role type ADMIN
> 2015-11-09 19:12:39,851 INFO
> [org.ovirt.engine.core.bll.ActivateVdsCommand]
> (org.ovirt.thread.pool-8-thread-49) [4bf460e8] Before acquiring lock in
> order to prevent monitoring for host hosted_engine_2 from data-center
> Default
> 2015-11-09 19:12:39,856 INFO
> [org.ovirt.engine.core.bll.ActivateVdsCommand]
> (org.ovirt.thread.pool-8-thread-49) [4bf460e8] Lock acquired, from now a
> monitoring of host will be skipped for host hosted_engine_2 from
> data-center Default
> 2015-11-09 19:12:39,861 INFO
> [org.ovirt.engine.core.vdsbroker.SetVdsStatusVDSCommand]
> (org.ovirt.thread.pool-8-thread-49) [4bf460e8] START,
> SetVdsStatusVDSCommand(HostName = hosted_engine_2, HostId =
> fab55ebe-cc0f-4f95-87aa-fc3a5e08a5df, status=Unassigned,
> nonOperationalReason=NONE, stopSpmFailureLogged=false), log id: 1d206899
> 2015-11-09 19:12:39,870 INFO
> [org.ovirt.engine.core.vdsbroker.SetVdsStatusVDSCommand]
> (org.ovirt.thread.pool-8-thread-49) [4bf460e8] FINISH,
> SetVdsStatusVDSCommand, log id: 1d206899
> 2015-11-09 19:12:39,888 INFO
> [org.ovirt.engine.core.bll.ActivateVdsCommand]
> (org.ovirt.thread.pool-8-thread-49) Activate finished. Lock released.
> Monitoring can run now for host hosted_engine_2 from data-center Default
> 2015-11-09 19:12:39,892 INFO
> [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector]
> (org.ovirt.thread.pool-8-thread-49) Correlation ID: 4bf460e8, Job ID:
> 08a2b1ad-1c1c-425c-b657-7739df72b764, Call Stack: null, Custom Event ID:
> -1, Message: Host hosted_engine_2 was activated by admin@internal.
> 2015-11-09 19:12:39,895 INFO
> [org.ovirt.engine.core.bll.ActivateVdsCommand]
> (org.ovirt.thread.pool-8-thread-49) Lock freed to object EngineLock
> [exclusiveLocks= key: fab55ebe-cc0f-4f95-87aa-fc3a5e08a5df value: VDS
> , sharedLocks= ]
> 2015-11-09 19:12:40,263 INFO
> [org.ovirt.engine.core.vdsbroker.vdsbroker.GetHardwareInfoVDSCommand]
> (DefaultQuartzScheduler_Worker-12) [79b24fed] START,
> GetHardwareInfoVDSCommand(HostName = hosted_engine_2, HostId =
> fab55ebe-cc0f-4f95-87aa-fc3a5e08a5df,
> vds=Host[hosted_engine_2,fab55ebe-cc0f-4f95-87aa-fc3a5e08a5df]), log id:
> 7be846bb
> 2015-11-09 19:12:40,298 INFO
> [org.ovirt.engine.core.vdsbroker.vdsbroker.GetHardwareInfoVDSCommand]
> (DefaultQuartzScheduler_Worker-12) [79b24fed] FINISH,
> GetHardwareInfoVDSCommand, log id: 7be846bb
> 2015-11-09 19:12:40,326 INFO
> [org.ovirt.engine.core.bll.SetNonOperationalVdsCommand]
> (DefaultQuartzScheduler_Worker-12) [5569d8a6] Running command:
> SetNonOperationalVdsCommand internal: true. Entities affected : ID:
> fab55ebe-cc0f-4f95-87aa-fc3a5e08a5df Type: VDS
> 2015-11-09 19:12:40,328 INFO
> [org.ovirt.engine.core.vdsbroker.SetVdsStatusVDSCommand]
> (DefaultQuartzScheduler_Worker-12) [5569d8a6] START,
> SetVdsStatusVDSCommand(HostName = hosted_engine_2, HostId =
> fab55ebe-cc0f-4f95-87aa-fc3a5e08a5df, status=NonOperational,
> nonOperationalReason=NETWORK_UNREACHABLE, stopSpmFailureLogged=false),
> log id: 56456697
> 2015-11-09 19:12:40,330 INFO
> [org.ovirt.engine.core.vdsbroker.SetVdsStatusVDSCommand]
> (DefaultQuartzScheduler_Worker-12) [5569d8a6] FINISH,
> SetVdsStatusVDSCommand, log id: 56456697
> 2015-11-09 19:12:40,332 ERROR
> [org.ovirt.engine.core.bll.SetNonOperationalVdsCommand]
> (DefaultQuartzScheduler_Worker-12) [5569d8a6] Host hosted_engine_2 is
> set to Non-Operational, it is missing the following networks: ovirtmgmt
> 2015-11-09 19:12:40,335 WARN
> [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector]
> (DefaultQuartzScheduler_Worker-12) [5569d8a6] Correlation ID: 5569d8a6,
> Job ID: 98718a7b-4f64-4a3d-bb72-e6d46100edc5, Call Stack: null, Custom
> Event ID: -1, Message: Host hosted_engine_2 does not comply with the
> cluster Default networks, the following networks are missing on host:
> 'ovirtmgmt'
Didn't you see this one anywhere? There was some problem creating the
bridge, or something else making the engine think so.
If it seems ok to you, perhaps check also vdsm logs.
I might have missed that; I
don't recall for certain as there is a lot
of logging going on in various places, it's hard to know where to look
sometimes. Then there's all the errors that seem to happen that aren't
really problems AFAICT.
Anyway, I just ran through an attempted deploy again and I don't see
ovirtmgmt mentioned anywhere in /var/log/vdsm/vdsm.log, nor anything
else in that file looking majorly bad WRT networking. I do see a bunch
of errors related to storage though. Any hints what I should be search
for? Would it be best just to post the file somewhere -- it's a rather
portly 12MB ATM.
And just because I'm feeling stupid right now.... I don't need to do
anything special WRT configuring networking prior to running
hosted-engine --deploy do I? I mean its got working IP, DNS, etc. but
IIRC, the installation handles all the setup of ovirtmgmt, right?
> 2015-11-09 19:12:40,341 INFO
> [org.ovirt.engine.core.bll.HandleVdsCpuFlagsOrClusterChangedCommand]
> (DefaultQuartzScheduler_Worker-12) [a4459ff] Running command:
> HandleVdsCpuFlagsOrClusterChangedCommand internal: true. Entities
> affected : ID: fab55ebe-cc0f-4f95-87aa-fc3a5e08a5df Type: VDS
> 2015-11-09 19:12:40,383 INFO
> [org.ovirt.engine.core.vdsbroker.vdsbroker.GetHardwareInfoVDSCommand]
> (DefaultQuartzScheduler_Worker-12) [a4459ff] START,
> GetHardwareInfoVDSCommand(HostName = hosted_engine_2, HostId =
> fab55ebe-cc0f-4f95-87aa-fc3a5e08a5df,
> vds=Host[hosted_engine_2,fab55ebe-cc0f-4f95-87aa-fc3a5e08a5df]), log id:
> 3cf70a78
> 2015-11-09 19:12:40,387 INFO
> [org.ovirt.engine.core.vdsbroker.vdsbroker.GetHardwareInfoVDSCommand]
> (DefaultQuartzScheduler_Worker-12) [a4459ff] FINISH,
> GetHardwareInfoVDSCommand, log id: 3cf70a78
> 2015-11-09 19:12:40,407 INFO
> [org.ovirt.engine.core.bll.SetNonOperationalVdsCommand]
> (DefaultQuartzScheduler_Worker-12) [3031cc4e] Running command:
> SetNonOperationalVdsCommand internal: true. Entities affected : ID:
> fab55ebe-cc0f-4f95-87aa-fc3a5e08a5df Type: VDS
> 2015-11-09 19:12:40,408 INFO
> [org.ovirt.engine.core.vdsbroker.SetVdsStatusVDSCommand]
> (DefaultQuartzScheduler_Worker-12) [3031cc4e] START,
> SetVdsStatusVDSCommand(HostName = hosted_engine_2, HostId =
> fab55ebe-cc0f-4f95-87aa-fc3a5e08a5df, status=NonOperational,
> nonOperationalReason=NETWORK_UNREACHABLE, stopSpmFailureLogged=false),
> log id: 17fa8ac0
> 2015-11-09 19:12:40,411 INFO
> [org.ovirt.engine.core.vdsbroker.SetVdsStatusVDSCommand]
> (DefaultQuartzScheduler_Worker-12) [3031cc4e] FINISH,
> SetVdsStatusVDSCommand, log id: 17fa8ac0
> 2015-11-09 19:12:40,413 ERROR
> [org.ovirt.engine.core.bll.SetNonOperationalVdsCommand]
> (DefaultQuartzScheduler_Worker-12) [3031cc4e] Host hosted_engine_2 is
> set to Non-Operational, it is missing the following networks: ovirtmgmt
> 2015-11-09 19:12:40,418 INFO
> [org.ovirt.engine.core.bll.HandleVdsCpuFlagsOrClusterChangedCommand]
> (DefaultQuartzScheduler_Worker-12) [5e83f7aa] Running command:
> HandleVdsCpuFlagsOrClusterChangedCommand internal: true. Entities
> affected : ID: fab55ebe-cc0f-4f95-87aa-fc3a5e08a5df Type: VDS
> 2015-11-09 19:12:40,439 INFO
> [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector]
> (DefaultQuartzScheduler_Worker-12) [5e83f7aa] Correlation ID: null, Call
> Stack: null, Custom Event ID: -1, Message: Status of host
> hosted_engine_2 was set to NonOperational.
> 2015-11-09 19:12:40,443 ERROR
> [org.ovirt.engine.core.vdsbroker.VdsUpdateRunTimeInfo]
> (DefaultQuartzScheduler_Worker-12) [5e83f7aa]
> ResourceManager::refreshVdsRunTimeInfo: Error:
> DataIntegrityViolationException: CallableStatementCallback; SQL [{call
> updatevdsdynamic(?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?,
> ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?,
> ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?)}]; ERROR: value too long for type
> character varying(255)
That's ugly too, not sure why it happened and how reproducible, but
I think you should first take care of the ovirtmgmt network.
Adding Eli anyway.
> Where: SQL statement "UPDATE vds_dynamic
> SET cpu_cores = v_cpu_cores,cpu_threads = v_cpu_threads,
> cpu_model = v_cpu_model,cpu_speed_mh = v_cpu_speed_mh,
> if_total_speed = v_if_total_speed,kvm_enabled = v_kvm_enabled,
> mem_commited = v_mem_commited,physical_mem_mb = v_physical_mem_mb,
> status = v_status,vm_active = v_vm_active,vm_count = v_vm_count,
> vm_migrating = v_vm_migrating,reserved_mem = v_reserved_mem,
> guest_overhead = v_guest_overhead,rpm_version = v_rpm_version,
> software_version = v_software_version,
> version_name = v_version_name,build_name =
> v_build_name,previous_status = v_previous_status,
> cpu_flags = v_cpu_flags,
> vms_cores_count = v_vms_cores_count,pending_vcpus_count =
> v_pending_vcpus_count,
> pending_vmem_size = v_pending_vmem_size,
> cpu_sockets = v_cpu_sockets,net_config_dirty = v_net_config_dirty,
> supported_cluster_levels = v_supported_cluster_levels,
> supported_engines = v_supported_engines,host_os = v_host_os,
> kvm_version = v_kvm_version,libvirt_version =
> v_libvirt_version,spice_version = v_spice_version,
> gluster_version = v_gluster_version,
> kernel_version = v_kernel_version,iscsi_initiator_name =
> v_iscsi_initiator_name,
> transparent_hugepages_state = v_transparent_hugepages_state,
> hooks = v_hooks,
> _update_date = LOCALTIMESTAMP,non_operational_reason =
> v_non_operational_reason,
> hw_manufacturer = v_hw_manufacturer, hw_product_name =
> v_hw_product_name,
> hw_version = v_hw_version, hw_serial_number = v_hw_serial_number,
> hw_uuid = v_hw_uuid, hw_family = v_hw_family, hbas = v_hbas,
> supported_emulated_machines = v_supported_emulated_machines,
> kdump_status = v_kdump_status, selinux_enforce_mode =
> v_selinux_enforce_mode,
> auto_numa_balancing = v_auto_numa_balancing,
> is_numa_supported = v_is_numa_supported,
> supported_rng_sources = v_supported_rng_sources,
> is_live_snapshot_supported = v_is_live_snapshot_supported,
> is_live_merge_supported = v_is_live_merge_supported,
> online_cpus = v_online_cpus
> WHERE vds_id = v_vds_id"
>
>
> The SQL error then seems to repeat several more times.
>
> -- John Florian
Thanks for the report.
Best,
--
John Florian