[ovirt-users] [SOLVED] Re: VMs are not running/booting on one host

Mon Feb 20 09:40:06 UTC 2017

oVirt 4.1. run with the same issue.

So the problem was not oVirt related.

I tried the ELREPO kernel and that solved my issue. Kernel
4.9.11-1.el7.elrepo.x86_64 seems to be working without issues.

Conclusion: We triggered some bug in 3.10.0 kernel, but I was unable to
find out which one. We have 3 same hypervisors, but the problem was only
on the third none.

	Peter

On 16/02/2017 15:35, Peter Hudec wrote:
> memtest run without issues.
> I'm thinking about install on this host latest ovirt and try running the vm.
> It may solve the issues if it's ovirt related and not hw/kvm.
> 
> latest stable version is 4.1.
> 
> 	Peter
> 
> I'm thinking to try lastest oVirt
> On 15/02/2017 21:46, Peter Hudec wrote:
>> On 15/02/2017 21:20, Nir Soffer wrote:
>>> On Wed, Feb 15, 2017 at 10:05 PM, Peter Hudec <phudec at cnc.sk> wrote:
>>>> Hi,
>>>>
>>>> so theproblem is little bit different. When I wait for a long time, the
>>>> VM boots ;(
>>>
>>> Is this an issue only with old vms imported from the old setup, or
>>> also with new vms?
>> I do not have new VMs, so with the OLD one.But I did not import them
>> from old setup.
>> The Host OS upgrade I did by our docs, creating new cluster, host
>> upgrade and vm migrations. There was no outage until now.
>>
>> I tried to install new VM, but the installer hangs on that host.
>>
>>
>>>
>>>>
>>>> But ... /see the log/. I'm invetigating the reason.
>>>> The difference between the dipovirt0{1,2} and the dipovirt03 isthe
>>>> installation time. The first 2 was migrated last week, the last one
>>>> yesterday. There some newer packages, but nothing related to KVM.
>>>>
>>>> [  292.429622] INFO: rcu_sched self-detected stall on CPU { 0}  (t=72280
>>>> jiffies g=393 c=392 q=35)
>>>> [  292.430294] sending NMI to all CPUs:
>>>> [  292.430305] NMI backtrace for cpu 0
>>>> [  292.430309] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 3.16.0-4-amd64
>>>> #1 Debian 3.16.39-1
>>>> [  292.430311] Hardware name: oVirt oVirt Node, BIOS 0.5.1 01/01/2011
>>>> [  292.430313] task: ffffffff8181a460 ti: ffffffff81800000 task.ti:
>>>> ffffffff81800000
>>>> [  292.430315] RIP: 0010:[<ffffffff81052ae6>]  [<ffffffff81052ae6>]
>>>> native_write_msr_safe+0x6/0x10
>>>> [  292.430323] RSP: 0018:ffff88001fc03e08  EFLAGS: 00000046
>>>> [  292.430325] RAX: 0000000000000400 RBX: 0000000000000000 RCX:
>>>> 0000000000000830
>>>> [  292.430326] RDX: 0000000000000000 RSI: 0000000000000400 RDI:
>>>> 0000000000000830
>>>> [  292.430327] RBP: ffffffff818e2a80 R08: ffffffff818e2a80 R09:
>>>> 00000000000001e8
>>>> [  292.430329] R10: 0000000000000000 R11: ffff88001fc03b96 R12:
>>>> 0000000000000000
>>>> [  292.430330] R13: 000000000000a0ea R14: 0000000000000002 R15:
>>>> 0000000000080000
>>>> [  292.430335] FS:  0000000000000000(0000) GS:ffff88001fc00000(0000)
>>>> knlGS:0000000000000000
>>>> [  292.430337] CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
>>>> [  292.430339] CR2: 0000000001801000 CR3: 000000001c6de000 CR4:
>>>> 00000000000006f0
>>>> [  292.430343] Stack:
>>>> [  292.430344]  ffffffff8104b30d 0000000000000002 0000000000000082
>>>> ffff88001fc0d6a0
>>>> [  292.430347]  ffffffff81853800 0000000000000000 ffffffff818e2fe0
>>>> 0000000000000023
>>>> [  292.430349]  ffffffff81853800 ffffffff81047d63 ffff88001fc0d6a0
>>>> ffffffff810c73fa
>>>> [  292.430352] Call Trace:
>>>> [  292.430354]  <IRQ>
>>>>
>>>> [  292.430360]  [<ffffffff8104b30d>] ? __x2apic_send_IPI_mask+0xad/0xe0
>>>> [  292.430365]  [<ffffffff81047d63>] ?
>>>> arch_trigger_all_cpu_backtrace+0xc3/0x140
>>>> [  292.430369]  [<ffffffff810c73fa>] ? rcu_check_callbacks+0x42a/0x670
>>>> [  292.430373]  [<ffffffff8109bb1e>] ? account_process_tick+0xde/0x180
>>>> [  292.430376]  [<ffffffff810d1e00>] ? tick_sched_handle.isra.16+0x60/0x60
>>>> [  292.430381]  [<ffffffff81075fc0>] ? update_process_times+0x40/0x70
>>>> [  292.430404]  [<ffffffff810d1dc0>] ? tick_sched_handle.isra.16+0x20/0x60
>>>> [  292.430407]  [<ffffffff810d1e3c>] ? tick_sched_timer+0x3c/0x60
>>>> [  292.430410]  [<ffffffff8108c6a7>] ? __run_hrtimer+0x67/0x210
>>>> [  292.430412]  [<ffffffff8108caa9>] ? hrtimer_interrupt+0xe9/0x220
>>>> [  292.430416]  [<ffffffff8151dcab>] ? smp_apic_timer_interrupt+0x3b/0x50
>>>> [  292.430420]  [<ffffffff8151bd3d>] ? apic_timer_interrupt+0x6d/0x80
>>>> [  292.430422]  <EOI>
>>>>
>>>> [  292.430425]  [<ffffffff8109b2e5>] ? sched_clock_local+0x15/0x80
>>>> [  292.430428]  [<ffffffff8101da50>] ? mwait_idle+0xa0/0xa0
>>>> [  292.430431]  [<ffffffff81052c22>] ? native_safe_halt+0x2/0x10
>>>> [  292.430434]  [<ffffffff8101da69>] ? default_idle+0x19/0xd0
>>>> [  292.430437]  [<ffffffff810a9b74>] ? cpu_startup_entry+0x374/0x470
>>>> [  292.430440]  [<ffffffff81903076>] ? start_kernel+0x497/0x4a2
>>>> [  292.430442]  [<ffffffff81902a04>] ? set_init_arg+0x4e/0x4e
>>>> [  292.430445]  [<ffffffff81902120>] ? early_idt_handler_array+0x120/0x120
>>>> [  292.430447]  [<ffffffff8190271f>] ? x86_64_start_kernel+0x14d/0x15c
>>>> [  292.430448] Code: c2 48 89 d0 c3 89 f9 0f 32 31 c9 48 c1 e2 20 89 c0
>>>> 89 0e 48 09 c2 48 89 d0 c3 66 66 2e 0f 1f 84 00 00 00 00 00 89 f0 89 f9
>>>> 0f 30 <31> c0 c3 0f 1f 80 00 00 00 00 89 f9 0f 33 48 c1 e2 20 89 c0 48
>>>> [  292.430579] Clocksource tsc unstable (delta = -289118137838 ns)
>>>>
>>>>
>>>> On 15/02/2017 20:39, Peter Hudec wrote:
>>>>> Hi,
>>>>>
>>>>> I did already, but not find any suspicious, see attached logs and the
>>>>> spice screenshot.
>>>>>
>>>>> Actually the VM is booting, but is stacked in some  bad state.
>>>>> When migrating, the migration is sucessfull, but the vm is not acessible
>>>>> /even on network/
>>>>>
>>>>> Right now I found one VM, which is working well.
>>>>>
>>>>> In logs look for diplci01 at 2017-02-15 20:23:00,420, the VM ID is
>>>>> 7ddf349b-fb9a-44f4-9e88-73e84625a44e
>>>>>
>>>>>       thanks
>>>>>               Peter
>>>>>
>>>>> On 15/02/2017 19:40, Nir Soffer wrote:
>>>>>> On Wed, Feb 15, 2017 at 8:11 PM, Peter Hudec <phudec at cnc.sk> wrote:
>>>>>>> Hi,
>>>>>>>
>>>>>>> I'm preparing to migrate from 3.5 to 3.6
>>>>>>> The first step is the CentOS6 -> CentOS7 for hosts.
>>>>>>>
>>>>>>> setup:
>>>>>>>   - 3x hosts /dipovitrt01, dipovirt02, dipovirt03/
>>>>>>>   - 1x hosted engine /on all 3 hosts/
>>>>>>>
>>>>>>> The upgrade of the first 2 hosts was OK, all VM are running OK.
>>>>>>> When I upgraded the 3rd host /dipovirt03/, some  VMs are not able to run
>>>>>>> on the or boot on this host. I tried  to full reinstall the host, but
>>>>>>> wth the same result.
>>>>>>>
>>>>>>> In case of migration the VMm will stop running in a while.
>>>>>>> In case of booting the VM will not boot, I see the 'Loading kernel ...'
>>>>>>>
>>>>>>> Almost all VMS are Debian 8 with guest tools, some Centos 6/7
>>>>>>>
>>>>>>> The hosts were OK with CentOS6.
>>>>>>>
>>>>>>>
>>>>>>> Where should I start to investigate ?
>>>>>>
>>>>>> Sharing vdsm logs showing the failed attempts to run or migrate
>>>>>> a vm would be a good start.
>>>>>>
>>>>>>>
>>>>>>>         best regards
>>>>>>>                 Peter
>>>>>>>
>>>>>>>
>>>>>>> --
>>>>>>> *Peter Hudec*
>>>>>>> Infraštruktúrny architekt
>>>>>>> phudec at cnc.sk <mailto:phudec at cnc.sk>
>>>>>>>
>>>>>>> *CNC, a.s.*
>>>>>>> Borská 6, 841 04 Bratislava
>>>>>>> Recepcia: +421 2 35 000 100
>>>>>>>
>>>>>>> Mobil:+421 905 997 203
>>>>>>> *www.cnc.sk* <http:///www.cnc.sk>
>>>>>>>
>>>>>>> _______________________________________________
>>>>>>> Users mailing list
>>>>>>> Users at ovirt.org
>>>>>>> http://lists.ovirt.org/mailman/listinfo/users
>>>>>
>>>>>
>>>>
>>>>
>>>> --
>>>> *Peter Hudec*
>>>> Infraštruktúrny architekt
>>>> phudec at cnc.sk <mailto:phudec at cnc.sk>
>>>>
>>>> *CNC, a.s.*
>>>> Borská 6, 841 04 Bratislava
>>>> Recepcia: +421 2 35 000 100
>>>>
>>>> Mobil:+421 905 997 203
>>>> *www.cnc.sk* <http:///www.cnc.sk>
>>>>
>>
>>
> 
> 

-- 
*Peter Hudec*
Infraštruktúrny architekt
phudec at cnc.sk <mailto:phudec at cnc.sk>

*CNC, a.s.*
Borská 6, 841 04 Bratislava
Recepcia: +421 2  35 000 100

Mobil:+421 905 997 203
*www.cnc.sk* <http:///www.cnc.sk>