On 03/27/2013 10:38 AM, Nicolas Ecarnot wrote:
> Le 26/03/2013 12:17, Nicolas Ecarnot a écrit :
>> Le 25/03/2013 12:10, Nicolas Ecarnot a écrit :
>>> Le 24/03/2013 09:53, Dafna Ron a écrit :
>>>> is the vm preallocated or thin provision disk type?
>>>
>>> This VM has 3 disks :
>>> - first disk to host the windows system : Thin provision
>>> - second disk to store some data : Preallocated
>>> - third disk to store some more data : Thin provision
>>>
>>> I'm realizing that amongst the 15 VMs, only this one and another one
>>> that is stopped are using preallocated disks.
>>> I'm regularly migrating some VMs (and stopping and starting and playing
>>> with them) with no issue, and they all are using thin provisioned
>>> disks!
>>>
>>> Could this be a common factor of the problem?
>>>
>>>>
>>>> also, can you please attach engine, vdsm, libvirt and the vm's qemu
>>>> logs?
>>>
>>> Relevant logs :
>>>
>>> ############
>>>
>>> Ok, I'm in the process of collecting the logs and posting them in a
>>> useable manner.
>>>
>>> More to come.
>>
>> Ok, once again, I ran a test and observed the relevant logs.
>> I tried to isolate the time frames, but it may be long for vdsm.log
>>
>> Here they are :
>> * /var/log/libvirt/qemu/serv-chk-adm3.log
>>
http://pastebin.com/JVKMSmxD
>> * /var/log/libvirtd.log
>>
http://pastebin.com/sWGDCqNh
>> * /var/log/vdsm/vdsm.log (the BIG one)
>>
http://pastebin.com/bevTEhym
>>
>> What I can add to help you help me, is that :
>> - I saw that all my VM appear as tainted. I did not know what that meant
>> (but RTFMed since), and this does not appear to disturb the other VMs
>> - Many VMs including the problematic one have been imported from
>> ovirt-v2v with now such issue.
>> - This particular VM was also imported, but the starting point was a
>> vmdk or ova single file.
>> - Two additionnal data disks were added
>> - As I said, this is the only running VM stored as pre allocated.
>>
>> Regards,
>>
>
> One suggestion : I see no obvious errors in the log files. Could this
> paused state happen due to a VM's kernel panic?
>
is this still relevant?
It is!
Further investigations from my colleague shown the following facts :
- This VM has 3 disks. Only one of those disks is responsible for the
problem
- In this disk, my coworker has found only 3 files (database files) that
he can do nothing with without leading to the freeze.
- He tried to cat them into /dev/null, and this is leading to the freeze
- He tried to copy them into another disk -> freeze!
We see absolutely no evidence of a kernel panic.
Rather, this seems to be related to a network bottleneck between the
node and the iSCSI SAN, leading to oVirt unable to sustain a sufficent
bandwidth and freezing the VM.
Since then, we moved to another solution, but for the sake of opensource
debugging, we did kept the faulty VM for your eyes only :)
--
Nicolas Ecarnot