[Users] oVirt 3.2 - Migration failed due to error: migrateerr
Martin Kletzander
mkletzan at redhat.com
Tue Jul 30 04:01:24 EDT 2013
On 07/29/2013 06:06 PM, Nicholas Kesick wrote:
>
>
>> Date: Mon, 29 Jul 2013 09:56:30 +0200
>> From: mkletzan at redhat.com
>> To: danken at redhat.com
>> CC: cybertimber2000 at hotmail.com; users at ovirt.org
>> Subject: Re: [Users] oVirt 3.2 - Migration failed due to error: migrateerr
>>
>> On 07/27/2013 09:50 PM, Dan Kenigsberg wrote:
>>> On Fri, Jul 26, 2013 at 02:03:28PM -0400, Nicholas Kesick wrote:
>>>>> Date: Fri, 26 Jul 2013 05:52:44 +0300
>>>>> From: iheim at redhat.com
>>>>> To: cybertimber2000 at hotmail.com
>>>>> CC: danken at redhat.com; users at ovirt.org
>>>>> Subject: Re: [Users] oVirt 3.2 - Migration failed due to error: migrateerr
>>>>>
>>>>> On 07/26/2013 05:40 AM, Nicholas Kesick wrote:
>>>>>>
>>>>>> Replies inline.
>>>>>> > Date: Thu, 25 Jul 2013 22:27:17 +0300
>>>>>> > From: danken at redhat.com
>>>>>> > To: cybertimber2000 at hotmail.com
>>>>>> > CC: users at ovirt.org
>>>>>> > Subject: Re: [Users] oVirt 3.2 - Migration failed due to error:
>>>>>> migrateerr
>>>>>> >
>>>>>> > On Thu, Jul 25, 2013 at 11:54:40AM -0400, Nicholas Kesick wrote:
>>>>>> > > When I try to migrate a VM, any VM, between my two hosts, I receive
>>>>>> an error that says Migration failed due to error: migrateerr. Looking in
>>>>>> the log I don't see any thing that jumps out other than the final message
>>>>>> > >
>>>>>> > > VDSGenericException: VDSErrorException: Failed to MigrateStatusVDS,
>>>>>> error = Fatal error during migration
>>>>>> > >
>>>>>> > > Ovirt-engine is version 3.2.2-1.1.fc18.noarch, firewalld is
>>>>>> disabled, and selinux is permissive.
>>>>>> >
>>>>>> > Please do not say this in public, you're hurting Dan Walsh's feelings ;-)
>>>>>> >
>>>>>> I recall seeing his blog posts, and I agree. Not sure when I set it to
>>>>>> permissive... maybe to get the 3.2 install w/ Firewalld setup to
>>>>>> complete? I remember that was fixed in 3.2.1. I'll set it back to enforcing.
>>>>>> > >
>>>>>> > > ovirt-node version is 2.6.1 on both hosts.
>>>>>> > >
>>>>>> > > Any suggestions would be welcome!
>>>>>> > >
>>>>>> >
>>>>>> > I'd love to see /etc/vdsm/vdsm.log from source and destination. The
>>>>>> > intersting parts start with vmMigrate at the source and with
>>>>>> > vmMigrationCreate at the destination.
>>>>>> Hmm, I probably should have pulled that sooner. So, I cleared the active
>>>>>> VDSM (while nothing was running) and libvirtd.log, booted one vm, and
>>>>>> tried to migrate it. Attached are the logs. It looks like it boils down
>>>>>> to (from the source):
>>>>>> Traceback (most recent call last):
>>>>>> File "/usr/share/vdsm/vm.py", line 271, in run
>>>>>> File "/usr/share/vdsm/libvirtvm.py", line 505, in
>>>>>> _startUnderlyingMigration
>>>>>> File "/usr/share/vdsm/libvirtvm.py", line 541, in f
>>>>>> File "/usr/lib64/python2.7/site-packages/vdsm/libvirtconnection.py",
>>>>>> line 111, in wrapper
>>>>>> File "/usr/lib64/python2.7/site-packages/libvirt.py", line 1178, in
>>>>>> migrateToURI2
>>>>>> libvirtError: internal error Attempt to migrate guest to the same host
>>>>>> localhost
>>>>>> Does this mean my UUIDs are the same?
>>>>>> http://vaunaspada.babel.it/blog/?p=613
>>>>>> As far as the destination, I'm really not understanding what's going on
>>>>>> on the destination between "Destination VM creation succeeded" and
>>>>>> ":destroy Called" that would lead to it failing, except for what's after
>>>>>> the traceback:
>>>>>> Traceback (most recent call last):
>>>>>> File "/usr/share/vdsm/vm.py", line 696, in _startUnderlyingVm
>>>>>> File "/usr/share/vdsm/libvirtvm.py", line 1907, in
>>>>>> _waitForIncomingMigrationFinish
>>>>>> File "/usr/lib64/python2.7/site-packages/vdsm/libvirtconnection.py",
>>>>>> line 111, in wrapper
>>>>>> File "/usr/lib64/python2.7/site-packages/libvirt.py", line 2822, in
>>>>>> lookupByUUIDString
>>>>>> libvirtError: Domain not found: no domain with matching uuid
>>>>>> '50171e1b-cf21-41d8-80f3-88ab1b980091'
>>>>>> But that is the ID of the VM by the looks of it.
>>>>>> Sorry Itamar, nothing was written to libvirtd.log after I cleared it.
>>>
>>> It could be that libvirtd is still writing to the files that you removed
>>> from the filesystem. To make sure libvirtd writes to your new file,
>>> restart the service. There may be clues there on why libvirt thinks that
>>> the source and destination are one and the same.
>>>
>>
>> When clearing the logs, it should be enough to do '>
>> /path/to/libvirtd.log' (in bash).
>> Just checked and it seems some things were logged in there during my testing on Friday. I'll attach those.
>>>>>
>>>>> Thread-800::ERROR::2013-07-26 01:57:16,198::vm::198::vm.Vm::(_recover)
>>>>> vmId=`50171e1b-cf21-41d8-80f3-88ab1b980091`::internal error Attempt to
>>>>> migrate guest to the same host localhost
>>>>> Thread-800::ERROR::2013-07-26 01:57:16,377::vm::286::vm.Vm::(run)
>>>>> vmId=`50171e1b-cf21-41d8-80f3-88ab1b980091`::Failed to migrate
>>>>> Traceback (most recent call last):
>>>>> File "/usr/share/vdsm/vm.py", line 271, in run
>>>>> File "/usr/share/vdsm/libvirtvm.py", line 505, in
>>>>> _startUnderlyingMigration
>>>>> File "/usr/share/vdsm/libvirtvm.py", line 541, in f
>>>>> File "/usr/lib64/python2.7/site-packages/vdsm/libvirtconnection.py",
>>>>> line 111, in wrapper
>>>>> File "/usr/lib64/python2.7/site-packages/libvirt.py", line 1178, in
>>>>> migrateToURI2
>>>>> libvirtError: internal error Attempt to migrate guest to the same host
>>>>> localhost
>>>>>
>>>>> what are your hostnames?
>>>>
>>>> "host001" on 192.168.0.103 and "host002" on 192.168.0.104
>>>> Even tried changing it, no luck.
>>>>
>>
>> Are they resolving properly on those hosts? Is there a DNS or
>> /etc/hosts entry related to this?
>> There are /etc/hosts entries on both hosts to each other, and a "ping host001" "ping host002" resolves correctly.I do however note that the terminal session says root at localhost. I wonder if running hostnamectl set-hostname {name} will fix anything. ...and after running hostnamectl set-hostname {name}, migrations are working! I think maybe I found a bug:With the node in maintenance mode, from the oVirt Node (SSH or local) if you go to Network, change the hostname ({newname}), and then go down to the configured System NIC and press enter, it says it is Setting Hostname. Now, if you press F2, the terminal will show root@{newname}. If you reboot however, under network it will say {newname}, but pressing F2 for the terminal will show root at localhost. If it's localhost, it won't migrate.So, it looks like the hostname isn't getting written persistantly. Even a hostnamectl set-hostname {name} gets lost on reboot. Or am I doing something wrong?
>>>> Could it be because the oVirt Node - Network tab - does not have any DNS servers specified?
>>>
>>> I do not think so. We do not see "name resolution" errors, or name
>>> resolutions at all.
>>>
No name resolution errors, but the name resolution is the problem.
Looking at the logs, the hostname source libvirtd sends to the
destination is "localhost", which means it's unable to properly resolve
it's hostname. You should properly set up hostname resolving. It's
good that hostnames are resolved to ip addresses properly on both hosts,
but when 'gethostname()' or 'hostname' (in shell) gives you 'localhost',
the daemon can't get it's hostname properly. You can probably check
that out also by running:
python2 -c 'import socket; print socket.gethostname()'
This should return the proper hostname of the machine.
Martin
More information about the Users
mailing list