> Date: Mon, 29 Jul 2013 09:56:30 +0200
> From: mkletzan(a)redhat.com
> To: danken(a)redhat.com
> CC: cybertimber2000(a)hotmail.com; users(a)ovirt.org
> Subject: Re: [Users] oVirt 3.2 - Migration failed due to error: migrateerr
>
> On 07/27/2013 09:50 PM, Dan Kenigsberg wrote:
>> On Fri, Jul 26, 2013 at 02:03:28PM -0400, Nicholas Kesick wrote:
>>>> Date: Fri, 26 Jul 2013 05:52:44 +0300
>>>> From: iheim(a)redhat.com
>>>> To: cybertimber2000(a)hotmail.com
>>>> CC: danken(a)redhat.com; users(a)ovirt.org
>>>> Subject: Re: [Users] oVirt 3.2 - Migration failed due to error:
migrateerr
>>>>
>>>> On 07/26/2013 05:40 AM, Nicholas Kesick wrote:
>>>>>
>>>>> Replies inline.
>>>>> > Date: Thu, 25 Jul 2013 22:27:17 +0300
>>>>> > From: danken(a)redhat.com
>>>>> > To: cybertimber2000(a)hotmail.com
>>>>> > CC: users(a)ovirt.org
>>>>> > Subject: Re: [Users] oVirt 3.2 - Migration failed due to
error:
>>>>> migrateerr
>>>>> >
>>>>> > On Thu, Jul 25, 2013 at 11:54:40AM -0400, Nicholas Kesick
wrote:
>>>>> > > When I try to migrate a VM, any VM, between my two hosts,
I receive
>>>>> an error that says Migration failed due to error: migrateerr. Looking
in
>>>>> the log I don't see any thing that jumps out other than the final
message
>>>>> > >
>>>>> > > VDSGenericException: VDSErrorException: Failed to
MigrateStatusVDS,
>>>>> error = Fatal error during migration
>>>>> > >
>>>>> > > Ovirt-engine is version 3.2.2-1.1.fc18.noarch, firewalld
is
>>>>> disabled, and selinux is permissive.
>>>>> >
>>>>> > Please do not say this in public, you're hurting Dan
Walsh's feelings ;-)
>>>>> >
>>>>> I recall seeing his blog posts, and I agree. Not sure when I set it
to
>>>>> permissive... maybe to get the 3.2 install w/ Firewalld setup to
>>>>> complete? I remember that was fixed in 3.2.1. I'll set it back to
enforcing.
>>>>> > >
>>>>> > > ovirt-node version is 2.6.1 on both hosts.
>>>>> > >
>>>>> > > Any suggestions would be welcome!
>>>>> > >
>>>>> >
>>>>> > I'd love to see /etc/vdsm/vdsm.log from source and
destination. The
>>>>> > intersting parts start with vmMigrate at the source and with
>>>>> > vmMigrationCreate at the destination.
>>>>> Hmm, I probably should have pulled that sooner. So, I cleared the
active
>>>>> VDSM (while nothing was running) and libvirtd.log, booted one vm,
and
>>>>> tried to migrate it. Attached are the logs. It looks like it boils
down
>>>>> to (from the source):
>>>>> Traceback (most recent call last):
>>>>> File "/usr/share/vdsm/vm.py", line 271, in run
>>>>> File "/usr/share/vdsm/libvirtvm.py", line 505, in
>>>>> _startUnderlyingMigration
>>>>> File "/usr/share/vdsm/libvirtvm.py", line 541, in f
>>>>> File
"/usr/lib64/python2.7/site-packages/vdsm/libvirtconnection.py",
>>>>> line 111, in wrapper
>>>>> File "/usr/lib64/python2.7/site-packages/libvirt.py",
line 1178, in
>>>>> migrateToURI2
>>>>> libvirtError: internal error Attempt to migrate guest to the same
host
>>>>> localhost
>>>>> Does this mean my UUIDs are the same?
>>>>>
http://vaunaspada.babel.it/blog/?p=613
>>>>> As far as the destination, I'm really not understanding
what's going on
>>>>> on the destination between "Destination VM creation
succeeded" and
>>>>> ":destroy Called" that would lead to it failing, except for
what's after
>>>>> the traceback:
>>>>> Traceback (most recent call last):
>>>>> File "/usr/share/vdsm/vm.py", line 696, in
_startUnderlyingVm
>>>>> File "/usr/share/vdsm/libvirtvm.py", line 1907, in
>>>>> _waitForIncomingMigrationFinish
>>>>> File
"/usr/lib64/python2.7/site-packages/vdsm/libvirtconnection.py",
>>>>> line 111, in wrapper
>>>>> File "/usr/lib64/python2.7/site-packages/libvirt.py",
line 2822, in
>>>>> lookupByUUIDString
>>>>> libvirtError: Domain not found: no domain with matching uuid
>>>>> '50171e1b-cf21-41d8-80f3-88ab1b980091'
>>>>> But that is the ID of the VM by the looks of it.
>>>>> Sorry Itamar, nothing was written to libvirtd.log after I cleared
it.
>>
>> It could be that libvirtd is still writing to the files that you removed
>> from the filesystem. To make sure libvirtd writes to your new file,
>> restart the service. There may be clues there on why libvirt thinks that
>> the source and destination are one and the same.
>>
>
> When clearing the logs, it should be enough to do '>
> /path/to/libvirtd.log' (in bash).
> Just checked and it seems some things were logged in there during my testing on
Friday. I'll attach those.
>>>>
>>>> Thread-800::ERROR::2013-07-26 01:57:16,198::vm::198::vm.Vm::(_recover)
>>>> vmId=`50171e1b-cf21-41d8-80f3-88ab1b980091`::internal error Attempt to
>>>> migrate guest to the same host localhost
>>>> Thread-800::ERROR::2013-07-26 01:57:16,377::vm::286::vm.Vm::(run)
>>>> vmId=`50171e1b-cf21-41d8-80f3-88ab1b980091`::Failed to migrate
>>>> Traceback (most recent call last):
>>>> File "/usr/share/vdsm/vm.py", line 271, in run
>>>> File "/usr/share/vdsm/libvirtvm.py", line 505, in
>>>> _startUnderlyingMigration
>>>> File "/usr/share/vdsm/libvirtvm.py", line 541, in f
>>>> File
"/usr/lib64/python2.7/site-packages/vdsm/libvirtconnection.py",
>>>> line 111, in wrapper
>>>> File "/usr/lib64/python2.7/site-packages/libvirt.py", line
1178, in
>>>> migrateToURI2
>>>> libvirtError: internal error Attempt to migrate guest to the same host
>>>> localhost
>>>>
>>>> what are your hostnames?
>>>
>>> "host001" on 192.168.0.103 and "host002" on
192.168.0.104
>>> Even tried changing it, no luck.
>>>
>
> Are they resolving properly on those hosts? Is there a DNS or
> /etc/hosts entry related to this?
> There are /etc/hosts entries on both hosts to each other, and a "ping
host001" "ping host002" resolves correctly.I do however note that the
terminal session says root@localhost. I wonder if running hostnamectl set-hostname {name}
will fix anything. ...and after running hostnamectl set-hostname {name}, migrations are
working! I think maybe I found a bug:With the node in maintenance mode, from the oVirt
Node (SSH or local) if you go to Network, change the hostname ({newname}), and then go
down to the configured System NIC and press enter, it says it is Setting Hostname. Now, if
you press F2, the terminal will show root@{newname}. If you reboot however, under network
it will say {newname}, but pressing F2 for the terminal will show root@localhost. If
it's localhost, it won't migrate.So, it looks like the hostname isn't getting
written persistantly. Even a hostnamectl set-hostname {name} gets lost on reboot. Or am I
doing something wrong?
>>> Could it be because the oVirt Node - Network tab - does not have any DNS
servers specified?
>>
>> I do not think so. We do not see "name resolution" errors, or name
>> resolutions at all.
>>
No name resolution errors, but the name resolution is the problem.
Looking at the logs, the hostname source libvirtd sends to the
destination is "localhost", which means it's unable to properly resolve
it's hostname. You should properly set up hostname resolving. It's
good that hostnames are resolved to ip addresses properly on both hosts,
but when 'gethostname()' or 'hostname' (in shell) gives you
'localhost',
the daemon can't get it's hostname properly. You can probably check
that out also by running:
python2 -c 'import socket; print socket.gethostname()'
This should return the proper hostname of the machine.
Martin