Hi Ivan,
Thanks for the in depth reply.
I've only seen this happen twice, and only after I added a third host
to the HA cluster. I wonder if that's the root problem.
It shouldn't be if
a shared storage that vm is residing on is accessible
by a third node in the cluster.
Have you seen this happen on all your installs or only just after your
manual migration? It's a little frustrating this is happening as I was
hoping to get this into a production environment. It was all working
except that log message :(
Just after manual migration, then things went all to ...
My strong
recommendation is not to use self hosted engine feature for production
purposes untill the mentioned bug is resolved. But it would really help
to hear someone from the dev team on this one.
Thanks,
Andrew
On Fri, Jun 6, 2014 at 3:20 PM, combuster <combuster(a)archlinux.us> wrote:
> Hi Andrew,
>
> this is something that I saw in my logs too, first on one node and then on
> the other three. When that happend on all four of them, engine was corrupted
> beyond repair.
>
> First of all, I think that message is saying that sanlock can't get a lock
> on the shared storage that you defined for the hostedengine during
> installation. I got this error when I've tried to manually migrate the
> hosted engine. There is an unresolved bug there and I think it's related to
> this one:
>
> [Bug 1093366 - Migration of hosted-engine vm put target host score to zero]
>
https://bugzilla.redhat.com/show_bug.cgi?id=1093366
>
> This is a blocker bug (or should be) for the selfhostedengine and, from my
> own experience with it, shouldn't be used in the production enviroment (not
> untill it's fixed).
>
> Nothing that I've done couldn't fix the fact that the score for the target
> node was Zero, tried to reinstall the node, reboot the node, restarted
> several services, tailed a tons of logs etc but to no avail. When only one
> node was left (that was actually running the hosted engine), I brought the
> engine's vm down gracefully (hosted-engine --vm-shutdown I belive) and after
> that, when I've tried to start the vm - it wouldn't load. Running VNC showed
> that the filesystem inside the vm was corrupted and when I ran fsck and
> finally started up - it was too badly damaged. I succeded to start the
> engine itself (after repairing postgresql service that wouldn't want to
> start) but the database was damaged enough and acted pretty weird (showed
> that storage domains were down but the vm's were running fine etc). Lucky
> me, I had already exported all of the VM's on the first sign of trouble and
> then installed ovirt-engine on the dedicated server and attached the export
> domain.
>
> So while really a usefull feature, and it's working (for the most part ie,
> automatic migration works), manually migrating VM with the hosted-engine
> will lead to troubles.
>
> I hope that my experience with it, will be of use to you. It happened to me
> two weeks ago, ovirt-engine was current (3.4.1) and there was no fix
> available.
>
> Regards,
>
> Ivan
>
> On 06/06/2014 05:12 AM, Andrew Lau wrote:
>
> Hi,
>
> I'm seeing this weird message in my engine log
>
> 2014-06-06 03:06:09,380 INFO
> [org.ovirt.engine.core.vdsbroker.VdsUpdateRunTimeInfo]
> (DefaultQuartzScheduler_Worker-79) RefreshVmList vm id
> 85d4cfb9-f063-4c7c-a9f8-2b74f5f7afa5 status = WaitForLaunch on vds
> ov-hv2-2a-08-23 ignoring it in the refresh until migration is done
> 2014-06-06 03:06:12,494 INFO
> [org.ovirt.engine.core.vdsbroker.vdsbroker.DestroyVDSCommand]
> (DefaultQuartzScheduler_Worker-89) START, DestroyVDSCommand(HostName =
> ov-hv2-2a-08-23, HostId = c04c62be-5d34-4e73-bd26-26f805b2dc60,
> vmId=85d4cfb9-f063-4c7c-a9f8-2b74f5f7afa5, force=false,
> secondsToWait=0, gracefully=false), log id: 62a9d4c1
> 2014-06-06 03:06:12,561 INFO
> [org.ovirt.engine.core.vdsbroker.vdsbroker.DestroyVDSCommand]
> (DefaultQuartzScheduler_Worker-89) FINISH, DestroyVDSCommand, log id:
> 62a9d4c1
> 2014-06-06 03:06:12,652 INFO
> [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector]
> (DefaultQuartzScheduler_
> Worker-89) Correlation ID: null, Call Stack:
> null, Custom Event ID: -1, Message: VM HostedEngine is down. Exit
> message: internal error Failed to acquire lock: error -243.
>
> It also appears to occur on the other hosts in the cluster, except the
> host which is running the hosted-engine. So right now 3 servers, it
> shows up twice in the engine UI.
>
> The engine VM continues to run peacefully, without any issues on the
> host which doesn't have that error.
>
> Any ideas?
> _______________________________________________
> Users mailing list
> Users(a)ovirt.org
>
http://lists.ovirt.org/mailman/listinfo/users
>
>