[ovirt-users] Fwd: Engine crash, storage won't activate, hosts won't shutdown, template locked, gpu passthrough failed

Mon Oct 9 12:21:03 UTC 2017

On Sat, Sep 30, 2017 at 7:50 PM M R <gr8nextmail at gmail.com> wrote:

> Hello!
>
> I have been using Ovirt for last four weeks, testing and trying to get
> things working.
>
> I have collected here the problems I have found and this might be a bit
> long but help to any of these or maybe to all of them from several people
> would be wonderful.
>
> My version is ovirt node 4.1.5 and 4.1.6 downloaded from website latest
> stable release at the time. Also tested with CentOS minimal +ovirt repo. In
> this case, 3. is solved, but other problems persist.
>
>
> 1. Power off host
> First day after installing ovirt node, it was able to reboot and shutdown
> clean. No problems at all. After few days of using ovir, I have noticed
> that hosts are unable to shutdown. I have tested this in several different
> ways and come to the following conclusion. IF engine has not been started
> after boot, all hosts are able to shutdown clean. But if engine is started
> even once, none of the hosts are able to shutdown anymore. The only way to
> get power off is to unplug or press power button for a longer time as hard
> reset. I have failed to find a way to have the engine running and then
> shutdown host. This effects to all hosts in the cluster.
>
> 2. Glusterfs failed
> Every time I have booted hosts, glusterfs has failed. For some reason, it
> turns inactive state even if I have setup systemctl enable glusterd. Before
> this command it was just inactive. After this command, it will say "failed
> (inactive). There is still a way to get glusterfs working. I have to give
> command systemctl start glusterd manually and everything starts working.
> Why do I have to give manual commands to start glusterfs? I have used this
> for CentOS before and never had this problem before. Node installer is that
> much different from the CentOS core?
>

You need to fix this first, since this may cause the issues you report
later about
storage domains not activating.

You should ask about this in gluster mailing list.
Adding Sahina to advice on this.

There is a problem where glusterd fails to start  on nodes where glusterd
tries to come up even before network is up. You can find more info at bug
https://bugzilla.redhat.com/show_bug.cgi?id=1472267

>
> 3. Epel
> As I said that I have used CentOS before, I would like to able to install
> some packets from repo. But even if I install epel-release, it won't find
> packets such as nano or htop. I have read about how to add epel-release to
> ovirt node from here: https://www.ovirt.org/release/4.1.1/#epel
> I have tested even manually edit repolist, but it will fail to find normal
> epel packets. I have setup additional exclude=collectd* as guided in the
> link above. This doesn't make any difference. All being said I am able to
> install manually packets which are downloaded with other CentOS machine and
> transferred with scp to ovirt node. Still, this once again needs a lot of
> manual input and is just a workaround for the bug.
>
> 4.  Engine startup
> When I try to start the engine when glusterfs is up, it will say vm
> doesn't exist, starting up. Still, it won't startup automatically. I have
> to give several times command hosted-engine --vm-start. I wait for about
> 5minutes until I give it next time. This will take usually about 30minutes
> and then randomly. Completely randomly after one of the times, I give this
> command engine shoots up and is up in 1minute. This has happened every time
> I boot up. And the times that I have to give a command to start the engine,
> has been changing. At best it's been 3rd time at worst it has been 7th
> time. Calculating from there it might take from 15minutes to 35minutes to
> get the engine up.Nevertheless, it will eventually come up every time. If
> there is a way to get it up on the first try or even better, automatically
> up, it would be great.
>
> 5. Activate storage
> Once the engine is up, there has been a problem with storage. When I go to
> storage tab, it will show all sources red. Even if I wait for 15~20minutes,
> it won't get storage green itself. I have to go and press active button
> from main data storage. Then it will get main storage up in
> 2~3munutes.Sometimes it fails it once, but will definitely get main data
> storage up on the seconds try. And then magically at the same time all
> other storages instantly go green. Main storage is glusterfs and I have 3
> NFS storages as well. This is only a problem when starting up and once
> storages are on green they stay green. Still annoying that it cannot get it
> done by itself.
>

This may be related to the glusterfs issue (2).

When the nodes gets rebooted, gluster bricks goes down. If more than one
brick in the volume goes down quorum is lost and storage domain associated
goes to inactive state.

 Once two or more bricks are up and brick status gets synced to UI, storage
domains should automatically turn to active state and no issues should  be
seen there.

Does it happen after you fix the glusterfs issue? If you can reproduce
this, please
file a bug and attach logs showing the time you tried to activate the
storage, until
it was actually activated.

> 6.Template locked
> I try to create a template from existing VM and it resulted in original VM
> going into locked state and template being locked. I have read that some
> other people had a similar problem and they were suggested to restart
> engine to see if it solves it. For me it has been now a week and several
> restarts of engine and hosts, but there is still one VM locked and template
> locked as well. This is not a big problem, but still a problem. Everything
> is grey and cannot delete this bugged VM or template.
>

 Can you check if you have active tasks on your SPM?

You can get the tasks using using:

    vdsm-client Host getAllTasks

Please share the output.

If there are no tasks on the SPM, this may be stale tasks in engine
database,
Adding Daniel to advice on detecting and fixing this.

To understand why it happened, we need engine log from the time you
started to clone the vm, until now.

Nir

> 7. unable to use GPU
> I have been trying to do GPU passthrough with my VM. First, there was a
> problem with qemu cmd line, but once I figure out a way to get commands, it
> maybe is working(?). Log shows up fine, but it still doesn't give
> functionality I¨m looking for. As I mentioned in the other email that I
> have found this: https://www.mail-archive.com/
> users at ovirt.org/msg40422.html . It will give right syntax in log, but
> still, won't fix error 43 with nvidia drivers. If anybody got this working
> or has ideas how to do it, would really like to know how it's done
> properly.  I have also tested with AMD graphics cards such as vega, but as
> soon as drivers have installed, I will get a black screen. Even if I
> restart VM or hosts or both. I will only see black screen and unable to use
> VM at all. I might be able to live with the other six things listed above,
> but this one is a bit of a problem for me. My use of VMs will eventually
> need graphical performance and therefore I will have to get this working or
> find an alternative to ovirt..I have found several things that I really
> like in ovirt and would prefer to use it.
>

> Best regards
> Mikko
>
>
>
> <https://www.avast.com/sig-email?utm_medium=email&utm_source=link&utm_campaign=sig-email&utm_content=webmail> Ei
> viruksia. www.avast.com
> <https://www.avast.com/sig-email?utm_medium=email&utm_source=link&utm_campaign=sig-email&utm_content=webmail>
> <#m_-6481394895467598762_m_4272815435791771971_DAB4FAD8-2DD7-40BB-A1B8-4E2AA1F9FDF2>
> _______________________________________________
> Users mailing list
> Users at ovirt.org
> http://lists.ovirt.org/mailman/listinfo/users
>

_______________________________________________
Users mailing list
Users at ovirt.org
http://lists.ovirt.org/mailman/listinfo/users
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.ovirt.org/pipermail/users/attachments/20171009/6f780ef1/attachment.html>