NFS storage and importation

Hello, Our cluster use NFS storage (on each node as data, and on a NAS for export) We currently have a huge problem on our cluster, and export VM as OVA takes lot's of time. I checked on the NFS storage, and it looks like there's files who might be OVA in the images directory, and OVF files in the the master/vms Can someone tell me if, when i install a new engine, there's a way to get this VMs back inside the new engine (like the import tools by example) Thank you a lot for your answers, Regards, Alexis ps : it should be said in the documentation to NEVER use backup of an engine when he is in a NFS storage domain on a node. It looks like it's working, but all the data are unsynch with reality.

On Fri, Nov 23, 2018 at 1:12 PM AG <agmeshs@gmail.com> wrote:
Hello,
Our cluster use NFS storage (on each node as data, and on a NAS for export)
Looks like you invented your own hyperconverge solution. The NFS server on the node will always be accessible on the node, but if it is not accessible from other hosts, the entire DC may go down. Also you probably don't have any redundancy, so failure of single node cause downtime and data loss. Did you consider hyperconverge Gluster based setup? https://ovirt.org/documentation/gluster-hyperconverged/Gluster_Hyperconverge...
We currently have a huge problem on our cluster, and export VM as OVA takes lot's of time.
The huge problem is slow export to OVA or there is another problem?
I checked on the NFS storage, and it looks like there's files who might be OVA in the images directory,
I don't know the ova export code, but I'm sure it does not save in the images directory. It probably creates temporary volume for preparing and storing the exported ova file. Arik, how do you suggest to debug slow export to OVA?
and OVF files in the the master/vms
OVF should be stored in OVF_STORE disks. Maybe you see files created by an old version?
Can someone tell me if, when i install a new engine, there's a way to get this VMs back inside the new engine (like the import tools by example)
What do yo mean by "get this VMs back"? If you mean importing all vms and disks on storage to a new engine, yest, it should work. This is the base for oVirt DR support.
ps : it should be said in the documentation to NEVER use backup of an engine when he is in a NFS storage domain on a node. It looks like it's working, but all the data are unsynch with reality.
Do you mean hosted engine stored on NFS storage domain served by one of the nodes? Can you give more details on this problem? Please also specify oVirt version you use. Nir

(sorry for double post earlier) Le 25/11/2018 à 18:19, Nir Soffer a écrit :>
Hello,
Our cluster use NFS storage (on each node as data, and on a NAS for export)
Looks like you invented your own hyperconverge solution.
Well, when we looks at the network specs for Gluster to works, we thinked initially that was a more reasonable solution. Seems that might be true for VM, but deep wrong for the engine for any case. Regarding the engine, we thinked (once again), that the backup will allow us to rebuild in case of problem. Wrong again, backup won't work with new installation, and even if it's looks like it works on things that haven't changed, it doesn't.
The NFS server on the node will always be accessible on the node, but if it is not accessible from other hosts, the entire DC may go down.
Also you probably don't have any redundancy, so failure of single node cause downtime and data loss.
We have RAID and regular backups, but that's not good enough, of course.
Did you consider hyperconverge Gluster based setup? https://ovirt.org/documentation/gluster-hyperconverged/Gluster_Hyperconverge...
Yes, we will try to rebuild our cluster with that, at last for engine.
We currently have a huge problem on our cluster, and export VM as OVA takes lot's of time.
The huge problem is slow export to OVA or there is another problem?
At this time, we have mostly all data cover by backup or OVA, but nearly all VMs are down and can't be restarted (i explain why at the end of the mail). Some export in OVA failed (probably, but not only), because of of free space on storage. So we have to rebuild a clean cluster without destroying all the data we still have on our disks. (Import domains function should do the trick, i hope)
I checked on the NFS storage, and it looks like there's files who might be OVA in the images directory,
I don't know the ova export code, but I'm sure it does not save in the images directory. It probably creates temporary volume for preparing and storing the exported ova file.
Arik, how do you suggest to debug slow export to OVA? That might not be a bug, some of them make hundred of Gb, it can be "normal". Anyway, files doesn't have the .ova extension (but the size matchs the vms)
and OVF files in the the master/vms
OVF should be stored in OVF_STORE disks. Maybe you see files created by an old version? well, it was file with ovf extension on NFS, but i might be wrong, it's this type of path : c3c17a66-52e8-42dc-9c09-3e667e4c7290/master/vms/0265bf6b-7bbb-44b1-87f6-7cb5552d12c2/0265bf6b-7bbb-44b1-87f6-7cb5552d12c2.ovf but it maybe only on exports domains.
A little after my mail, and like you mention it under, i heard about the "import domain" function in storage domains, which makes me hope my mail was meaningless, i'll try it in a few hours with true vm inside.
Can someone tell me if, when i install a new engine, there's a way to get this VMs back inside the new engine (like the import tools by example)
What do yo mean by "get this VMs back"?
If you mean importing all vms and disks on storage to a new engine, yest, it should work. This is the base for oVirt DR support. Yes, thank you.
ps : it should be said in the documentation to NEVER use backup of an engine when he is in a NFS storage domain on a node. It looks like it's working, but all the data are unsynch with reality.
Do you mean hosted engine stored on NFS storage domain served by one of the nodes? Yes
Can you give more details on this problem?
ovirt version 4.2.7 I'll try to make it short, but it's a weeks worth of stress and wrong decisions. We have build our cluster with a few nodes, but our whole storage are on the nodes (the reason we choose NFS). And we put our engine on one of this node in a NFS share. We had regular backup. I saw someday that the status of this node was detoriated (on nodectl check), and it recommanded us to make a lvs to check. [ Small precision, if needed, the node has a three disks, merge in a physical raid5. The installation of the node was a standard ovirt partitionning except for one thing : we reduced the / part (without size problem, it's more than 100Gb), to make a separate part in xfs to store the vm data, this part have shares with the engine, data (vms) and iso (export is on a NAS). ] When I check with lvs, the data partition was used at 99.97% (!) when the df says 55% (spoiler alert, df was right, but who cares). There's a few days, it wasn't 99.97% but 99.99% (after a log collector, love the irony) and the whole node crash, with engine on it, of course. I restarted the cluster on another node, without too much trouble. Then I looked how to repair the node where the engine was stored. It seems there were no real solution to clean the lvs (if it was what we should have done), so i decided to rebuild the whole node, and to reinject the backup inside the node (it seems it's impossible to move the engine once it's stored). Well, i did that, and it has been hell since. It looks like the old engine, but there's no control other vms, and a few things point on non existing part (like the engine storage, the engine vm). Mostly i have a tons of error "command Get Host Capabilities failed: General SSLEngine problem" on the rebuilded node, but for what i see, the management of the vm is whole gone on all nodes. Worse thing is that i didn't realise only after some of the VMs crashed, without being able to restart them. (I try to virsh, but it might take more time than makes things clean) If i have rebuild a cluster without the backup, I think I might have been able to rebuild the cluster quickly, but I might be wrong again. One last thing, it seems the "lvs vs df" discrepancy things is happening now on other nodes. So, since we can't full glusterify all vms (some of them are more than 0.5 Tb, and network won't be able to follow), we think to keep them in NFS, but on a partition outside of LVM, created after the ovirt node installation (since we don't know exactly what is the problem or how to solve it). If the problem is interessing for some of you, i have still live examples (but maybe not for long, we have to rebuild this thing quickly). And I will welcome of course any advice or tests you might have. Finally, I'm really sorry, of course, for both the question, and not doing things exactly by the books (we have to work with the hardware we have), and more than anything to miss some functions pretty obvious (like the import domain ofc). If i have time later, i'll try to make an article about noob mistakes to not make on oVirt :) Anyway, thank you for your answer.
Please also specify oVirt version you use.
Nir
regards, Alexis Grillon Pôle Humanités Numériques, Outils, Méthodes et Analyse de Données Maison européenne des sciences de l'homme et de la société MESHS - Lille Nord de France / CNRS tel. +33 (0)3 20 12 58 57 | alexis.grillon@meshs.fr www.meshs.fr | 2, rue des Canonniers 59000 Lille GPG fingerprint AC37 4C4B 6308 975B 77D4 772F 214F 1E97 6C08 CD11
participants (3)
-
AG
-
Alexis Grillon
-
Nir Soffer