[Users] How to rescue storage domain structure

List, I have lost the ability to manage the hosts or VM's using ovirt engine web interface. The data center is offline, and I can't actually perform any operations with the hosts or VM's. I don't think that there are any actions I can perform in the web interface at all. What's odd is that I can tell the host to go into maintenance mode using the ovirt-engine web interface and it seems to go into maintenance mode. It even shows the wrench icon next to the host. I can also try and activate it after it susposedly goes into maintenance mode, and It states that the host was activated, but the host never actually comes up or contends for SPM status, and the data center never comes online.
From the logs it seems that at least PKI is broken between the engine and the hosts as I see numerous certificate errors on both the ovirt-engine and clients.
vdsm.log shows: Traceback (most recent call last): File "/usr/lib64/python2.7/SocketServer.py", line 582, in process_request_thread self.finish_request(request, client_address) File "/usr/lib/python2.7/site-packages/vdsm/SecureXMLRPCServer.py", line 66, in finish_request request.do_handshake() File "/usr/lib64/python2.7/ssl.py", line 305, in do_handshake self._sslobj.do_handshake() SSLError: [Errno 1] _ssl.c:504: error:14094416:SSL routines:SSL3_READ_BYTES:sslv3 alert certificate unknown and engine.log shows: 2013-04-18 18:42:43,632 ERROR [org.ovirt.engine.core. engineencryptutils.EncryptionUtils] (QuartzScheduler_Worker-68) Failed to decryptData must start with zero 2013-04-18 18:42:43,642 ERROR [org.ovirt.engine.core.vdsbroker.vdsbroker.VdsBrokerCommand] (QuartzScheduler_Worker-68) XML RPC error in command Alon Bar-Lev was able to offer several good pointers in another thread titled "Certificates and PKI seem to be broken after yum update" and eventually concluded that the installation seems to be corrupted more than just the certificates, truststore, and keystore, and suggested that I start a new thread to ask about how to rescue the storage domain structure. The storage used for the data center is ISCSI, which is intact and working. In fact 2 of the VM's are still online and running on one of the original FC17 hosts systems. I'm not able to reinstall any of the existing hosts from the ovirt-engine web interface. I attempted to reinstall one of the hosts (not the SPM) which failed. I also tried to bring up a new, third host and add it to the cluster. I setup another Fedora 17 box up and tried to add it to the cluster, but it states that there are no available servers in the cluster to probe the new host. This is a test environment that I would like to fix, but I'm also willing to just run engine cleanup and start over. That said, there are 3 VM's that I would like to keep. Two are online and running, and I'm able to see them with virsh on that host. I was wondering about using virsh to backup these vm's. The third VM exists in the database, and was set to run on the host that I attempted to reinstall, but that VM isn't running, and when I use virsh on it's host, virsh can't seem to find it, when I perform the list commands, and I can't start it with virsh <vm-name> What is the best way to proceed? It seems like it would be easier to export the VM's using virsh from the host that they run on if possible, then update ovirt to the latest version, recreate everything and then import the VM's back in to the new environment. Will this work? Is there a procedure I can follow to do this? Here's some additional information about the installed ovirt packages on the ovirt-engine [root@reliant yum.repos.d]# yum list installed | grep ovirt ovirt-engine.noarch 3.1.0-4.fc17 @ovirt-stable ovirt-engine-backend.noarch 3.1.0-4.fc17 @ovirt-stable ovirt-engine-cli.noarch 3.2.0.5-1.fc17 @updates ovirt-engine-config.noarch 3.1.0-4.fc17 @ovirt-stable ovirt-engine-dbscripts.noarch 3.1.0-4.fc17 @ovirt-stable ovirt-engine-genericapi.noarch 3.1.0-4.fc17 @ovirt-stable ovirt-engine-notification- service.noarch 3.1.0-4.fc17 @ovirt-stable ovirt-engine-restapi.noarch 3.1.0-4.fc17 @ovirt-stable ovirt-engine-sdk.noarch 3.2.0.2-1.fc17 @updates ovirt-engine-setup.noarch 3.1.0-4.fc17 @ovirt-stable ovirt-engine-tools-common.noarch 3.1.0-4.fc17 @ovirt-stable ovirt-engine-userportal.noarch 3.1.0-4.fc17 @ovirt-stable ovirt-engine-webadmin-portal.noarch 3.1.0-4.fc17 @ovirt-stable ovirt-image-uploader.noarch 3.1.0-0.git9c42c8.fc17 @ovirt-stable ovirt-iso-uploader.noarch 3.1.0-0.git1841d9.fc17 @ovirt-stable ovirt-log-collector.noarch 3.1.0-0.git10d719.fc17 @ovirt-stable ovirt-release-fedora.noarch 4-2 @/ovirt-release-fedora.noarch

List,
I have lost the ability to manage the hosts or VM's using ovirt engine web interface. The data center is offline, and I can't actually perform any operations with the hosts or VM's. I don't think that there are any actions I can perform in the web interface at all.
What's odd is that I can tell the host to go into maintenance mode using the ovirt-engine web interface and it seems to go into maintenance mode. It even shows the wrench icon next to the host. I can also try and activate it after it susposedly goes into maintenance mode, and It states that the host was activated, but the host never actually comes up or contends for SPM status, and the data center never comes online.
From the logs it seems that at least PKI is broken between the engine and the hosts as I see numerous certificate errors on both the ovirt-engine and clients.
vdsm.log shows:
Traceback (most recent call last): File "/usr/lib64/python2.7/SocketServer.py", line 582, in process_request_thread self.finish_request(request, client_address) File "/usr/lib/python2.7/site-packages/vdsm/SecureXMLRPCServer.py", line 66, in finish_request request.do_handshake() File "/usr/lib64/python2.7/ssl.py", line 305, in do_handshake self._sslobj.do_handshake() SSLError: [Errno 1] _ssl.c:504: error:14094416:SSL routines:SSL3_READ_BYTES:sslv3 alert certificate unknown
and engine.log shows:
2013-04-18 18:42:43,632 ERROR [org.ovirt.engine.core. engineencryptutils.EncryptionUtils] (QuartzScheduler_Worker-68) Failed to decryptData must start with zero 2013-04-18 18:42:43,642 ERROR [org.ovirt.engine.core.vdsbroker.vdsbroker.VdsBrokerCommand] (QuartzScheduler_Worker-68) XML RPC error in command
Alon Bar-Lev was able to offer several good pointers in another thread titled "Certificates and PKI seem to be broken after yum update" and eventually concluded that the installation seems to be corrupted more than just the certificates, truststore, and keystore, and suggested that I start a new thread to ask about how to rescue the storage domain structure.
The storage used for the data center is ISCSI, which is intact and working. In fact 2 of the VM's are still online and running on one of the original FC17 hosts systems.
I'm not able to reinstall any of the existing hosts from the ovirt-engine web interface. I attempted to reinstall one of the hosts (not the SPM) which failed.
I also tried to bring up a new, third host and add it to the cluster. I setup another Fedora 17 box up and tried to add it to the cluster, but it states that there are no available servers in the cluster to probe the new host.
This is a test environment that I would like to fix, but I'm also willing to just run engine cleanup and start over.
That said, there are 3 VM's that I would like to keep. Two are online and running, and I'm able to see them with virsh on that host. I was wondering about using virsh to backup these vm's.
The third VM exists in the database, and was set to run on the host that I attempted to reinstall, but that VM isn't running, and when I use virsh on it's host, virsh can't seem to find it, when I perform the list commands, and I can't start it with virsh <vm-name>
What is the best way to proceed? It seems like it would be easier to export the VM's using virsh from the host that they run on if possible, then update ovirt to the latest version, recreate everything and then import the VM's back in to the new environment.
Will this work? Is there a procedure I can follow to do this?
Here's some additional information about the installed ovirt packages on the ovirt-engine
[ If you want a backup of the currently running hosts you can use fsarchiver. There is a statically linked version consisting of one executable on the website of fsarchiver and you can use options to overrule the fact that you're backing up a live system. You can't shutdown the VM's, I think, and then do an export to an export domain since you don't have a master storage domain thats why the above workaround with fsarchiver. You can ofcourse use you're favourite backup
Chris Smith wrote: programme. Joop -- irc: jvandewege

Does this work with volume groups? I have several virtual disks presented to the VM which are part of a volume group. [root@voyager media]# fsarchiver probe [======DISK======] [=============NAME==============] [====SIZE====] [MAJ] [MIN] [vda ] [ ] [ 40.00 GB] [252] [ 0] [vdb ] [ ] [ 100.00 GB] [252] [ 16] [vdc ] [ ] [ 20.00 GB] [252] [ 32] [vdd ] [ ] [ 40.00 GB] [252] [ 48] [=====DEVICE=====] [==FILESYS==] [======LABEL======] [====SIZE====] [MAJ] [MIN] [vda1 ] [ext4 ] [<unknown> ] [ 500.00 MB] [252] [ 1] [vda2 ] [LVM2_member] [<unknown> ] [ 39.51 GB] [252] [ 2] [vdb1 ] [LVM2_member] [<unknown> ] [ 100.00 GB] [252] [ 17] [vdc1 ] [LVM2_member] [<unknown> ] [ 20.00 GB] [252] [ 33] [dm-0 ] [ext4 ] [<unknown> ] [ 4.00 GB] [253] [ 0] [dm-1 ] [swap ] [<unknown> ] [ 3.94 GB] [253] [ 1] [dm-2 ] [ext4 ] [<unknown> ] [ 119.99 GB] [253] [ 2] [dm-3 ] [ext4 ] [<unknown> ] [ 15.00 GB] [253] [ 3] [dm-4 ] [ext4 ] [<unknown> ] [ 8.00 GB] [253] [ 4] [dm-5 ] [ext4 ] [<unknown> ] [ 8.00 GB] [253] [ 5] I'm thinking that during restore, I can just re-create the volume groups and logical volumes and then restore each file system backup to that logical volume. Or better yet, since I know how much space I'm actually using, just create one logical volume of the right size. I kept adding virtual disks as needed to store repos in /var/satellite I also want to verify the syntax I'm using: fsarchiver savefs -Aa -e "/mnt/media/*" -j 2 /mnt/media/voyager/boot.fsa /dev/vda1 Seemed to work fine for backing up /boot Are there any other recommended options I should be using for backing up live file systems mounted read / write? I've also stopped all of the spacewalk services and other services on the VM in order to minimize open files being skipped, etc. Volume group structure. [root@voyager media]# pvdisplay --- Physical volume --- PV Name /dev/vdb1 VG Name satellite PV Size 100.00 GiB / not usable 3.00 MiB Allocatable yes (but full) PE Size 4.00 MiB Total PE 25599 Free PE 0 Allocated PE 25599 PV UUID g3uGGu-p0b3-eSIJ-Bwy7-YOTD-GKnd-prWP7a --- Physical volume --- PV Name /dev/vdc1 VG Name satellite PV Size 20.00 GiB / not usable 3.89 MiB Allocatable yes (but full) PE Size 4.00 MiB Total PE 5119 Free PE 0 Allocated PE 5119 PV UUID W35GYr-T6pg-3e0o-s8I7-aqtc-fxcD-Emh62K --- Physical volume --- PV Name /dev/vda2 VG Name vg_voyager PV Size 39.51 GiB / not usable 3.00 MiB Allocatable yes PE Size 4.00 MiB Total PE 10114 Free PE 146 Allocated PE 9968 PV UUID hJCdct-iR6Q-NPYi-eBZN-dZdP-x4YP-U1zyvE [root@voyager media]# vgdisplay --- Volume group --- VG Name satellite System ID Format lvm2 Metadata Areas 2 Metadata Sequence No 4 VG Access read/write VG Status resizable MAX LV 0 Cur LV 1 Open LV 1 Max PV 0 Cur PV 2 Act PV 2 VG Size 119.99 GiB PE Size 4.00 MiB Total PE 30718 Alloc PE / Size 30718 / 119.99 GiB Free PE / Size 0 / 0 VG UUID fXvCp3-N0uG-rBRc-FWVJ-Kpv3-AH9L-1PnYUy --- Volume group --- VG Name vg_voyager System ID Format lvm2 Metadata Areas 1 Metadata Sequence No 9 VG Access read/write VG Status resizable MAX LV 0 Cur LV 5 Open LV 5 Max PV 0 Cur PV 1 Act PV 1 VG Size 39.51 GiB PE Size 4.00 MiB Total PE 10114 Alloc PE / Size 9968 / 38.94 GiB Free PE / Size 146 / 584.00 MiB VG UUID 3txqia-eDtn-j5wn-iixS-gfpv-90b9-ButDqh [root@voyager media]# lvdisplay --- Logical volume --- LV Path /dev/satellite/lv_packages LV Name lv_packages VG Name satellite LV UUID 03VUWu-bxGf-hG2b-c3cx-m3lu-7Dlp-iaiWzu LV Write Access read/write LV Creation host, time voyager, 2012-11-11 12:53:54 -0500 LV Status available # open 1 LV Size 119.99 GiB Current LE 30718 Segments 2 Allocation inherit Read ahead sectors auto - currently set to 256 Block device 253:2 --- Logical volume --- LV Path /dev/vg_voyager/lv_var LV Name lv_var VG Name vg_voyager LV UUID serQHO-uSog-ci5m-Xx7B-AElf-GTqi-HYCRY6 LV Write Access read/write LV Creation host, time voyager, 2012-11-11 01:55:01 -0500 LV Status available # open 1 LV Size 15.00 GiB Current LE 3840 Segments 2 Allocation inherit Read ahead sectors auto - currently set to 256 Block device 253:3 --- Logical volume --- LV Path /dev/vg_voyager/lv_root LV Name lv_root VG Name vg_voyager LV UUID Kc43IB-5EWZ-N05E-FrN5-NgcQ-kWTv-LxfSig LV Write Access read/write LV Creation host, time voyager, 2012-11-11 01:55:05 -0500 LV Status available # open 1 LV Size 4.00 GiB Current LE 1024 Segments 1 Allocation inherit Read ahead sectors auto - currently set to 256 Block device 253:0 --- Logical volume --- LV Path /dev/vg_voyager/lv_home LV Name lv_home VG Name vg_voyager LV UUID F7aJrw-FqwN-yML2-7bbX-kcuQ-12pX-1QG8Gp LV Write Access read/write LV Creation host, time voyager, 2012-11-11 01:55:09 -0500 LV Status available # open 1 LV Size 8.00 GiB Current LE 2048 Segments 1 Allocation inherit Read ahead sectors auto - currently set to 256 Block device 253:4 --- Logical volume --- LV Path /dev/vg_voyager/lv_swap LV Name lv_swap VG Name vg_voyager LV UUID S5uYT4-Q3x4-3icm-SEFW-yZVW-DhLl-vSkcLc LV Write Access read/write LV Creation host, time voyager, 2012-11-11 01:55:14 -0500 LV Status available # open 1 LV Size 3.94 GiB Current LE 1008 Segments 1 Allocation inherit Read ahead sectors auto - currently set to 256 Block device 253:1 --- Logical volume --- LV Path /dev/vg_voyager/lv_tmp LV Name lv_tmp VG Name vg_voyager LV UUID 2QeEXe-7zpq-0yLV-NT0u-9ZgY-mk8w-30n2Nz LV Write Access read/write LV Creation host, time voyager, 2012-11-11 01:55:14 -0500 LV Status available # open 1 LV Size 8.00 GiB Current LE 2048 Segments 1 Allocation inherit Read ahead sectors auto - currently set to 256 Block device 253:5 On Mon, Apr 22, 2013 at 3:53 PM, Joop <jvdwege@xs4all.nl> wrote:
Chris Smith wrote:
List,
I have lost the ability to manage the hosts or VM's using ovirt engine web interface. The data center is offline, and I can't actually perform any operations with the hosts or VM's. I don't think that there are any actions I can perform in the web interface at all.
What's odd is that I can tell the host to go into maintenance mode using the ovirt-engine web interface and it seems to go into maintenance mode. It even shows the wrench icon next to the host. I can also try and activate it after it susposedly goes into maintenance mode, and It states that the host was activated, but the host never actually comes up or contends for SPM status, and the data center never comes online.
From the logs it seems that at least PKI is broken between the engine and the hosts as I see numerous certificate errors on both the ovirt-engine and clients.
vdsm.log shows:
Traceback (most recent call last): File "/usr/lib64/python2.7/SocketServer.py", line 582, in process_request_thread self.finish_request(request, client_address) File "/usr/lib/python2.7/site-packages/vdsm/SecureXMLRPCServer.py", line 66, in finish_request request.do_handshake() File "/usr/lib64/python2.7/ssl.py", line 305, in do_handshake self._sslobj.do_handshake() SSLError: [Errno 1] _ssl.c:504: error:14094416:SSL routines:SSL3_READ_BYTES:sslv3 alert certificate unknown
and engine.log shows:
2013-04-18 18:42:43,632 ERROR [org.ovirt.engine.core. engineencryptutils.EncryptionUtils] (QuartzScheduler_Worker-68) Failed to decryptData must start with zero 2013-04-18 18:42:43,642 ERROR [org.ovirt.engine.core.vdsbroker.vdsbroker.VdsBrokerCommand] (QuartzScheduler_Worker-68) XML RPC error in command
Alon Bar-Lev was able to offer several good pointers in another thread titled "Certificates and PKI seem to be broken after yum update" and eventually concluded that the installation seems to be corrupted more than just the certificates, truststore, and keystore, and suggested that I start a new thread to ask about how to rescue the storage domain structure.
The storage used for the data center is ISCSI, which is intact and working. In fact 2 of the VM's are still online and running on one of the original FC17 hosts systems.
I'm not able to reinstall any of the existing hosts from the ovirt-engine web interface. I attempted to reinstall one of the hosts (not the SPM) which failed.
I also tried to bring up a new, third host and add it to the cluster. I setup another Fedora 17 box up and tried to add it to the cluster, but it states that there are no available servers in the cluster to probe the new host.
This is a test environment that I would like to fix, but I'm also willing to just run engine cleanup and start over.
That said, there are 3 VM's that I would like to keep. Two are online and running, and I'm able to see them with virsh on that host. I was wondering about using virsh to backup these vm's.
The third VM exists in the database, and was set to run on the host that I attempted to reinstall, but that VM isn't running, and when I use virsh on it's host, virsh can't seem to find it, when I perform the list commands, and I can't start it with virsh <vm-name>
What is the best way to proceed? It seems like it would be easier to export the VM's using virsh from the host that they run on if possible, then update ovirt to the latest version, recreate everything and then import the VM's back in to the new environment.
Will this work? Is there a procedure I can follow to do this?
Here's some additional information about the installed ovirt packages on the ovirt-engine
[
If you want a backup of the currently running hosts you can use fsarchiver. There is a statically linked version consisting of one executable on the website of fsarchiver and you can use options to overrule the fact that you're backing up a live system. You can't shutdown the VM's, I think, and then do an export to an export domain since you don't have a master storage domain thats why the above workaround with fsarchiver. You can ofcourse use you're favourite backup programme.
Joop
-- irc: jvandewege

On 04/22/2013 08:23 PM, Chris Smith wrote:
List,
I have lost the ability to manage the hosts or VM's using ovirt engine web interface. The data center is offline, and I can't actually perform any operations with the hosts or VM's. I don't think that there are any actions I can perform in the web interface at all.
What's odd is that I can tell the host to go into maintenance mode using the ovirt-engine web interface and it seems to go into maintenance mode. It even shows the wrench icon next to the host. I can also try and activate it after it susposedly goes into maintenance mode, and It states that the host was activated, but the host never actually comes up or contends for SPM status, and the data center never comes online.
From the logs it seems that at least PKI is broken between the engine and the hosts as I see numerous certificate errors on both the ovirt-engine and clients.
vdsm.log shows:
Traceback (most recent call last): File "/usr/lib64/python2.7/SocketServer.py", line 582, in process_request_thread self.finish_request(request, client_address) File "/usr/lib/python2.7/site-packages/vdsm/SecureXMLRPCServer.py", line 66, in finish_request request.do_handshake() File "/usr/lib64/python2.7/ssl.py", line 305, in do_handshake self._sslobj.do_handshake() SSLError: [Errno 1] _ssl.c:504: error:14094416:SSL routines:SSL3_READ_BYTES:sslv3 alert certificate unknown
and engine.log shows:
2013-04-18 18:42:43,632 ERROR [org.ovirt.engine.core. engineencryptutils.EncryptionUtils] (QuartzScheduler_Worker-68) Failed to decryptData must start with zero 2013-04-18 18:42:43,642 ERROR [org.ovirt.engine.core.vdsbroker.vdsbroker.VdsBrokerCommand] (QuartzScheduler_Worker-68) XML RPC error in command
Alon Bar-Lev was able to offer several good pointers in another thread titled "Certificates and PKI seem to be broken after yum update" and eventually concluded that the installation seems to be corrupted more than just the certificates, truststore, and keystore, and suggested that I start a new thread to ask about how to rescue the storage domain structure.
The storage used for the data center is ISCSI, which is intact and working. In fact 2 of the VM's are still online and running on one of the original FC17 hosts systems.
I'm not able to reinstall any of the existing hosts from the ovirt-engine web interface. I attempted to reinstall one of the hosts (not the SPM) which failed.
I also tried to bring up a new, third host and add it to the cluster. I setup another Fedora 17 box up and tried to add it to the cluster, but it states that there are no available servers in the cluster to probe the new host.
This is a test environment that I would like to fix, but I'm also willing to just run engine cleanup and start over.
That said, there are 3 VM's that I would like to keep. Two are online and running, and I'm able to see them with virsh on that host. I was wondering about using virsh to backup these vm's.
The third VM exists in the database, and was set to run on the host that I attempted to reinstall, but that VM isn't running, and when I use virsh on it's host, virsh can't seem to find it, when I perform the list commands, and I can't start it with virsh <vm-name>
What is the best way to proceed? It seems like it would be easier to export the VM's using virsh from the host that they run on if possible, then update ovirt to the latest version, recreate everything and then import the VM's back in to the new environment.
Will this work? Is there a procedure I can follow to do this?
Here's some additional information about the installed ovirt packages on the ovirt-engine
[root@reliant yum.repos.d]# yum list installed | grep ovirt ovirt-engine.noarch 3.1.0-4.fc17 @ovirt-stable ovirt-engine-backend.noarch 3.1.0-4.fc17 @ovirt-stable ovirt-engine-cli.noarch 3.2.0.5-1.fc17 @updates ovirt-engine-config.noarch 3.1.0-4.fc17 @ovirt-stable ovirt-engine-dbscripts.noarch 3.1.0-4.fc17 @ovirt-stable ovirt-engine-genericapi.noarch 3.1.0-4.fc17 @ovirt-stable ovirt-engine-notification- service.noarch 3.1.0-4.fc17 @ovirt-stable ovirt-engine-restapi.noarch 3.1.0-4.fc17 @ovirt-stable ovirt-engine-sdk.noarch 3.2.0.2-1.fc17 @updates ovirt-engine-setup.noarch 3.1.0-4.fc17 @ovirt-stable ovirt-engine-tools-common.noarch 3.1.0-4.fc17 @ovirt-stable ovirt-engine-userportal.noarch 3.1.0-4.fc17 @ovirt-stable ovirt-engine-webadmin-portal.noarch 3.1.0-4.fc17 @ovirt-stable ovirt-image-uploader.noarch 3.1.0-0.git9c42c8.fc17 @ovirt-stable ovirt-iso-uploader.noarch 3.1.0-0.git1841d9.fc17 @ovirt-stable ovirt-log-collector.noarch 3.1.0-0.git10d719.fc17 @ovirt-stable ovirt-release-fedora.noarch 4-2 @/ovirt-release-fedora.noarch _______________________________________________ Users mailing list Users@ovirt.org http://lists.ovirt.org/mailman/listinfo/users
what type of storage domain is this (nfs, iscsi, etc.)? you can also try backing up the db, re-install engine, restore the db, then try to re-install the hosts.

This is a multi-part message in MIME format. --------------000201030700090801040205 Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit dont forget to backup the cert(s) too... On 04/27/2013 06:23 AM, Itamar Heim wrote:
On 04/22/2013 08:23 PM, Chris Smith wrote:
List,
I have lost the ability to manage the hosts or VM's using ovirt engine web interface. The data center is offline, and I can't actually perform any operations with the hosts or VM's. I don't think that there are any actions I can perform in the web interface at all.
What's odd is that I can tell the host to go into maintenance mode using the ovirt-engine web interface and it seems to go into maintenance mode. It even shows the wrench icon next to the host. I can also try and activate it after it susposedly goes into maintenance mode, and It states that the host was activated, but the host never actually comes up or contends for SPM status, and the data center never comes online.
From the logs it seems that at least PKI is broken between the engine and the hosts as I see numerous certificate errors on both the ovirt-engine and clients.
vdsm.log shows:
Traceback (most recent call last): File "/usr/lib64/python2.7/SocketServer.py", line 582, in process_request_thread self.finish_request(request, client_address) File "/usr/lib/python2.7/site-packages/vdsm/SecureXMLRPCServer.py", line 66, in finish_request request.do_handshake() File "/usr/lib64/python2.7/ssl.py", line 305, in do_handshake self._sslobj.do_handshake() SSLError: [Errno 1] _ssl.c:504: error:14094416:SSL routines:SSL3_READ_BYTES:sslv3 alert certificate unknown
and engine.log shows:
2013-04-18 18:42:43,632 ERROR [org.ovirt.engine.core. engineencryptutils.EncryptionUtils] (QuartzScheduler_Worker-68) Failed to decryptData must start with zero 2013-04-18 18:42:43,642 ERROR [org.ovirt.engine.core.vdsbroker.vdsbroker.VdsBrokerCommand] (QuartzScheduler_Worker-68) XML RPC error in command
Alon Bar-Lev was able to offer several good pointers in another thread titled "Certificates and PKI seem to be broken after yum update" and eventually concluded that the installation seems to be corrupted more than just the certificates, truststore, and keystore, and suggested that I start a new thread to ask about how to rescue the storage domain structure.
The storage used for the data center is ISCSI, which is intact and working. In fact 2 of the VM's are still online and running on one of the original FC17 hosts systems.
I'm not able to reinstall any of the existing hosts from the ovirt-engine web interface. I attempted to reinstall one of the hosts (not the SPM) which failed.
I also tried to bring up a new, third host and add it to the cluster. I setup another Fedora 17 box up and tried to add it to the cluster, but it states that there are no available servers in the cluster to probe the new host.
This is a test environment that I would like to fix, but I'm also willing to just run engine cleanup and start over.
That said, there are 3 VM's that I would like to keep. Two are online and running, and I'm able to see them with virsh on that host. I was wondering about using virsh to backup these vm's.
The third VM exists in the database, and was set to run on the host that I attempted to reinstall, but that VM isn't running, and when I use virsh on it's host, virsh can't seem to find it, when I perform the list commands, and I can't start it with virsh <vm-name>
What is the best way to proceed? It seems like it would be easier to export the VM's using virsh from the host that they run on if possible, then update ovirt to the latest version, recreate everything and then import the VM's back in to the new environment.
Will this work? Is there a procedure I can follow to do this?
Here's some additional information about the installed ovirt packages on the ovirt-engine
[root@reliant yum.repos.d]# yum list installed | grep ovirt ovirt-engine.noarch 3.1.0-4.fc17 @ovirt-stable ovirt-engine-backend.noarch 3.1.0-4.fc17 @ovirt-stable ovirt-engine-cli.noarch 3.2.0.5-1.fc17 @updates ovirt-engine-config.noarch 3.1.0-4.fc17 @ovirt-stable ovirt-engine-dbscripts.noarch 3.1.0-4.fc17 @ovirt-stable ovirt-engine-genericapi.noarch 3.1.0-4.fc17 @ovirt-stable ovirt-engine-notification- service.noarch 3.1.0-4.fc17 @ovirt-stable ovirt-engine-restapi.noarch 3.1.0-4.fc17 @ovirt-stable ovirt-engine-sdk.noarch 3.2.0.2-1.fc17 @updates ovirt-engine-setup.noarch 3.1.0-4.fc17 @ovirt-stable ovirt-engine-tools-common.noarch 3.1.0-4.fc17 @ovirt-stable ovirt-engine-userportal.noarch 3.1.0-4.fc17 @ovirt-stable ovirt-engine-webadmin-portal.noarch 3.1.0-4.fc17 @ovirt-stable ovirt-image-uploader.noarch 3.1.0-0.git9c42c8.fc17 @ovirt-stable ovirt-iso-uploader.noarch 3.1.0-0.git1841d9.fc17 @ovirt-stable ovirt-log-collector.noarch 3.1.0-0.git10d719.fc17 @ovirt-stable ovirt-release-fedora.noarch 4-2 @/ovirt-release-fedora.noarch _______________________________________________ Users mailing list Users@ovirt.org http://lists.ovirt.org/mailman/listinfo/users
what type of storage domain is this (nfs, iscsi, etc.)? you can also try backing up the db, re-install engine, restore the db, then try to re-install the hosts.
_______________________________________________ Users mailing list Users@ovirt.org http://lists.ovirt.org/mailman/listinfo/users
--------------000201030700090801040205 Content-Type: text/html; charset=ISO-8859-1 Content-Transfer-Encoding: 7bit <html> <head> <meta content="text/html; charset=ISO-8859-1" http-equiv="Content-Type"> </head> <body bgcolor="#FFFFFF" text="#000000"> <font size="-1"><font face="Tahoma">dont forget to backup the cert(s) too... </font></font><br> <br> On 04/27/2013 06:23 AM, Itamar Heim wrote: <blockquote cite="mid:517B60D8.6020107@redhat.com" type="cite">On 04/22/2013 08:23 PM, Chris Smith wrote: <br> <blockquote type="cite">List, <br> <br> I have lost the ability to manage the hosts or VM's using ovirt engine <br> web interface. The data center is offline, and I <br> can't actually perform any operations with the hosts or VM's. I don't <br> think that there <br> are any actions I can perform in the web interface at all. <br> <br> What's odd is that I can tell the host to go into maintenance mode <br> using the ovirt-engine web interface and it seems to go into <br> maintenance mode. It even shows the wrench icon next to the host. I <br> can also try and activate it after it susposedly goes into maintenance <br> mode, and It states that the host was activated, but the host never <br> actually comes up or contends for SPM status, and the data center <br> never comes online. <br> <br> From the logs it seems that at least PKI is broken between the engine <br> and the hosts as I see numerous certificate errors on both the <br> ovirt-engine and clients. <br> <br> vdsm.log shows: <br> <br> Traceback (most recent call last): <br> File "/usr/lib64/python2.7/SocketServer.py", line 582, in <br> process_request_thread <br> self.finish_request(request, client_address) <br> File "/usr/lib/python2.7/site-packages/vdsm/SecureXMLRPCServer.py", <br> line 66, in finish_request <br> request.do_handshake() <br> File "/usr/lib64/python2.7/ssl.py", line 305, in do_handshake <br> self._sslobj.do_handshake() <br> SSLError: [Errno 1] _ssl.c:504: error:14094416:SSL <br> routines:SSL3_READ_BYTES:sslv3 alert certificate unknown <br> <br> and engine.log shows: <br> <br> 2013-04-18 18:42:43,632 ERROR <br> [org.ovirt.engine.core. <br> engineencryptutils.EncryptionUtils] <br> (QuartzScheduler_Worker-68) Failed to decryptData must start with zero <br> 2013-04-18 18:42:43,642 ERROR <br> [org.ovirt.engine.core.vdsbroker.vdsbroker.VdsBrokerCommand] <br> (QuartzScheduler_Worker-68) XML RPC error in command <br> <br> <br> Alon Bar-Lev was able to offer several good pointers in another thread <br> titled "Certificates and PKI seem to be broken after yum update" and <br> eventually concluded that the installation seems to be corrupted more <br> than just the certificates, truststore, and keystore, and suggested <br> that I start a new thread to ask about how to rescue the storage <br> domain structure. <br> <br> The storage used for the data center is ISCSI, which is intact and <br> working. In fact 2 of the VM's are still online and running on one of <br> the original FC17 hosts systems. <br> <br> I'm not able to reinstall any of the existing hosts from the ovirt-engine web <br> interface. I attempted to reinstall one of the hosts (not the SPM) <br> which failed. <br> <br> I also tried to bring up a new, third host and add it to the cluster. <br> I setup another Fedora 17 box up and tried to add it to the <br> cluster, but it states that there are no available servers in the <br> cluster to probe the new host. <br> <br> This is a test environment that I would like to fix, but I'm also <br> willing to just run engine cleanup and start over. <br> <br> That said, there are 3 VM's that I would like to keep. Two are online <br> and running, and I'm able to see them with virsh on that host. I was <br> wondering about using virsh to backup these vm's. <br> <br> The third VM exists in the database, and was set to run on the host <br> that I attempted to reinstall, but that VM isn't running, and when I <br> use virsh on it's host, virsh can't seem to find it, when I perform <br> the list commands, and I can't start it with virsh <vm-name> <br> <br> What is the best way to proceed? It seems like it would be easier to <br> export the VM's using virsh from the host that they run on if <br> possible, then update ovirt to the latest version, recreate everything <br> and then import the VM's back in to the new environment. <br> <br> Will this work? Is there a procedure I can follow to do this? <br> <br> Here's some additional information about the installed ovirt packages <br> on the ovirt-engine <br> <br> [root@reliant yum.repos.d]# yum list installed | grep ovirt <br> ovirt-engine.noarch 3.1.0-4.fc17 <br> @ovirt-stable <br> ovirt-engine-backend.noarch 3.1.0-4.fc17 <br> @ovirt-stable <br> ovirt-engine-cli.noarch 3.2.0.5-1.fc17 @updates <br> ovirt-engine-config.noarch 3.1.0-4.fc17 <br> @ovirt-stable <br> ovirt-engine-dbscripts.noarch 3.1.0-4.fc17 <br> @ovirt-stable <br> ovirt-engine-genericapi.noarch 3.1.0-4.fc17 <br> @ovirt-stable <br> ovirt-engine-notification- <br> service.noarch <br> 3.1.0-4.fc17 <br> @ovirt-stable <br> ovirt-engine-restapi.noarch 3.1.0-4.fc17 <br> @ovirt-stable <br> ovirt-engine-sdk.noarch 3.2.0.2-1.fc17 @updates <br> ovirt-engine-setup.noarch 3.1.0-4.fc17 <br> @ovirt-stable <br> ovirt-engine-tools-common.noarch 3.1.0-4.fc17 <br> @ovirt-stable <br> ovirt-engine-userportal.noarch 3.1.0-4.fc17 <br> @ovirt-stable <br> ovirt-engine-webadmin-portal.noarch 3.1.0-4.fc17 <br> @ovirt-stable <br> ovirt-image-uploader.noarch 3.1.0-0.git9c42c8.fc17 <br> @ovirt-stable <br> ovirt-iso-uploader.noarch 3.1.0-0.git1841d9.fc17 <br> @ovirt-stable <br> ovirt-log-collector.noarch 3.1.0-0.git10d719.fc17 <br> @ovirt-stable <br> ovirt-release-fedora.noarch 4-2 <br> @/ovirt-release-fedora.noarch <br> _______________________________________________ <br> Users mailing list <br> <a class="moz-txt-link-abbreviated" href="mailto:Users@ovirt.org">Users@ovirt.org</a> <br> <a class="moz-txt-link-freetext" href="http://lists.ovirt.org/mailman/listinfo/users">http://lists.ovirt.org/mailman/listinfo/users</a> <br> <br> </blockquote> <br> what type of storage domain is this (nfs, iscsi, etc.)? <br> you can also try backing up the db, re-install engine, restore the db, then try to re-install the hosts. <br> <br> _______________________________________________ <br> Users mailing list <br> <a class="moz-txt-link-abbreviated" href="mailto:Users@ovirt.org">Users@ovirt.org</a> <br> <a class="moz-txt-link-freetext" href="http://lists.ovirt.org/mailman/listinfo/users">http://lists.ovirt.org/mailman/listinfo/users</a> <br> </blockquote> </body> </html> --------------000201030700090801040205--
participants (4)
-
Alex Leonhardt
-
Chris Smith
-
Itamar Heim
-
Joop