[Users] How to rescue storage domain structure

22 Apr 2013

      List,

I have lost the ability to manage the hosts or VM's using ovirt engine
web interface.  The data center is offline, and I
can't actually perform any operations with the hosts or VM's.  I don't
think that there
are any actions I can perform in the web interface at all.

What's odd is that I can tell the host to go into maintenance mode
using the ovirt-engine web interface and it seems to go into
maintenance mode.  It even shows the wrench icon next to the host.  I
can also try and activate it after it susposedly goes into maintenance
mode, and It states that the host was activated, but the host never
actually comes up or contends for SPM status, and the data center
never comes online.
...
From the logs it seems that at least PKI is broken between the engine
and the hosts as I see numerous certificate errors on both the
ovirt-engine and clients.
vdsm.log shows:

Traceback (most recent call last):
  File "/usr/lib64/python2.7/SocketServer.py", line 582, in
process_request_thread
    self.finish_request(request, client_address)
  File "/usr/lib/python2.7/site-packages/vdsm/SecureXMLRPCServer.py",
line 66, in finish_request
    request.do_handshake()
  File "/usr/lib64/python2.7/ssl.py", line 305, in do_handshake
    self._sslobj.do_handshake()
SSLError: [Errno 1] _ssl.c:504: error:14094416:SSL
routines:SSL3_READ_BYTES:sslv3 alert certificate unknown

and engine.log shows:

2013-04-18 18:42:43,632 ERROR
[org.ovirt.engine.core.
engineencryptutils.EncryptionUtils]
(QuartzScheduler_Worker-68) Failed to decryptData must start with zero
2013-04-18 18:42:43,642 ERROR
[org.ovirt.engine.core.vdsbroker.vdsbroker.VdsBrokerCommand]
(QuartzScheduler_Worker-68) XML RPC error in command

Alon Bar-Lev was able to offer several good pointers in another thread
titled "Certificates and PKI seem to be broken after yum update" and
eventually concluded that the installation seems to be corrupted more
than just the certificates, truststore, and keystore, and suggested
that I start a new thread to ask about how to rescue the storage
domain structure.

The storage used for the data center is ISCSI, which is intact and
working.  In fact 2 of the VM's are still online and running on one of
the original FC17 hosts systems.

I'm not able to reinstall any of the existing hosts from the ovirt-engine web
interface.  I attempted to reinstall one of the hosts (not the SPM)
which failed.

I also tried to bring up a new, third host and add it to the cluster.
I setup another Fedora 17 box up and tried to add it to the
cluster, but it states that there are no available servers in the
cluster to probe the new host.

This is a test environment that I would like to fix, but I'm also
willing to just run engine cleanup and start over.

That said, there are 3 VM's that I would like to keep.  Two are online
and running, and I'm able to see them with virsh on that host.  I was
wondering about using virsh to backup these vm's.

The third VM exists in the database, and was set to run on the host
that I attempted to reinstall, but that VM isn't running, and when I
use virsh on it's host, virsh can't seem to find it, when I perform
the list commands, and I can't start it with virsh <vm-name>

What is the best way to proceed?  It seems like it would be easier to
export the VM's using virsh from the host that they run on if
possible, then update ovirt to the latest version, recreate everything
and then import the VM's back in to the new environment.

Will this work?  Is there a procedure I can follow to do this?

Here's some additional information about the installed ovirt packages
on the ovirt-engine

[root@reliant yum.repos.d]# yum list installed | grep ovirt
ovirt-engine.noarch                    3.1.0-4.fc17
 @ovirt-stable
ovirt-engine-backend.noarch            3.1.0-4.fc17
 @ovirt-stable
ovirt-engine-cli.noarch                3.2.0.5-1.fc17                   @updates
ovirt-engine-config.noarch             3.1.0-4.fc17
 @ovirt-stable
ovirt-engine-dbscripts.noarch          3.1.0-4.fc17
 @ovirt-stable
ovirt-engine-genericapi.noarch         3.1.0-4.fc17
 @ovirt-stable
ovirt-engine-notification-
service.noarch
                                       3.1.0-4.fc17
 @ovirt-stable
ovirt-engine-restapi.noarch            3.1.0-4.fc17
 @ovirt-stable
ovirt-engine-sdk.noarch                3.2.0.2-1.fc17                   @updates
ovirt-engine-setup.noarch              3.1.0-4.fc17
 @ovirt-stable
ovirt-engine-tools-common.noarch       3.1.0-4.fc17
 @ovirt-stable
ovirt-engine-userportal.noarch         3.1.0-4.fc17
 @ovirt-stable
ovirt-engine-webadmin-portal.noarch    3.1.0-4.fc17
 @ovirt-stable
ovirt-image-uploader.noarch            3.1.0-0.git9c42c8.fc17
 @ovirt-stable
ovirt-iso-uploader.noarch              3.1.0-0.git1841d9.fc17
 @ovirt-stable
ovirt-log-collector.noarch             3.1.0-0.git10d719.fc17
 @ovirt-stable
ovirt-release-fedora.noarch            4-2
 @/ovirt-release-fedora.noarch

Chris Smith

Joop

Chris Smith

Itamar Heim

Alex Leonhardt

tags

participants (4)