[Users] How to rescue storage domain structure
Itamar Heim
iheim at redhat.com
Sat Apr 27 01:23:36 EDT 2013
On 04/22/2013 08:23 PM, Chris Smith wrote:
> List,
>
> I have lost the ability to manage the hosts or VM's using ovirt engine
> web interface. The data center is offline, and I
> can't actually perform any operations with the hosts or VM's. I don't
> think that there
> are any actions I can perform in the web interface at all.
>
> What's odd is that I can tell the host to go into maintenance mode
> using the ovirt-engine web interface and it seems to go into
> maintenance mode. It even shows the wrench icon next to the host. I
> can also try and activate it after it susposedly goes into maintenance
> mode, and It states that the host was activated, but the host never
> actually comes up or contends for SPM status, and the data center
> never comes online.
>
> From the logs it seems that at least PKI is broken between the engine
> and the hosts as I see numerous certificate errors on both the
> ovirt-engine and clients.
>
> vdsm.log shows:
>
> Traceback (most recent call last):
> File "/usr/lib64/python2.7/SocketServer.py", line 582, in
> process_request_thread
> self.finish_request(request, client_address)
> File "/usr/lib/python2.7/site-packages/vdsm/SecureXMLRPCServer.py",
> line 66, in finish_request
> request.do_handshake()
> File "/usr/lib64/python2.7/ssl.py", line 305, in do_handshake
> self._sslobj.do_handshake()
> SSLError: [Errno 1] _ssl.c:504: error:14094416:SSL
> routines:SSL3_READ_BYTES:sslv3 alert certificate unknown
>
> and engine.log shows:
>
> 2013-04-18 18:42:43,632 ERROR
> [org.ovirt.engine.core.
> engineencryptutils.EncryptionUtils]
> (QuartzScheduler_Worker-68) Failed to decryptData must start with zero
> 2013-04-18 18:42:43,642 ERROR
> [org.ovirt.engine.core.vdsbroker.vdsbroker.VdsBrokerCommand]
> (QuartzScheduler_Worker-68) XML RPC error in command
>
>
> Alon Bar-Lev was able to offer several good pointers in another thread
> titled "Certificates and PKI seem to be broken after yum update" and
> eventually concluded that the installation seems to be corrupted more
> than just the certificates, truststore, and keystore, and suggested
> that I start a new thread to ask about how to rescue the storage
> domain structure.
>
> The storage used for the data center is ISCSI, which is intact and
> working. In fact 2 of the VM's are still online and running on one of
> the original FC17 hosts systems.
>
> I'm not able to reinstall any of the existing hosts from the ovirt-engine web
> interface. I attempted to reinstall one of the hosts (not the SPM)
> which failed.
>
> I also tried to bring up a new, third host and add it to the cluster.
> I setup another Fedora 17 box up and tried to add it to the
> cluster, but it states that there are no available servers in the
> cluster to probe the new host.
>
> This is a test environment that I would like to fix, but I'm also
> willing to just run engine cleanup and start over.
>
> That said, there are 3 VM's that I would like to keep. Two are online
> and running, and I'm able to see them with virsh on that host. I was
> wondering about using virsh to backup these vm's.
>
> The third VM exists in the database, and was set to run on the host
> that I attempted to reinstall, but that VM isn't running, and when I
> use virsh on it's host, virsh can't seem to find it, when I perform
> the list commands, and I can't start it with virsh <vm-name>
>
> What is the best way to proceed? It seems like it would be easier to
> export the VM's using virsh from the host that they run on if
> possible, then update ovirt to the latest version, recreate everything
> and then import the VM's back in to the new environment.
>
> Will this work? Is there a procedure I can follow to do this?
>
> Here's some additional information about the installed ovirt packages
> on the ovirt-engine
>
> [root at reliant yum.repos.d]# yum list installed | grep ovirt
> ovirt-engine.noarch 3.1.0-4.fc17
> @ovirt-stable
> ovirt-engine-backend.noarch 3.1.0-4.fc17
> @ovirt-stable
> ovirt-engine-cli.noarch 3.2.0.5-1.fc17 @updates
> ovirt-engine-config.noarch 3.1.0-4.fc17
> @ovirt-stable
> ovirt-engine-dbscripts.noarch 3.1.0-4.fc17
> @ovirt-stable
> ovirt-engine-genericapi.noarch 3.1.0-4.fc17
> @ovirt-stable
> ovirt-engine-notification-
> service.noarch
> 3.1.0-4.fc17
> @ovirt-stable
> ovirt-engine-restapi.noarch 3.1.0-4.fc17
> @ovirt-stable
> ovirt-engine-sdk.noarch 3.2.0.2-1.fc17 @updates
> ovirt-engine-setup.noarch 3.1.0-4.fc17
> @ovirt-stable
> ovirt-engine-tools-common.noarch 3.1.0-4.fc17
> @ovirt-stable
> ovirt-engine-userportal.noarch 3.1.0-4.fc17
> @ovirt-stable
> ovirt-engine-webadmin-portal.noarch 3.1.0-4.fc17
> @ovirt-stable
> ovirt-image-uploader.noarch 3.1.0-0.git9c42c8.fc17
> @ovirt-stable
> ovirt-iso-uploader.noarch 3.1.0-0.git1841d9.fc17
> @ovirt-stable
> ovirt-log-collector.noarch 3.1.0-0.git10d719.fc17
> @ovirt-stable
> ovirt-release-fedora.noarch 4-2
> @/ovirt-release-fedora.noarch
> _______________________________________________
> Users mailing list
> Users at ovirt.org
> http://lists.ovirt.org/mailman/listinfo/users
>
what type of storage domain is this (nfs, iscsi, etc.)?
you can also try backing up the db, re-install engine, restore the db,
then try to re-install the hosts.
More information about the Users
mailing list