[Users] How to rescue storage domain structure

Itamar Heim iheim at redhat.com
Sat Apr 27 05:23:36 UTC 2013


On 04/22/2013 08:23 PM, Chris Smith wrote:
> List,
>
> I have lost the ability to manage the hosts or VM's using ovirt engine
> web interface.  The data center is offline, and I
> can't actually perform any operations with the hosts or VM's.  I don't
> think that there
> are any actions I can perform in the web interface at all.
>
> What's odd is that I can tell the host to go into maintenance mode
> using the ovirt-engine web interface and it seems to go into
> maintenance mode.  It even shows the wrench icon next to the host.  I
> can also try and activate it after it susposedly goes into maintenance
> mode, and It states that the host was activated, but the host never
> actually comes up or contends for SPM status, and the data center
> never comes online.
>
>  From the logs it seems that at least PKI is broken between the engine
> and the hosts as I see numerous certificate errors on both the
> ovirt-engine and clients.
>
> vdsm.log shows:
>
> Traceback (most recent call last):
>    File "/usr/lib64/python2.7/SocketServer.py", line 582, in
> process_request_thread
>      self.finish_request(request, client_address)
>    File "/usr/lib/python2.7/site-packages/vdsm/SecureXMLRPCServer.py",
> line 66, in finish_request
>      request.do_handshake()
>    File "/usr/lib64/python2.7/ssl.py", line 305, in do_handshake
>      self._sslobj.do_handshake()
> SSLError: [Errno 1] _ssl.c:504: error:14094416:SSL
> routines:SSL3_READ_BYTES:sslv3 alert certificate unknown
>
> and engine.log shows:
>
> 2013-04-18 18:42:43,632 ERROR
> [org.ovirt.engine.core.
> engineencryptutils.EncryptionUtils]
> (QuartzScheduler_Worker-68) Failed to decryptData must start with zero
> 2013-04-18 18:42:43,642 ERROR
> [org.ovirt.engine.core.vdsbroker.vdsbroker.VdsBrokerCommand]
> (QuartzScheduler_Worker-68) XML RPC error in command
>
>
> Alon Bar-Lev was able to offer several good pointers in another thread
> titled "Certificates and PKI seem to be broken after yum update" and
> eventually concluded that the installation seems to be corrupted more
> than just the certificates, truststore, and keystore, and suggested
> that I start a new thread to ask about how to rescue the storage
> domain structure.
>
> The storage used for the data center is ISCSI, which is intact and
> working.  In fact 2 of the VM's are still online and running on one of
> the original FC17 hosts systems.
>
> I'm not able to reinstall any of the existing hosts from the ovirt-engine web
> interface.  I attempted to reinstall one of the hosts (not the SPM)
> which failed.
>
> I also tried to bring up a new, third host and add it to the cluster.
> I setup another Fedora 17 box up and tried to add it to the
> cluster, but it states that there are no available servers in the
> cluster to probe the new host.
>
> This is a test environment that I would like to fix, but I'm also
> willing to just run engine cleanup and start over.
>
> That said, there are 3 VM's that I would like to keep.  Two are online
> and running, and I'm able to see them with virsh on that host.  I was
> wondering about using virsh to backup these vm's.
>
> The third VM exists in the database, and was set to run on the host
> that I attempted to reinstall, but that VM isn't running, and when I
> use virsh on it's host, virsh can't seem to find it, when I perform
> the list commands, and I can't start it with virsh <vm-name>
>
> What is the best way to proceed?  It seems like it would be easier to
> export the VM's using virsh from the host that they run on if
> possible, then update ovirt to the latest version, recreate everything
> and then import the VM's back in to the new environment.
>
> Will this work?  Is there a procedure I can follow to do this?
>
> Here's some additional information about the installed ovirt packages
> on the ovirt-engine
>
> [root at reliant yum.repos.d]# yum list installed | grep ovirt
> ovirt-engine.noarch                    3.1.0-4.fc17
>   @ovirt-stable
> ovirt-engine-backend.noarch            3.1.0-4.fc17
>   @ovirt-stable
> ovirt-engine-cli.noarch                3.2.0.5-1.fc17                   @updates
> ovirt-engine-config.noarch             3.1.0-4.fc17
>   @ovirt-stable
> ovirt-engine-dbscripts.noarch          3.1.0-4.fc17
>   @ovirt-stable
> ovirt-engine-genericapi.noarch         3.1.0-4.fc17
>   @ovirt-stable
> ovirt-engine-notification-
> service.noarch
>                                         3.1.0-4.fc17
>   @ovirt-stable
> ovirt-engine-restapi.noarch            3.1.0-4.fc17
>   @ovirt-stable
> ovirt-engine-sdk.noarch                3.2.0.2-1.fc17                   @updates
> ovirt-engine-setup.noarch              3.1.0-4.fc17
>   @ovirt-stable
> ovirt-engine-tools-common.noarch       3.1.0-4.fc17
>   @ovirt-stable
> ovirt-engine-userportal.noarch         3.1.0-4.fc17
>   @ovirt-stable
> ovirt-engine-webadmin-portal.noarch    3.1.0-4.fc17
>   @ovirt-stable
> ovirt-image-uploader.noarch            3.1.0-0.git9c42c8.fc17
>   @ovirt-stable
> ovirt-iso-uploader.noarch              3.1.0-0.git1841d9.fc17
>   @ovirt-stable
> ovirt-log-collector.noarch             3.1.0-0.git10d719.fc17
>   @ovirt-stable
> ovirt-release-fedora.noarch            4-2
>   @/ovirt-release-fedora.noarch
> _______________________________________________
> Users mailing list
> Users at ovirt.org
> http://lists.ovirt.org/mailman/listinfo/users
>

what type of storage domain is this (nfs, iscsi, etc.)?
you can also try backing up the db, re-install engine, restore the db, 
then try to re-install the hosts.




More information about the Users mailing list