[Users] ITA-2967 URGENT: ovirt Node turns status to "non operational" STORAGE_DOMAIN_UNREACHABLE

Sven Knohsalla s.knohsalla at netbiscuits.com
Mon Oct 15 13:56:00 UTC 2012


Hi,

sometimes one hypervisors status turns to "Non-operational" with error "STORAGE_DOMAIN_UNREACHABLE" and the live-migration (activated for all VMs) is starting.
I don't currently know why the ovirt-node turns to this status, because the connected iSCSI SAN is available all the time(checked via iscsi session and lsblk), I'm also able to r/w on the SAN during that time.

We can simply activate this ovirt-node and it turns up again. The migration process is running from scratch and hitting the some error --> Reboot of ovirt-node necessary!

When a hypervisor turns to "non-operational" status, the live migration is starting and tries to migrate ~25 VMs (~ 100 GB RAM to migrate).
During that process the network workload goes 100%, some VMs will be migrated, then the destination host also turns to "non-operational" status with error "STORAGE_DOMAIN_UNREACHABLE".

Many VMs are still running on their  origin host, some are paused, some are showing "migration from" status.
After a reboot of the origin host, the VMs turns of course into unknown state.

So the whole cluster is down :/

For this problem I have some questions:

-          Does ovirt engine just use the ovirt-mgmt network for migration/HA?

-          If so, is there any possibility to add/switch a network for migration/HA?

-          Is the kind of way we are using the live-migration not recommended?

-          Which engine module checks the availability of the storage domain for the ovirt-nodes?

-          Is there any timeout/cache option we can set/increase to avoid this problem?

-          Is there any known problem with the versions we are using? (Migration to ovirt-engine 3.1 is not possible atm)

-          Is it possible to modify the migration queue to just migrate a max. of 4 VMs at the same time for example?


ovirt-engine:
FC 16:  3.3.6-3.fc16.x86_64
Engine:  3.0.0_0001-1.6.fc16
KVM based VM: 2 vCPU, 4 GB RAM
1 NIC for ssh/https access
1 NIC for ovirtmgmt network access
engine source: dreyou repo

ovirt-node:
Node: 2.3.0
2 bonded NICs -> Frontend Network
4 Multipath NICs -> SAN connection

Attached some relevant logfiles.

Thanks in advance, I really appreciate your help!

Best,
Sven Knohsalla | System Administration

Office +49 631 68036 433 | Fax +49 631 68036 111  |E-Mail s.knohsalla at netbiscuits.com<mailto:s.knohsalla at netbiscuits.com> | Skype: Netbiscuits.admin
Netbiscuits GmbH | Europaallee 10 | 67657 | GERMANY

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.ovirt.org/pipermail/users/attachments/20121015/8edb6856/attachment-0001.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: ovirtlogs.zip
Type: application/x-zip-compressed
Size: 359523 bytes
Desc: ovirtlogs.zip
URL: <http://lists.ovirt.org/pipermail/users/attachments/20121015/8edb6856/attachment-0001.bin>


More information about the Users mailing list