[Users] NFS Domains down because of single node failure

Itamar Heim iheim at redhat.com
Wed Sep 18 12:04:36 UTC 2013


On 09/17/2013 10:01 AM, Markus Stockhausen wrote:
> Hello,
>
> maybe a stupid one but ...
>
> When I create a (NFS) storage domain I have to provide a node host that
> makes
> the inital contact. All other node hosts wil directly connect to that
> domain. So
> no bottlenecks.
>
> Today I stopped one of my two nodes outside ovirt-engine. For simplicity we
> assume the node crashed. The machine is up right now but VDSM is down.
>
> All domains that where setup with this host are "down" now (red arrow
> down).
> After searching the web interface I found "Data center" -> "Select your
> DC" ->
> "Storage" -> "Activate". Trying to activate only results in a failure
> message.
> To ensure that I can recover those situations in the future I'd like to
> know what
> this node binding is all about and what to do next.
>
> Logs attached & thanks in advance
>
> Markus
>
> 2013-09-17 08:54:54,985 WARN
> [org.ovirt.engine.core.vdsbroker.irsbroker.IrsBrokerCommand]
> (pool-6-thread-50) [1be325f3] spm vds is non responsive, stopping spm
> selection.
> 2013-09-17 08:54:54,986 INFO
> [org.ovirt.engine.core.vdsbroker.irsbroker.ActivateStorageDomainVDSCommand]
> (pool-6-thread-50) [1be325f3] FINISH, ActivateStorageDomainVDSCommand,
> log id: 4c38e98
> 2013-09-17 08:54:54,987 ERROR
> [org.ovirt.engine.core.bll.storage.ActivateStorageDomainCommand]
> (pool-6-thread-50) [1be325f3] Command
> org.ovirt.engine.core.bll.storage.ActivateStorageDomainCommand throw Vdc
> Bll exception. With error message VdcBLLException: Cannot allocate IRS
> server (Failed with VDSM error IRS_REPOSITORY_NOT_FOUND and code 5009)
> 2013-09-17 08:54:54,989 INFO
> [org.ovirt.engine.core.bll.storage.ActivateStorageDomainCommand]
> (pool-6-thread-50) [1be325f3] Command
> [id=a0dbe909-fbb1-40ff-b77a-8e43bd075ace]: Compensating
> CHANGED_STATUS_ONLY of
> org.ovirt.engine.core.common.businessentities.StoragePoolIsoMap;
> snapshot: EntityStatusSnapshot [id=storagePoolId =
> b054727d-fe4a-41ed-8393-a81e36b8a1af, storageId =
> ecf7f507-b0fa-47ee-a8b2-d621fbd7b8bf, status=Unknown].
> 2013-09-17 08:54:55,004 ERROR
> [org.ovirt.engine.core.vdsbroker.vdsbroker.GetCapabilitiesVDSCommand]
> (DefaultQuartzScheduler_Worker-99) Command GetCapabilitiesVDS execution
> failed. Exception: VDSNetworkException: java.net.ConnectException:
> Connection refused
> 2013-09-17 08:54:55,008 INFO
> [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector]
> (pool-6-thread-50) [1be325f3] Correlation ID: 1be325f3, Job ID:
> c88c00ba-0298-4f42-bc0e-a720d79c5f49, Call Stack: null, Custom Event ID:
> -1, Message: Failed to activate Storage Domain NAS5_IB (Data Center
> Collogia) by admin at internal
> 2013-09-17 08:54:56,263 INFO
> [org.ovirt.engine.core.bll.storage.ActivateStorageDomainCommand]
> (ajp--127.0.0.1-8702-2) [5c6218c1] Lock Acquired to object EngineLock
> [exclusiveLocks= key: ecf7f507-b0fa-47ee-a8b2-d621fbd7b8bf value: STORAGE
> , sharedLocks= ]
> 2013-09-17 08:54:56,272 INFO
> [org.ovirt.engine.core.bll.storage.ActivateStorageDomainCommand]
> (pool-6-thread-50) [5c6218c1] Running command:
> ActivateStorageDomainCommand internal: false. Entities affected :  ID:
> ecf7f507-b0fa-47ee-a8b2-d621fbd7b8bf Type: Storage
> 2013-09-17 08:54:56,291 INFO
> [org.ovirt.engine.core.bll.storage.ActivateStorageDomainCommand]
> (pool-6-thread-50) [5c6218c1] Lock freed to object EngineLock
> [exclusiveLocks= key: ecf7f507-b0fa-47ee-a8b2-d621fbd7b8bf value: STORAGE
> , sharedLocks= ]
> 2013-09-17 08:54:56,292 INFO
> [org.ovirt.engine.core.bll.storage.ActivateStorageDomainCommand]
> (pool-6-thread-50) [5c6218c1] ActivateStorage Domain. Before Connect all
> hosts to pool. Time:9/17/13 8:54 AM
> 2013-09-17 08:54:56,296 INFO
> [org.ovirt.engine.core.bll.storage.ConnectStorageToVdsCommand]
> (pool-6-thread-47) Running command: ConnectStorageToVdsCommand internal:
> true. Entities affected :  ID: aaa00000-0000-0000-0000-123456789aaa
> Type: System
> 2013-09-17 08:54:56,299 INFO
> [org.ovirt.engine.core.vdsbroker.vdsbroker.ConnectStorageServerVDSCommand]
> (pool-6-thread-47) START, ConnectStorageServerVDSCommand(HostName =
> colovn3, HostId = 0fdccd63-f5d7-41e4-8350-5941bbc29270, storagePoolId =
> 00000000-0000-0000-0000-000000000000, storageType = NFS, connectionList
> = [{ id: 68c31a49-0e37-4438-a8fe-fc28be62cd3f, connection:
> 10.10.30.251:/var/nas5/ovirt, iqn: null, vfsType: null, mountOptions:
> null, nfsVersion: null, nfsRetrans: null, nfsTimeo: null };]), log id:
> 75a9c6a0
> 2013-09-17 08:54:56,317 INFO
> [org.ovirt.engine.core.vdsbroker.vdsbroker.ConnectStorageServerVDSCommand]
> (pool-6-thread-47) FINISH, ConnectStorageServerVDSCommand, return:
> {68c31a49-0e37-4438-a8fe-fc28be62cd3f=0}, log id: 75a9c6a0
>
>
>
> _______________________________________________
> Users mailing list
> Users at ovirt.org
> http://lists.ovirt.org/mailman/listinfo/users
>

the node you used to setup the storage isn't affecting later runs.
can the remaining node access the storage?



More information about the Users mailing list