Error when trying to change master storage domain

Hello, I'm trying to decommission the old master storage domain in ovirt, and replace it with a new one. All of the VMs have been migrated off of the old master, and everything has been running on the new storage domain for a couple months. But when I try to put the old domain into maintenance mode I get an error. Old Master: vm-storage-ssd New Domain: vm-storage-ssd2 The error is: Failed to Reconstruct Master Domain for Data Center EDC2 As well as: Sync Error on Master Domain between Host daccs01 and oVirt Engine. Domain: vm-storage-ssd is marked as Master in oVirt Engine database but not on the Storage side. Please consult with Support on how to fix this issue. 2021-07-28 11:41:34,870-07 WARN [org.ovirt.engine.core.vdsbroker.irsbroker.IrsProxy] (EE-ManagedThreadFactory-engine-Thread-23) [] Master domain version is not in sync between DB and VDSM. Domain vm-storage-ssd marked as master, but the version in DB: 283 and in VDSM: 280 And: Not stopping SPM on vds daccs01, pool id f72ec125-69a1-4c1b-a5e1-313fcb70b6ff as there are uncleared tasks Task '5fa9edf0-56c3-40e4-9327-47bf7764d28d', status 'finished' After a couple minutes all the domains are marked as active again and things continue, but vm-storage-ssd is still listed as the master domain. Any thoughts? This is on 4.3.10.4-1.el7 on CentOS 7. engine=# SELECT storage_name, storage_pool_id, storage, status FROM storage_pool_with_storage_domain ORDER BY storage_name; storage_name | storage_pool_id | storage | status -----------------------+--------------------------------------+----------------------------------------+-------- compute1-iscsi-ssd | f72ec125-69a1-4c1b-a5e1-313fcb70b6ff | yvUESE-yWUv-VIWL-qX90-aAq7-gK0I-EqppRL | 1 compute7-iscsi-ssd | f72ec125-69a1-4c1b-a5e1-313fcb70b6ff | 8ekHdv-u0RJ-B0FO-LUUK-wDWs-iaxb-sh3W3J | 1 export-domain-storage | f72ec125-69a1-4c1b-a5e1-313fcb70b6ff | d3932528-6844-481a-bfed-542872ace9e5 | 1 iso-storage | f72ec125-69a1-4c1b-a5e1-313fcb70b6ff | f800b7a6-6a0c-4560-8476-2f294412d87d | 1 vm-storage-7200rpm | f72ec125-69a1-4c1b-a5e1-313fcb70b6ff | a0bff472-1348-4302-a5c7-f1177efa45a9 | 1 vm-storage-ssd | f72ec125-69a1-4c1b-a5e1-313fcb70b6ff | 95acd9a4-a6fb-4208-80dd-1c53d6aacad0 | 1 vm-storage-ssd2 | f72ec125-69a1-4c1b-a5e1-313fcb70b6ff | 829d0600-c3f7-4dae-a749-d7f05c6a6ca4 | 1 (7 rows) Thanks, -Matthew --

Hi Matthew, Actually, your description is related to 2 features available for ovirt 4.4.5 <https://www.ovirt.org/release/4.4.5/> 1. The ability to switch the master storage domain while domains are up and running [1] 2. Clearing the finished tasks from REST API [2] and UI [3]. We recommend you upgrade your engine to enjoy those features. In the meanwhile, as you've described, moving the Master role from one storage to the other is available using putting the domain into maintenance. In order to clear the finished tasks from SPM: vdsm-client Host getAllTasksStatuses It should be something like that: { "1dc4d885-577a-4b6a-b01f-e682602a907c": { "code": 0, "message": "1 jobs completed successfully", "taskID": "1dc4d885-577a-4b6a-b01f-e682602a907c", "taskResult": "success", "taskState": "finished" } } Then clear that tasks: vdsm-client Task clear taskID=12345 Once it gets cleared, the reconstruction can be finished. To verify there are no more finished async tasks, you can run this SQL query on the engine: engine=# select * from async_tasks WHERE storage_pool_id = '123'; [1] https://bugzilla.redhat.com/show_bug.cgi?id=1910022 [2] https://bugzilla.redhat.com/show_bug.cgi?id=1627997 [3] https://bugzilla.redhat.com/show_bug.cgi?id=1910302 *Regards,* *Shani Leviim* On Thu, Jul 29, 2021 at 8:33 AM Matthew Benstead <matthewb@uvic.ca> wrote:
Hello,
I'm trying to decommission the old master storage domain in ovirt, and replace it with a new one. All of the VMs have been migrated off of the old master, and everything has been running on the new storage domain for a couple months. But when I try to put the old domain into maintenance mode I get an error.
Old Master: vm-storage-ssd New Domain: vm-storage-ssd2
The error is:
Failed to Reconstruct Master Domain for Data Center EDC2
As well as:
Sync Error on Master Domain between Host daccs01 and oVirt Engine. Domain: vm-storage-ssd is marked as Master in oVirt Engine database but not on the Storage side. Please consult with Support on how to fix this issue.
2021-07-28 11:41:34,870-07 WARN [org.ovirt.engine.core.vdsbroker.irsbroker.IrsProxy] (EE-ManagedThreadFactory-engine-Thread-23) [] Master domain version is not in sync between DB and VDSM. Domain vm-storage-ssd marked as master, but the version in DB: 283 and in VDSM: 280
And:
Not stopping SPM on vds daccs01, pool id f72ec125-69a1-4c1b-a5e1-313fcb70b6ff as there are uncleared tasks Task '5fa9edf0-56c3-40e4-9327-47bf7764d28d', status 'finished'
After a couple minutes all the domains are marked as active again and things continue, but vm-storage-ssd is still listed as the master domain. Any thoughts?
This is on 4.3.10.4-1.el7 on CentOS 7.
engine=# SELECT storage_name, storage_pool_id, storage, status FROM storage_pool_with_storage_domain ORDER BY storage_name; storage_name | storage_pool_id | storage | status
-----------------------+--------------------------------------+----------------------------------------+-------- compute1-iscsi-ssd | f72ec125-69a1-4c1b-a5e1-313fcb70b6ff | yvUESE-yWUv-VIWL-qX90-aAq7-gK0I-EqppRL | 1 compute7-iscsi-ssd | f72ec125-69a1-4c1b-a5e1-313fcb70b6ff | 8ekHdv-u0RJ-B0FO-LUUK-wDWs-iaxb-sh3W3J | 1 export-domain-storage | f72ec125-69a1-4c1b-a5e1-313fcb70b6ff | d3932528-6844-481a-bfed-542872ace9e5 | 1 iso-storage | f72ec125-69a1-4c1b-a5e1-313fcb70b6ff | f800b7a6-6a0c-4560-8476-2f294412d87d | 1 vm-storage-7200rpm | f72ec125-69a1-4c1b-a5e1-313fcb70b6ff | a0bff472-1348-4302-a5c7-f1177efa45a9 | 1 vm-storage-ssd | f72ec125-69a1-4c1b-a5e1-313fcb70b6ff | 95acd9a4-a6fb-4208-80dd-1c53d6aacad0 | 1 vm-storage-ssd2 | f72ec125-69a1-4c1b-a5e1-313fcb70b6ff | 829d0600-c3f7-4dae-a749-d7f05c6a6ca4 | 1 (7 rows)
Thanks, -Matthew -- _______________________________________________ Users mailing list -- users@ovirt.org To unsubscribe send an email to users-leave@ovirt.org Privacy Statement: https://www.ovirt.org/privacy-policy.html oVirt Code of Conduct: https://www.ovirt.org/community/about/community-guidelines/ List Archives: https://lists.ovirt.org/archives/list/users@ovirt.org/message/OXOXW6B2NWXOUG...

Thanks Shani - yes we plan to upgrade to 4.4 in the future, but we're on 4.3 right now due to only running CentOS 7 at the moment. I was able to clear the job from the SPM: [root@daccs01 ~]# vdsm-client Host getAllTasksStatuses { "5fa9edf0-56c3-40e4-9327-47bf7764d28d": { "message": "1 jobs completed successfully", "code": 0, "taskID": "5fa9edf0-56c3-40e4-9327-47bf7764d28d", "taskResult": "success", "taskState": "finished" } } [root@daccs01 ~]# vdsm-client Task clear taskID=5fa9edf0-56c3-40e4-9327-47bf7764d28d true [root@daccs01 ~]# vdsm-client Host getAllTasksStatuses {} And confirm there were no async_tasks: engine=# select * from async_tasks; task_id | action_type | status | result | step_id | command_id | started_at | storage_pool_id | task_type | vdsm_task_id | root_command_id | user_id ---------+-------------+--------+--------+---------+------------+------------+-----------------+-----------+--------------+-----------------+--------- (0 rows) However, when putting the vm-storage-ssd domain into maintenance mode, it failed again: Here are some the logs entries - anything else I can look at? 2021-07-29 10:30:37,848-07 ERROR [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (EE-ManagedThreadFactory-engineScheduled-Thread-25) [35c5b47] EVENT_ID: VDS_BROKER_COMMAND_FAILURE(10,802), VDSM compute7.pcic.uvic.ca command ConnectStoragePoolVDS failed: Wrong Master domain or its version: u'SD=a5a83df1-47e2-4927-9add-079199ca7ef8, pool=f72ec125-69a1-4c1b-a5e1-313fcb70b6ff' 2021-07-29 10:30:37,848-07 INFO [org.ovirt.engine.core.vdsbroker.vdsbroker.ConnectStoragePoolVDSCommand] (EE-ManagedThreadFactory-engineScheduled-Thread-25) [35c5b47] Command 'org.ovirt.engine.core.vdsbroker.vd sbroker.ConnectStoragePoolVDSCommand' return value 'StatusOnlyReturn [status=Status [code=324, message=Wrong Master domain or its version: u'SD=a5a83df1-47e2-4927-9add-079199ca7ef8, pool=f72ec125-69a1-4c1b-a5e1- 313fcb70b6ff']]' ... 2021-07-29 10:30:37,848-07 INFO [org.ovirt.engine.core.vdsbroker.vdsbroker.ConnectStoragePoolVDSCommand] (EE-ManagedThreadFactory-engineScheduled-Thread-25) [35c5b47] HostName = compute7.pcic.uvic.ca 2021-07-29 10:30:37,849-07 ERROR [org.ovirt.engine.core.vdsbroker.vdsbroker.ConnectStoragePoolVDSCommand] (EE-ManagedThreadFactory-engineScheduled-Thread-25) [35c5b47] Command 'ConnectStoragePoolVDSCommand(HostN ame = compute7.pcic.uvic.ca, ConnectStoragePoolVDSCommandParameters:{hostId='51769733-0cf6-4270-8288-ec96474b7609', vdsId='51769733-0cf6-4270-8288-ec96474b7609', storagePoolId='f72ec125-69a1-4c1b-a5e1-313fcb70b6 ff', masterVersion='288'})' execution failed: IRSGenericException: IRSErrorException: IRSNoMasterDomainException: Wrong Master domain or its version: u'SD=a5a83df1-47e2-4927-9add-079199ca7ef8, pool=f72ec125-69a1 -4c1b-a5e1-313fcb70b6ff' ... 2021-07-29 10:30:37,849-07 ERROR [org.ovirt.engine.core.vdsbroker.irsbroker.IrsBrokerCommand] (EE-ManagedThreadFactory-engineScheduled-Thread-25) [35c5b47] IrsBroker::Failed::DeactivateStorageDomainVDS: IRSGener icException: IRSErrorException: IRSNoMasterDomainException: Wrong Master domain or its version: u'SD=a5a83df1-47e2-4927-9add-079199ca7ef8, pool=f72ec125-69a1-4c1b-a5e1-313fcb70b6ff' 2021-07-29 10:30:37,855-07 INFO [org.ovirt.engine.core.vdsbroker.irsbroker.DeactivateStorageDomainVDSCommand] (EE-ManagedThreadFactory-engineScheduled-Thread-25) [35c5b47] FINISH, DeactivateStorageDomainVDSComm and, return: , log id: 1c215ca4 2021-07-29 10:30:37,855-07 ERROR [org.ovirt.engine.core.bll.storage.domain.DeactivateStorageDomainCommand] (EE-ManagedThreadFactory-engineScheduled-Thread-25) [35c5b47] DeactivateStorageDomainVDS failed 'a5a83df 1-47e2-4927-9add-079199ca7ef8': org.ovirt.engine.core.common.errors.EngineException: EngineException: org.ovirt.engine.core.vdsbroker.irsbroker.IRSNoMasterDomainException: IRSGenericException: IRSErrorException: IRSNoMasterDomainException: Wrong Master domain or its version: u'SD=a5a83df1-47e2-4927-9add-079199ca7ef8, pool=f72ec125-69a1-4c1b-a5e1-313fcb70b6ff' (Failed with error StoragePoolWrongMaster and code 324) at org.ovirt.engine.core.bll.VdsHandler.handleVdsResult(VdsHandler.java:118) [bll.jar:] at org.ovirt.engine.core.bll.VDSBrokerFrontendImpl.runVdsCommand(VDSBrokerFrontendImpl.java:33) [bll.jar:] at org.ovirt.engine.core.bll.CommandBase.runVdsCommand(CommandBase.java:2112) [bll.jar:] at org.ovirt.engine.core.bll.storage.domain.DeactivateStorageDomainCommand.dectivateStorageDomain(DeactivateStorageDomainCommand.java:340) [bll.jar:] ... at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) [rt.jar:1.8.0_292] at java.lang.Thread.run(Thread.java:748) [rt.jar:1.8.0_292] at org.glassfish.enterprise.concurrent.ManagedThreadFactoryImpl$ManagedThread.run(ManagedThreadFactoryImpl.java:250) [javax.enterprise.concurrent-1.0.jar:] Caused by: org.ovirt.engine.core.vdsbroker.irsbroker.IRSNoMasterDomainException: IRSGenericException: IRSErrorException: IRSNoMasterDomainException: Wrong Master domain or its version: u'SD=a5a83df1-47e2-4927-9add-079199ca7ef8, pool=f72ec125-69a1-4c1b-a5e1-313fcb70b6ff' at org.ovirt.engine.core.vdsbroker.vdsbroker.BrokerCommandBase.proceedProxyReturnValue(BrokerCommandBase.java:50) [vdsbroker.jar:] at org.ovirt.engine.core.vdsbroker.vdsbroker.ConnectStoragePoolVDSCommand.proceedConnectProxyReturnValue(ConnectStoragePoolVDSCommand.java:48) [vdsbroker.jar:] at org.ovirt.engine.core.vdsbroker.vdsbroker.ConnectStoragePoolVDSCommand.proceedProxyReturnValue(ConnectStoragePoolVDSCommand.java:36) [vdsbroker.jar:] ... at org.jboss.weld.bean.proxy.CombinedInterceptorAndDecoratorStackMethodHandler.invoke(CombinedInterceptorAndDecoratorStackMethodHandler.java:79) [weld-core-impl-3.1.1.Final.jar:3.1.1.Final] at org.jboss.weld.bean.proxy.CombinedInterceptorAndDecoratorStackMethodHandler.invoke(CombinedInterceptorAndDecoratorStackMethodHandler.java:68) [weld-core-impl-3.1.1.Final.jar:3.1.1.Final] at org.ovirt.engine.core.vdsbroker.ResourceManager$Proxy$_$$_WeldSubclass.runVdsCommand(Unknown Source) [vdsbroker.jar:] ... 94 more 2021-07-29 10:30:37,861-07 ERROR [org.ovirt.engine.core.bll.storage.domain.DeactivateStorageDomainCommand] (EE-ManagedThreadFactory-engineScheduled-Thread-25) [35c5b47] Failed to deactivate storage domain 'a5a83df1-47e2-4927-9add-079199ca7ef8' 2021-07-29 10:30:37,868-07 INFO [org.ovirt.engine.core.bll.CommandCompensator] (EE-ManagedThreadFactory-engineScheduled-Thread-25) [35c5b47] Command [id=c63199f8-a720-4053-8e5c-92c8d21e0ce2]: Compensating CHANGED_STATUS_ONLY of org.ovirt.engine.core.common.businessentities.StoragePoolIsoMap; snapshot: EntityStatusSnapshot:{id='StoragePoolIsoMapId:{storagePoolId='f72ec125-69a1-4c1b-a5e1-313fcb70b6ff', storageId='a5a83df1-47e2-4927-9add-079199ca7ef8'}', status='Unknown'}. 2021-07-29 10:30:37,882-07 ERROR [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (EE-ManagedThreadFactory-engineScheduled-Thread-25) [35c5b47] EVENT_ID: USER_DEACTIVATE_STORAGE_DOMAIN_FAILED(969), Failed to deactivate Storage Domain vm-storage-ssd (Data Center EDC2). 2021-07-29 10:30:37,884-07 WARN [org.ovirt.engine.core.bll.storage.pool.ReconstructMasterDomainCommand] (EE-ManagedThreadFactory-engine-Thread-25) [60d33d] Validation of action 'ReconstructMasterDomain' failed for user SYSTEM. Reasons: VAR__ACTION__RECONSTRUCT_MASTER,VAR__TYPE__STORAGE__DOMAIN,ACTION_TYPE_FAILED_STORAGE_DOMAIN_STATUS_ILLEGAL2,$status Locked 2021-07-29 10:30:37,888-07 INFO [org.ovirt.engine.core.bll.eventqueue.EventQueueMonitor] (EE-ManagedThreadFactory-engine-Thread-48) [35c5b47] Finished reconstruct for pool 'f72ec125-69a1-4c1b-a5e1-313fcb70b6ff'. Clearing event queue 2021-07-29 10:30:37,899-07 INFO [org.ovirt.engine.core.vdsbroker.vdsbroker.HSMGetAllTasksStatusesVDSCommand] (EE-ManagedThreadFactory-engineScheduled-Thread-50) [] Command 'org.ovirt.engine.core.vdsbroker.vdsbroker.HSMGetAllTasksStatusesVDSCommand' return value ' TaskStatusListReturn:{status='Status [code=654, message=Not SPM]'} Thanks, -Matthew On 7/29/21 2:52 AM, Shani Leviim wrote:
Notice: This message was sent from outside the University of Victoria email system. Please be cautious with links and sensitive information.
Hi Matthew, Actually, your description is related to 2 features available for ovirt 4.4.5 <https://www.ovirt.org/release/4.4.5/> 1. The ability to switch the master storage domain while domains are up and running [1] 2. Clearing the finished tasks from REST API [2] and UI [3].
We recommend you upgrade your engine to enjoy those features.
In the meanwhile, as you've described, moving the Master role from one storage to the other is available using putting the domain into maintenance. In order to clear the finished tasks from SPM: vdsm-client Host getAllTasksStatuses
It should be something like that: { "1dc4d885-577a-4b6a-b01f-e682602a907c": { "code": 0, "message": "1 jobs completed successfully", "taskID": "1dc4d885-577a-4b6a-b01f-e682602a907c", "taskResult": "success", "taskState": "finished" } }
Then clear that tasks: vdsm-client Task clear taskID=12345 Once it gets cleared, the reconstruction can be finished.
To verify there are no more finished async tasks, you can run this SQL query on the engine: engine=# select * from async_tasks WHERE storage_pool_id = '123';
[1] https://bugzilla.redhat.com/show_bug.cgi?id=1910022 <https://bugzilla.redhat.com/show_bug.cgi?id=1910022> [2] https://bugzilla.redhat.com/show_bug.cgi?id=1627997 <https://bugzilla.redhat.com/show_bug.cgi?id=1627997> [3] https://bugzilla.redhat.com/show_bug.cgi?id=1910302 <https://bugzilla.redhat.com/show_bug.cgi?id=1910302>
*Regards, * *Shani Leviim *
On Thu, Jul 29, 2021 at 8:33 AM Matthew Benstead <matthewb@uvic.ca <mailto:matthewb@uvic.ca>> wrote:
Hello,
I'm trying to decommission the old master storage domain in ovirt, and replace it with a new one. All of the VMs have been migrated off of the old master, and everything has been running on the new storage domain for a couple months. But when I try to put the old domain into maintenance mode I get an error.
Old Master: vm-storage-ssd New Domain: vm-storage-ssd2
The error is:
Failed to Reconstruct Master Domain for Data Center EDC2
As well as:
Sync Error on Master Domain between Host daccs01 and oVirt Engine. Domain: vm-storage-ssd is marked as Master in oVirt Engine database but not on the Storage side. Please consult with Support on how to fix this issue.
2021-07-28 11:41:34,870-07 WARN [org.ovirt.engine.core.vdsbroker.irsbroker.IrsProxy] (EE-ManagedThreadFactory-engine-Thread-23) [] Master domain version is not in sync between DB and VDSM. Domain vm-storage-ssd marked as master, but the version in DB: 283 and in VDSM: 280
And:
Not stopping SPM on vds daccs01, pool id f72ec125-69a1-4c1b-a5e1-313fcb70b6ff as there are uncleared tasks Task '5fa9edf0-56c3-40e4-9327-47bf7764d28d', status 'finished'
After a couple minutes all the domains are marked as active again and things continue, but vm-storage-ssd is still listed as the master domain. Any thoughts?
This is on 4.3.10.4-1.el7 on CentOS 7.
engine=# SELECT storage_name, storage_pool_id, storage, status FROM storage_pool_with_storage_domain ORDER BY storage_name; storage_name | storage_pool_id | storage | status -----------------------+--------------------------------------+----------------------------------------+-------- compute1-iscsi-ssd | f72ec125-69a1-4c1b-a5e1-313fcb70b6ff | yvUESE-yWUv-VIWL-qX90-aAq7-gK0I-EqppRL | 1 compute7-iscsi-ssd | f72ec125-69a1-4c1b-a5e1-313fcb70b6ff | 8ekHdv-u0RJ-B0FO-LUUK-wDWs-iaxb-sh3W3J | 1 export-domain-storage | f72ec125-69a1-4c1b-a5e1-313fcb70b6ff | d3932528-6844-481a-bfed-542872ace9e5 | 1 iso-storage | f72ec125-69a1-4c1b-a5e1-313fcb70b6ff | f800b7a6-6a0c-4560-8476-2f294412d87d | 1 vm-storage-7200rpm | f72ec125-69a1-4c1b-a5e1-313fcb70b6ff | a0bff472-1348-4302-a5c7-f1177efa45a9 | 1 vm-storage-ssd | f72ec125-69a1-4c1b-a5e1-313fcb70b6ff | 95acd9a4-a6fb-4208-80dd-1c53d6aacad0 | 1 vm-storage-ssd2 | f72ec125-69a1-4c1b-a5e1-313fcb70b6ff | 829d0600-c3f7-4dae-a749-d7f05c6a6ca4 | 1 (7 rows)
Thanks, -Matthew -- _______________________________________________ Users mailing list -- users@ovirt.org <mailto:users@ovirt.org> To unsubscribe send an email to users-leave@ovirt.org <mailto:users-leave@ovirt.org> Privacy Statement: https://www.ovirt.org/privacy-policy.html <https://www.ovirt.org/privacy-policy.html> oVirt Code of Conduct: https://www.ovirt.org/community/about/community-guidelines/ <https://www.ovirt.org/community/about/community-guidelines/> List Archives: https://lists.ovirt.org/archives/list/users@ovirt.org/message/OXOXW6B2NWXOUG... <https://lists.ovirt.org/archives/list/users@ovirt.org/message/OXOXW6B2NWXOUGZV3OKO4OMDXVDJSQLZ/>

Hi Matthew, You might need to sync back the master version and domain between the engine and vdsm. To verify those parameters on vdsm, run this command on the SPM host: vdsm-client StoragePool getInfo storagepoolID="f72ec125-69a1-4c1b-a5e1-313fcb70b6ff" The result should be something like: "info": { "domains": "1234:Active,5678:Active,91011:Active", "isoprefix": "", "lver": 6, * "master_uuid": "123", "master_ver": 14,* "name": "No Description", "pool_status": "connected", "spm_id": 1, "type": "NFS", "version": "5" } Then, compare the master version value with the engine: engine=> select * from storage_pool where id = 'f72ec125-69a1-4c1b-a5e1-313fcb70b6ff'; And the master domain: engine=> select * from storage_domains where storage_pool_id='f72ec125-69a1-4c1b-a5e1-313fcb70b6ff' and storage_domain_type='0'; (0 means master, for reference, see https://github.com/oVirt/ovirt-engine/blob/a65cf0eae8858ab2278c3f537dc427e3f... ) Then we can get the bigger picture (and update the engine data to match the vdsm) *Regards,* *Shani Leviim* On Thu, Jul 29, 2021 at 8:40 PM Matthew Benstead <matthewb@uvic.ca> wrote:
Thanks Shani - yes we plan to upgrade to 4.4 in the future, but we're on 4.3 right now due to only running CentOS 7 at the moment.
I was able to clear the job from the SPM:
[root@daccs01 ~]# vdsm-client Host getAllTasksStatuses { "5fa9edf0-56c3-40e4-9327-47bf7764d28d": { "message": "1 jobs completed successfully", "code": 0, "taskID": "5fa9edf0-56c3-40e4-9327-47bf7764d28d", "taskResult": "success", "taskState": "finished" } } [root@daccs01 ~]# vdsm-client Task clear taskID=5fa9edf0-56c3-40e4-9327-47bf7764d28d true [root@daccs01 ~]# vdsm-client Host getAllTasksStatuses {}
And confirm there were no async_tasks:
engine=# select * from async_tasks; task_id | action_type | status | result | step_id | command_id | started_at | storage_pool_id | task_type | vdsm_task_id | root_command_id | user_id
---------+-------------+--------+--------+---------+------------+------------+-----------------+-----------+--------------+-----------------+--------- (0 rows)
However, when putting the vm-storage-ssd domain into maintenance mode, it failed again:
Here are some the logs entries - anything else I can look at?
2021-07-29 10:30:37,848-07 ERROR [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (EE-ManagedThreadFactory-engineScheduled-Thread-25) [35c5b47] EVENT_ID: VDS_BROKER_COMMAND_FAILURE(10,802), VDSM compute7.pcic.uvic.ca command ConnectStoragePoolVDS failed: Wrong Master domain or its version: u'SD=a5a83df1-47e2-4927-9add-079199ca7ef8, pool=f72ec125-69a1-4c1b-a5e1-313fcb70b6ff' 2021-07-29 10:30:37,848-07 INFO [org.ovirt.engine.core.vdsbroker.vdsbroker.ConnectStoragePoolVDSCommand] (EE-ManagedThreadFactory-engineScheduled-Thread-25) [35c5b47] Command 'org.ovirt.engine.core.vdsbroker.vd sbroker.ConnectStoragePoolVDSCommand' return value 'StatusOnlyReturn [status=Status [code=324, message=Wrong Master domain or its version: u'SD=a5a83df1-47e2-4927-9add-079199ca7ef8, pool=f72ec125-69a1-4c1b-a5e1- 313fcb70b6ff']]' ... 2021-07-29 10:30:37,848-07 INFO [org.ovirt.engine.core.vdsbroker.vdsbroker.ConnectStoragePoolVDSCommand] (EE-ManagedThreadFactory-engineScheduled-Thread-25) [35c5b47] HostName = compute7.pcic.uvic.ca 2021-07-29 10:30:37,849-07 ERROR [org.ovirt.engine.core.vdsbroker.vdsbroker.ConnectStoragePoolVDSCommand] (EE-ManagedThreadFactory-engineScheduled-Thread-25) [35c5b47] Command 'ConnectStoragePoolVDSCommand(HostN ame = compute7.pcic.uvic.ca, ConnectStoragePoolVDSCommandParameters:{hostId='51769733-0cf6-4270-8288-ec96474b7609', vdsId='51769733-0cf6-4270-8288-ec96474b7609', storagePoolId='f72ec125-69a1-4c1b-a5e1-313fcb70b6 ff', masterVersion='288'})' execution failed: IRSGenericException: IRSErrorException: IRSNoMasterDomainException: Wrong Master domain or its version: u'SD=a5a83df1-47e2-4927-9add-079199ca7ef8, pool=f72ec125-69a1 -4c1b-a5e1-313fcb70b6ff' ... 2021-07-29 10:30:37,849-07 ERROR [org.ovirt.engine.core.vdsbroker.irsbroker.IrsBrokerCommand] (EE-ManagedThreadFactory-engineScheduled-Thread-25) [35c5b47] IrsBroker::Failed::DeactivateStorageDomainVDS: IRSGener icException: IRSErrorException: IRSNoMasterDomainException: Wrong Master domain or its version: u'SD=a5a83df1-47e2-4927-9add-079199ca7ef8, pool=f72ec125-69a1-4c1b-a5e1-313fcb70b6ff' 2021-07-29 10:30:37,855-07 INFO [org.ovirt.engine.core.vdsbroker.irsbroker.DeactivateStorageDomainVDSCommand] (EE-ManagedThreadFactory-engineScheduled-Thread-25) [35c5b47] FINISH, DeactivateStorageDomainVDSComm and, return: , log id: 1c215ca4 2021-07-29 10:30:37,855-07 ERROR [org.ovirt.engine.core.bll.storage.domain.DeactivateStorageDomainCommand] (EE-ManagedThreadFactory-engineScheduled-Thread-25) [35c5b47] DeactivateStorageDomainVDS failed 'a5a83df 1-47e2-4927-9add-079199ca7ef8': org.ovirt.engine.core.common.errors.EngineException: EngineException: org.ovirt.engine.core.vdsbroker.irsbroker.IRSNoMasterDomainException: IRSGenericException: IRSErrorException: IRSNoMasterDomainException: Wrong Master domain or its version: u'SD=a5a83df1-47e2-4927-9add-079199ca7ef8, pool=f72ec125-69a1-4c1b-a5e1-313fcb70b6ff' (Failed with error StoragePoolWrongMaster and code 324) at org.ovirt.engine.core.bll.VdsHandler.handleVdsResult(VdsHandler.java:118) [bll.jar:] at org.ovirt.engine.core.bll.VDSBrokerFrontendImpl.runVdsCommand(VDSBrokerFrontendImpl.java:33) [bll.jar:] at org.ovirt.engine.core.bll.CommandBase.runVdsCommand(CommandBase.java:2112) [bll.jar:] at org.ovirt.engine.core.bll.storage.domain.DeactivateStorageDomainCommand.dectivateStorageDomain(DeactivateStorageDomainCommand.java:340) [bll.jar:] ... at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) [rt.jar:1.8.0_292] at java.lang.Thread.run(Thread.java:748) [rt.jar:1.8.0_292] at org.glassfish.enterprise.concurrent.ManagedThreadFactoryImpl$ManagedThread.run(ManagedThreadFactoryImpl.java:250) [javax.enterprise.concurrent-1.0.jar:] Caused by: org.ovirt.engine.core.vdsbroker.irsbroker.IRSNoMasterDomainException: IRSGenericException: IRSErrorException: IRSNoMasterDomainException: Wrong Master domain or its version: u'SD=a5a83df1-47e2-4927-9add-079199ca7ef8, pool=f72ec125-69a1-4c1b-a5e1-313fcb70b6ff' at org.ovirt.engine.core.vdsbroker.vdsbroker.BrokerCommandBase.proceedProxyReturnValue(BrokerCommandBase.java:50) [vdsbroker.jar:] at org.ovirt.engine.core.vdsbroker.vdsbroker.ConnectStoragePoolVDSCommand.proceedConnectProxyReturnValue(ConnectStoragePoolVDSCommand.java:48) [vdsbroker.jar:] at org.ovirt.engine.core.vdsbroker.vdsbroker.ConnectStoragePoolVDSCommand.proceedProxyReturnValue(ConnectStoragePoolVDSCommand.java:36) [vdsbroker.jar:] ... at org.jboss.weld.bean.proxy.CombinedInterceptorAndDecoratorStackMethodHandler.invoke(CombinedInterceptorAndDecoratorStackMethodHandler.java:79) [weld-core-impl-3.1.1.Final.jar:3.1.1.Final] at org.jboss.weld.bean.proxy.CombinedInterceptorAndDecoratorStackMethodHandler.invoke(CombinedInterceptorAndDecoratorStackMethodHandler.java:68) [weld-core-impl-3.1.1.Final.jar:3.1.1.Final] at org.ovirt.engine.core.vdsbroker.ResourceManager$Proxy$_$$_WeldSubclass.runVdsCommand(Unknown Source) [vdsbroker.jar:] ... 94 more
2021-07-29 10:30:37,861-07 ERROR [org.ovirt.engine.core.bll.storage.domain.DeactivateStorageDomainCommand] (EE-ManagedThreadFactory-engineScheduled-Thread-25) [35c5b47] Failed to deactivate storage domain 'a5a83df1-47e2-4927-9add-079199ca7ef8' 2021-07-29 10:30:37,868-07 INFO [org.ovirt.engine.core.bll.CommandCompensator] (EE-ManagedThreadFactory-engineScheduled-Thread-25) [35c5b47] Command [id=c63199f8-a720-4053-8e5c-92c8d21e0ce2]: Compensating CHANGED_STATUS_ONLY of org.ovirt.engine.core.common.businessentities.StoragePoolIsoMap; snapshot: EntityStatusSnapshot:{id='StoragePoolIsoMapId:{storagePoolId='f72ec125-69a1-4c1b-a5e1-313fcb70b6ff', storageId='a5a83df1-47e2-4927-9add-079199ca7ef8'}', status='Unknown'}. 2021-07-29 10:30:37,882-07 ERROR [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (EE-ManagedThreadFactory-engineScheduled-Thread-25) [35c5b47] EVENT_ID: USER_DEACTIVATE_STORAGE_DOMAIN_FAILED(969), Failed to deactivate Storage Domain vm-storage-ssd (Data Center EDC2). 2021-07-29 10:30:37,884-07 WARN [org.ovirt.engine.core.bll.storage.pool.ReconstructMasterDomainCommand] (EE-ManagedThreadFactory-engine-Thread-25) [60d33d] Validation of action 'ReconstructMasterDomain' failed for user SYSTEM. Reasons: VAR__ACTION__RECONSTRUCT_MASTER,VAR__TYPE__STORAGE__DOMAIN,ACTION_TYPE_FAILED_STORAGE_DOMAIN_STATUS_ILLEGAL2,$status Locked 2021-07-29 10:30:37,888-07 INFO [org.ovirt.engine.core.bll.eventqueue.EventQueueMonitor] (EE-ManagedThreadFactory-engine-Thread-48) [35c5b47] Finished reconstruct for pool 'f72ec125-69a1-4c1b-a5e1-313fcb70b6ff'. Clearing event queue 2021-07-29 10:30:37,899-07 INFO [org.ovirt.engine.core.vdsbroker.vdsbroker.HSMGetAllTasksStatusesVDSCommand] (EE-ManagedThreadFactory-engineScheduled-Thread-50) [] Command 'org.ovirt.engine.core.vdsbroker.vdsbroker.HSMGetAllTasksStatusesVDSCommand' return value ' TaskStatusListReturn:{status='Status [code=654, message=Not SPM]'}
Thanks, -Matthew
On 7/29/21 2:52 AM, Shani Leviim wrote:
Notice: This message was sent from outside the University of Victoria email system. Please be cautious with links and sensitive information.
Hi Matthew, Actually, your description is related to 2 features available for ovirt 4.4.5 <https://www.ovirt.org/release/4.4.5/> 1. The ability to switch the master storage domain while domains are up and running [1] 2. Clearing the finished tasks from REST API [2] and UI [3].
We recommend you upgrade your engine to enjoy those features.
In the meanwhile, as you've described, moving the Master role from one storage to the other is available using putting the domain into maintenance. In order to clear the finished tasks from SPM: vdsm-client Host getAllTasksStatuses
It should be something like that: { "1dc4d885-577a-4b6a-b01f-e682602a907c": { "code": 0, "message": "1 jobs completed successfully", "taskID": "1dc4d885-577a-4b6a-b01f-e682602a907c", "taskResult": "success", "taskState": "finished" } }
Then clear that tasks: vdsm-client Task clear taskID=12345 Once it gets cleared, the reconstruction can be finished.
To verify there are no more finished async tasks, you can run this SQL query on the engine: engine=# select * from async_tasks WHERE storage_pool_id = '123';
[1] https://bugzilla.redhat.com/show_bug.cgi?id=1910022 [2] https://bugzilla.redhat.com/show_bug.cgi?id=1627997 [3] https://bugzilla.redhat.com/show_bug.cgi?id=1910302
*Regards, *
*Shani Leviim *
On Thu, Jul 29, 2021 at 8:33 AM Matthew Benstead <matthewb@uvic.ca> wrote:
Hello,
I'm trying to decommission the old master storage domain in ovirt, and replace it with a new one. All of the VMs have been migrated off of the old master, and everything has been running on the new storage domain for a couple months. But when I try to put the old domain into maintenance mode I get an error.
Old Master: vm-storage-ssd New Domain: vm-storage-ssd2
The error is:
Failed to Reconstruct Master Domain for Data Center EDC2
As well as:
Sync Error on Master Domain between Host daccs01 and oVirt Engine. Domain: vm-storage-ssd is marked as Master in oVirt Engine database but not on the Storage side. Please consult with Support on how to fix this issue.
2021-07-28 11:41:34,870-07 WARN [org.ovirt.engine.core.vdsbroker.irsbroker.IrsProxy] (EE-ManagedThreadFactory-engine-Thread-23) [] Master domain version is not in sync between DB and VDSM. Domain vm-storage-ssd marked as master, but the version in DB: 283 and in VDSM: 280
And:
Not stopping SPM on vds daccs01, pool id f72ec125-69a1-4c1b-a5e1-313fcb70b6ff as there are uncleared tasks Task '5fa9edf0-56c3-40e4-9327-47bf7764d28d', status 'finished'
After a couple minutes all the domains are marked as active again and things continue, but vm-storage-ssd is still listed as the master domain. Any thoughts?
This is on 4.3.10.4-1.el7 on CentOS 7.
engine=# SELECT storage_name, storage_pool_id, storage, status FROM storage_pool_with_storage_domain ORDER BY storage_name; storage_name | storage_pool_id | storage | status
-----------------------+--------------------------------------+----------------------------------------+-------- compute1-iscsi-ssd | f72ec125-69a1-4c1b-a5e1-313fcb70b6ff | yvUESE-yWUv-VIWL-qX90-aAq7-gK0I-EqppRL | 1 compute7-iscsi-ssd | f72ec125-69a1-4c1b-a5e1-313fcb70b6ff | 8ekHdv-u0RJ-B0FO-LUUK-wDWs-iaxb-sh3W3J | 1 export-domain-storage | f72ec125-69a1-4c1b-a5e1-313fcb70b6ff | d3932528-6844-481a-bfed-542872ace9e5 | 1 iso-storage | f72ec125-69a1-4c1b-a5e1-313fcb70b6ff | f800b7a6-6a0c-4560-8476-2f294412d87d | 1 vm-storage-7200rpm | f72ec125-69a1-4c1b-a5e1-313fcb70b6ff | a0bff472-1348-4302-a5c7-f1177efa45a9 | 1 vm-storage-ssd | f72ec125-69a1-4c1b-a5e1-313fcb70b6ff | 95acd9a4-a6fb-4208-80dd-1c53d6aacad0 | 1 vm-storage-ssd2 | f72ec125-69a1-4c1b-a5e1-313fcb70b6ff | 829d0600-c3f7-4dae-a749-d7f05c6a6ca4 | 1 (7 rows)
Thanks, -Matthew -- _______________________________________________ Users mailing list -- users@ovirt.org To unsubscribe send an email to users-leave@ovirt.org Privacy Statement: https://www.ovirt.org/privacy-policy.html oVirt Code of Conduct: https://www.ovirt.org/community/about/community-guidelines/ List Archives: https://lists.ovirt.org/archives/list/users@ovirt.org/message/OXOXW6B2NWXOUG...

Thanks Shani, Here's the output from the SPM - it looks like the master version is 288: [root@compute7 ~]# vdsm-client StoragePool getInfo storagepoolID="f72ec125-69a1-4c1b-a5e1-313fcb70b6ff" { "info": { "name": "No Description", "isoprefix": "/rhev/data-center/mnt/10.0.231.91:_storage_data_projects_ovirt_nobackup_iso-storage/3fc76134-2143-4921-ad36-ee84abca40e8/images/11111111-1111-1111-1111-111111111111", "pool_status": "connected", "lver": 9356, "spm_id": 6, "master_uuid": "a5a83df1-47e2-4927-9add-079199ca7ef8", "version": "5", "domains": "f73307bc-06c8-4996-86d1-78947cdaf6dd:Attached,d5ae843b-5815-4f3a-b1be-370e56fe0962:Active,a5a83df1-47e2-4927-9add-079199ca7ef8:Active,311c1382-12c2-43a0-96d0-e2084180b114:Active,fc049ebe-03f9-43fc-adca-d6bfeb99c288:Active,3fc76134-2143-4921-ad36-ee84abca40e8:Active,2f2aab43-6ce3-4cb0-9142-b2b57e5083b3:Active", "type": "GLUSTERFS", "master_ver": 288 }, "dominfo": { "f73307bc-06c8-4996-86d1-78947cdaf6dd": { "status": "Attached", "isoprefix": "", "alerts": [] }, "d5ae843b-5815-4f3a-b1be-370e56fe0962": { "status": "Active", "diskfree": "94847578931200", "isoprefix": "", "alerts": [], "disktotal": "359981635338240", "version": 0 }, "a5a83df1-47e2-4927-9add-079199ca7ef8": { "status": "Active", "diskfree": "708888707072", "isoprefix": "", "alerts": [], "disktotal": "751252275200", "version": 5 }, "311c1382-12c2-43a0-96d0-e2084180b114": { "status": "Active", "diskfree": "1598335766528", "isoprefix": "", "alerts": [], "disktotal": "2197949513728", "version": 5 }, "2f2aab43-6ce3-4cb0-9142-b2b57e5083b3": { "status": "Active", "diskfree": "1416265465856", "isoprefix": "", "alerts": [], "disktotal": "3837687496704", "version": 5 }, "3fc76134-2143-4921-ad36-ee84abca40e8": { "status": "Active", "diskfree": "94847578931200", "isoprefix": "/rhev/data-center/mnt/10.0.231.91:_storage_data_projects_ovirt_nobackup_iso-storage/3fc76134-2143-4921-ad36-ee84abca40e8/images/11111111-1111-1111-1111-111111111111", "alerts": [], "disktotal": "359981635338240", "version": 0 }, "fc049ebe-03f9-43fc-adca-d6bfeb99c288": { "status": "Active", "diskfree": "1337882312704", "isoprefix": "", "alerts": [], "disktotal": "3837687496704", "version": 5 } } } And the master domain version for the vm-storage-ssd domain is 288 in the database as well: engine=# select * from storage_pool where id = 'f72ec125-69a1-4c1b-a5e1-313fcb70b6ff'; id | name | description | storage_pool_type | storage_pool_format_type | status | master_domain_version | spm_vds_id | compatibility_version | _create_date | _update_date | quota_enforcement_type | free_text_comment | is_local --------------------------------------+------+-------------+-------------------+--------------------------+--------+-----------------------+--------------------------------------+-----------------------+-------------------------------+-------------------------------+------------------------+-------------------+---------- f72ec125-69a1-4c1b-a5e1-313fcb70b6ff | EDC2 | | | 5 | 1 | 288 | 51769733-0cf6-4270-8288-ec96474b7609 | 4.3 | 2015-08-10 20:51:03.831215-07 | 2021-07-29 10:31:02.234262-07 | 0 | | f (1 row) Here is the master storage domain details: engine=# select * from storage_domains where storage_pool_id='f72ec125-69a1-4c1b-a5e1-313fcb70b6ff' and storage_domain_type='0'; id | storage | storage_name | storage_description | storage_comment | storage_pool_id | available_disk_size | confirmed_available_disk_size | vdo_savings | used_disk_size | commited_disk_size | actual_images_size | status | storage_pool_name | storage_type | storage_domain_type | storage_domain_format_type | last_time_used_as_master | wipe_after_delete | discard_after_delete | first_metadata_device | vg_metadata_device | backup | block_size | storage_domain_shared_status | recoverable | contains_unregistered_entities | warning_low_space_indicator | critical_space_action_blocker | warning_low_confirmed_space_indicator | external_status | supports_discard | is_hosted_engine_storage --------------------------------------+--------------------------------------+----------------+---------------------+-----------------+--------------------------------------+---------------------+-------------------------------+-------------+----------------+--------------------+--------------------+--------+-------------------+--------------+---------------------+----------------------------+--------------------------+-------------------+----------------------+-----------------------+--------------------+--------+------------+------------------------------+-------------+--------------------------------+-----------------------------+-------------------------------+---------------------------------------+-----------------+------------------+-------------------------- a5a83df1-47e2-4927-9add-079199ca7ef8 | 95acd9a4-a6fb-4208-80dd-1c53d6aacad0 | vm-storage-ssd | | | f72ec125-69a1-4c1b-a5e1-313fcb70b6ff | 660 | | | 39 | 0 | 0 | 3 | EDC2 | 7 | 0 | 5 | 1627497705160 | f | f | | | f | 512 | 1 | t | f | 10 | 5 | 0 | 0 | | f (1 row) This is the domain we want to switch over to the master domain so I can decommission the old one. engine=# select * from storage_domains where storage_pool_id='f72ec125-69a1-4c1b-a5e1-313fcb70b6ff' and storage_name = 'vm-storage-ssd2'; id | storage | storage_name | storage_description | storage_comment | storage_pool_id | available_disk_size | confirmed_available_disk_size | vdo_savings | used_disk_size | commited_disk_size | actual_images_size | status | storage_pool_name | storage_type | storage_domain_type | storage_domain_format_type | last_time_used_as_master | wipe_after_delete | discard_after_delete | first_metadata_device | vg_metadata_device | backup | block_size | storage_domain_shared_status | recoverable | contains_unregistered_entities | warning_low_space_indicator | critical_space_action_blocker | warning_low_confirmed_space_indicator | external_status | supports_discard | is_hosted_engine_storage --------------------------------------+--------------------------------------+-----------------+----------------------------+-----------------+--------------------------------------+---------------------+-------------------------------+-------------+----------------+--------------------+--------------------+--------+-------------------+--------------+---------------------+----------------------------+------------------ --------+-------------------+----------------------+-----------------------+--------------------+--------+------------+------------------------------+-------------+--------------------------------+-----------------------------+-------------------------------+---------------------------------------+-----------------+------------------+-------------------------- 311c1382-12c2-43a0-96d0-e2084180b114 | 829d0600-c3f7-4dae-a749-d7f05c6a6ca4 | vm-storage-ssd2 | Storage01,02,03 vm-storage | | f72ec125-69a1-4c1b-a5e1-313fcb70b6ff | 1488 | | | 559 | 1147 | 538 | 3 | EDC2 | 7 | 1 | 5 | 1627497694904 | f | f | | | f | 512 | 1 | t | f | 10 | 5 | 10 | 0 | | f (1 row) Thanks, -Matthew On 8/1/21 2:00 AM, Shani Leviim wrote:
Notice: This message was sent from outside the University of Victoria email system. Please be cautious with links and sensitive information.
Hi Matthew,
You might need to sync back the master version and domain between the engine and vdsm. To verify those parameters on vdsm, run this command on the SPM host: vdsm-client StoragePool getInfo storagepoolID="f72ec125-69a1-4c1b-a5e1-313fcb70b6ff"
The result should be something like: "info": { "domains": "1234:Active,5678:Active,91011:Active", "isoprefix": "", "lver": 6, * "master_uuid": "123", "master_ver": 14,* "name": "No Description", "pool_status": "connected", "spm_id": 1, "type": "NFS", "version": "5" }
Then, compare the master version value with the engine: engine=> select * from storage_pool where id = 'f72ec125-69a1-4c1b-a5e1-313fcb70b6ff';
And the master domain: engine=> select * from storage_domains where storage_pool_id='f72ec125-69a1-4c1b-a5e1-313fcb70b6ff' and storage_domain_type='0';
(0 means master, for reference, see https://github.com/oVirt/ovirt-engine/blob/a65cf0eae8858ab2278c3f537dc427e3f... <https://github.com/oVirt/ovirt-engine/blob/a65cf0eae8858ab2278c3f537dc427e3ff20eba7/backend/manager/modules/common/src/main/java/org/ovirt/engine/core/common/businessentities/StorageDomainType.java>)
Then we can get the bigger picture (and update the engine data to match the vdsm)
*Regards, * *Shani Leviim *
On Thu, Jul 29, 2021 at 8:40 PM Matthew Benstead <matthewb@uvic.ca <mailto:matthewb@uvic.ca>> wrote:
Thanks Shani - yes we plan to upgrade to 4.4 in the future, but we're on 4.3 right now due to only running CentOS 7 at the moment.
I was able to clear the job from the SPM:
[root@daccs01 ~]# vdsm-client Host getAllTasksStatuses { "5fa9edf0-56c3-40e4-9327-47bf7764d28d": { "message": "1 jobs completed successfully", "code": 0, "taskID": "5fa9edf0-56c3-40e4-9327-47bf7764d28d", "taskResult": "success", "taskState": "finished" } } [root@daccs01 ~]# vdsm-client Task clear taskID=5fa9edf0-56c3-40e4-9327-47bf7764d28d true [root@daccs01 ~]# vdsm-client Host getAllTasksStatuses {}
And confirm there were no async_tasks:
engine=# select * from async_tasks; task_id | action_type | status | result | step_id | command_id | started_at | storage_pool_id | task_type | vdsm_task_id | root_command_id | user_id ---------+-------------+--------+--------+---------+------------+------------+-----------------+-----------+--------------+-----------------+--------- (0 rows)
However, when putting the vm-storage-ssd domain into maintenance mode, it failed again:
Here are some the logs entries - anything else I can look at?
2021-07-29 10:30:37,848-07 ERROR [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (EE-ManagedThreadFactory-engineScheduled-Thread-25) [35c5b47] EVENT_ID: VDS_BROKER_COMMAND_FAILURE(10,802), VDSM compute7.pcic.uvic.ca <http://compute7.pcic.uvic.ca> command ConnectStoragePoolVDS failed: Wrong Master domain or its version: u'SD=a5a83df1-47e2-4927-9add-079199ca7ef8, pool=f72ec125-69a1-4c1b-a5e1-313fcb70b6ff' 2021-07-29 10:30:37,848-07 INFO [org.ovirt.engine.core.vdsbroker.vdsbroker.ConnectStoragePoolVDSCommand] (EE-ManagedThreadFactory-engineScheduled-Thread-25) [35c5b47] Command 'org.ovirt.engine.core.vdsbroker.vd sbroker.ConnectStoragePoolVDSCommand' return value 'StatusOnlyReturn [status=Status [code=324, message=Wrong Master domain or its version: u'SD=a5a83df1-47e2-4927-9add-079199ca7ef8, pool=f72ec125-69a1-4c1b-a5e1- 313fcb70b6ff']]' ... 2021-07-29 10:30:37,848-07 INFO [org.ovirt.engine.core.vdsbroker.vdsbroker.ConnectStoragePoolVDSCommand] (EE-ManagedThreadFactory-engineScheduled-Thread-25) [35c5b47] HostName = compute7.pcic.uvic.ca <http://compute7.pcic.uvic.ca> 2021-07-29 10:30:37,849-07 ERROR [org.ovirt.engine.core.vdsbroker.vdsbroker.ConnectStoragePoolVDSCommand] (EE-ManagedThreadFactory-engineScheduled-Thread-25) [35c5b47] Command 'ConnectStoragePoolVDSCommand(HostN ame = compute7.pcic.uvic.ca <http://compute7.pcic.uvic.ca>, ConnectStoragePoolVDSCommandParameters:{hostId='51769733-0cf6-4270-8288-ec96474b7609', vdsId='51769733-0cf6-4270-8288-ec96474b7609', storagePoolId='f72ec125-69a1-4c1b-a5e1-313fcb70b6 ff', masterVersion='288'})' execution failed: IRSGenericException: IRSErrorException: IRSNoMasterDomainException: Wrong Master domain or its version: u'SD=a5a83df1-47e2-4927-9add-079199ca7ef8, pool=f72ec125-69a1 -4c1b-a5e1-313fcb70b6ff' ... 2021-07-29 10:30:37,849-07 ERROR [org.ovirt.engine.core.vdsbroker.irsbroker.IrsBrokerCommand] (EE-ManagedThreadFactory-engineScheduled-Thread-25) [35c5b47] IrsBroker::Failed::DeactivateStorageDomainVDS: IRSGener icException: IRSErrorException: IRSNoMasterDomainException: Wrong Master domain or its version: u'SD=a5a83df1-47e2-4927-9add-079199ca7ef8, pool=f72ec125-69a1-4c1b-a5e1-313fcb70b6ff' 2021-07-29 10:30:37,855-07 INFO [org.ovirt.engine.core.vdsbroker.irsbroker.DeactivateStorageDomainVDSCommand] (EE-ManagedThreadFactory-engineScheduled-Thread-25) [35c5b47] FINISH, DeactivateStorageDomainVDSComm and, return: , log id: 1c215ca4 2021-07-29 10:30:37,855-07 ERROR [org.ovirt.engine.core.bll.storage.domain.DeactivateStorageDomainCommand] (EE-ManagedThreadFactory-engineScheduled-Thread-25) [35c5b47] DeactivateStorageDomainVDS failed 'a5a83df 1-47e2-4927-9add-079199ca7ef8': org.ovirt.engine.core.common.errors.EngineException: EngineException: org.ovirt.engine.core.vdsbroker.irsbroker.IRSNoMasterDomainException: IRSGenericException: IRSErrorException: IRSNoMasterDomainException: Wrong Master domain or its version: u'SD=a5a83df1-47e2-4927-9add-079199ca7ef8, pool=f72ec125-69a1-4c1b-a5e1-313fcb70b6ff' (Failed with error StoragePoolWrongMaster and code 324) at org.ovirt.engine.core.bll.VdsHandler.handleVdsResult(VdsHandler.java:118) [bll.jar:] at org.ovirt.engine.core.bll.VDSBrokerFrontendImpl.runVdsCommand(VDSBrokerFrontendImpl.java:33) [bll.jar:] at org.ovirt.engine.core.bll.CommandBase.runVdsCommand(CommandBase.java:2112) [bll.jar:] at org.ovirt.engine.core.bll.storage.domain.DeactivateStorageDomainCommand.dectivateStorageDomain(DeactivateStorageDomainCommand.java:340) [bll.jar:] ... at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) [rt.jar:1.8.0_292] at java.lang.Thread.run(Thread.java:748) [rt.jar:1.8.0_292] at org.glassfish.enterprise.concurrent.ManagedThreadFactoryImpl$ManagedThread.run(ManagedThreadFactoryImpl.java:250) [javax.enterprise.concurrent-1.0.jar:] Caused by: org.ovirt.engine.core.vdsbroker.irsbroker.IRSNoMasterDomainException: IRSGenericException: IRSErrorException: IRSNoMasterDomainException: Wrong Master domain or its version: u'SD=a5a83df1-47e2-4927-9add-079199ca7ef8, pool=f72ec125-69a1-4c1b-a5e1-313fcb70b6ff' at org.ovirt.engine.core.vdsbroker.vdsbroker.BrokerCommandBase.proceedProxyReturnValue(BrokerCommandBase.java:50) [vdsbroker.jar:] at org.ovirt.engine.core.vdsbroker.vdsbroker.ConnectStoragePoolVDSCommand.proceedConnectProxyReturnValue(ConnectStoragePoolVDSCommand.java:48) [vdsbroker.jar:] at org.ovirt.engine.core.vdsbroker.vdsbroker.ConnectStoragePoolVDSCommand.proceedProxyReturnValue(ConnectStoragePoolVDSCommand.java:36) [vdsbroker.jar:] ... at org.jboss.weld.bean.proxy.CombinedInterceptorAndDecoratorStackMethodHandler.invoke(CombinedInterceptorAndDecoratorStackMethodHandler.java:79) [weld-core-impl-3.1.1.Final.jar:3.1.1.Final] at org.jboss.weld.bean.proxy.CombinedInterceptorAndDecoratorStackMethodHandler.invoke(CombinedInterceptorAndDecoratorStackMethodHandler.java:68) [weld-core-impl-3.1.1.Final.jar:3.1.1.Final] at org.ovirt.engine.core.vdsbroker.ResourceManager$Proxy$_$$_WeldSubclass.runVdsCommand(Unknown Source) [vdsbroker.jar:] ... 94 more
2021-07-29 10:30:37,861-07 ERROR [org.ovirt.engine.core.bll.storage.domain.DeactivateStorageDomainCommand] (EE-ManagedThreadFactory-engineScheduled-Thread-25) [35c5b47] Failed to deactivate storage domain 'a5a83df1-47e2-4927-9add-079199ca7ef8' 2021-07-29 10:30:37,868-07 INFO [org.ovirt.engine.core.bll.CommandCompensator] (EE-ManagedThreadFactory-engineScheduled-Thread-25) [35c5b47] Command [id=c63199f8-a720-4053-8e5c-92c8d21e0ce2]: Compensating CHANGED_STATUS_ONLY of org.ovirt.engine.core.common.businessentities.StoragePoolIsoMap; snapshot: EntityStatusSnapshot:{id='StoragePoolIsoMapId:{storagePoolId='f72ec125-69a1-4c1b-a5e1-313fcb70b6ff', storageId='a5a83df1-47e2-4927-9add-079199ca7ef8'}', status='Unknown'}. 2021-07-29 10:30:37,882-07 ERROR [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (EE-ManagedThreadFactory-engineScheduled-Thread-25) [35c5b47] EVENT_ID: USER_DEACTIVATE_STORAGE_DOMAIN_FAILED(969), Failed to deactivate Storage Domain vm-storage-ssd (Data Center EDC2). 2021-07-29 10:30:37,884-07 WARN [org.ovirt.engine.core.bll.storage.pool.ReconstructMasterDomainCommand] (EE-ManagedThreadFactory-engine-Thread-25) [60d33d] Validation of action 'ReconstructMasterDomain' failed for user SYSTEM. Reasons: VAR__ACTION__RECONSTRUCT_MASTER,VAR__TYPE__STORAGE__DOMAIN,ACTION_TYPE_FAILED_STORAGE_DOMAIN_STATUS_ILLEGAL2,$status Locked 2021-07-29 10:30:37,888-07 INFO [org.ovirt.engine.core.bll.eventqueue.EventQueueMonitor] (EE-ManagedThreadFactory-engine-Thread-48) [35c5b47] Finished reconstruct for pool 'f72ec125-69a1-4c1b-a5e1-313fcb70b6ff'. Clearing event queue 2021-07-29 10:30:37,899-07 INFO [org.ovirt.engine.core.vdsbroker.vdsbroker.HSMGetAllTasksStatusesVDSCommand] (EE-ManagedThreadFactory-engineScheduled-Thread-50) [] Command 'org.ovirt.engine.core.vdsbroker.vdsbroker.HSMGetAllTasksStatusesVDSCommand' return value ' TaskStatusListReturn:{status='Status [code=654, message=Not SPM]'}
Thanks, -Matthew
On 7/29/21 2:52 AM, Shani Leviim wrote:
Notice: This message was sent from outside the University of Victoria email system. Please be cautious with links and sensitive information.
Hi Matthew, Actually, your description is related to 2 features available for ovirt 4.4.5 <https://www.ovirt.org/release/4.4.5/> 1. The ability to switch the master storage domain while domains are up and running [1] 2. Clearing the finished tasks from REST API [2] and UI [3].
We recommend you upgrade your engine to enjoy those features.
In the meanwhile, as you've described, moving the Master role from one storage to the other is available using putting the domain into maintenance. In order to clear the finished tasks from SPM: vdsm-client Host getAllTasksStatuses
It should be something like that: { "1dc4d885-577a-4b6a-b01f-e682602a907c": { "code": 0, "message": "1 jobs completed successfully", "taskID": "1dc4d885-577a-4b6a-b01f-e682602a907c", "taskResult": "success", "taskState": "finished" } }
Then clear that tasks: vdsm-client Task clear taskID=12345 Once it gets cleared, the reconstruction can be finished.
To verify there are no more finished async tasks, you can run this SQL query on the engine: engine=# select * from async_tasks WHERE storage_pool_id = '123';
[1] https://bugzilla.redhat.com/show_bug.cgi?id=1910022 <https://bugzilla.redhat.com/show_bug.cgi?id=1910022> [2] https://bugzilla.redhat.com/show_bug.cgi?id=1627997 <https://bugzilla.redhat.com/show_bug.cgi?id=1627997> [3] https://bugzilla.redhat.com/show_bug.cgi?id=1910302 <https://bugzilla.redhat.com/show_bug.cgi?id=1910302>
*Regards, * *Shani Leviim *
On Thu, Jul 29, 2021 at 8:33 AM Matthew Benstead <matthewb@uvic.ca <mailto:matthewb@uvic.ca>> wrote:
Hello,
I'm trying to decommission the old master storage domain in ovirt, and replace it with a new one. All of the VMs have been migrated off of the old master, and everything has been running on the new storage domain for a couple months. But when I try to put the old domain into maintenance mode I get an error.
Old Master: vm-storage-ssd New Domain: vm-storage-ssd2
The error is:
Failed to Reconstruct Master Domain for Data Center EDC2
As well as:
Sync Error on Master Domain between Host daccs01 and oVirt Engine. Domain: vm-storage-ssd is marked as Master in oVirt Engine database but not on the Storage side. Please consult with Support on how to fix this issue.
2021-07-28 11:41:34,870-07 WARN [org.ovirt.engine.core.vdsbroker.irsbroker.IrsProxy] (EE-ManagedThreadFactory-engine-Thread-23) [] Master domain version is not in sync between DB and VDSM. Domain vm-storage-ssd marked as master, but the version in DB: 283 and in VDSM: 280
And:
Not stopping SPM on vds daccs01, pool id f72ec125-69a1-4c1b-a5e1-313fcb70b6ff as there are uncleared tasks Task '5fa9edf0-56c3-40e4-9327-47bf7764d28d', status 'finished'
After a couple minutes all the domains are marked as active again and things continue, but vm-storage-ssd is still listed as the master domain. Any thoughts?
This is on 4.3.10.4-1.el7 on CentOS 7.
engine=# SELECT storage_name, storage_pool_id, storage, status FROM storage_pool_with_storage_domain ORDER BY storage_name; storage_name | storage_pool_id | storage | status -----------------------+--------------------------------------+----------------------------------------+-------- compute1-iscsi-ssd | f72ec125-69a1-4c1b-a5e1-313fcb70b6ff | yvUESE-yWUv-VIWL-qX90-aAq7-gK0I-EqppRL | 1 compute7-iscsi-ssd | f72ec125-69a1-4c1b-a5e1-313fcb70b6ff | 8ekHdv-u0RJ-B0FO-LUUK-wDWs-iaxb-sh3W3J | 1 export-domain-storage | f72ec125-69a1-4c1b-a5e1-313fcb70b6ff | d3932528-6844-481a-bfed-542872ace9e5 | 1 iso-storage | f72ec125-69a1-4c1b-a5e1-313fcb70b6ff | f800b7a6-6a0c-4560-8476-2f294412d87d | 1 vm-storage-7200rpm | f72ec125-69a1-4c1b-a5e1-313fcb70b6ff | a0bff472-1348-4302-a5c7-f1177efa45a9 | 1 vm-storage-ssd | f72ec125-69a1-4c1b-a5e1-313fcb70b6ff | 95acd9a4-a6fb-4208-80dd-1c53d6aacad0 | 1 vm-storage-ssd2 | f72ec125-69a1-4c1b-a5e1-313fcb70b6ff | 829d0600-c3f7-4dae-a749-d7f05c6a6ca4 | 1 (7 rows)
Thanks, -Matthew -- _______________________________________________ Users mailing list -- users@ovirt.org <mailto:users@ovirt.org> To unsubscribe send an email to users-leave@ovirt.org <mailto:users-leave@ovirt.org> Privacy Statement: https://www.ovirt.org/privacy-policy.html <https://www.ovirt.org/privacy-policy.html> oVirt Code of Conduct: https://www.ovirt.org/community/about/community-guidelines/ <https://www.ovirt.org/community/about/community-guidelines/> List Archives: https://lists.ovirt.org/archives/list/users@ovirt.org/message/OXOXW6B2NWXOUG... <https://lists.ovirt.org/archives/list/users@ovirt.org/message/OXOXW6B2NWXOUGZV3OKO4OMDXVDJSQLZ/>

Thanks, Matthew So it seems that both vdsm and engine are aligned with that data. Does your env still fail with reconstruction errors? Once it's "stabilized" (so both domains are up and available), can you please try again putting the current master into maintenance and share the results? *Regards,* *Shani Leviim* On Tue, Aug 3, 2021 at 8:29 PM Matthew Benstead <matthewb@uvic.ca> wrote:
Thanks Shani,
Here's the output from the SPM - it looks like the master version is 288:
[root@compute7 ~]# vdsm-client StoragePool getInfo storagepoolID="f72ec125-69a1-4c1b-a5e1-313fcb70b6ff" { "info": { "name": "No Description", "isoprefix": "/rhev/data-center/mnt/10.0.231.91:_storage_data_projects_ovirt_nobackup_iso-storage/3fc76134-2143-4921-ad36-ee84abca40e8/images/11111111-1111-1111-1111-111111111111",
"pool_status": "connected", "lver": 9356, "spm_id": 6, "master_uuid": "a5a83df1-47e2-4927-9add-079199ca7ef8", "version": "5", "domains": "f73307bc-06c8-4996-86d1-78947cdaf6dd:Attached,d5ae843b-5815-4f3a-b1be-370e56fe0962:Active,a5a83df1-47e2-4927-9add-079199ca7ef8:Active,311c1382-12c2-43a0-96d0-e2084180b114:Active,fc049ebe-03f9-43fc-adca-d6bfeb99c288:Active,3fc76134-2143-4921-ad36-ee84abca40e8:Active,2f2aab43-6ce3-4cb0-9142-b2b57e5083b3:Active",
"type": "GLUSTERFS", "master_ver": 288 }, "dominfo": { "f73307bc-06c8-4996-86d1-78947cdaf6dd": { "status": "Attached", "isoprefix": "", "alerts": [] }, "d5ae843b-5815-4f3a-b1be-370e56fe0962": { "status": "Active", "diskfree": "94847578931200", "isoprefix": "", "alerts": [], "disktotal": "359981635338240", "version": 0 }, "a5a83df1-47e2-4927-9add-079199ca7ef8": { "status": "Active", "diskfree": "708888707072", "isoprefix": "", "alerts": [], "disktotal": "751252275200", "version": 5 }, "311c1382-12c2-43a0-96d0-e2084180b114": { "status": "Active", "diskfree": "1598335766528", "isoprefix": "", "alerts": [], "disktotal": "2197949513728", "version": 5 }, "2f2aab43-6ce3-4cb0-9142-b2b57e5083b3": { "status": "Active", "diskfree": "1416265465856", "isoprefix": "", "alerts": [], "disktotal": "3837687496704", "version": 5 }, "3fc76134-2143-4921-ad36-ee84abca40e8": { "status": "Active", "diskfree": "94847578931200", "isoprefix": "/rhev/data-center/mnt/10.0.231.91:_storage_data_projects_ovirt_nobackup_iso-storage/3fc76134-2143-4921-ad36-ee84abca40e8/images/11111111-1111-1111-1111-111111111111",
"alerts": [], "disktotal": "359981635338240", "version": 0 }, "fc049ebe-03f9-43fc-adca-d6bfeb99c288": { "status": "Active", "diskfree": "1337882312704", "isoprefix": "", "alerts": [], "disktotal": "3837687496704", "version": 5 } } }
And the master domain version for the vm-storage-ssd domain is 288 in the database as well:
engine=# select * from storage_pool where id = 'f72ec125-69a1-4c1b-a5e1-313fcb70b6ff'; id | name | description | storage_pool_type | storage_pool_format_type | status | master_domain_version | spm_vds_id | compatibility_version | _create_date | _update_date | quota_enforcement_type | free_text_comment | is_local
--------------------------------------+------+-------------+-------------------+--------------------------+--------+-----------------------+--------------------------------------+-----------------------+-------------------------------+-------------------------------+------------------------+-------------------+---------- f72ec125-69a1-4c1b-a5e1-313fcb70b6ff | EDC2 | | | 5 | 1 | 288 | 51769733-0cf6-4270-8288-ec96474b7609 | 4.3 | 2015-08-10 20:51:03.831215-07 | 2021-07-29 10:31:02.234262-07 | 0 | | f (1 row)
Here is the master storage domain details: engine=# select * from storage_domains where storage_pool_id='f72ec125-69a1-4c1b-a5e1-313fcb70b6ff' and storage_domain_type='0'; id | storage | storage_name | storage_description | storage_comment | storage_pool_id | available_disk_size | confirmed_available_disk_size | vdo_savings | used_disk_size | commited_disk_size | actual_images_size | status | storage_pool_name | storage_type | storage_domain_type | storage_domain_format_type | last_time_used_as_master | wipe_after_delete | discard_after_delete | first_metadata_device | vg_metadata_device | backup | block_size | storage_domain_shared_status | recoverable | contains_unregistered_entities | warning_low_space_indicator | critical_space_action_blocker | warning_low_confirmed_space_indicator | external_status | supports_discard | is_hosted_engine_storage
--------------------------------------+--------------------------------------+----------------+---------------------+-----------------+--------------------------------------+---------------------+-------------------------------+-------------+----------------+--------------------+--------------------+--------+-------------------+--------------+---------------------+----------------------------+--------------------------+-------------------+----------------------+-----------------------+--------------------+--------+------------+------------------------------+-------------+--------------------------------+-----------------------------+-------------------------------+---------------------------------------+-----------------+------------------+-------------------------- a5a83df1-47e2-4927-9add-079199ca7ef8 | 95acd9a4-a6fb-4208-80dd-1c53d6aacad0 | vm-storage-ssd | | | f72ec125-69a1-4c1b-a5e1-313fcb70b6ff | 660 | | | 39 | 0 | 0 | 3 | EDC2 | 7 | 0 | 5 | 1627497705160 | f | f | | | f | 512 | 1 | t | f | 10 | 5 | 0 | 0 | | f (1 row)
This is the domain we want to switch over to the master domain so I can decommission the old one.
engine=# select * from storage_domains where storage_pool_id='f72ec125-69a1-4c1b-a5e1-313fcb70b6ff' and storage_name = 'vm-storage-ssd2'; id | storage | storage_name | storage_description | storage_comment | storage_pool_id | available_disk_size | confirmed_available_disk_size | vdo_savings | used_disk_size | commited_disk_size | actual_images_size | status | storage_pool_name | storage_type | storage_domain_type | storage_domain_format_type | last_time_used_as_master | wipe_after_delete | discard_after_delete | first_metadata_device | vg_metadata_device | backup | block_size | storage_domain_shared_status | recoverable | contains_unregistered_entities | warning_low_space_indicator | critical_space_action_blocker | warning_low_confirmed_space_indicator | external_status | supports_discard | is_hosted_engine_storage
--------------------------------------+--------------------------------------+-----------------+----------------------------+-----------------+--------------------------------------+---------------------+-------------------------------+-------------+----------------+--------------------+--------------------+--------+-------------------+--------------+---------------------+----------------------------+------------------
--------+-------------------+----------------------+-----------------------+--------------------+--------+------------+------------------------------+-------------+--------------------------------+-----------------------------+-------------------------------+---------------------------------------+-----------------+------------------+-------------------------- 311c1382-12c2-43a0-96d0-e2084180b114 | 829d0600-c3f7-4dae-a749-d7f05c6a6ca4 | vm-storage-ssd2 | Storage01,02,03 vm-storage | | f72ec125-69a1-4c1b-a5e1-313fcb70b6ff | 1488 | | | 559 | 1147 | 538 | 3 | EDC2 | 7 | 1 | 5 | 1627497694904 | f | f | | | f | 512 | 1 | t | f | 10 | 5 | 10 | 0 | | f (1 row)
Thanks, -Matthew
On 8/1/21 2:00 AM, Shani Leviim wrote:
Notice: This message was sent from outside the University of Victoria email system. Please be cautious with links and sensitive information.
Hi Matthew,
You might need to sync back the master version and domain between the engine and vdsm. To verify those parameters on vdsm, run this command on the SPM host: vdsm-client StoragePool getInfo storagepoolID="f72ec125-69a1-4c1b-a5e1-313fcb70b6ff"
The result should be something like: "info": { "domains": "1234:Active,5678:Active,91011:Active", "isoprefix": "", "lver": 6,
* "master_uuid": "123", "master_ver": 14,* "name": "No Description", "pool_status": "connected", "spm_id": 1, "type": "NFS", "version": "5" }
Then, compare the master version value with the engine: engine=> select * from storage_pool where id = 'f72ec125-69a1-4c1b-a5e1-313fcb70b6ff';
And the master domain: engine=> select * from storage_domains where storage_pool_id='f72ec125-69a1-4c1b-a5e1-313fcb70b6ff' and storage_domain_type='0';
(0 means master, for reference, see https://github.com/oVirt/ovirt-engine/blob/a65cf0eae8858ab2278c3f537dc427e3f... )
Then we can get the bigger picture (and update the engine data to match the vdsm)
*Regards, *
*Shani Leviim *
On Thu, Jul 29, 2021 at 8:40 PM Matthew Benstead <matthewb@uvic.ca> wrote:
Thanks Shani - yes we plan to upgrade to 4.4 in the future, but we're on 4.3 right now due to only running CentOS 7 at the moment.
I was able to clear the job from the SPM:
[root@daccs01 ~]# vdsm-client Host getAllTasksStatuses { "5fa9edf0-56c3-40e4-9327-47bf7764d28d": { "message": "1 jobs completed successfully", "code": 0, "taskID": "5fa9edf0-56c3-40e4-9327-47bf7764d28d", "taskResult": "success", "taskState": "finished" } } [root@daccs01 ~]# vdsm-client Task clear taskID=5fa9edf0-56c3-40e4-9327-47bf7764d28d true [root@daccs01 ~]# vdsm-client Host getAllTasksStatuses {}
And confirm there were no async_tasks:
engine=# select * from async_tasks; task_id | action_type | status | result | step_id | command_id | started_at | storage_pool_id | task_type | vdsm_task_id | root_command_id | user_id
---------+-------------+--------+--------+---------+------------+------------+-----------------+-----------+--------------+-----------------+--------- (0 rows)
However, when putting the vm-storage-ssd domain into maintenance mode, it failed again:
Here are some the logs entries - anything else I can look at?
2021-07-29 10:30:37,848-07 ERROR [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (EE-ManagedThreadFactory-engineScheduled-Thread-25) [35c5b47] EVENT_ID: VDS_BROKER_COMMAND_FAILURE(10,802), VDSM compute7.pcic.uvic.ca command ConnectStoragePoolVDS failed: Wrong Master domain or its version: u'SD=a5a83df1-47e2-4927-9add-079199ca7ef8, pool=f72ec125-69a1-4c1b-a5e1-313fcb70b6ff' 2021-07-29 10:30:37,848-07 INFO [org.ovirt.engine.core.vdsbroker.vdsbroker.ConnectStoragePoolVDSCommand] (EE-ManagedThreadFactory-engineScheduled-Thread-25) [35c5b47] Command 'org.ovirt.engine.core.vdsbroker.vd sbroker.ConnectStoragePoolVDSCommand' return value 'StatusOnlyReturn [status=Status [code=324, message=Wrong Master domain or its version: u'SD=a5a83df1-47e2-4927-9add-079199ca7ef8, pool=f72ec125-69a1-4c1b-a5e1- 313fcb70b6ff']]' ... 2021-07-29 10:30:37,848-07 INFO [org.ovirt.engine.core.vdsbroker.vdsbroker.ConnectStoragePoolVDSCommand] (EE-ManagedThreadFactory-engineScheduled-Thread-25) [35c5b47] HostName = compute7.pcic.uvic.ca 2021-07-29 10:30:37,849-07 ERROR [org.ovirt.engine.core.vdsbroker.vdsbroker.ConnectStoragePoolVDSCommand] (EE-ManagedThreadFactory-engineScheduled-Thread-25) [35c5b47] Command 'ConnectStoragePoolVDSCommand(HostN ame = compute7.pcic.uvic.ca, ConnectStoragePoolVDSCommandParameters:{hostId='51769733-0cf6-4270-8288-ec96474b7609', vdsId='51769733-0cf6-4270-8288-ec96474b7609', storagePoolId='f72ec125-69a1-4c1b-a5e1-313fcb70b6 ff', masterVersion='288'})' execution failed: IRSGenericException: IRSErrorException: IRSNoMasterDomainException: Wrong Master domain or its version: u'SD=a5a83df1-47e2-4927-9add-079199ca7ef8, pool=f72ec125-69a1 -4c1b-a5e1-313fcb70b6ff' ... 2021-07-29 10:30:37,849-07 ERROR [org.ovirt.engine.core.vdsbroker.irsbroker.IrsBrokerCommand] (EE-ManagedThreadFactory-engineScheduled-Thread-25) [35c5b47] IrsBroker::Failed::DeactivateStorageDomainVDS: IRSGener icException: IRSErrorException: IRSNoMasterDomainException: Wrong Master domain or its version: u'SD=a5a83df1-47e2-4927-9add-079199ca7ef8, pool=f72ec125-69a1-4c1b-a5e1-313fcb70b6ff' 2021-07-29 10:30:37,855-07 INFO [org.ovirt.engine.core.vdsbroker.irsbroker.DeactivateStorageDomainVDSCommand] (EE-ManagedThreadFactory-engineScheduled-Thread-25) [35c5b47] FINISH, DeactivateStorageDomainVDSComm and, return: , log id: 1c215ca4 2021-07-29 10:30:37,855-07 ERROR [org.ovirt.engine.core.bll.storage.domain.DeactivateStorageDomainCommand] (EE-ManagedThreadFactory-engineScheduled-Thread-25) [35c5b47] DeactivateStorageDomainVDS failed 'a5a83df 1-47e2-4927-9add-079199ca7ef8': org.ovirt.engine.core.common.errors.EngineException: EngineException: org.ovirt.engine.core.vdsbroker.irsbroker.IRSNoMasterDomainException: IRSGenericException: IRSErrorException: IRSNoMasterDomainException: Wrong Master domain or its version: u'SD=a5a83df1-47e2-4927-9add-079199ca7ef8, pool=f72ec125-69a1-4c1b-a5e1-313fcb70b6ff' (Failed with error StoragePoolWrongMaster and code 324) at org.ovirt.engine.core.bll.VdsHandler.handleVdsResult(VdsHandler.java:118) [bll.jar:] at org.ovirt.engine.core.bll.VDSBrokerFrontendImpl.runVdsCommand(VDSBrokerFrontendImpl.java:33) [bll.jar:] at org.ovirt.engine.core.bll.CommandBase.runVdsCommand(CommandBase.java:2112) [bll.jar:] at org.ovirt.engine.core.bll.storage.domain.DeactivateStorageDomainCommand.dectivateStorageDomain(DeactivateStorageDomainCommand.java:340) [bll.jar:] ... at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) [rt.jar:1.8.0_292] at java.lang.Thread.run(Thread.java:748) [rt.jar:1.8.0_292] at org.glassfish.enterprise.concurrent.ManagedThreadFactoryImpl$ManagedThread.run(ManagedThreadFactoryImpl.java:250) [javax.enterprise.concurrent-1.0.jar:] Caused by: org.ovirt.engine.core.vdsbroker.irsbroker.IRSNoMasterDomainException: IRSGenericException: IRSErrorException: IRSNoMasterDomainException: Wrong Master domain or its version: u'SD=a5a83df1-47e2-4927-9add-079199ca7ef8, pool=f72ec125-69a1-4c1b-a5e1-313fcb70b6ff' at org.ovirt.engine.core.vdsbroker.vdsbroker.BrokerCommandBase.proceedProxyReturnValue(BrokerCommandBase.java:50) [vdsbroker.jar:] at org.ovirt.engine.core.vdsbroker.vdsbroker.ConnectStoragePoolVDSCommand.proceedConnectProxyReturnValue(ConnectStoragePoolVDSCommand.java:48) [vdsbroker.jar:] at org.ovirt.engine.core.vdsbroker.vdsbroker.ConnectStoragePoolVDSCommand.proceedProxyReturnValue(ConnectStoragePoolVDSCommand.java:36) [vdsbroker.jar:] ... at org.jboss.weld.bean.proxy.CombinedInterceptorAndDecoratorStackMethodHandler.invoke(CombinedInterceptorAndDecoratorStackMethodHandler.java:79) [weld-core-impl-3.1.1.Final.jar:3.1.1.Final] at org.jboss.weld.bean.proxy.CombinedInterceptorAndDecoratorStackMethodHandler.invoke(CombinedInterceptorAndDecoratorStackMethodHandler.java:68) [weld-core-impl-3.1.1.Final.jar:3.1.1.Final] at org.ovirt.engine.core.vdsbroker.ResourceManager$Proxy$_$$_WeldSubclass.runVdsCommand(Unknown Source) [vdsbroker.jar:] ... 94 more
2021-07-29 10:30:37,861-07 ERROR [org.ovirt.engine.core.bll.storage.domain.DeactivateStorageDomainCommand] (EE-ManagedThreadFactory-engineScheduled-Thread-25) [35c5b47] Failed to deactivate storage domain 'a5a83df1-47e2-4927-9add-079199ca7ef8' 2021-07-29 10:30:37,868-07 INFO [org.ovirt.engine.core.bll.CommandCompensator] (EE-ManagedThreadFactory-engineScheduled-Thread-25) [35c5b47] Command [id=c63199f8-a720-4053-8e5c-92c8d21e0ce2]: Compensating CHANGED_STATUS_ONLY of org.ovirt.engine.core.common.businessentities.StoragePoolIsoMap; snapshot: EntityStatusSnapshot:{id='StoragePoolIsoMapId:{storagePoolId='f72ec125-69a1-4c1b-a5e1-313fcb70b6ff', storageId='a5a83df1-47e2-4927-9add-079199ca7ef8'}', status='Unknown'}. 2021-07-29 10:30:37,882-07 ERROR [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (EE-ManagedThreadFactory-engineScheduled-Thread-25) [35c5b47] EVENT_ID: USER_DEACTIVATE_STORAGE_DOMAIN_FAILED(969), Failed to deactivate Storage Domain vm-storage-ssd (Data Center EDC2). 2021-07-29 10:30:37,884-07 WARN [org.ovirt.engine.core.bll.storage.pool.ReconstructMasterDomainCommand] (EE-ManagedThreadFactory-engine-Thread-25) [60d33d] Validation of action 'ReconstructMasterDomain' failed for user SYSTEM. Reasons: VAR__ACTION__RECONSTRUCT_MASTER,VAR__TYPE__STORAGE__DOMAIN,ACTION_TYPE_FAILED_STORAGE_DOMAIN_STATUS_ILLEGAL2,$status Locked 2021-07-29 10:30:37,888-07 INFO [org.ovirt.engine.core.bll.eventqueue.EventQueueMonitor] (EE-ManagedThreadFactory-engine-Thread-48) [35c5b47] Finished reconstruct for pool 'f72ec125-69a1-4c1b-a5e1-313fcb70b6ff'. Clearing event queue 2021-07-29 10:30:37,899-07 INFO [org.ovirt.engine.core.vdsbroker.vdsbroker.HSMGetAllTasksStatusesVDSCommand] (EE-ManagedThreadFactory-engineScheduled-Thread-50) [] Command 'org.ovirt.engine.core.vdsbroker.vdsbroker.HSMGetAllTasksStatusesVDSCommand' return value ' TaskStatusListReturn:{status='Status [code=654, message=Not SPM]'}
Thanks, -Matthew
On 7/29/21 2:52 AM, Shani Leviim wrote:
Notice: This message was sent from outside the University of Victoria email system. Please be cautious with links and sensitive information.
Hi Matthew, Actually, your description is related to 2 features available for ovirt 4.4.5 <https://www.ovirt.org/release/4.4.5/> 1. The ability to switch the master storage domain while domains are up and running [1] 2. Clearing the finished tasks from REST API [2] and UI [3].
We recommend you upgrade your engine to enjoy those features.
In the meanwhile, as you've described, moving the Master role from one storage to the other is available using putting the domain into maintenance. In order to clear the finished tasks from SPM: vdsm-client Host getAllTasksStatuses
It should be something like that: { "1dc4d885-577a-4b6a-b01f-e682602a907c": { "code": 0, "message": "1 jobs completed successfully", "taskID": "1dc4d885-577a-4b6a-b01f-e682602a907c", "taskResult": "success", "taskState": "finished" } }
Then clear that tasks: vdsm-client Task clear taskID=12345 Once it gets cleared, the reconstruction can be finished.
To verify there are no more finished async tasks, you can run this SQL query on the engine: engine=# select * from async_tasks WHERE storage_pool_id = '123';
[1] https://bugzilla.redhat.com/show_bug.cgi?id=1910022 [2] https://bugzilla.redhat.com/show_bug.cgi?id=1627997 [3] https://bugzilla.redhat.com/show_bug.cgi?id=1910302
*Regards, *
*Shani Leviim *
On Thu, Jul 29, 2021 at 8:33 AM Matthew Benstead <matthewb@uvic.ca> wrote:
Hello,
I'm trying to decommission the old master storage domain in ovirt, and replace it with a new one. All of the VMs have been migrated off of the old master, and everything has been running on the new storage domain for a couple months. But when I try to put the old domain into maintenance mode I get an error.
Old Master: vm-storage-ssd New Domain: vm-storage-ssd2
The error is:
Failed to Reconstruct Master Domain for Data Center EDC2
As well as:
Sync Error on Master Domain between Host daccs01 and oVirt Engine. Domain: vm-storage-ssd is marked as Master in oVirt Engine database but not on the Storage side. Please consult with Support on how to fix this issue.
2021-07-28 11:41:34,870-07 WARN [org.ovirt.engine.core.vdsbroker.irsbroker.IrsProxy] (EE-ManagedThreadFactory-engine-Thread-23) [] Master domain version is not in sync between DB and VDSM. Domain vm-storage-ssd marked as master, but the version in DB: 283 and in VDSM: 280
And:
Not stopping SPM on vds daccs01, pool id f72ec125-69a1-4c1b-a5e1-313fcb70b6ff as there are uncleared tasks Task '5fa9edf0-56c3-40e4-9327-47bf7764d28d', status 'finished'
After a couple minutes all the domains are marked as active again and things continue, but vm-storage-ssd is still listed as the master domain. Any thoughts?
This is on 4.3.10.4-1.el7 on CentOS 7.
engine=# SELECT storage_name, storage_pool_id, storage, status FROM storage_pool_with_storage_domain ORDER BY storage_name; storage_name | storage_pool_id | storage | status
-----------------------+--------------------------------------+----------------------------------------+-------- compute1-iscsi-ssd | f72ec125-69a1-4c1b-a5e1-313fcb70b6ff | yvUESE-yWUv-VIWL-qX90-aAq7-gK0I-EqppRL | 1 compute7-iscsi-ssd | f72ec125-69a1-4c1b-a5e1-313fcb70b6ff | 8ekHdv-u0RJ-B0FO-LUUK-wDWs-iaxb-sh3W3J | 1 export-domain-storage | f72ec125-69a1-4c1b-a5e1-313fcb70b6ff | d3932528-6844-481a-bfed-542872ace9e5 | 1 iso-storage | f72ec125-69a1-4c1b-a5e1-313fcb70b6ff | f800b7a6-6a0c-4560-8476-2f294412d87d | 1 vm-storage-7200rpm | f72ec125-69a1-4c1b-a5e1-313fcb70b6ff | a0bff472-1348-4302-a5c7-f1177efa45a9 | 1 vm-storage-ssd | f72ec125-69a1-4c1b-a5e1-313fcb70b6ff | 95acd9a4-a6fb-4208-80dd-1c53d6aacad0 | 1 vm-storage-ssd2 | f72ec125-69a1-4c1b-a5e1-313fcb70b6ff | 829d0600-c3f7-4dae-a749-d7f05c6a6ca4 | 1 (7 rows)
Thanks, -Matthew -- _______________________________________________ Users mailing list -- users@ovirt.org To unsubscribe send an email to users-leave@ovirt.org Privacy Statement: https://www.ovirt.org/privacy-policy.html oVirt Code of Conduct: https://www.ovirt.org/community/about/community-guidelines/ List Archives: https://lists.ovirt.org/archives/list/users@ovirt.org/message/OXOXW6B2NWXOUG...

Thanks Shani - Unfortunately it still fails. I had been been hoping to make the change without an outage, but if I shutdown the VMs and put all the storage domains into maintenance mode, and then activate the new storage domain (that I want to be the master) first, would that work? Or is there some kind of transfer that needs to take place? Before: During: Errors: Thanks, -Matthew On 8/4/21 2:46 AM, Shani Leviim wrote:
Notice: This message was sent from outside the University of Victoria email system. Please be cautious with links and sensitive information.
Thanks, Matthew So it seems that both vdsm and engine are aligned with that data. Does your env still fail with reconstruction errors?
Once it's "stabilized" (so both domains are up and available), can you please try again putting the current master into maintenance and share the results?
*Regards, * *Shani Leviim *
On Tue, Aug 3, 2021 at 8:29 PM Matthew Benstead <matthewb@uvic.ca <mailto:matthewb@uvic.ca>> wrote:
Thanks Shani,
Here's the output from the SPM - it looks like the master version is 288:
[root@compute7 ~]# vdsm-client StoragePool getInfo storagepoolID="f72ec125-69a1-4c1b-a5e1-313fcb70b6ff" { "info": { "name": "No Description", "isoprefix": "/rhev/data-center/mnt/10.0.231.91:_storage_data_projects_ovirt_nobackup_iso-storage/3fc76134-2143-4921-ad36-ee84abca40e8/images/11111111-1111-1111-1111-111111111111", "pool_status": "connected", "lver": 9356, "spm_id": 6, "master_uuid": "a5a83df1-47e2-4927-9add-079199ca7ef8", "version": "5", "domains": "f73307bc-06c8-4996-86d1-78947cdaf6dd:Attached,d5ae843b-5815-4f3a-b1be-370e56fe0962:Active,a5a83df1-47e2-4927-9add-079199ca7ef8:Active,311c1382-12c2-43a0-96d0-e2084180b114:Active,fc049ebe-03f9-43fc-adca-d6bfeb99c288:Active,3fc76134-2143-4921-ad36-ee84abca40e8:Active,2f2aab43-6ce3-4cb0-9142-b2b57e5083b3:Active", "type": "GLUSTERFS", "master_ver": 288 }, "dominfo": { "f73307bc-06c8-4996-86d1-78947cdaf6dd": { "status": "Attached", "isoprefix": "", "alerts": [] }, "d5ae843b-5815-4f3a-b1be-370e56fe0962": { "status": "Active", "diskfree": "94847578931200", "isoprefix": "", "alerts": [], "disktotal": "359981635338240", "version": 0 }, "a5a83df1-47e2-4927-9add-079199ca7ef8": { "status": "Active", "diskfree": "708888707072", "isoprefix": "", "alerts": [], "disktotal": "751252275200", "version": 5 }, "311c1382-12c2-43a0-96d0-e2084180b114": { "status": "Active", "diskfree": "1598335766528", "isoprefix": "", "alerts": [], "disktotal": "2197949513728", "version": 5 }, "2f2aab43-6ce3-4cb0-9142-b2b57e5083b3": { "status": "Active", "diskfree": "1416265465856", "isoprefix": "", "alerts": [], "disktotal": "3837687496704", "version": 5 }, "3fc76134-2143-4921-ad36-ee84abca40e8": { "status": "Active", "diskfree": "94847578931200", "isoprefix": "/rhev/data-center/mnt/10.0.231.91:_storage_data_projects_ovirt_nobackup_iso-storage/3fc76134-2143-4921-ad36-ee84abca40e8/images/11111111-1111-1111-1111-111111111111", "alerts": [], "disktotal": "359981635338240", "version": 0 }, "fc049ebe-03f9-43fc-adca-d6bfeb99c288": { "status": "Active", "diskfree": "1337882312704", "isoprefix": "", "alerts": [], "disktotal": "3837687496704", "version": 5 } } }
And the master domain version for the vm-storage-ssd domain is 288 in the database as well:
engine=# select * from storage_pool where id = 'f72ec125-69a1-4c1b-a5e1-313fcb70b6ff'; id | name | description | storage_pool_type | storage_pool_format_type | status | master_domain_version | spm_vds_id | compatibility_version | _create_date | _update_date | quota_enforcement_type | free_text_comment | is_local --------------------------------------+------+-------------+-------------------+--------------------------+--------+-----------------------+--------------------------------------+-----------------------+-------------------------------+-------------------------------+------------------------+-------------------+---------- f72ec125-69a1-4c1b-a5e1-313fcb70b6ff | EDC2 | | | 5 | 1 | 288 | 51769733-0cf6-4270-8288-ec96474b7609 | 4.3 | 2015-08-10 20:51:03.831215-07 | 2021-07-29 10:31:02.234262-07 | 0 | | f (1 row)
Here is the master storage domain details: engine=# select * from storage_domains where storage_pool_id='f72ec125-69a1-4c1b-a5e1-313fcb70b6ff' and storage_domain_type='0'; id | storage | storage_name | storage_description | storage_comment | storage_pool_id | available_disk_size | confirmed_available_disk_size | vdo_savings | used_disk_size | commited_disk_size | actual_images_size | status | storage_pool_name | storage_type | storage_domain_type | storage_domain_format_type | last_time_used_as_master | wipe_after_delete | discard_after_delete | first_metadata_device | vg_metadata_device | backup | block_size | storage_domain_shared_status | recoverable | contains_unregistered_entities | warning_low_space_indicator | critical_space_action_blocker | warning_low_confirmed_space_indicator | external_status | supports_discard | is_hosted_engine_storage --------------------------------------+--------------------------------------+----------------+---------------------+-----------------+--------------------------------------+---------------------+-------------------------------+-------------+----------------+--------------------+--------------------+--------+-------------------+--------------+---------------------+----------------------------+--------------------------+-------------------+----------------------+-----------------------+--------------------+--------+------------+------------------------------+-------------+--------------------------------+-----------------------------+-------------------------------+---------------------------------------+-----------------+------------------+-------------------------- a5a83df1-47e2-4927-9add-079199ca7ef8 | 95acd9a4-a6fb-4208-80dd-1c53d6aacad0 | vm-storage-ssd | | | f72ec125-69a1-4c1b-a5e1-313fcb70b6ff | 660 | | | 39 | 0 | 0 | 3 | EDC2 | 7 | 0 | 5 | 1627497705160 | f | f | | | f | 512 | 1 | t | f | 10 | 5 | 0 | 0 | | f (1 row)
This is the domain we want to switch over to the master domain so I can decommission the old one.
engine=# select * from storage_domains where storage_pool_id='f72ec125-69a1-4c1b-a5e1-313fcb70b6ff' and storage_name = 'vm-storage-ssd2'; id | storage | storage_name | storage_description | storage_comment | storage_pool_id | available_disk_size | confirmed_available_disk_size | vdo_savings | used_disk_size | commited_disk_size | actual_images_size | status | storage_pool_name | storage_type | storage_domain_type | storage_domain_format_type | last_time_used_as_master | wipe_after_delete | discard_after_delete | first_metadata_device | vg_metadata_device | backup | block_size | storage_domain_shared_status | recoverable | contains_unregistered_entities | warning_low_space_indicator | critical_space_action_blocker | warning_low_confirmed_space_indicator | external_status | supports_discard | is_hosted_engine_storage --------------------------------------+--------------------------------------+-----------------+----------------------------+-----------------+--------------------------------------+---------------------+-------------------------------+-------------+----------------+--------------------+--------------------+--------+-------------------+--------------+---------------------+----------------------------+------------------ --------+-------------------+----------------------+-----------------------+--------------------+--------+------------+------------------------------+-------------+--------------------------------+-----------------------------+-------------------------------+---------------------------------------+-----------------+------------------+-------------------------- 311c1382-12c2-43a0-96d0-e2084180b114 | 829d0600-c3f7-4dae-a749-d7f05c6a6ca4 | vm-storage-ssd2 | Storage01,02,03 vm-storage | | f72ec125-69a1-4c1b-a5e1-313fcb70b6ff | 1488 | | | 559 | 1147 | 538 | 3 | EDC2 | 7 | 1 | 5 | 1627497694904 | f | f | | | f | 512 | 1 | t | f | 10 | 5 | 10 | 0 | | f (1 row)
Thanks, -Matthew
On 8/1/21 2:00 AM, Shani Leviim wrote:
Notice: This message was sent from outside the University of Victoria email system. Please be cautious with links and sensitive information.
Hi Matthew,
You might need to sync back the master version and domain between the engine and vdsm. To verify those parameters on vdsm, run this command on the SPM host: vdsm-client StoragePool getInfo storagepoolID="f72ec125-69a1-4c1b-a5e1-313fcb70b6ff"
The result should be something like: "info": { "domains": "1234:Active,5678:Active,91011:Active", "isoprefix": "", "lver": 6, * "master_uuid": "123", "master_ver": 14,* "name": "No Description", "pool_status": "connected", "spm_id": 1, "type": "NFS", "version": "5" }
Then, compare the master version value with the engine: engine=> select * from storage_pool where id = 'f72ec125-69a1-4c1b-a5e1-313fcb70b6ff';
And the master domain: engine=> select * from storage_domains where storage_pool_id='f72ec125-69a1-4c1b-a5e1-313fcb70b6ff' and storage_domain_type='0';
(0 means master, for reference, see https://github.com/oVirt/ovirt-engine/blob/a65cf0eae8858ab2278c3f537dc427e3f... <https://github.com/oVirt/ovirt-engine/blob/a65cf0eae8858ab2278c3f537dc427e3ff20eba7/backend/manager/modules/common/src/main/java/org/ovirt/engine/core/common/businessentities/StorageDomainType.java>)
Then we can get the bigger picture (and update the engine data to match the vdsm)
*Regards, * *Shani Leviim *
On Thu, Jul 29, 2021 at 8:40 PM Matthew Benstead <matthewb@uvic.ca <mailto:matthewb@uvic.ca>> wrote:
Thanks Shani - yes we plan to upgrade to 4.4 in the future, but we're on 4.3 right now due to only running CentOS 7 at the moment.
I was able to clear the job from the SPM:
[root@daccs01 ~]# vdsm-client Host getAllTasksStatuses { "5fa9edf0-56c3-40e4-9327-47bf7764d28d": { "message": "1 jobs completed successfully", "code": 0, "taskID": "5fa9edf0-56c3-40e4-9327-47bf7764d28d", "taskResult": "success", "taskState": "finished" } } [root@daccs01 ~]# vdsm-client Task clear taskID=5fa9edf0-56c3-40e4-9327-47bf7764d28d true [root@daccs01 ~]# vdsm-client Host getAllTasksStatuses {}
And confirm there were no async_tasks:
engine=# select * from async_tasks; task_id | action_type | status | result | step_id | command_id | started_at | storage_pool_id | task_type | vdsm_task_id | root_command_id | user_id ---------+-------------+--------+--------+---------+------------+------------+-----------------+-----------+--------------+-----------------+--------- (0 rows)
However, when putting the vm-storage-ssd domain into maintenance mode, it failed again:
Here are some the logs entries - anything else I can look at?
2021-07-29 10:30:37,848-07 ERROR [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (EE-ManagedThreadFactory-engineScheduled-Thread-25) [35c5b47] EVENT_ID: VDS_BROKER_COMMAND_FAILURE(10,802), VDSM compute7.pcic.uvic.ca <http://compute7.pcic.uvic.ca> command ConnectStoragePoolVDS failed: Wrong Master domain or its version: u'SD=a5a83df1-47e2-4927-9add-079199ca7ef8, pool=f72ec125-69a1-4c1b-a5e1-313fcb70b6ff' 2021-07-29 10:30:37,848-07 INFO [org.ovirt.engine.core.vdsbroker.vdsbroker.ConnectStoragePoolVDSCommand] (EE-ManagedThreadFactory-engineScheduled-Thread-25) [35c5b47] Command 'org.ovirt.engine.core.vdsbroker.vd sbroker.ConnectStoragePoolVDSCommand' return value 'StatusOnlyReturn [status=Status [code=324, message=Wrong Master domain or its version: u'SD=a5a83df1-47e2-4927-9add-079199ca7ef8, pool=f72ec125-69a1-4c1b-a5e1- 313fcb70b6ff']]' ... 2021-07-29 10:30:37,848-07 INFO [org.ovirt.engine.core.vdsbroker.vdsbroker.ConnectStoragePoolVDSCommand] (EE-ManagedThreadFactory-engineScheduled-Thread-25) [35c5b47] HostName = compute7.pcic.uvic.ca <http://compute7.pcic.uvic.ca> 2021-07-29 10:30:37,849-07 ERROR [org.ovirt.engine.core.vdsbroker.vdsbroker.ConnectStoragePoolVDSCommand] (EE-ManagedThreadFactory-engineScheduled-Thread-25) [35c5b47] Command 'ConnectStoragePoolVDSCommand(HostN ame = compute7.pcic.uvic.ca <http://compute7.pcic.uvic.ca>, ConnectStoragePoolVDSCommandParameters:{hostId='51769733-0cf6-4270-8288-ec96474b7609', vdsId='51769733-0cf6-4270-8288-ec96474b7609', storagePoolId='f72ec125-69a1-4c1b-a5e1-313fcb70b6 ff', masterVersion='288'})' execution failed: IRSGenericException: IRSErrorException: IRSNoMasterDomainException: Wrong Master domain or its version: u'SD=a5a83df1-47e2-4927-9add-079199ca7ef8, pool=f72ec125-69a1 -4c1b-a5e1-313fcb70b6ff' ... 2021-07-29 10:30:37,849-07 ERROR [org.ovirt.engine.core.vdsbroker.irsbroker.IrsBrokerCommand] (EE-ManagedThreadFactory-engineScheduled-Thread-25) [35c5b47] IrsBroker::Failed::DeactivateStorageDomainVDS: IRSGener icException: IRSErrorException: IRSNoMasterDomainException: Wrong Master domain or its version: u'SD=a5a83df1-47e2-4927-9add-079199ca7ef8, pool=f72ec125-69a1-4c1b-a5e1-313fcb70b6ff' 2021-07-29 10:30:37,855-07 INFO [org.ovirt.engine.core.vdsbroker.irsbroker.DeactivateStorageDomainVDSCommand] (EE-ManagedThreadFactory-engineScheduled-Thread-25) [35c5b47] FINISH, DeactivateStorageDomainVDSComm and, return: , log id: 1c215ca4 2021-07-29 10:30:37,855-07 ERROR [org.ovirt.engine.core.bll.storage.domain.DeactivateStorageDomainCommand] (EE-ManagedThreadFactory-engineScheduled-Thread-25) [35c5b47] DeactivateStorageDomainVDS failed 'a5a83df 1-47e2-4927-9add-079199ca7ef8': org.ovirt.engine.core.common.errors.EngineException: EngineException: org.ovirt.engine.core.vdsbroker.irsbroker.IRSNoMasterDomainException: IRSGenericException: IRSErrorException: IRSNoMasterDomainException: Wrong Master domain or its version: u'SD=a5a83df1-47e2-4927-9add-079199ca7ef8, pool=f72ec125-69a1-4c1b-a5e1-313fcb70b6ff' (Failed with error StoragePoolWrongMaster and code 324) at org.ovirt.engine.core.bll.VdsHandler.handleVdsResult(VdsHandler.java:118) [bll.jar:] at org.ovirt.engine.core.bll.VDSBrokerFrontendImpl.runVdsCommand(VDSBrokerFrontendImpl.java:33) [bll.jar:] at org.ovirt.engine.core.bll.CommandBase.runVdsCommand(CommandBase.java:2112) [bll.jar:] at org.ovirt.engine.core.bll.storage.domain.DeactivateStorageDomainCommand.dectivateStorageDomain(DeactivateStorageDomainCommand.java:340) [bll.jar:] ... at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) [rt.jar:1.8.0_292] at java.lang.Thread.run(Thread.java:748) [rt.jar:1.8.0_292] at org.glassfish.enterprise.concurrent.ManagedThreadFactoryImpl$ManagedThread.run(ManagedThreadFactoryImpl.java:250) [javax.enterprise.concurrent-1.0.jar:] Caused by: org.ovirt.engine.core.vdsbroker.irsbroker.IRSNoMasterDomainException: IRSGenericException: IRSErrorException: IRSNoMasterDomainException: Wrong Master domain or its version: u'SD=a5a83df1-47e2-4927-9add-079199ca7ef8, pool=f72ec125-69a1-4c1b-a5e1-313fcb70b6ff' at org.ovirt.engine.core.vdsbroker.vdsbroker.BrokerCommandBase.proceedProxyReturnValue(BrokerCommandBase.java:50) [vdsbroker.jar:] at org.ovirt.engine.core.vdsbroker.vdsbroker.ConnectStoragePoolVDSCommand.proceedConnectProxyReturnValue(ConnectStoragePoolVDSCommand.java:48) [vdsbroker.jar:] at org.ovirt.engine.core.vdsbroker.vdsbroker.ConnectStoragePoolVDSCommand.proceedProxyReturnValue(ConnectStoragePoolVDSCommand.java:36) [vdsbroker.jar:] ... at org.jboss.weld.bean.proxy.CombinedInterceptorAndDecoratorStackMethodHandler.invoke(CombinedInterceptorAndDecoratorStackMethodHandler.java:79) [weld-core-impl-3.1.1.Final.jar:3.1.1.Final] at org.jboss.weld.bean.proxy.CombinedInterceptorAndDecoratorStackMethodHandler.invoke(CombinedInterceptorAndDecoratorStackMethodHandler.java:68) [weld-core-impl-3.1.1.Final.jar:3.1.1.Final] at org.ovirt.engine.core.vdsbroker.ResourceManager$Proxy$_$$_WeldSubclass.runVdsCommand(Unknown Source) [vdsbroker.jar:] ... 94 more
2021-07-29 10:30:37,861-07 ERROR [org.ovirt.engine.core.bll.storage.domain.DeactivateStorageDomainCommand] (EE-ManagedThreadFactory-engineScheduled-Thread-25) [35c5b47] Failed to deactivate storage domain 'a5a83df1-47e2-4927-9add-079199ca7ef8' 2021-07-29 10:30:37,868-07 INFO [org.ovirt.engine.core.bll.CommandCompensator] (EE-ManagedThreadFactory-engineScheduled-Thread-25) [35c5b47] Command [id=c63199f8-a720-4053-8e5c-92c8d21e0ce2]: Compensating CHANGED_STATUS_ONLY of org.ovirt.engine.core.common.businessentities.StoragePoolIsoMap; snapshot: EntityStatusSnapshot:{id='StoragePoolIsoMapId:{storagePoolId='f72ec125-69a1-4c1b-a5e1-313fcb70b6ff', storageId='a5a83df1-47e2-4927-9add-079199ca7ef8'}', status='Unknown'}. 2021-07-29 10:30:37,882-07 ERROR [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (EE-ManagedThreadFactory-engineScheduled-Thread-25) [35c5b47] EVENT_ID: USER_DEACTIVATE_STORAGE_DOMAIN_FAILED(969), Failed to deactivate Storage Domain vm-storage-ssd (Data Center EDC2). 2021-07-29 10:30:37,884-07 WARN [org.ovirt.engine.core.bll.storage.pool.ReconstructMasterDomainCommand] (EE-ManagedThreadFactory-engine-Thread-25) [60d33d] Validation of action 'ReconstructMasterDomain' failed for user SYSTEM. Reasons: VAR__ACTION__RECONSTRUCT_MASTER,VAR__TYPE__STORAGE__DOMAIN,ACTION_TYPE_FAILED_STORAGE_DOMAIN_STATUS_ILLEGAL2,$status Locked 2021-07-29 10:30:37,888-07 INFO [org.ovirt.engine.core.bll.eventqueue.EventQueueMonitor] (EE-ManagedThreadFactory-engine-Thread-48) [35c5b47] Finished reconstruct for pool 'f72ec125-69a1-4c1b-a5e1-313fcb70b6ff'. Clearing event queue 2021-07-29 10:30:37,899-07 INFO [org.ovirt.engine.core.vdsbroker.vdsbroker.HSMGetAllTasksStatusesVDSCommand] (EE-ManagedThreadFactory-engineScheduled-Thread-50) [] Command 'org.ovirt.engine.core.vdsbroker.vdsbroker.HSMGetAllTasksStatusesVDSCommand' return value ' TaskStatusListReturn:{status='Status [code=654, message=Not SPM]'}
Thanks, -Matthew
On 7/29/21 2:52 AM, Shani Leviim wrote:
Notice: This message was sent from outside the University of Victoria email system. Please be cautious with links and sensitive information.
Hi Matthew, Actually, your description is related to 2 features available for ovirt 4.4.5 <https://www.ovirt.org/release/4.4.5/> 1. The ability to switch the master storage domain while domains are up and running [1] 2. Clearing the finished tasks from REST API [2] and UI [3].
We recommend you upgrade your engine to enjoy those features.
In the meanwhile, as you've described, moving the Master role from one storage to the other is available using putting the domain into maintenance. In order to clear the finished tasks from SPM: vdsm-client Host getAllTasksStatuses
It should be something like that: { "1dc4d885-577a-4b6a-b01f-e682602a907c": { "code": 0, "message": "1 jobs completed successfully", "taskID": "1dc4d885-577a-4b6a-b01f-e682602a907c", "taskResult": "success", "taskState": "finished" } }
Then clear that tasks: vdsm-client Task clear taskID=12345 Once it gets cleared, the reconstruction can be finished.
To verify there are no more finished async tasks, you can run this SQL query on the engine: engine=# select * from async_tasks WHERE storage_pool_id = '123';
[1] https://bugzilla.redhat.com/show_bug.cgi?id=1910022 <https://bugzilla.redhat.com/show_bug.cgi?id=1910022> [2] https://bugzilla.redhat.com/show_bug.cgi?id=1627997 <https://bugzilla.redhat.com/show_bug.cgi?id=1627997> [3] https://bugzilla.redhat.com/show_bug.cgi?id=1910302 <https://bugzilla.redhat.com/show_bug.cgi?id=1910302>
*Regards, * *Shani Leviim *
On Thu, Jul 29, 2021 at 8:33 AM Matthew Benstead <matthewb@uvic.ca <mailto:matthewb@uvic.ca>> wrote:
Hello,
I'm trying to decommission the old master storage domain in ovirt, and replace it with a new one. All of the VMs have been migrated off of the old master, and everything has been running on the new storage domain for a couple months. But when I try to put the old domain into maintenance mode I get an error.
Old Master: vm-storage-ssd New Domain: vm-storage-ssd2
The error is:
Failed to Reconstruct Master Domain for Data Center EDC2
As well as:
Sync Error on Master Domain between Host daccs01 and oVirt Engine. Domain: vm-storage-ssd is marked as Master in oVirt Engine database but not on the Storage side. Please consult with Support on how to fix this issue.
2021-07-28 11:41:34,870-07 WARN [org.ovirt.engine.core.vdsbroker.irsbroker.IrsProxy] (EE-ManagedThreadFactory-engine-Thread-23) [] Master domain version is not in sync between DB and VDSM. Domain vm-storage-ssd marked as master, but the version in DB: 283 and in VDSM: 280
And:
Not stopping SPM on vds daccs01, pool id f72ec125-69a1-4c1b-a5e1-313fcb70b6ff as there are uncleared tasks Task '5fa9edf0-56c3-40e4-9327-47bf7764d28d', status 'finished'
After a couple minutes all the domains are marked as active again and things continue, but vm-storage-ssd is still listed as the master domain. Any thoughts?
This is on 4.3.10.4-1.el7 on CentOS 7.
engine=# SELECT storage_name, storage_pool_id, storage, status FROM storage_pool_with_storage_domain ORDER BY storage_name; storage_name | storage_pool_id | storage | status -----------------------+--------------------------------------+----------------------------------------+-------- compute1-iscsi-ssd | f72ec125-69a1-4c1b-a5e1-313fcb70b6ff | yvUESE-yWUv-VIWL-qX90-aAq7-gK0I-EqppRL | 1 compute7-iscsi-ssd | f72ec125-69a1-4c1b-a5e1-313fcb70b6ff | 8ekHdv-u0RJ-B0FO-LUUK-wDWs-iaxb-sh3W3J | 1 export-domain-storage | f72ec125-69a1-4c1b-a5e1-313fcb70b6ff | d3932528-6844-481a-bfed-542872ace9e5 | 1 iso-storage | f72ec125-69a1-4c1b-a5e1-313fcb70b6ff | f800b7a6-6a0c-4560-8476-2f294412d87d | 1 vm-storage-7200rpm | f72ec125-69a1-4c1b-a5e1-313fcb70b6ff | a0bff472-1348-4302-a5c7-f1177efa45a9 | 1 vm-storage-ssd | f72ec125-69a1-4c1b-a5e1-313fcb70b6ff | 95acd9a4-a6fb-4208-80dd-1c53d6aacad0 | 1 vm-storage-ssd2 | f72ec125-69a1-4c1b-a5e1-313fcb70b6ff | 829d0600-c3f7-4dae-a749-d7f05c6a6ca4 | 1 (7 rows)
Thanks, -Matthew -- _______________________________________________ Users mailing list -- users@ovirt.org <mailto:users@ovirt.org> To unsubscribe send an email to users-leave@ovirt.org <mailto:users-leave@ovirt.org> Privacy Statement: https://www.ovirt.org/privacy-policy.html <https://www.ovirt.org/privacy-policy.html> oVirt Code of Conduct: https://www.ovirt.org/community/about/community-guidelines/ <https://www.ovirt.org/community/about/community-guidelines/> List Archives: https://lists.ovirt.org/archives/list/users@ovirt.org/message/OXOXW6B2NWXOUG... <https://lists.ovirt.org/archives/list/users@ovirt.org/message/OXOXW6B2NWXOUGZV3OKO4OMDXVDJSQLZ/>

The master domain should be up first. I've noticed from your prev response to: [root@compute7 ~]# vdsm-client StoragePool getInfo storagepoolID="f72ec125-69a1-4c1b-a5e1-313fcb70b6ff" { "info": { "name": "No Description", "isoprefix": "/rhev/data-center/mnt/10.0.231.91: _storage_data_projects_ovirt_nobackup_iso-storage/3fc76134-2143-4921-ad36-ee84abca40e8/images/11111111-1111-1111-1111-111111111111", "pool_status": "connected", "lver": 9356, "spm_id": 6, "master_uuid": "a5a83df1-47e2-4927-9add-079199ca7ef8", "version": "5", "domains": "f73307bc-06c8-4996-86d1-78947cdaf6dd:Attached,d5ae843b-5815-4f3a-b1be-370e56fe0962:Active,a5a83df1-47e2-4927-9add-079199ca7ef8:Active,311c1382-12c2-43a0-96d0-e2084180b114:Active,fc049ebe-03f9-43fc-adca-d6bfeb99c288:Active,3fc76134-2143-4921-ad36-ee84abca40e8:Active,2f2aab43-6ce3-4cb0-9142-b2b57e5083b3:Active", *"type": "GLUSTERFS",* "master_ver": 288 }, There's a bug regarding putting GLUSTER domains into maintenance for replacing the master role: https://bugzilla.redhat.com/show_bug.cgi?id=1913764. If you can see that error on the vdsm.log, your case matches the bug: (tasks/2) [storage.StoragePool] Migration to new master 2a0d3c24-3357-4677-b9b2-35486af464a3 failed (sp:903) Traceback (most recent call last): File "/usr/lib/python3.6/site-packages/vdsm/storage/sp.py", line 891, in masterMigrate exclude=('./lost+found',)) File "/usr/lib/python3.6/site-packages/vdsm/storage/fileUtils.py", line 71, in tarCopy raise TarCopyFailed(tsrc.returncode, tdst.returncode, out, err) vdsm.storage.fileUtils.TarCopyFailed: (1, 0, b'', b'') There's a chance you can switch the master domain to one of the iscsi domains: Try putting domain vm-storage-ssd2 into maintenance and all other GLUSTER domains into maintenance, so only the iscsi domains remain active. Then, try to put vm-storage-ssd into maintenance. *Regards,* *Shani Leviim* On Wed, Aug 4, 2021 at 7:00 PM Matthew Benstead <matthewb@uvic.ca> wrote:
Thanks Shani - Unfortunately it still fails. I had been been hoping to make the change without an outage, but if I shutdown the VMs and put all the storage domains into maintenance mode, and then activate the new storage domain (that I want to be the master) first, would that work? Or is there some kind of transfer that needs to take place?
Before:
During:
Errors:
Thanks, -Matthew
On 8/4/21 2:46 AM, Shani Leviim wrote:
Notice: This message was sent from outside the University of Victoria email system. Please be cautious with links and sensitive information.
Thanks, Matthew So it seems that both vdsm and engine are aligned with that data. Does your env still fail with reconstruction errors?
Once it's "stabilized" (so both domains are up and available), can you please try again putting the current master into maintenance and share the results?
*Regards, *
*Shani Leviim *
On Tue, Aug 3, 2021 at 8:29 PM Matthew Benstead <matthewb@uvic.ca> wrote:
Thanks Shani,
Here's the output from the SPM - it looks like the master version is 288:
[root@compute7 ~]# vdsm-client StoragePool getInfo storagepoolID="f72ec125-69a1-4c1b-a5e1-313fcb70b6ff" { "info": { "name": "No Description", "isoprefix": "/rhev/data-center/mnt/10.0.231.91: _storage_data_projects_ovirt_nobackup_iso-storage/3fc76134-2143-4921-ad36-ee84abca40e8/images/11111111-1111-1111-1111-111111111111", "pool_status": "connected", "lver": 9356, "spm_id": 6, "master_uuid": "a5a83df1-47e2-4927-9add-079199ca7ef8", "version": "5", "domains": "f73307bc-06c8-4996-86d1-78947cdaf6dd:Attached,d5ae843b-5815-4f3a-b1be-370e56fe0962:Active,a5a83df1-47e2-4927-9add-079199ca7ef8:Active,311c1382-12c2-43a0-96d0-e2084180b114:Active,fc049ebe-03f9-43fc-adca-d6bfeb99c288:Active,3fc76134-2143-4921-ad36-ee84abca40e8:Active,2f2aab43-6ce3-4cb0-9142-b2b57e5083b3:Active", "type": "GLUSTERFS", "master_ver": 288 }, "dominfo": { "f73307bc-06c8-4996-86d1-78947cdaf6dd": { "status": "Attached", "isoprefix": "", "alerts": [] }, "d5ae843b-5815-4f3a-b1be-370e56fe0962": { "status": "Active", "diskfree": "94847578931200", "isoprefix": "", "alerts": [], "disktotal": "359981635338240", "version": 0 }, "a5a83df1-47e2-4927-9add-079199ca7ef8": { "status": "Active", "diskfree": "708888707072", "isoprefix": "", "alerts": [], "disktotal": "751252275200", "version": 5 }, "311c1382-12c2-43a0-96d0-e2084180b114": { "status": "Active", "diskfree": "1598335766528", "isoprefix": "", "alerts": [], "disktotal": "2197949513728", "version": 5 }, "2f2aab43-6ce3-4cb0-9142-b2b57e5083b3": { "status": "Active", "diskfree": "1416265465856", "isoprefix": "", "alerts": [], "disktotal": "3837687496704", "version": 5 }, "3fc76134-2143-4921-ad36-ee84abca40e8": { "status": "Active", "diskfree": "94847578931200", "isoprefix": "/rhev/data-center/mnt/10.0.231.91: _storage_data_projects_ovirt_nobackup_iso-storage/3fc76134-2143-4921-ad36-ee84abca40e8/images/11111111-1111-1111-1111-111111111111", "alerts": [], "disktotal": "359981635338240", "version": 0 }, "fc049ebe-03f9-43fc-adca-d6bfeb99c288": { "status": "Active", "diskfree": "1337882312704", "isoprefix": "", "alerts": [], "disktotal": "3837687496704", "version": 5 } } }
And the master domain version for the vm-storage-ssd domain is 288 in the database as well:
engine=# select * from storage_pool where id = 'f72ec125-69a1-4c1b-a5e1-313fcb70b6ff'; id | name | description | storage_pool_type | storage_pool_format_type | status | master_domain_version | spm_vds_id | compatibility_version | _create_date | _update_date | quota_enforcement_type | free_text_comment | is_local
--------------------------------------+------+-------------+-------------------+--------------------------+--------+-----------------------+--------------------------------------+-----------------------+-------------------------------+-------------------------------+------------------------+-------------------+---------- f72ec125-69a1-4c1b-a5e1-313fcb70b6ff | EDC2 | | | 5 | 1 | 288 | 51769733-0cf6-4270-8288-ec96474b7609 | 4.3 | 2015-08-10 20:51:03.831215-07 | 2021-07-29 10:31:02.234262-07 | 0 | | f (1 row)
Here is the master storage domain details: engine=# select * from storage_domains where storage_pool_id='f72ec125-69a1-4c1b-a5e1-313fcb70b6ff' and storage_domain_type='0'; id | storage | storage_name | storage_description | storage_comment | storage_pool_id | available_disk_size | confirmed_available_disk_size | vdo_savings | used_disk_size | commited_disk_size | actual_images_size | status | storage_pool_name | storage_type | storage_domain_type | storage_domain_format_type | last_time_used_as_master | wipe_after_delete | discard_after_delete | first_metadata_device | vg_metadata_device | backup | block_size | storage_domain_shared_status | recoverable | contains_unregistered_entities | warning_low_space_indicator | critical_space_action_blocker | warning_low_confirmed_space_indicator | external_status | supports_discard | is_hosted_engine_storage
--------------------------------------+--------------------------------------+----------------+---------------------+-----------------+--------------------------------------+---------------------+-------------------------------+-------------+----------------+--------------------+--------------------+--------+-------------------+--------------+---------------------+----------------------------+--------------------------+-------------------+----------------------+-----------------------+--------------------+--------+------------+------------------------------+-------------+--------------------------------+-----------------------------+-------------------------------+---------------------------------------+-----------------+------------------+-------------------------- a5a83df1-47e2-4927-9add-079199ca7ef8 | 95acd9a4-a6fb-4208-80dd-1c53d6aacad0 | vm-storage-ssd | | | f72ec125-69a1-4c1b-a5e1-313fcb70b6ff | 660 | | | 39 | 0 | 0 | 3 | EDC2 | 7 | 0 | 5 | 1627497705160 | f | f | | | f | 512 | 1 | t | f | 10 | 5 | 0 | 0 | | f (1 row)
This is the domain we want to switch over to the master domain so I can decommission the old one.
engine=# select * from storage_domains where storage_pool_id='f72ec125-69a1-4c1b-a5e1-313fcb70b6ff' and storage_name = 'vm-storage-ssd2'; id | storage | storage_name | storage_description | storage_comment | storage_pool_id | available_disk_size | confirmed_available_disk_size | vdo_savings | used_disk_size | commited_disk_size | actual_images_size | status | storage_pool_name | storage_type | storage_domain_type | storage_domain_format_type | last_time_used_as_master | wipe_after_delete | discard_after_delete | first_metadata_device | vg_metadata_device | backup | block_size | storage_domain_shared_status | recoverable | contains_unregistered_entities | warning_low_space_indicator | critical_space_action_blocker | warning_low_confirmed_space_indicator | external_status | supports_discard | is_hosted_engine_storage
--------------------------------------+--------------------------------------+-----------------+----------------------------+-----------------+--------------------------------------+---------------------+-------------------------------+-------------+----------------+--------------------+--------------------+--------+-------------------+--------------+---------------------+----------------------------+------------------
--------+-------------------+----------------------+-----------------------+--------------------+--------+------------+------------------------------+-------------+--------------------------------+-----------------------------+-------------------------------+---------------------------------------+-----------------+------------------+-------------------------- 311c1382-12c2-43a0-96d0-e2084180b114 | 829d0600-c3f7-4dae-a749-d7f05c6a6ca4 | vm-storage-ssd2 | Storage01,02,03 vm-storage | | f72ec125-69a1-4c1b-a5e1-313fcb70b6ff | 1488 | | | 559 | 1147 | 538 | 3 | EDC2 | 7 | 1 | 5 | 1627497694904 | f | f | | | f | 512 | 1 | t | f | 10 | 5 | 10 | 0 | | f (1 row)
Thanks, -Matthew
On 8/1/21 2:00 AM, Shani Leviim wrote:
Notice: This message was sent from outside the University of Victoria email system. Please be cautious with links and sensitive information.
Hi Matthew,
You might need to sync back the master version and domain between the engine and vdsm. To verify those parameters on vdsm, run this command on the SPM host: vdsm-client StoragePool getInfo storagepoolID="f72ec125-69a1-4c1b-a5e1-313fcb70b6ff"
The result should be something like: "info": { "domains": "1234:Active,5678:Active,91011:Active", "isoprefix": "", "lver": 6,
* "master_uuid": "123", "master_ver": 14,* "name": "No Description", "pool_status": "connected", "spm_id": 1, "type": "NFS", "version": "5" }
Then, compare the master version value with the engine: engine=> select * from storage_pool where id = 'f72ec125-69a1-4c1b-a5e1-313fcb70b6ff';
And the master domain: engine=> select * from storage_domains where storage_pool_id='f72ec125-69a1-4c1b-a5e1-313fcb70b6ff' and storage_domain_type='0';
(0 means master, for reference, see https://github.com/oVirt/ovirt-engine/blob/a65cf0eae8858ab2278c3f537dc427e3f... )
Then we can get the bigger picture (and update the engine data to match the vdsm)
*Regards, *
*Shani Leviim *
On Thu, Jul 29, 2021 at 8:40 PM Matthew Benstead <matthewb@uvic.ca> wrote:
Thanks Shani - yes we plan to upgrade to 4.4 in the future, but we're on 4.3 right now due to only running CentOS 7 at the moment.
I was able to clear the job from the SPM:
[root@daccs01 ~]# vdsm-client Host getAllTasksStatuses { "5fa9edf0-56c3-40e4-9327-47bf7764d28d": { "message": "1 jobs completed successfully", "code": 0, "taskID": "5fa9edf0-56c3-40e4-9327-47bf7764d28d", "taskResult": "success", "taskState": "finished" } } [root@daccs01 ~]# vdsm-client Task clear taskID=5fa9edf0-56c3-40e4-9327-47bf7764d28d true [root@daccs01 ~]# vdsm-client Host getAllTasksStatuses {}
And confirm there were no async_tasks:
engine=# select * from async_tasks; task_id | action_type | status | result | step_id | command_id | started_at | storage_pool_id | task_type | vdsm_task_id | root_command_id | user_id
---------+-------------+--------+--------+---------+------------+------------+-----------------+-----------+--------------+-----------------+--------- (0 rows)
However, when putting the vm-storage-ssd domain into maintenance mode, it failed again:
Here are some the logs entries - anything else I can look at?
2021-07-29 10:30:37,848-07 ERROR [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (EE-ManagedThreadFactory-engineScheduled-Thread-25) [35c5b47] EVENT_ID: VDS_BROKER_COMMAND_FAILURE(10,802), VDSM compute7.pcic.uvic.ca command ConnectStoragePoolVDS failed: Wrong Master domain or its version: u'SD=a5a83df1-47e2-4927-9add-079199ca7ef8, pool=f72ec125-69a1-4c1b-a5e1-313fcb70b6ff' 2021-07-29 10:30:37,848-07 INFO [org.ovirt.engine.core.vdsbroker.vdsbroker.ConnectStoragePoolVDSCommand] (EE-ManagedThreadFactory-engineScheduled-Thread-25) [35c5b47] Command 'org.ovirt.engine.core.vdsbroker.vd sbroker.ConnectStoragePoolVDSCommand' return value 'StatusOnlyReturn [status=Status [code=324, message=Wrong Master domain or its version: u'SD=a5a83df1-47e2-4927-9add-079199ca7ef8, pool=f72ec125-69a1-4c1b-a5e1- 313fcb70b6ff']]' ... 2021-07-29 10:30:37,848-07 INFO [org.ovirt.engine.core.vdsbroker.vdsbroker.ConnectStoragePoolVDSCommand] (EE-ManagedThreadFactory-engineScheduled-Thread-25) [35c5b47] HostName = compute7.pcic.uvic.ca 2021-07-29 10:30:37,849-07 ERROR [org.ovirt.engine.core.vdsbroker.vdsbroker.ConnectStoragePoolVDSCommand] (EE-ManagedThreadFactory-engineScheduled-Thread-25) [35c5b47] Command 'ConnectStoragePoolVDSCommand(HostN ame = compute7.pcic.uvic.ca, ConnectStoragePoolVDSCommandParameters:{hostId='51769733-0cf6-4270-8288-ec96474b7609', vdsId='51769733-0cf6-4270-8288-ec96474b7609', storagePoolId='f72ec125-69a1-4c1b-a5e1-313fcb70b6 ff', masterVersion='288'})' execution failed: IRSGenericException: IRSErrorException: IRSNoMasterDomainException: Wrong Master domain or its version: u'SD=a5a83df1-47e2-4927-9add-079199ca7ef8, pool=f72ec125-69a1 -4c1b-a5e1-313fcb70b6ff' ... 2021-07-29 10:30:37,849-07 ERROR [org.ovirt.engine.core.vdsbroker.irsbroker.IrsBrokerCommand] (EE-ManagedThreadFactory-engineScheduled-Thread-25) [35c5b47] IrsBroker::Failed::DeactivateStorageDomainVDS: IRSGener icException: IRSErrorException: IRSNoMasterDomainException: Wrong Master domain or its version: u'SD=a5a83df1-47e2-4927-9add-079199ca7ef8, pool=f72ec125-69a1-4c1b-a5e1-313fcb70b6ff' 2021-07-29 10:30:37,855-07 INFO [org.ovirt.engine.core.vdsbroker.irsbroker.DeactivateStorageDomainVDSCommand] (EE-ManagedThreadFactory-engineScheduled-Thread-25) [35c5b47] FINISH, DeactivateStorageDomainVDSComm and, return: , log id: 1c215ca4 2021-07-29 10:30:37,855-07 ERROR [org.ovirt.engine.core.bll.storage.domain.DeactivateStorageDomainCommand] (EE-ManagedThreadFactory-engineScheduled-Thread-25) [35c5b47] DeactivateStorageDomainVDS failed 'a5a83df 1-47e2-4927-9add-079199ca7ef8': org.ovirt.engine.core.common.errors.EngineException: EngineException: org.ovirt.engine.core.vdsbroker.irsbroker.IRSNoMasterDomainException: IRSGenericException: IRSErrorException: IRSNoMasterDomainException: Wrong Master domain or its version: u'SD=a5a83df1-47e2-4927-9add-079199ca7ef8, pool=f72ec125-69a1-4c1b-a5e1-313fcb70b6ff' (Failed with error StoragePoolWrongMaster and code 324) at org.ovirt.engine.core.bll.VdsHandler.handleVdsResult(VdsHandler.java:118) [bll.jar:] at org.ovirt.engine.core.bll.VDSBrokerFrontendImpl.runVdsCommand(VDSBrokerFrontendImpl.java:33) [bll.jar:] at org.ovirt.engine.core.bll.CommandBase.runVdsCommand(CommandBase.java:2112) [bll.jar:] at org.ovirt.engine.core.bll.storage.domain.DeactivateStorageDomainCommand.dectivateStorageDomain(DeactivateStorageDomainCommand.java:340) [bll.jar:] ... at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) [rt.jar:1.8.0_292] at java.lang.Thread.run(Thread.java:748) [rt.jar:1.8.0_292] at org.glassfish.enterprise.concurrent.ManagedThreadFactoryImpl$ManagedThread.run(ManagedThreadFactoryImpl.java:250) [javax.enterprise.concurrent-1.0.jar:] Caused by: org.ovirt.engine.core.vdsbroker.irsbroker.IRSNoMasterDomainException: IRSGenericException: IRSErrorException: IRSNoMasterDomainException: Wrong Master domain or its version: u'SD=a5a83df1-47e2-4927-9add-079199ca7ef8, pool=f72ec125-69a1-4c1b-a5e1-313fcb70b6ff' at org.ovirt.engine.core.vdsbroker.vdsbroker.BrokerCommandBase.proceedProxyReturnValue(BrokerCommandBase.java:50) [vdsbroker.jar:] at org.ovirt.engine.core.vdsbroker.vdsbroker.ConnectStoragePoolVDSCommand.proceedConnectProxyReturnValue(ConnectStoragePoolVDSCommand.java:48) [vdsbroker.jar:] at org.ovirt.engine.core.vdsbroker.vdsbroker.ConnectStoragePoolVDSCommand.proceedProxyReturnValue(ConnectStoragePoolVDSCommand.java:36) [vdsbroker.jar:] ... at org.jboss.weld.bean.proxy.CombinedInterceptorAndDecoratorStackMethodHandler.invoke(CombinedInterceptorAndDecoratorStackMethodHandler.java:79) [weld-core-impl-3.1.1.Final.jar:3.1.1.Final] at org.jboss.weld.bean.proxy.CombinedInterceptorAndDecoratorStackMethodHandler.invoke(CombinedInterceptorAndDecoratorStackMethodHandler.java:68) [weld-core-impl-3.1.1.Final.jar:3.1.1.Final] at org.ovirt.engine.core.vdsbroker.ResourceManager$Proxy$_$$_WeldSubclass.runVdsCommand(Unknown Source) [vdsbroker.jar:] ... 94 more
2021-07-29 10:30:37,861-07 ERROR [org.ovirt.engine.core.bll.storage.domain.DeactivateStorageDomainCommand] (EE-ManagedThreadFactory-engineScheduled-Thread-25) [35c5b47] Failed to deactivate storage domain 'a5a83df1-47e2-4927-9add-079199ca7ef8' 2021-07-29 10:30:37,868-07 INFO [org.ovirt.engine.core.bll.CommandCompensator] (EE-ManagedThreadFactory-engineScheduled-Thread-25) [35c5b47] Command [id=c63199f8-a720-4053-8e5c-92c8d21e0ce2]: Compensating CHANGED_STATUS_ONLY of org.ovirt.engine.core.common.businessentities.StoragePoolIsoMap; snapshot: EntityStatusSnapshot:{id='StoragePoolIsoMapId:{storagePoolId='f72ec125-69a1-4c1b-a5e1-313fcb70b6ff', storageId='a5a83df1-47e2-4927-9add-079199ca7ef8'}', status='Unknown'}. 2021-07-29 10:30:37,882-07 ERROR [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (EE-ManagedThreadFactory-engineScheduled-Thread-25) [35c5b47] EVENT_ID: USER_DEACTIVATE_STORAGE_DOMAIN_FAILED(969), Failed to deactivate Storage Domain vm-storage-ssd (Data Center EDC2). 2021-07-29 10:30:37,884-07 WARN [org.ovirt.engine.core.bll.storage.pool.ReconstructMasterDomainCommand] (EE-ManagedThreadFactory-engine-Thread-25) [60d33d] Validation of action 'ReconstructMasterDomain' failed for user SYSTEM. Reasons: VAR__ACTION__RECONSTRUCT_MASTER,VAR__TYPE__STORAGE__DOMAIN,ACTION_TYPE_FAILED_STORAGE_DOMAIN_STATUS_ILLEGAL2,$status Locked 2021-07-29 10:30:37,888-07 INFO [org.ovirt.engine.core.bll.eventqueue.EventQueueMonitor] (EE-ManagedThreadFactory-engine-Thread-48) [35c5b47] Finished reconstruct for pool 'f72ec125-69a1-4c1b-a5e1-313fcb70b6ff'. Clearing event queue 2021-07-29 10:30:37,899-07 INFO [org.ovirt.engine.core.vdsbroker.vdsbroker.HSMGetAllTasksStatusesVDSCommand] (EE-ManagedThreadFactory-engineScheduled-Thread-50) [] Command 'org.ovirt.engine.core.vdsbroker.vdsbroker.HSMGetAllTasksStatusesVDSCommand' return value ' TaskStatusListReturn:{status='Status [code=654, message=Not SPM]'}
Thanks, -Matthew
On 7/29/21 2:52 AM, Shani Leviim wrote:
Notice: This message was sent from outside the University of Victoria email system. Please be cautious with links and sensitive information.
Hi Matthew, Actually, your description is related to 2 features available for ovirt 4.4.5 <https://www.ovirt.org/release/4.4.5/> 1. The ability to switch the master storage domain while domains are up and running [1] 2. Clearing the finished tasks from REST API [2] and UI [3].
We recommend you upgrade your engine to enjoy those features.
In the meanwhile, as you've described, moving the Master role from one storage to the other is available using putting the domain into maintenance. In order to clear the finished tasks from SPM: vdsm-client Host getAllTasksStatuses
It should be something like that: { "1dc4d885-577a-4b6a-b01f-e682602a907c": { "code": 0, "message": "1 jobs completed successfully", "taskID": "1dc4d885-577a-4b6a-b01f-e682602a907c", "taskResult": "success", "taskState": "finished" } }
Then clear that tasks: vdsm-client Task clear taskID=12345 Once it gets cleared, the reconstruction can be finished.
To verify there are no more finished async tasks, you can run this SQL query on the engine: engine=# select * from async_tasks WHERE storage_pool_id = '123';
[1] https://bugzilla.redhat.com/show_bug.cgi?id=1910022 [2] https://bugzilla.redhat.com/show_bug.cgi?id=1627997 [3] https://bugzilla.redhat.com/show_bug.cgi?id=1910302
*Regards, *
*Shani Leviim *
On Thu, Jul 29, 2021 at 8:33 AM Matthew Benstead <matthewb@uvic.ca> wrote:
Hello,
I'm trying to decommission the old master storage domain in ovirt, and replace it with a new one. All of the VMs have been migrated off of the old master, and everything has been running on the new storage domain for a couple months. But when I try to put the old domain into maintenance mode I get an error.
Old Master: vm-storage-ssd New Domain: vm-storage-ssd2
The error is:
Failed to Reconstruct Master Domain for Data Center EDC2
As well as:
Sync Error on Master Domain between Host daccs01 and oVirt Engine. Domain: vm-storage-ssd is marked as Master in oVirt Engine database but not on the Storage side. Please consult with Support on how to fix this issue.
2021-07-28 11:41:34,870-07 WARN [org.ovirt.engine.core.vdsbroker.irsbroker.IrsProxy] (EE-ManagedThreadFactory-engine-Thread-23) [] Master domain version is not in sync between DB and VDSM. Domain vm-storage-ssd marked as master, but the version in DB: 283 and in VDSM: 280
And:
Not stopping SPM on vds daccs01, pool id f72ec125-69a1-4c1b-a5e1-313fcb70b6ff as there are uncleared tasks Task '5fa9edf0-56c3-40e4-9327-47bf7764d28d', status 'finished'
After a couple minutes all the domains are marked as active again and things continue, but vm-storage-ssd is still listed as the master domain. Any thoughts?
This is on 4.3.10.4-1.el7 on CentOS 7.
engine=# SELECT storage_name, storage_pool_id, storage, status FROM storage_pool_with_storage_domain ORDER BY storage_name; storage_name | storage_pool_id | storage | status
-----------------------+--------------------------------------+----------------------------------------+-------- compute1-iscsi-ssd | f72ec125-69a1-4c1b-a5e1-313fcb70b6ff | yvUESE-yWUv-VIWL-qX90-aAq7-gK0I-EqppRL | 1 compute7-iscsi-ssd | f72ec125-69a1-4c1b-a5e1-313fcb70b6ff | 8ekHdv-u0RJ-B0FO-LUUK-wDWs-iaxb-sh3W3J | 1 export-domain-storage | f72ec125-69a1-4c1b-a5e1-313fcb70b6ff | d3932528-6844-481a-bfed-542872ace9e5 | 1 iso-storage | f72ec125-69a1-4c1b-a5e1-313fcb70b6ff | f800b7a6-6a0c-4560-8476-2f294412d87d | 1 vm-storage-7200rpm | f72ec125-69a1-4c1b-a5e1-313fcb70b6ff | a0bff472-1348-4302-a5c7-f1177efa45a9 | 1 vm-storage-ssd | f72ec125-69a1-4c1b-a5e1-313fcb70b6ff | 95acd9a4-a6fb-4208-80dd-1c53d6aacad0 | 1 vm-storage-ssd2 | f72ec125-69a1-4c1b-a5e1-313fcb70b6ff | 829d0600-c3f7-4dae-a749-d7f05c6a6ca4 | 1 (7 rows)
Thanks, -Matthew -- _______________________________________________ Users mailing list -- users@ovirt.org To unsubscribe send an email to users-leave@ovirt.org Privacy Statement: https://www.ovirt.org/privacy-policy.html oVirt Code of Conduct: https://www.ovirt.org/community/about/community-guidelines/ List Archives: https://lists.ovirt.org/archives/list/users@ovirt.org/message/OXOXW6B2NWXOUG...

Hi Shani, Yes, it's a gluster domain, and it looks like yes that matches the bug: 2021-08-04 08:47:28,516-0700 ERROR (jsonrpc/5) [storage.StoragePool] migration to new master failed (sp:909) Traceback (most recent call last): File "/usr/lib/python2.7/site-packages/vdsm/storage/sp.py", line 898, in masterMigrate exclude=('./lost+found',)) File "/usr/lib/python2.7/site-packages/vdsm/storage/fileUtils.py", line 69, in tarCopy raise TarCopyFailed(tsrc.returncode, tdst.returncode, out, err) TarCopyFailed: (1, 0, '', '') The iscsi domains' aren't as resilient as the gluster domains, so I'd really like to have the vm-storage-ssd2 domains as the master. I'll have to take an outage anyways to put the gluster domains in maintenance mode, so could I just put all domains into maintenance mode and then activate vm-storage-ssd2 first so it's elected master? Or is there some transfer mechanism that must take place? Thanks, -Matthew -- Matthew Benstead System Administrator Pacific Climate Impacts Consortium <https://pacificclimate.org/> University of Victoria, UH1 PO Box 1800, STN CSC Victoria, BC, V8W 2Y2 Phone: +1-250-721-8432 Email: matthewb@uvic.ca On 8/4/21 11:26 AM, Shani Leviim wrote:
Notice: This message was sent from outside the University of Victoria email system. Please be cautious with links and sensitive information.
The master domain should be up first.
I've noticed from your prev response to:
[root@compute7 ~]# vdsm-client StoragePool getInfo storagepoolID="f72ec125-69a1-4c1b-a5e1-313fcb70b6ff" { "info": { "name": "No Description", "isoprefix": "/rhev/data-center/mnt/10.0.231.91:_storage_data_projects_ovirt_nobackup_iso-storage/3fc76134-2143-4921-ad36-ee84abca40e8/images/11111111-1111-1111-1111-111111111111", "pool_status": "connected", "lver": 9356, "spm_id": 6, "master_uuid": "a5a83df1-47e2-4927-9add-079199ca7ef8", "version": "5", "domains": "f73307bc-06c8-4996-86d1-78947cdaf6dd:Attached,d5ae843b-5815-4f3a-b1be-370e56fe0962:Active,a5a83df1-47e2-4927-9add-079199ca7ef8:Active,311c1382-12c2-43a0-96d0-e2084180b114:Active,fc049ebe-03f9-43fc-adca-d6bfeb99c288:Active,3fc76134-2143-4921-ad36-ee84abca40e8:Active,2f2aab43-6ce3-4cb0-9142-b2b57e5083b3:Active", *"type": "GLUSTERFS",* "master_ver": 288 },
There's a bug regarding putting GLUSTER domains into maintenance for replacing the master role: https://bugzilla.redhat.com/show_bug.cgi?id=1913764 <https://bugzilla.redhat.com/show_bug.cgi?id=1913764>.
If you can see that error on the vdsm.log, your case matches the bug: (tasks/2) [storage.StoragePool] Migration to new master 2a0d3c24-3357-4677-b9b2-35486af464a3 failed (sp:903) Traceback (most recent call last): File "/usr/lib/python3.6/site-packages/vdsm/storage/sp.py", line 891, in masterMigrate exclude=('./lost+found',)) File "/usr/lib/python3.6/site-packages/vdsm/storage/fileUtils.py", line 71, in tarCopy raise TarCopyFailed(tsrc.returncode, tdst.returncode, out, err) vdsm.storage.fileUtils.TarCopyFailed: (1, 0, b'', b'')
There's a chance you can switch the master domain to one of the iscsi domains: Try putting domain vm-storage-ssd2 into maintenance and all other GLUSTER domains into maintenance, so only the iscsi domains remain active. Then, try to put vm-storage-ssd into maintenance.
*Regards, * *Shani Leviim *
On Wed, Aug 4, 2021 at 7:00 PM Matthew Benstead <matthewb@uvic.ca <mailto:matthewb@uvic.ca>> wrote:
Thanks Shani - Unfortunately it still fails. I had been been hoping to make the change without an outage, but if I shutdown the VMs and put all the storage domains into maintenance mode, and then activate the new storage domain (that I want to be the master) first, would that work? Or is there some kind of transfer that needs to take place?
Before:
During:
Errors:
Thanks, -Matthew
On 8/4/21 2:46 AM, Shani Leviim wrote:
Notice: This message was sent from outside the University of Victoria email system. Please be cautious with links and sensitive information.
Thanks, Matthew So it seems that both vdsm and engine are aligned with that data. Does your env still fail with reconstruction errors?
Once it's "stabilized" (so both domains are up and available), can you please try again putting the current master into maintenance and share the results?
*Regards, * *Shani Leviim *
On Tue, Aug 3, 2021 at 8:29 PM Matthew Benstead <matthewb@uvic.ca <mailto:matthewb@uvic.ca>> wrote:
Thanks Shani,
Here's the output from the SPM - it looks like the master version is 288:
[root@compute7 ~]# vdsm-client StoragePool getInfo storagepoolID="f72ec125-69a1-4c1b-a5e1-313fcb70b6ff" { "info": { "name": "No Description", "isoprefix": "/rhev/data-center/mnt/10.0.231.91:_storage_data_projects_ovirt_nobackup_iso-storage/3fc76134-2143-4921-ad36-ee84abca40e8/images/11111111-1111-1111-1111-111111111111", "pool_status": "connected", "lver": 9356, "spm_id": 6, "master_uuid": "a5a83df1-47e2-4927-9add-079199ca7ef8", "version": "5", "domains": "f73307bc-06c8-4996-86d1-78947cdaf6dd:Attached,d5ae843b-5815-4f3a-b1be-370e56fe0962:Active,a5a83df1-47e2-4927-9add-079199ca7ef8:Active,311c1382-12c2-43a0-96d0-e2084180b114:Active,fc049ebe-03f9-43fc-adca-d6bfeb99c288:Active,3fc76134-2143-4921-ad36-ee84abca40e8:Active,2f2aab43-6ce3-4cb0-9142-b2b57e5083b3:Active", "type": "GLUSTERFS", "master_ver": 288 }, "dominfo": { "f73307bc-06c8-4996-86d1-78947cdaf6dd": { "status": "Attached", "isoprefix": "", "alerts": [] }, "d5ae843b-5815-4f3a-b1be-370e56fe0962": { "status": "Active", "diskfree": "94847578931200", "isoprefix": "", "alerts": [], "disktotal": "359981635338240", "version": 0 }, "a5a83df1-47e2-4927-9add-079199ca7ef8": { "status": "Active", "diskfree": "708888707072", "isoprefix": "", "alerts": [], "disktotal": "751252275200", "version": 5 }, "311c1382-12c2-43a0-96d0-e2084180b114": { "status": "Active", "diskfree": "1598335766528", "isoprefix": "", "alerts": [], "disktotal": "2197949513728", "version": 5 }, "2f2aab43-6ce3-4cb0-9142-b2b57e5083b3": { "status": "Active", "diskfree": "1416265465856", "isoprefix": "", "alerts": [], "disktotal": "3837687496704", "version": 5 }, "3fc76134-2143-4921-ad36-ee84abca40e8": { "status": "Active", "diskfree": "94847578931200", "isoprefix": "/rhev/data-center/mnt/10.0.231.91:_storage_data_projects_ovirt_nobackup_iso-storage/3fc76134-2143-4921-ad36-ee84abca40e8/images/11111111-1111-1111-1111-111111111111", "alerts": [], "disktotal": "359981635338240", "version": 0 }, "fc049ebe-03f9-43fc-adca-d6bfeb99c288": { "status": "Active", "diskfree": "1337882312704", "isoprefix": "", "alerts": [], "disktotal": "3837687496704", "version": 5 } } }
And the master domain version for the vm-storage-ssd domain is 288 in the database as well:
engine=# select * from storage_pool where id = 'f72ec125-69a1-4c1b-a5e1-313fcb70b6ff'; id | name | description | storage_pool_type | storage_pool_format_type | status | master_domain_version | spm_vds_id | compatibility_version | _create_date | _update_date | quota_enforcement_type | free_text_comment | is_local --------------------------------------+------+-------------+-------------------+--------------------------+--------+-----------------------+--------------------------------------+-----------------------+-------------------------------+-------------------------------+------------------------+-------------------+---------- f72ec125-69a1-4c1b-a5e1-313fcb70b6ff | EDC2 | | | 5 | 1 | 288 | 51769733-0cf6-4270-8288-ec96474b7609 | 4.3 | 2015-08-10 20:51:03.831215-07 | 2021-07-29 10:31:02.234262-07 | 0 | | f (1 row)
Here is the master storage domain details: engine=# select * from storage_domains where storage_pool_id='f72ec125-69a1-4c1b-a5e1-313fcb70b6ff' and storage_domain_type='0'; id | storage | storage_name | storage_description | storage_comment | storage_pool_id | available_disk_size | confirmed_available_disk_size | vdo_savings | used_disk_size | commited_disk_size | actual_images_size | status | storage_pool_name | storage_type | storage_domain_type | storage_domain_format_type | last_time_used_as_master | wipe_after_delete | discard_after_delete | first_metadata_device | vg_metadata_device | backup | block_size | storage_domain_shared_status | recoverable | contains_unregistered_entities | warning_low_space_indicator | critical_space_action_blocker | warning_low_confirmed_space_indicator | external_status | supports_discard | is_hosted_engine_storage --------------------------------------+--------------------------------------+----------------+---------------------+-----------------+--------------------------------------+---------------------+-------------------------------+-------------+----------------+--------------------+--------------------+--------+-------------------+--------------+---------------------+----------------------------+--------------------------+-------------------+----------------------+-----------------------+--------------------+--------+------------+------------------------------+-------------+--------------------------------+-----------------------------+-------------------------------+---------------------------------------+-----------------+------------------+-------------------------- a5a83df1-47e2-4927-9add-079199ca7ef8 | 95acd9a4-a6fb-4208-80dd-1c53d6aacad0 | vm-storage-ssd | | | f72ec125-69a1-4c1b-a5e1-313fcb70b6ff | 660 | | | 39 | 0 | 0 | 3 | EDC2 | 7 | 0 | 5 | 1627497705160 | f | f | | | f | 512 | 1 | t | f | 10 | 5 | 0 | 0 | | f (1 row)
This is the domain we want to switch over to the master domain so I can decommission the old one.
engine=# select * from storage_domains where storage_pool_id='f72ec125-69a1-4c1b-a5e1-313fcb70b6ff' and storage_name = 'vm-storage-ssd2'; id | storage | storage_name | storage_description | storage_comment | storage_pool_id | available_disk_size | confirmed_available_disk_size | vdo_savings | used_disk_size | commited_disk_size | actual_images_size | status | storage_pool_name | storage_type | storage_domain_type | storage_domain_format_type | last_time_used_as_master | wipe_after_delete | discard_after_delete | first_metadata_device | vg_metadata_device | backup | block_size | storage_domain_shared_status | recoverable | contains_unregistered_entities | warning_low_space_indicator | critical_space_action_blocker | warning_low_confirmed_space_indicator | external_status | supports_discard | is_hosted_engine_storage --------------------------------------+--------------------------------------+-----------------+----------------------------+-----------------+--------------------------------------+---------------------+-------------------------------+-------------+----------------+--------------------+--------------------+--------+-------------------+--------------+---------------------+----------------------------+------------------ --------+-------------------+----------------------+-----------------------+--------------------+--------+------------+------------------------------+-------------+--------------------------------+-----------------------------+-------------------------------+---------------------------------------+-----------------+------------------+-------------------------- 311c1382-12c2-43a0-96d0-e2084180b114 | 829d0600-c3f7-4dae-a749-d7f05c6a6ca4 | vm-storage-ssd2 | Storage01,02,03 vm-storage | | f72ec125-69a1-4c1b-a5e1-313fcb70b6ff | 1488 | | | 559 | 1147 | 538 | 3 | EDC2 | 7 | 1 | 5 | 1627497694904 | f | f | | | f | 512 | 1 | t | f | 10 | 5 | 10 | 0 | | f (1 row)
Thanks, -Matthew
On 8/1/21 2:00 AM, Shani Leviim wrote:
Notice: This message was sent from outside the University of Victoria email system. Please be cautious with links and sensitive information.
Hi Matthew,
You might need to sync back the master version and domain between the engine and vdsm. To verify those parameters on vdsm, run this command on the SPM host: vdsm-client StoragePool getInfo storagepoolID="f72ec125-69a1-4c1b-a5e1-313fcb70b6ff"
The result should be something like: "info": { "domains": "1234:Active,5678:Active,91011:Active", "isoprefix": "", "lver": 6, * "master_uuid": "123", "master_ver": 14,* "name": "No Description", "pool_status": "connected", "spm_id": 1, "type": "NFS", "version": "5" }
Then, compare the master version value with the engine: engine=> select * from storage_pool where id = 'f72ec125-69a1-4c1b-a5e1-313fcb70b6ff';
And the master domain: engine=> select * from storage_domains where storage_pool_id='f72ec125-69a1-4c1b-a5e1-313fcb70b6ff' and storage_domain_type='0';
(0 means master, for reference, see https://github.com/oVirt/ovirt-engine/blob/a65cf0eae8858ab2278c3f537dc427e3f... <https://github.com/oVirt/ovirt-engine/blob/a65cf0eae8858ab2278c3f537dc427e3ff20eba7/backend/manager/modules/common/src/main/java/org/ovirt/engine/core/common/businessentities/StorageDomainType.java>)
Then we can get the bigger picture (and update the engine data to match the vdsm)
*Regards, * *Shani Leviim *
On Thu, Jul 29, 2021 at 8:40 PM Matthew Benstead <matthewb@uvic.ca <mailto:matthewb@uvic.ca>> wrote:
Thanks Shani - yes we plan to upgrade to 4.4 in the future, but we're on 4.3 right now due to only running CentOS 7 at the moment.
I was able to clear the job from the SPM:
[root@daccs01 ~]# vdsm-client Host getAllTasksStatuses { "5fa9edf0-56c3-40e4-9327-47bf7764d28d": { "message": "1 jobs completed successfully", "code": 0, "taskID": "5fa9edf0-56c3-40e4-9327-47bf7764d28d", "taskResult": "success", "taskState": "finished" } } [root@daccs01 ~]# vdsm-client Task clear taskID=5fa9edf0-56c3-40e4-9327-47bf7764d28d true [root@daccs01 ~]# vdsm-client Host getAllTasksStatuses {}
And confirm there were no async_tasks:
engine=# select * from async_tasks; task_id | action_type | status | result | step_id | command_id | started_at | storage_pool_id | task_type | vdsm_task_id | root_command_id | user_id ---------+-------------+--------+--------+---------+------------+------------+-----------------+-----------+--------------+-----------------+--------- (0 rows)
However, when putting the vm-storage-ssd domain into maintenance mode, it failed again:
Here are some the logs entries - anything else I can look at?
2021-07-29 10:30:37,848-07 ERROR [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (EE-ManagedThreadFactory-engineScheduled-Thread-25) [35c5b47] EVENT_ID: VDS_BROKER_COMMAND_FAILURE(10,802), VDSM compute7.pcic.uvic.ca <http://compute7.pcic.uvic.ca> command ConnectStoragePoolVDS failed: Wrong Master domain or its version: u'SD=a5a83df1-47e2-4927-9add-079199ca7ef8, pool=f72ec125-69a1-4c1b-a5e1-313fcb70b6ff' 2021-07-29 10:30:37,848-07 INFO [org.ovirt.engine.core.vdsbroker.vdsbroker.ConnectStoragePoolVDSCommand] (EE-ManagedThreadFactory-engineScheduled-Thread-25) [35c5b47] Command 'org.ovirt.engine.core.vdsbroker.vd sbroker.ConnectStoragePoolVDSCommand' return value 'StatusOnlyReturn [status=Status [code=324, message=Wrong Master domain or its version: u'SD=a5a83df1-47e2-4927-9add-079199ca7ef8, pool=f72ec125-69a1-4c1b-a5e1- 313fcb70b6ff']]' ... 2021-07-29 10:30:37,848-07 INFO [org.ovirt.engine.core.vdsbroker.vdsbroker.ConnectStoragePoolVDSCommand] (EE-ManagedThreadFactory-engineScheduled-Thread-25) [35c5b47] HostName = compute7.pcic.uvic.ca <http://compute7.pcic.uvic.ca> 2021-07-29 10:30:37,849-07 ERROR [org.ovirt.engine.core.vdsbroker.vdsbroker.ConnectStoragePoolVDSCommand] (EE-ManagedThreadFactory-engineScheduled-Thread-25) [35c5b47] Command 'ConnectStoragePoolVDSCommand(HostN ame = compute7.pcic.uvic.ca <http://compute7.pcic.uvic.ca>, ConnectStoragePoolVDSCommandParameters:{hostId='51769733-0cf6-4270-8288-ec96474b7609', vdsId='51769733-0cf6-4270-8288-ec96474b7609', storagePoolId='f72ec125-69a1-4c1b-a5e1-313fcb70b6 ff', masterVersion='288'})' execution failed: IRSGenericException: IRSErrorException: IRSNoMasterDomainException: Wrong Master domain or its version: u'SD=a5a83df1-47e2-4927-9add-079199ca7ef8, pool=f72ec125-69a1 -4c1b-a5e1-313fcb70b6ff' ... 2021-07-29 10:30:37,849-07 ERROR [org.ovirt.engine.core.vdsbroker.irsbroker.IrsBrokerCommand] (EE-ManagedThreadFactory-engineScheduled-Thread-25) [35c5b47] IrsBroker::Failed::DeactivateStorageDomainVDS: IRSGener icException: IRSErrorException: IRSNoMasterDomainException: Wrong Master domain or its version: u'SD=a5a83df1-47e2-4927-9add-079199ca7ef8, pool=f72ec125-69a1-4c1b-a5e1-313fcb70b6ff' 2021-07-29 10:30:37,855-07 INFO [org.ovirt.engine.core.vdsbroker.irsbroker.DeactivateStorageDomainVDSCommand] (EE-ManagedThreadFactory-engineScheduled-Thread-25) [35c5b47] FINISH, DeactivateStorageDomainVDSComm and, return: , log id: 1c215ca4 2021-07-29 10:30:37,855-07 ERROR [org.ovirt.engine.core.bll.storage.domain.DeactivateStorageDomainCommand] (EE-ManagedThreadFactory-engineScheduled-Thread-25) [35c5b47] DeactivateStorageDomainVDS failed 'a5a83df 1-47e2-4927-9add-079199ca7ef8': org.ovirt.engine.core.common.errors.EngineException: EngineException: org.ovirt.engine.core.vdsbroker.irsbroker.IRSNoMasterDomainException: IRSGenericException: IRSErrorException: IRSNoMasterDomainException: Wrong Master domain or its version: u'SD=a5a83df1-47e2-4927-9add-079199ca7ef8, pool=f72ec125-69a1-4c1b-a5e1-313fcb70b6ff' (Failed with error StoragePoolWrongMaster and code 324) at org.ovirt.engine.core.bll.VdsHandler.handleVdsResult(VdsHandler.java:118) [bll.jar:] at org.ovirt.engine.core.bll.VDSBrokerFrontendImpl.runVdsCommand(VDSBrokerFrontendImpl.java:33) [bll.jar:] at org.ovirt.engine.core.bll.CommandBase.runVdsCommand(CommandBase.java:2112) [bll.jar:] at org.ovirt.engine.core.bll.storage.domain.DeactivateStorageDomainCommand.dectivateStorageDomain(DeactivateStorageDomainCommand.java:340) [bll.jar:] ... at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) [rt.jar:1.8.0_292] at java.lang.Thread.run(Thread.java:748) [rt.jar:1.8.0_292] at org.glassfish.enterprise.concurrent.ManagedThreadFactoryImpl$ManagedThread.run(ManagedThreadFactoryImpl.java:250) [javax.enterprise.concurrent-1.0.jar:] Caused by: org.ovirt.engine.core.vdsbroker.irsbroker.IRSNoMasterDomainException: IRSGenericException: IRSErrorException: IRSNoMasterDomainException: Wrong Master domain or its version: u'SD=a5a83df1-47e2-4927-9add-079199ca7ef8, pool=f72ec125-69a1-4c1b-a5e1-313fcb70b6ff' at org.ovirt.engine.core.vdsbroker.vdsbroker.BrokerCommandBase.proceedProxyReturnValue(BrokerCommandBase.java:50) [vdsbroker.jar:] at org.ovirt.engine.core.vdsbroker.vdsbroker.ConnectStoragePoolVDSCommand.proceedConnectProxyReturnValue(ConnectStoragePoolVDSCommand.java:48) [vdsbroker.jar:] at org.ovirt.engine.core.vdsbroker.vdsbroker.ConnectStoragePoolVDSCommand.proceedProxyReturnValue(ConnectStoragePoolVDSCommand.java:36) [vdsbroker.jar:] ... at org.jboss.weld.bean.proxy.CombinedInterceptorAndDecoratorStackMethodHandler.invoke(CombinedInterceptorAndDecoratorStackMethodHandler.java:79) [weld-core-impl-3.1.1.Final.jar:3.1.1.Final] at org.jboss.weld.bean.proxy.CombinedInterceptorAndDecoratorStackMethodHandler.invoke(CombinedInterceptorAndDecoratorStackMethodHandler.java:68) [weld-core-impl-3.1.1.Final.jar:3.1.1.Final] at org.ovirt.engine.core.vdsbroker.ResourceManager$Proxy$_$$_WeldSubclass.runVdsCommand(Unknown Source) [vdsbroker.jar:] ... 94 more
2021-07-29 10:30:37,861-07 ERROR [org.ovirt.engine.core.bll.storage.domain.DeactivateStorageDomainCommand] (EE-ManagedThreadFactory-engineScheduled-Thread-25) [35c5b47] Failed to deactivate storage domain 'a5a83df1-47e2-4927-9add-079199ca7ef8' 2021-07-29 10:30:37,868-07 INFO [org.ovirt.engine.core.bll.CommandCompensator] (EE-ManagedThreadFactory-engineScheduled-Thread-25) [35c5b47] Command [id=c63199f8-a720-4053-8e5c-92c8d21e0ce2]: Compensating CHANGED_STATUS_ONLY of org.ovirt.engine.core.common.businessentities.StoragePoolIsoMap; snapshot: EntityStatusSnapshot:{id='StoragePoolIsoMapId:{storagePoolId='f72ec125-69a1-4c1b-a5e1-313fcb70b6ff', storageId='a5a83df1-47e2-4927-9add-079199ca7ef8'}', status='Unknown'}. 2021-07-29 10:30:37,882-07 ERROR [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (EE-ManagedThreadFactory-engineScheduled-Thread-25) [35c5b47] EVENT_ID: USER_DEACTIVATE_STORAGE_DOMAIN_FAILED(969), Failed to deactivate Storage Domain vm-storage-ssd (Data Center EDC2). 2021-07-29 10:30:37,884-07 WARN [org.ovirt.engine.core.bll.storage.pool.ReconstructMasterDomainCommand] (EE-ManagedThreadFactory-engine-Thread-25) [60d33d] Validation of action 'ReconstructMasterDomain' failed for user SYSTEM. Reasons: VAR__ACTION__RECONSTRUCT_MASTER,VAR__TYPE__STORAGE__DOMAIN,ACTION_TYPE_FAILED_STORAGE_DOMAIN_STATUS_ILLEGAL2,$status Locked 2021-07-29 10:30:37,888-07 INFO [org.ovirt.engine.core.bll.eventqueue.EventQueueMonitor] (EE-ManagedThreadFactory-engine-Thread-48) [35c5b47] Finished reconstruct for pool 'f72ec125-69a1-4c1b-a5e1-313fcb70b6ff'. Clearing event queue 2021-07-29 10:30:37,899-07 INFO [org.ovirt.engine.core.vdsbroker.vdsbroker.HSMGetAllTasksStatusesVDSCommand] (EE-ManagedThreadFactory-engineScheduled-Thread-50) [] Command 'org.ovirt.engine.core.vdsbroker.vdsbroker.HSMGetAllTasksStatusesVDSCommand' return value ' TaskStatusListReturn:{status='Status [code=654, message=Not SPM]'}
Thanks, -Matthew
On 7/29/21 2:52 AM, Shani Leviim wrote:
Notice: This message was sent from outside the University of Victoria email system. Please be cautious with links and sensitive information.
Hi Matthew, Actually, your description is related to 2 features available for ovirt 4.4.5 <https://www.ovirt.org/release/4.4.5/> 1. The ability to switch the master storage domain while domains are up and running [1] 2. Clearing the finished tasks from REST API [2] and UI [3].
We recommend you upgrade your engine to enjoy those features.
In the meanwhile, as you've described, moving the Master role from one storage to the other is available using putting the domain into maintenance. In order to clear the finished tasks from SPM: vdsm-client Host getAllTasksStatuses
It should be something like that: { "1dc4d885-577a-4b6a-b01f-e682602a907c": { "code": 0, "message": "1 jobs completed successfully", "taskID": "1dc4d885-577a-4b6a-b01f-e682602a907c", "taskResult": "success", "taskState": "finished" } }
Then clear that tasks: vdsm-client Task clear taskID=12345 Once it gets cleared, the reconstruction can be finished.
To verify there are no more finished async tasks, you can run this SQL query on the engine: engine=# select * from async_tasks WHERE storage_pool_id = '123';
[1] https://bugzilla.redhat.com/show_bug.cgi?id=1910022 <https://bugzilla.redhat.com/show_bug.cgi?id=1910022> [2] https://bugzilla.redhat.com/show_bug.cgi?id=1627997 <https://bugzilla.redhat.com/show_bug.cgi?id=1627997> [3] https://bugzilla.redhat.com/show_bug.cgi?id=1910302 <https://bugzilla.redhat.com/show_bug.cgi?id=1910302>
*Regards, * *Shani Leviim *
On Thu, Jul 29, 2021 at 8:33 AM Matthew Benstead <matthewb@uvic.ca <mailto:matthewb@uvic.ca>> wrote:
Hello,
I'm trying to decommission the old master storage domain in ovirt, and replace it with a new one. All of the VMs have been migrated off of the old master, and everything has been running on the new storage domain for a couple months. But when I try to put the old domain into maintenance mode I get an error.
Old Master: vm-storage-ssd New Domain: vm-storage-ssd2
The error is:
Failed to Reconstruct Master Domain for Data Center EDC2
As well as:
Sync Error on Master Domain between Host daccs01 and oVirt Engine. Domain: vm-storage-ssd is marked as Master in oVirt Engine database but not on the Storage side. Please consult with Support on how to fix this issue.
2021-07-28 11:41:34,870-07 WARN [org.ovirt.engine.core.vdsbroker.irsbroker.IrsProxy] (EE-ManagedThreadFactory-engine-Thread-23) [] Master domain version is not in sync between DB and VDSM. Domain vm-storage-ssd marked as master, but the version in DB: 283 and in VDSM: 280
And:
Not stopping SPM on vds daccs01, pool id f72ec125-69a1-4c1b-a5e1-313fcb70b6ff as there are uncleared tasks Task '5fa9edf0-56c3-40e4-9327-47bf7764d28d', status 'finished'
After a couple minutes all the domains are marked as active again and things continue, but vm-storage-ssd is still listed as the master domain. Any thoughts?
This is on 4.3.10.4-1.el7 on CentOS 7.
engine=# SELECT storage_name, storage_pool_id, storage, status FROM storage_pool_with_storage_domain ORDER BY storage_name; storage_name | storage_pool_id | storage | status -----------------------+--------------------------------------+----------------------------------------+-------- compute1-iscsi-ssd | f72ec125-69a1-4c1b-a5e1-313fcb70b6ff | yvUESE-yWUv-VIWL-qX90-aAq7-gK0I-EqppRL | 1 compute7-iscsi-ssd | f72ec125-69a1-4c1b-a5e1-313fcb70b6ff | 8ekHdv-u0RJ-B0FO-LUUK-wDWs-iaxb-sh3W3J | 1 export-domain-storage | f72ec125-69a1-4c1b-a5e1-313fcb70b6ff | d3932528-6844-481a-bfed-542872ace9e5 | 1 iso-storage | f72ec125-69a1-4c1b-a5e1-313fcb70b6ff | f800b7a6-6a0c-4560-8476-2f294412d87d | 1 vm-storage-7200rpm | f72ec125-69a1-4c1b-a5e1-313fcb70b6ff | a0bff472-1348-4302-a5c7-f1177efa45a9 | 1 vm-storage-ssd | f72ec125-69a1-4c1b-a5e1-313fcb70b6ff | 95acd9a4-a6fb-4208-80dd-1c53d6aacad0 | 1 vm-storage-ssd2 | f72ec125-69a1-4c1b-a5e1-313fcb70b6ff | 829d0600-c3f7-4dae-a749-d7f05c6a6ca4 | 1 (7 rows)
Thanks, -Matthew -- _______________________________________________ Users mailing list -- users@ovirt.org <mailto:users@ovirt.org> To unsubscribe send an email to users-leave@ovirt.org <mailto:users-leave@ovirt.org> Privacy Statement: https://www.ovirt.org/privacy-policy.html <https://www.ovirt.org/privacy-policy.html> oVirt Code of Conduct: https://www.ovirt.org/community/about/community-guidelines/ <https://www.ovirt.org/community/about/community-guidelines/> List Archives: https://lists.ovirt.org/archives/list/users@ovirt.org/message/OXOXW6B2NWXOUG... <https://lists.ovirt.org/archives/list/users@ovirt.org/message/OXOXW6B2NWXOUGZV3OKO4OMDXVDJSQLZ/>

There should be an error message saying that the master domain (as mention on the db) should come up first. I also think that detaching the current gluster master domain and starting the engine with no master do a new one would be elected instead will leave you in an infinite loop of reconstruction, and editing the db should be involved. Make sure to backup your env before doing any db change manually, and consider it can damage your environment. *Regards,* *Shani Leviim* On Wed, Aug 4, 2021 at 10:27 PM Matthew Benstead <matthewb@uvic.ca> wrote:
Hi Shani,
Yes, it's a gluster domain, and it looks like yes that matches the bug:
2021-08-04 08:47:28,516-0700 ERROR (jsonrpc/5) [storage.StoragePool] migration to new master failed (sp:909) Traceback (most recent call last): File "/usr/lib/python2.7/site-packages/vdsm/storage/sp.py", line 898, in masterMigrate exclude=('./lost+found',)) File "/usr/lib/python2.7/site-packages/vdsm/storage/fileUtils.py", line 69, in tarCopy raise TarCopyFailed(tsrc.returncode, tdst.returncode, out, err) TarCopyFailed: (1, 0, '', '')
The iscsi domains' aren't as resilient as the gluster domains, so I'd really like to have the vm-storage-ssd2 domains as the master. I'll have to take an outage anyways to put the gluster domains in maintenance mode, so could I just put all domains into maintenance mode and then activate vm-storage-ssd2 first so it's elected master? Or is there some transfer mechanism that must take place?
Thanks, -Matthew
-- Matthew Benstead System Administrator Pacific Climate Impacts Consortium <https://pacificclimate.org/> University of Victoria, UH1 PO Box 1800, STN CSC Victoria, BC, V8W 2Y2 Phone: +1-250-721-8432 Email: matthewb@uvic.ca On 8/4/21 11:26 AM, Shani Leviim wrote:
Notice: This message was sent from outside the University of Victoria email system. Please be cautious with links and sensitive information.
The master domain should be up first.
I've noticed from your prev response to:
[root@compute7 ~]# vdsm-client StoragePool getInfo storagepoolID="f72ec125-69a1-4c1b-a5e1-313fcb70b6ff" { "info": { "name": "No Description", "isoprefix": "/rhev/data-center/mnt/10.0.231.91: _storage_data_projects_ovirt_nobackup_iso-storage/3fc76134-2143-4921-ad36-ee84abca40e8/images/11111111-1111-1111-1111-111111111111", "pool_status": "connected", "lver": 9356, "spm_id": 6, "master_uuid": "a5a83df1-47e2-4927-9add-079199ca7ef8", "version": "5", "domains": "f73307bc-06c8-4996-86d1-78947cdaf6dd:Attached,d5ae843b-5815-4f3a-b1be-370e56fe0962:Active,a5a83df1-47e2-4927-9add-079199ca7ef8:Active,311c1382-12c2-43a0-96d0-e2084180b114:Active,fc049ebe-03f9-43fc-adca-d6bfeb99c288:Active,3fc76134-2143-4921-ad36-ee84abca40e8:Active,2f2aab43-6ce3-4cb0-9142-b2b57e5083b3:Active", *"type": "GLUSTERFS",* "master_ver": 288 },
There's a bug regarding putting GLUSTER domains into maintenance for replacing the master role: https://bugzilla.redhat.com/show_bug.cgi?id=1913764.
If you can see that error on the vdsm.log, your case matches the bug: (tasks/2) [storage.StoragePool] Migration to new master 2a0d3c24-3357-4677-b9b2-35486af464a3 failed (sp:903) Traceback (most recent call last): File "/usr/lib/python3.6/site-packages/vdsm/storage/sp.py", line 891, in masterMigrate exclude=('./lost+found',)) File "/usr/lib/python3.6/site-packages/vdsm/storage/fileUtils.py", line 71, in tarCopy raise TarCopyFailed(tsrc.returncode, tdst.returncode, out, err) vdsm.storage.fileUtils.TarCopyFailed: (1, 0, b'', b'')
There's a chance you can switch the master domain to one of the iscsi domains: Try putting domain vm-storage-ssd2 into maintenance and all other GLUSTER domains into maintenance, so only the iscsi domains remain active. Then, try to put vm-storage-ssd into maintenance.
*Regards, *
*Shani Leviim *
On Wed, Aug 4, 2021 at 7:00 PM Matthew Benstead <matthewb@uvic.ca> wrote:
Thanks Shani - Unfortunately it still fails. I had been been hoping to make the change without an outage, but if I shutdown the VMs and put all the storage domains into maintenance mode, and then activate the new storage domain (that I want to be the master) first, would that work? Or is there some kind of transfer that needs to take place?
Before:
During:
Errors:
Thanks, -Matthew
On 8/4/21 2:46 AM, Shani Leviim wrote:
Notice: This message was sent from outside the University of Victoria email system. Please be cautious with links and sensitive information.
Thanks, Matthew So it seems that both vdsm and engine are aligned with that data. Does your env still fail with reconstruction errors?
Once it's "stabilized" (so both domains are up and available), can you please try again putting the current master into maintenance and share the results?
*Regards, *
*Shani Leviim *
On Tue, Aug 3, 2021 at 8:29 PM Matthew Benstead <matthewb@uvic.ca> wrote:
Thanks Shani,
Here's the output from the SPM - it looks like the master version is 288:
[root@compute7 ~]# vdsm-client StoragePool getInfo storagepoolID="f72ec125-69a1-4c1b-a5e1-313fcb70b6ff" { "info": { "name": "No Description", "isoprefix": "/rhev/data-center/mnt/10.0.231.91: _storage_data_projects_ovirt_nobackup_iso-storage/3fc76134-2143-4921-ad36-ee84abca40e8/images/11111111-1111-1111-1111-111111111111", "pool_status": "connected", "lver": 9356, "spm_id": 6, "master_uuid": "a5a83df1-47e2-4927-9add-079199ca7ef8", "version": "5", "domains": "f73307bc-06c8-4996-86d1-78947cdaf6dd:Attached,d5ae843b-5815-4f3a-b1be-370e56fe0962:Active,a5a83df1-47e2-4927-9add-079199ca7ef8:Active,311c1382-12c2-43a0-96d0-e2084180b114:Active,fc049ebe-03f9-43fc-adca-d6bfeb99c288:Active,3fc76134-2143-4921-ad36-ee84abca40e8:Active,2f2aab43-6ce3-4cb0-9142-b2b57e5083b3:Active", "type": "GLUSTERFS", "master_ver": 288 }, "dominfo": { "f73307bc-06c8-4996-86d1-78947cdaf6dd": { "status": "Attached", "isoprefix": "", "alerts": [] }, "d5ae843b-5815-4f3a-b1be-370e56fe0962": { "status": "Active", "diskfree": "94847578931200", "isoprefix": "", "alerts": [], "disktotal": "359981635338240", "version": 0 }, "a5a83df1-47e2-4927-9add-079199ca7ef8": { "status": "Active", "diskfree": "708888707072", "isoprefix": "", "alerts": [], "disktotal": "751252275200", "version": 5 }, "311c1382-12c2-43a0-96d0-e2084180b114": { "status": "Active", "diskfree": "1598335766528", "isoprefix": "", "alerts": [], "disktotal": "2197949513728", "version": 5 }, "2f2aab43-6ce3-4cb0-9142-b2b57e5083b3": { "status": "Active", "diskfree": "1416265465856", "isoprefix": "", "alerts": [], "disktotal": "3837687496704", "version": 5 }, "3fc76134-2143-4921-ad36-ee84abca40e8": { "status": "Active", "diskfree": "94847578931200", "isoprefix": "/rhev/data-center/mnt/10.0.231.91: _storage_data_projects_ovirt_nobackup_iso-storage/3fc76134-2143-4921-ad36-ee84abca40e8/images/11111111-1111-1111-1111-111111111111", "alerts": [], "disktotal": "359981635338240", "version": 0 }, "fc049ebe-03f9-43fc-adca-d6bfeb99c288": { "status": "Active", "diskfree": "1337882312704", "isoprefix": "", "alerts": [], "disktotal": "3837687496704", "version": 5 } } }
And the master domain version for the vm-storage-ssd domain is 288 in the database as well:
engine=# select * from storage_pool where id = 'f72ec125-69a1-4c1b-a5e1-313fcb70b6ff'; id | name | description | storage_pool_type | storage_pool_format_type | status | master_domain_version | spm_vds_id | compatibility_version | _create_date | _update_date | quota_enforcement_type | free_text_comment | is_local
--------------------------------------+------+-------------+-------------------+--------------------------+--------+-----------------------+--------------------------------------+-----------------------+-------------------------------+-------------------------------+------------------------+-------------------+---------- f72ec125-69a1-4c1b-a5e1-313fcb70b6ff | EDC2 | | | 5 | 1 | 288 | 51769733-0cf6-4270-8288-ec96474b7609 | 4.3 | 2015-08-10 20:51:03.831215-07 | 2021-07-29 10:31:02.234262-07 | 0 | | f (1 row)
Here is the master storage domain details: engine=# select * from storage_domains where storage_pool_id='f72ec125-69a1-4c1b-a5e1-313fcb70b6ff' and storage_domain_type='0'; id | storage | storage_name | storage_description | storage_comment | storage_pool_id | available_disk_size | confirmed_available_disk_size | vdo_savings | used_disk_size | commited_disk_size | actual_images_size | status | storage_pool_name | storage_type | storage_domain_type | storage_domain_format_type | last_time_used_as_master | wipe_after_delete | discard_after_delete | first_metadata_device | vg_metadata_device | backup | block_size | storage_domain_shared_status | recoverable | contains_unregistered_entities | warning_low_space_indicator | critical_space_action_blocker | warning_low_confirmed_space_indicator | external_status | supports_discard | is_hosted_engine_storage
--------------------------------------+--------------------------------------+----------------+---------------------+-----------------+--------------------------------------+---------------------+-------------------------------+-------------+----------------+--------------------+--------------------+--------+-------------------+--------------+---------------------+----------------------------+--------------------------+-------------------+----------------------+-----------------------+--------------------+--------+------------+------------------------------+-------------+--------------------------------+-----------------------------+-------------------------------+---------------------------------------+-----------------+------------------+-------------------------- a5a83df1-47e2-4927-9add-079199ca7ef8 | 95acd9a4-a6fb-4208-80dd-1c53d6aacad0 | vm-storage-ssd | | | f72ec125-69a1-4c1b-a5e1-313fcb70b6ff | 660 | | | 39 | 0 | 0 | 3 | EDC2 | 7 | 0 | 5 | 1627497705160 | f | f | | | f | 512 | 1 | t | f | 10 | 5 | 0 | 0 | | f (1 row)
This is the domain we want to switch over to the master domain so I can decommission the old one.
engine=# select * from storage_domains where storage_pool_id='f72ec125-69a1-4c1b-a5e1-313fcb70b6ff' and storage_name = 'vm-storage-ssd2'; id | storage | storage_name | storage_description | storage_comment | storage_pool_id | available_disk_size | confirmed_available_disk_size | vdo_savings | used_disk_size | commited_disk_size | actual_images_size | status | storage_pool_name | storage_type | storage_domain_type | storage_domain_format_type | last_time_used_as_master | wipe_after_delete | discard_after_delete | first_metadata_device | vg_metadata_device | backup | block_size | storage_domain_shared_status | recoverable | contains_unregistered_entities | warning_low_space_indicator | critical_space_action_blocker | warning_low_confirmed_space_indicator | external_status | supports_discard | is_hosted_engine_storage
--------------------------------------+--------------------------------------+-----------------+----------------------------+-----------------+--------------------------------------+---------------------+-------------------------------+-------------+----------------+--------------------+--------------------+--------+-------------------+--------------+---------------------+----------------------------+------------------
--------+-------------------+----------------------+-----------------------+--------------------+--------+------------+------------------------------+-------------+--------------------------------+-----------------------------+-------------------------------+---------------------------------------+-----------------+------------------+-------------------------- 311c1382-12c2-43a0-96d0-e2084180b114 | 829d0600-c3f7-4dae-a749-d7f05c6a6ca4 | vm-storage-ssd2 | Storage01,02,03 vm-storage | | f72ec125-69a1-4c1b-a5e1-313fcb70b6ff | 1488 | | | 559 | 1147 | 538 | 3 | EDC2 | 7 | 1 | 5 | 1627497694904 | f | f | | | f | 512 | 1 | t | f | 10 | 5 | 10 | 0 | | f (1 row)
Thanks, -Matthew
On 8/1/21 2:00 AM, Shani Leviim wrote:
Notice: This message was sent from outside the University of Victoria email system. Please be cautious with links and sensitive information.
Hi Matthew,
You might need to sync back the master version and domain between the engine and vdsm. To verify those parameters on vdsm, run this command on the SPM host: vdsm-client StoragePool getInfo storagepoolID="f72ec125-69a1-4c1b-a5e1-313fcb70b6ff"
The result should be something like: "info": { "domains": "1234:Active,5678:Active,91011:Active", "isoprefix": "", "lver": 6,
* "master_uuid": "123", "master_ver": 14,* "name": "No Description", "pool_status": "connected", "spm_id": 1, "type": "NFS", "version": "5" }
Then, compare the master version value with the engine: engine=> select * from storage_pool where id = 'f72ec125-69a1-4c1b-a5e1-313fcb70b6ff';
And the master domain: engine=> select * from storage_domains where storage_pool_id='f72ec125-69a1-4c1b-a5e1-313fcb70b6ff' and storage_domain_type='0';
(0 means master, for reference, see https://github.com/oVirt/ovirt-engine/blob/a65cf0eae8858ab2278c3f537dc427e3f... )
Then we can get the bigger picture (and update the engine data to match the vdsm)
*Regards, *
*Shani Leviim *
On Thu, Jul 29, 2021 at 8:40 PM Matthew Benstead <matthewb@uvic.ca> wrote:
Thanks Shani - yes we plan to upgrade to 4.4 in the future, but we're on 4.3 right now due to only running CentOS 7 at the moment.
I was able to clear the job from the SPM:
[root@daccs01 ~]# vdsm-client Host getAllTasksStatuses { "5fa9edf0-56c3-40e4-9327-47bf7764d28d": { "message": "1 jobs completed successfully", "code": 0, "taskID": "5fa9edf0-56c3-40e4-9327-47bf7764d28d", "taskResult": "success", "taskState": "finished" } } [root@daccs01 ~]# vdsm-client Task clear taskID=5fa9edf0-56c3-40e4-9327-47bf7764d28d true [root@daccs01 ~]# vdsm-client Host getAllTasksStatuses {}
And confirm there were no async_tasks:
engine=# select * from async_tasks; task_id | action_type | status | result | step_id | command_id | started_at | storage_pool_id | task_type | vdsm_task_id | root_command_id | user_id
---------+-------------+--------+--------+---------+------------+------------+-----------------+-----------+--------------+-----------------+--------- (0 rows)
However, when putting the vm-storage-ssd domain into maintenance mode, it failed again:
Here are some the logs entries - anything else I can look at?
2021-07-29 10:30:37,848-07 ERROR [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (EE-ManagedThreadFactory-engineScheduled-Thread-25) [35c5b47] EVENT_ID: VDS_BROKER_COMMAND_FAILURE(10,802), VDSM compute7.pcic.uvic.ca command ConnectStoragePoolVDS failed: Wrong Master domain or its version: u'SD=a5a83df1-47e2-4927-9add-079199ca7ef8, pool=f72ec125-69a1-4c1b-a5e1-313fcb70b6ff' 2021-07-29 10:30:37,848-07 INFO [org.ovirt.engine.core.vdsbroker.vdsbroker.ConnectStoragePoolVDSCommand] (EE-ManagedThreadFactory-engineScheduled-Thread-25) [35c5b47] Command 'org.ovirt.engine.core.vdsbroker.vd sbroker.ConnectStoragePoolVDSCommand' return value 'StatusOnlyReturn [status=Status [code=324, message=Wrong Master domain or its version: u'SD=a5a83df1-47e2-4927-9add-079199ca7ef8, pool=f72ec125-69a1-4c1b-a5e1- 313fcb70b6ff']]' ... 2021-07-29 10:30:37,848-07 INFO [org.ovirt.engine.core.vdsbroker.vdsbroker.ConnectStoragePoolVDSCommand] (EE-ManagedThreadFactory-engineScheduled-Thread-25) [35c5b47] HostName = compute7.pcic.uvic.ca 2021-07-29 10:30:37,849-07 ERROR [org.ovirt.engine.core.vdsbroker.vdsbroker.ConnectStoragePoolVDSCommand] (EE-ManagedThreadFactory-engineScheduled-Thread-25) [35c5b47] Command 'ConnectStoragePoolVDSCommand(HostN ame = compute7.pcic.uvic.ca, ConnectStoragePoolVDSCommandParameters:{hostId='51769733-0cf6-4270-8288-ec96474b7609', vdsId='51769733-0cf6-4270-8288-ec96474b7609', storagePoolId='f72ec125-69a1-4c1b-a5e1-313fcb70b6 ff', masterVersion='288'})' execution failed: IRSGenericException: IRSErrorException: IRSNoMasterDomainException: Wrong Master domain or its version: u'SD=a5a83df1-47e2-4927-9add-079199ca7ef8, pool=f72ec125-69a1 -4c1b-a5e1-313fcb70b6ff' ... 2021-07-29 10:30:37,849-07 ERROR [org.ovirt.engine.core.vdsbroker.irsbroker.IrsBrokerCommand] (EE-ManagedThreadFactory-engineScheduled-Thread-25) [35c5b47] IrsBroker::Failed::DeactivateStorageDomainVDS: IRSGener icException: IRSErrorException: IRSNoMasterDomainException: Wrong Master domain or its version: u'SD=a5a83df1-47e2-4927-9add-079199ca7ef8, pool=f72ec125-69a1-4c1b-a5e1-313fcb70b6ff' 2021-07-29 10:30:37,855-07 INFO [org.ovirt.engine.core.vdsbroker.irsbroker.DeactivateStorageDomainVDSCommand] (EE-ManagedThreadFactory-engineScheduled-Thread-25) [35c5b47] FINISH, DeactivateStorageDomainVDSComm and, return: , log id: 1c215ca4 2021-07-29 10:30:37,855-07 ERROR [org.ovirt.engine.core.bll.storage.domain.DeactivateStorageDomainCommand] (EE-ManagedThreadFactory-engineScheduled-Thread-25) [35c5b47] DeactivateStorageDomainVDS failed 'a5a83df 1-47e2-4927-9add-079199ca7ef8': org.ovirt.engine.core.common.errors.EngineException: EngineException: org.ovirt.engine.core.vdsbroker.irsbroker.IRSNoMasterDomainException: IRSGenericException: IRSErrorException: IRSNoMasterDomainException: Wrong Master domain or its version: u'SD=a5a83df1-47e2-4927-9add-079199ca7ef8, pool=f72ec125-69a1-4c1b-a5e1-313fcb70b6ff' (Failed with error StoragePoolWrongMaster and code 324) at org.ovirt.engine.core.bll.VdsHandler.handleVdsResult(VdsHandler.java:118) [bll.jar:] at org.ovirt.engine.core.bll.VDSBrokerFrontendImpl.runVdsCommand(VDSBrokerFrontendImpl.java:33) [bll.jar:] at org.ovirt.engine.core.bll.CommandBase.runVdsCommand(CommandBase.java:2112) [bll.jar:] at org.ovirt.engine.core.bll.storage.domain.DeactivateStorageDomainCommand.dectivateStorageDomain(DeactivateStorageDomainCommand.java:340) [bll.jar:] ... at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) [rt.jar:1.8.0_292] at java.lang.Thread.run(Thread.java:748) [rt.jar:1.8.0_292] at org.glassfish.enterprise.concurrent.ManagedThreadFactoryImpl$ManagedThread.run(ManagedThreadFactoryImpl.java:250) [javax.enterprise.concurrent-1.0.jar:] Caused by: org.ovirt.engine.core.vdsbroker.irsbroker.IRSNoMasterDomainException: IRSGenericException: IRSErrorException: IRSNoMasterDomainException: Wrong Master domain or its version: u'SD=a5a83df1-47e2-4927-9add-079199ca7ef8, pool=f72ec125-69a1-4c1b-a5e1-313fcb70b6ff' at org.ovirt.engine.core.vdsbroker.vdsbroker.BrokerCommandBase.proceedProxyReturnValue(BrokerCommandBase.java:50) [vdsbroker.jar:] at org.ovirt.engine.core.vdsbroker.vdsbroker.ConnectStoragePoolVDSCommand.proceedConnectProxyReturnValue(ConnectStoragePoolVDSCommand.java:48) [vdsbroker.jar:] at org.ovirt.engine.core.vdsbroker.vdsbroker.ConnectStoragePoolVDSCommand.proceedProxyReturnValue(ConnectStoragePoolVDSCommand.java:36) [vdsbroker.jar:] ... at org.jboss.weld.bean.proxy.CombinedInterceptorAndDecoratorStackMethodHandler.invoke(CombinedInterceptorAndDecoratorStackMethodHandler.java:79) [weld-core-impl-3.1.1.Final.jar:3.1.1.Final] at org.jboss.weld.bean.proxy.CombinedInterceptorAndDecoratorStackMethodHandler.invoke(CombinedInterceptorAndDecoratorStackMethodHandler.java:68) [weld-core-impl-3.1.1.Final.jar:3.1.1.Final] at org.ovirt.engine.core.vdsbroker.ResourceManager$Proxy$_$$_WeldSubclass.runVdsCommand(Unknown Source) [vdsbroker.jar:] ... 94 more
2021-07-29 10:30:37,861-07 ERROR [org.ovirt.engine.core.bll.storage.domain.DeactivateStorageDomainCommand] (EE-ManagedThreadFactory-engineScheduled-Thread-25) [35c5b47] Failed to deactivate storage domain 'a5a83df1-47e2-4927-9add-079199ca7ef8' 2021-07-29 10:30:37,868-07 INFO [org.ovirt.engine.core.bll.CommandCompensator] (EE-ManagedThreadFactory-engineScheduled-Thread-25) [35c5b47] Command [id=c63199f8-a720-4053-8e5c-92c8d21e0ce2]: Compensating CHANGED_STATUS_ONLY of org.ovirt.engine.core.common.businessentities.StoragePoolIsoMap; snapshot: EntityStatusSnapshot:{id='StoragePoolIsoMapId:{storagePoolId='f72ec125-69a1-4c1b-a5e1-313fcb70b6ff', storageId='a5a83df1-47e2-4927-9add-079199ca7ef8'}', status='Unknown'}. 2021-07-29 10:30:37,882-07 ERROR [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (EE-ManagedThreadFactory-engineScheduled-Thread-25) [35c5b47] EVENT_ID: USER_DEACTIVATE_STORAGE_DOMAIN_FAILED(969), Failed to deactivate Storage Domain vm-storage-ssd (Data Center EDC2). 2021-07-29 10:30:37,884-07 WARN [org.ovirt.engine.core.bll.storage.pool.ReconstructMasterDomainCommand] (EE-ManagedThreadFactory-engine-Thread-25) [60d33d] Validation of action 'ReconstructMasterDomain' failed for user SYSTEM. Reasons: VAR__ACTION__RECONSTRUCT_MASTER,VAR__TYPE__STORAGE__DOMAIN,ACTION_TYPE_FAILED_STORAGE_DOMAIN_STATUS_ILLEGAL2,$status Locked 2021-07-29 10:30:37,888-07 INFO [org.ovirt.engine.core.bll.eventqueue.EventQueueMonitor] (EE-ManagedThreadFactory-engine-Thread-48) [35c5b47] Finished reconstruct for pool 'f72ec125-69a1-4c1b-a5e1-313fcb70b6ff'. Clearing event queue 2021-07-29 10:30:37,899-07 INFO [org.ovirt.engine.core.vdsbroker.vdsbroker.HSMGetAllTasksStatusesVDSCommand] (EE-ManagedThreadFactory-engineScheduled-Thread-50) [] Command 'org.ovirt.engine.core.vdsbroker.vdsbroker.HSMGetAllTasksStatusesVDSCommand' return value ' TaskStatusListReturn:{status='Status [code=654, message=Not SPM]'}
Thanks, -Matthew
On 7/29/21 2:52 AM, Shani Leviim wrote:
Notice: This message was sent from outside the University of Victoria email system. Please be cautious with links and sensitive information.
Hi Matthew, Actually, your description is related to 2 features available for ovirt 4.4.5 <https://www.ovirt.org/release/4.4.5/> 1. The ability to switch the master storage domain while domains are up and running [1] 2. Clearing the finished tasks from REST API [2] and UI [3].
We recommend you upgrade your engine to enjoy those features.
In the meanwhile, as you've described, moving the Master role from one storage to the other is available using putting the domain into maintenance. In order to clear the finished tasks from SPM: vdsm-client Host getAllTasksStatuses
It should be something like that: { "1dc4d885-577a-4b6a-b01f-e682602a907c": { "code": 0, "message": "1 jobs completed successfully", "taskID": "1dc4d885-577a-4b6a-b01f-e682602a907c", "taskResult": "success", "taskState": "finished" } }
Then clear that tasks: vdsm-client Task clear taskID=12345 Once it gets cleared, the reconstruction can be finished.
To verify there are no more finished async tasks, you can run this SQL query on the engine: engine=# select * from async_tasks WHERE storage_pool_id = '123';
[1] https://bugzilla.redhat.com/show_bug.cgi?id=1910022 [2] https://bugzilla.redhat.com/show_bug.cgi?id=1627997 [3] https://bugzilla.redhat.com/show_bug.cgi?id=1910302
*Regards, *
*Shani Leviim *
On Thu, Jul 29, 2021 at 8:33 AM Matthew Benstead <matthewb@uvic.ca> wrote:
Hello,
I'm trying to decommission the old master storage domain in ovirt, and replace it with a new one. All of the VMs have been migrated off of the old master, and everything has been running on the new storage domain for a couple months. But when I try to put the old domain into maintenance mode I get an error.
Old Master: vm-storage-ssd New Domain: vm-storage-ssd2
The error is:
Failed to Reconstruct Master Domain for Data Center EDC2
As well as:
Sync Error on Master Domain between Host daccs01 and oVirt Engine. Domain: vm-storage-ssd is marked as Master in oVirt Engine database but not on the Storage side. Please consult with Support on how to fix this issue.
2021-07-28 11:41:34,870-07 WARN [org.ovirt.engine.core.vdsbroker.irsbroker.IrsProxy] (EE-ManagedThreadFactory-engine-Thread-23) [] Master domain version is not in sync between DB and VDSM. Domain vm-storage-ssd marked as master, but the version in DB: 283 and in VDSM: 280
And:
Not stopping SPM on vds daccs01, pool id f72ec125-69a1-4c1b-a5e1-313fcb70b6ff as there are uncleared tasks Task '5fa9edf0-56c3-40e4-9327-47bf7764d28d', status 'finished'
After a couple minutes all the domains are marked as active again and things continue, but vm-storage-ssd is still listed as the master domain. Any thoughts?
This is on 4.3.10.4-1.el7 on CentOS 7.
engine=# SELECT storage_name, storage_pool_id, storage, status FROM storage_pool_with_storage_domain ORDER BY storage_name; storage_name | storage_pool_id | storage | status
-----------------------+--------------------------------------+----------------------------------------+-------- compute1-iscsi-ssd | f72ec125-69a1-4c1b-a5e1-313fcb70b6ff | yvUESE-yWUv-VIWL-qX90-aAq7-gK0I-EqppRL | 1 compute7-iscsi-ssd | f72ec125-69a1-4c1b-a5e1-313fcb70b6ff | 8ekHdv-u0RJ-B0FO-LUUK-wDWs-iaxb-sh3W3J | 1 export-domain-storage | f72ec125-69a1-4c1b-a5e1-313fcb70b6ff | d3932528-6844-481a-bfed-542872ace9e5 | 1 iso-storage | f72ec125-69a1-4c1b-a5e1-313fcb70b6ff | f800b7a6-6a0c-4560-8476-2f294412d87d | 1 vm-storage-7200rpm | f72ec125-69a1-4c1b-a5e1-313fcb70b6ff | a0bff472-1348-4302-a5c7-f1177efa45a9 | 1 vm-storage-ssd | f72ec125-69a1-4c1b-a5e1-313fcb70b6ff | 95acd9a4-a6fb-4208-80dd-1c53d6aacad0 | 1 vm-storage-ssd2 | f72ec125-69a1-4c1b-a5e1-313fcb70b6ff | 829d0600-c3f7-4dae-a749-d7f05c6a6ca4 | 1 (7 rows)
Thanks, -Matthew -- _______________________________________________ Users mailing list -- users@ovirt.org To unsubscribe send an email to users-leave@ovirt.org Privacy Statement: https://www.ovirt.org/privacy-policy.html oVirt Code of Conduct: https://www.ovirt.org/community/about/community-guidelines/ List Archives: https://lists.ovirt.org/archives/list/users@ovirt.org/message/OXOXW6B2NWXOUG...

Yikes.... ok. So the recommended path would be to: * Put all domains except the current master and one of the iscsi domains into maintenance mode * Then put the master domain into maintenance mode * Hopefully that iscsi domain then becomes master * Then activate vm-storage-ssd2 * Put the iscsi domain into maintenance mode so master would move to vm-storage-ssd2 Right? Or am I missing something? I assume the bug you mentioned wouldn't be back ported to 4.3 when fixed.... right? Thanks, -Matthew On 8/4/21 1:17 PM, Shani Leviim wrote:
Notice: This message was sent from outside the University of Victoria email system. Please be cautious with links and sensitive information.
There should be an error message saying that the master domain (as mention on the db) should come up first. I also think that detaching the current gluster master domain and starting the engine with no master do a new one would be elected instead will leave you in an infinite loop of reconstruction, and editing the db should be involved.
Make sure to backup your env before doing any db change manually, and consider it can damage your environment.
*Regards, * *Shani Leviim *
On Wed, Aug 4, 2021 at 10:27 PM Matthew Benstead <matthewb@uvic.ca <mailto:matthewb@uvic.ca>> wrote:
Hi Shani,
Yes, it's a gluster domain, and it looks like yes that matches the bug:
2021-08-04 08:47:28,516-0700 ERROR (jsonrpc/5) [storage.StoragePool] migration to new master failed (sp:909) Traceback (most recent call last): File "/usr/lib/python2.7/site-packages/vdsm/storage/sp.py", line 898, in masterMigrate exclude=('./lost+found',)) File "/usr/lib/python2.7/site-packages/vdsm/storage/fileUtils.py", line 69, in tarCopy raise TarCopyFailed(tsrc.returncode, tdst.returncode, out, err) TarCopyFailed: (1, 0, '', '')
The iscsi domains' aren't as resilient as the gluster domains, so I'd really like to have the vm-storage-ssd2 domains as the master. I'll have to take an outage anyways to put the gluster domains in maintenance mode, so could I just put all domains into maintenance mode and then activate vm-storage-ssd2 first so it's elected master? Or is there some transfer mechanism that must take place?
Thanks, -Matthew
-- Matthew Benstead System Administrator Pacific Climate Impacts Consortium <https://pacificclimate.org/> University of Victoria, UH1 PO Box 1800, STN CSC Victoria, BC, V8W 2Y2 Phone: +1-250-721-8432 Email: matthewb@uvic.ca <mailto:matthewb@uvic.ca>
On 8/4/21 11:26 AM, Shani Leviim wrote:
Notice: This message was sent from outside the University of Victoria email system. Please be cautious with links and sensitive information.
The master domain should be up first.
I've noticed from your prev response to:
[root@compute7 ~]# vdsm-client StoragePool getInfo storagepoolID="f72ec125-69a1-4c1b-a5e1-313fcb70b6ff" { "info": { "name": "No Description", "isoprefix": "/rhev/data-center/mnt/10.0.231.91:_storage_data_projects_ovirt_nobackup_iso-storage/3fc76134-2143-4921-ad36-ee84abca40e8/images/11111111-1111-1111-1111-111111111111", "pool_status": "connected", "lver": 9356, "spm_id": 6, "master_uuid": "a5a83df1-47e2-4927-9add-079199ca7ef8", "version": "5", "domains": "f73307bc-06c8-4996-86d1-78947cdaf6dd:Attached,d5ae843b-5815-4f3a-b1be-370e56fe0962:Active,a5a83df1-47e2-4927-9add-079199ca7ef8:Active,311c1382-12c2-43a0-96d0-e2084180b114:Active,fc049ebe-03f9-43fc-adca-d6bfeb99c288:Active,3fc76134-2143-4921-ad36-ee84abca40e8:Active,2f2aab43-6ce3-4cb0-9142-b2b57e5083b3:Active", *"type": "GLUSTERFS",* "master_ver": 288 },
There's a bug regarding putting GLUSTER domains into maintenance for replacing the master role: https://bugzilla.redhat.com/show_bug.cgi?id=1913764 <https://bugzilla.redhat.com/show_bug.cgi?id=1913764>.
If you can see that error on the vdsm.log, your case matches the bug: (tasks/2) [storage.StoragePool] Migration to new master 2a0d3c24-3357-4677-b9b2-35486af464a3 failed (sp:903) Traceback (most recent call last): File "/usr/lib/python3.6/site-packages/vdsm/storage/sp.py", line 891, in masterMigrate exclude=('./lost+found',)) File "/usr/lib/python3.6/site-packages/vdsm/storage/fileUtils.py", line 71, in tarCopy raise TarCopyFailed(tsrc.returncode, tdst.returncode, out, err) vdsm.storage.fileUtils.TarCopyFailed: (1, 0, b'', b'')
There's a chance you can switch the master domain to one of the iscsi domains: Try putting domain vm-storage-ssd2 into maintenance and all other GLUSTER domains into maintenance, so only the iscsi domains remain active. Then, try to put vm-storage-ssd into maintenance.
*Regards, * *Shani Leviim *
On Wed, Aug 4, 2021 at 7:00 PM Matthew Benstead <matthewb@uvic.ca <mailto:matthewb@uvic.ca>> wrote:
Thanks Shani - Unfortunately it still fails. I had been been hoping to make the change without an outage, but if I shutdown the VMs and put all the storage domains into maintenance mode, and then activate the new storage domain (that I want to be the master) first, would that work? Or is there some kind of transfer that needs to take place?
Before:
During:
Errors:
Thanks, -Matthew
On 8/4/21 2:46 AM, Shani Leviim wrote:
Notice: This message was sent from outside the University of Victoria email system. Please be cautious with links and sensitive information.
Thanks, Matthew So it seems that both vdsm and engine are aligned with that data. Does your env still fail with reconstruction errors?
Once it's "stabilized" (so both domains are up and available), can you please try again putting the current master into maintenance and share the results?
*Regards, * *Shani Leviim *
On Tue, Aug 3, 2021 at 8:29 PM Matthew Benstead <matthewb@uvic.ca <mailto:matthewb@uvic.ca>> wrote:
Thanks Shani,
Here's the output from the SPM - it looks like the master version is 288:
[root@compute7 ~]# vdsm-client StoragePool getInfo storagepoolID="f72ec125-69a1-4c1b-a5e1-313fcb70b6ff" { "info": { "name": "No Description", "isoprefix": "/rhev/data-center/mnt/10.0.231.91:_storage_data_projects_ovirt_nobackup_iso-storage/3fc76134-2143-4921-ad36-ee84abca40e8/images/11111111-1111-1111-1111-111111111111", "pool_status": "connected", "lver": 9356, "spm_id": 6, "master_uuid": "a5a83df1-47e2-4927-9add-079199ca7ef8", "version": "5", "domains": "f73307bc-06c8-4996-86d1-78947cdaf6dd:Attached,d5ae843b-5815-4f3a-b1be-370e56fe0962:Active,a5a83df1-47e2-4927-9add-079199ca7ef8:Active,311c1382-12c2-43a0-96d0-e2084180b114:Active,fc049ebe-03f9-43fc-adca-d6bfeb99c288:Active,3fc76134-2143-4921-ad36-ee84abca40e8:Active,2f2aab43-6ce3-4cb0-9142-b2b57e5083b3:Active", "type": "GLUSTERFS", "master_ver": 288 }, "dominfo": { "f73307bc-06c8-4996-86d1-78947cdaf6dd": { "status": "Attached", "isoprefix": "", "alerts": [] }, "d5ae843b-5815-4f3a-b1be-370e56fe0962": { "status": "Active", "diskfree": "94847578931200", "isoprefix": "", "alerts": [], "disktotal": "359981635338240", "version": 0 }, "a5a83df1-47e2-4927-9add-079199ca7ef8": { "status": "Active", "diskfree": "708888707072", "isoprefix": "", "alerts": [], "disktotal": "751252275200", "version": 5 }, "311c1382-12c2-43a0-96d0-e2084180b114": { "status": "Active", "diskfree": "1598335766528", "isoprefix": "", "alerts": [], "disktotal": "2197949513728", "version": 5 }, "2f2aab43-6ce3-4cb0-9142-b2b57e5083b3": { "status": "Active", "diskfree": "1416265465856", "isoprefix": "", "alerts": [], "disktotal": "3837687496704", "version": 5 }, "3fc76134-2143-4921-ad36-ee84abca40e8": { "status": "Active", "diskfree": "94847578931200", "isoprefix": "/rhev/data-center/mnt/10.0.231.91:_storage_data_projects_ovirt_nobackup_iso-storage/3fc76134-2143-4921-ad36-ee84abca40e8/images/11111111-1111-1111-1111-111111111111", "alerts": [], "disktotal": "359981635338240", "version": 0 }, "fc049ebe-03f9-43fc-adca-d6bfeb99c288": { "status": "Active", "diskfree": "1337882312704", "isoprefix": "", "alerts": [], "disktotal": "3837687496704", "version": 5 } } }
And the master domain version for the vm-storage-ssd domain is 288 in the database as well:
engine=# select * from storage_pool where id = 'f72ec125-69a1-4c1b-a5e1-313fcb70b6ff'; id | name | description | storage_pool_type | storage_pool_format_type | status | master_domain_version | spm_vds_id | compatibility_version | _create_date | _update_date | quota_enforcement_type | free_text_comment | is_local --------------------------------------+------+-------------+-------------------+--------------------------+--------+-----------------------+--------------------------------------+-----------------------+-------------------------------+-------------------------------+------------------------+-------------------+---------- f72ec125-69a1-4c1b-a5e1-313fcb70b6ff | EDC2 | | | 5 | 1 | 288 | 51769733-0cf6-4270-8288-ec96474b7609 | 4.3 | 2015-08-10 20:51:03.831215-07 | 2021-07-29 10:31:02.234262-07 | 0 | | f (1 row)
Here is the master storage domain details: engine=# select * from storage_domains where storage_pool_id='f72ec125-69a1-4c1b-a5e1-313fcb70b6ff' and storage_domain_type='0'; id | storage | storage_name | storage_description | storage_comment | storage_pool_id | available_disk_size | confirmed_available_disk_size | vdo_savings | used_disk_size | commited_disk_size | actual_images_size | status | storage_pool_name | storage_type | storage_domain_type | storage_domain_format_type | last_time_used_as_master | wipe_after_delete | discard_after_delete | first_metadata_device | vg_metadata_device | backup | block_size | storage_domain_shared_status | recoverable | contains_unregistered_entities | warning_low_space_indicator | critical_space_action_blocker | warning_low_confirmed_space_indicator | external_status | supports_discard | is_hosted_engine_storage --------------------------------------+--------------------------------------+----------------+---------------------+-----------------+--------------------------------------+---------------------+-------------------------------+-------------+----------------+--------------------+--------------------+--------+-------------------+--------------+---------------------+----------------------------+--------------------------+-------------------+----------------------+-----------------------+--------------------+--------+------------+------------------------------+-------------+--------------------------------+-----------------------------+-------------------------------+---------------------------------------+-----------------+------------------+-------------------------- a5a83df1-47e2-4927-9add-079199ca7ef8 | 95acd9a4-a6fb-4208-80dd-1c53d6aacad0 | vm-storage-ssd | | | f72ec125-69a1-4c1b-a5e1-313fcb70b6ff | 660 | | | 39 | 0 | 0 | 3 | EDC2 | 7 | 0 | 5 | 1627497705160 | f | f | | | f | 512 | 1 | t | f | 10 | 5 | 0 | 0 | | f (1 row)
This is the domain we want to switch over to the master domain so I can decommission the old one.
engine=# select * from storage_domains where storage_pool_id='f72ec125-69a1-4c1b-a5e1-313fcb70b6ff' and storage_name = 'vm-storage-ssd2'; id | storage | storage_name | storage_description | storage_comment | storage_pool_id | available_disk_size | confirmed_available_disk_size | vdo_savings | used_disk_size | commited_disk_size | actual_images_size | status | storage_pool_name | storage_type | storage_domain_type | storage_domain_format_type | last_time_used_as_master | wipe_after_delete | discard_after_delete | first_metadata_device | vg_metadata_device | backup | block_size | storage_domain_shared_status | recoverable | contains_unregistered_entities | warning_low_space_indicator | critical_space_action_blocker | warning_low_confirmed_space_indicator | external_status | supports_discard | is_hosted_engine_storage --------------------------------------+--------------------------------------+-----------------+----------------------------+-----------------+--------------------------------------+---------------------+-------------------------------+-------------+----------------+--------------------+--------------------+--------+-------------------+--------------+---------------------+----------------------------+------------------ --------+-------------------+----------------------+-----------------------+--------------------+--------+------------+------------------------------+-------------+--------------------------------+-----------------------------+-------------------------------+---------------------------------------+-----------------+------------------+-------------------------- 311c1382-12c2-43a0-96d0-e2084180b114 | 829d0600-c3f7-4dae-a749-d7f05c6a6ca4 | vm-storage-ssd2 | Storage01,02,03 vm-storage | | f72ec125-69a1-4c1b-a5e1-313fcb70b6ff | 1488 | | | 559 | 1147 | 538 | 3 | EDC2 | 7 | 1 | 5 | 1627497694904 | f | f | | | f | 512 | 1 | t | f | 10 | 5 | 10 | 0 | | f (1 row)
Thanks, -Matthew
On 8/1/21 2:00 AM, Shani Leviim wrote:
Notice: This message was sent from outside the University of Victoria email system. Please be cautious with links and sensitive information.
Hi Matthew,
You might need to sync back the master version and domain between the engine and vdsm. To verify those parameters on vdsm, run this command on the SPM host: vdsm-client StoragePool getInfo storagepoolID="f72ec125-69a1-4c1b-a5e1-313fcb70b6ff"
The result should be something like: "info": { "domains": "1234:Active,5678:Active,91011:Active", "isoprefix": "", "lver": 6, * "master_uuid": "123", "master_ver": 14,* "name": "No Description", "pool_status": "connected", "spm_id": 1, "type": "NFS", "version": "5" }
Then, compare the master version value with the engine: engine=> select * from storage_pool where id = 'f72ec125-69a1-4c1b-a5e1-313fcb70b6ff';
And the master domain: engine=> select * from storage_domains where storage_pool_id='f72ec125-69a1-4c1b-a5e1-313fcb70b6ff' and storage_domain_type='0';
(0 means master, for reference, see https://github.com/oVirt/ovirt-engine/blob/a65cf0eae8858ab2278c3f537dc427e3f... <https://github.com/oVirt/ovirt-engine/blob/a65cf0eae8858ab2278c3f537dc427e3ff20eba7/backend/manager/modules/common/src/main/java/org/ovirt/engine/core/common/businessentities/StorageDomainType.java>)
Then we can get the bigger picture (and update the engine data to match the vdsm)
*Regards, * *Shani Leviim *
On Thu, Jul 29, 2021 at 8:40 PM Matthew Benstead <matthewb@uvic.ca <mailto:matthewb@uvic.ca>> wrote:
Thanks Shani - yes we plan to upgrade to 4.4 in the future, but we're on 4.3 right now due to only running CentOS 7 at the moment.
I was able to clear the job from the SPM:
[root@daccs01 ~]# vdsm-client Host getAllTasksStatuses { "5fa9edf0-56c3-40e4-9327-47bf7764d28d": { "message": "1 jobs completed successfully", "code": 0, "taskID": "5fa9edf0-56c3-40e4-9327-47bf7764d28d", "taskResult": "success", "taskState": "finished" } } [root@daccs01 ~]# vdsm-client Task clear taskID=5fa9edf0-56c3-40e4-9327-47bf7764d28d true [root@daccs01 ~]# vdsm-client Host getAllTasksStatuses {}
And confirm there were no async_tasks:
engine=# select * from async_tasks; task_id | action_type | status | result | step_id | command_id | started_at | storage_pool_id | task_type | vdsm_task_id | root_command_id | user_id ---------+-------------+--------+--------+---------+------------+------------+-----------------+-----------+--------------+-----------------+--------- (0 rows)
However, when putting the vm-storage-ssd domain into maintenance mode, it failed again:
Here are some the logs entries - anything else I can look at?
2021-07-29 10:30:37,848-07 ERROR [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (EE-ManagedThreadFactory-engineScheduled-Thread-25) [35c5b47] EVENT_ID: VDS_BROKER_COMMAND_FAILURE(10,802), VDSM compute7.pcic.uvic.ca <http://compute7.pcic.uvic.ca> command ConnectStoragePoolVDS failed: Wrong Master domain or its version: u'SD=a5a83df1-47e2-4927-9add-079199ca7ef8, pool=f72ec125-69a1-4c1b-a5e1-313fcb70b6ff' 2021-07-29 10:30:37,848-07 INFO [org.ovirt.engine.core.vdsbroker.vdsbroker.ConnectStoragePoolVDSCommand] (EE-ManagedThreadFactory-engineScheduled-Thread-25) [35c5b47] Command 'org.ovirt.engine.core.vdsbroker.vd sbroker.ConnectStoragePoolVDSCommand' return value 'StatusOnlyReturn [status=Status [code=324, message=Wrong Master domain or its version: u'SD=a5a83df1-47e2-4927-9add-079199ca7ef8, pool=f72ec125-69a1-4c1b-a5e1- 313fcb70b6ff']]' ... 2021-07-29 10:30:37,848-07 INFO [org.ovirt.engine.core.vdsbroker.vdsbroker.ConnectStoragePoolVDSCommand] (EE-ManagedThreadFactory-engineScheduled-Thread-25) [35c5b47] HostName = compute7.pcic.uvic.ca <http://compute7.pcic.uvic.ca> 2021-07-29 10:30:37,849-07 ERROR [org.ovirt.engine.core.vdsbroker.vdsbroker.ConnectStoragePoolVDSCommand] (EE-ManagedThreadFactory-engineScheduled-Thread-25) [35c5b47] Command 'ConnectStoragePoolVDSCommand(HostN ame = compute7.pcic.uvic.ca <http://compute7.pcic.uvic.ca>, ConnectStoragePoolVDSCommandParameters:{hostId='51769733-0cf6-4270-8288-ec96474b7609', vdsId='51769733-0cf6-4270-8288-ec96474b7609', storagePoolId='f72ec125-69a1-4c1b-a5e1-313fcb70b6 ff', masterVersion='288'})' execution failed: IRSGenericException: IRSErrorException: IRSNoMasterDomainException: Wrong Master domain or its version: u'SD=a5a83df1-47e2-4927-9add-079199ca7ef8, pool=f72ec125-69a1 -4c1b-a5e1-313fcb70b6ff' ... 2021-07-29 10:30:37,849-07 ERROR [org.ovirt.engine.core.vdsbroker.irsbroker.IrsBrokerCommand] (EE-ManagedThreadFactory-engineScheduled-Thread-25) [35c5b47] IrsBroker::Failed::DeactivateStorageDomainVDS: IRSGener icException: IRSErrorException: IRSNoMasterDomainException: Wrong Master domain or its version: u'SD=a5a83df1-47e2-4927-9add-079199ca7ef8, pool=f72ec125-69a1-4c1b-a5e1-313fcb70b6ff' 2021-07-29 10:30:37,855-07 INFO [org.ovirt.engine.core.vdsbroker.irsbroker.DeactivateStorageDomainVDSCommand] (EE-ManagedThreadFactory-engineScheduled-Thread-25) [35c5b47] FINISH, DeactivateStorageDomainVDSComm and, return: , log id: 1c215ca4 2021-07-29 10:30:37,855-07 ERROR [org.ovirt.engine.core.bll.storage.domain.DeactivateStorageDomainCommand] (EE-ManagedThreadFactory-engineScheduled-Thread-25) [35c5b47] DeactivateStorageDomainVDS failed 'a5a83df 1-47e2-4927-9add-079199ca7ef8': org.ovirt.engine.core.common.errors.EngineException: EngineException: org.ovirt.engine.core.vdsbroker.irsbroker.IRSNoMasterDomainException: IRSGenericException: IRSErrorException: IRSNoMasterDomainException: Wrong Master domain or its version: u'SD=a5a83df1-47e2-4927-9add-079199ca7ef8, pool=f72ec125-69a1-4c1b-a5e1-313fcb70b6ff' (Failed with error StoragePoolWrongMaster and code 324) at org.ovirt.engine.core.bll.VdsHandler.handleVdsResult(VdsHandler.java:118) [bll.jar:] at org.ovirt.engine.core.bll.VDSBrokerFrontendImpl.runVdsCommand(VDSBrokerFrontendImpl.java:33) [bll.jar:] at org.ovirt.engine.core.bll.CommandBase.runVdsCommand(CommandBase.java:2112) [bll.jar:] at org.ovirt.engine.core.bll.storage.domain.DeactivateStorageDomainCommand.dectivateStorageDomain(DeactivateStorageDomainCommand.java:340) [bll.jar:] ... at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) [rt.jar:1.8.0_292] at java.lang.Thread.run(Thread.java:748) [rt.jar:1.8.0_292] at org.glassfish.enterprise.concurrent.ManagedThreadFactoryImpl$ManagedThread.run(ManagedThreadFactoryImpl.java:250) [javax.enterprise.concurrent-1.0.jar:] Caused by: org.ovirt.engine.core.vdsbroker.irsbroker.IRSNoMasterDomainException: IRSGenericException: IRSErrorException: IRSNoMasterDomainException: Wrong Master domain or its version: u'SD=a5a83df1-47e2-4927-9add-079199ca7ef8, pool=f72ec125-69a1-4c1b-a5e1-313fcb70b6ff' at org.ovirt.engine.core.vdsbroker.vdsbroker.BrokerCommandBase.proceedProxyReturnValue(BrokerCommandBase.java:50) [vdsbroker.jar:] at org.ovirt.engine.core.vdsbroker.vdsbroker.ConnectStoragePoolVDSCommand.proceedConnectProxyReturnValue(ConnectStoragePoolVDSCommand.java:48) [vdsbroker.jar:] at org.ovirt.engine.core.vdsbroker.vdsbroker.ConnectStoragePoolVDSCommand.proceedProxyReturnValue(ConnectStoragePoolVDSCommand.java:36) [vdsbroker.jar:] ... at org.jboss.weld.bean.proxy.CombinedInterceptorAndDecoratorStackMethodHandler.invoke(CombinedInterceptorAndDecoratorStackMethodHandler.java:79) [weld-core-impl-3.1.1.Final.jar:3.1.1.Final] at org.jboss.weld.bean.proxy.CombinedInterceptorAndDecoratorStackMethodHandler.invoke(CombinedInterceptorAndDecoratorStackMethodHandler.java:68) [weld-core-impl-3.1.1.Final.jar:3.1.1.Final] at org.ovirt.engine.core.vdsbroker.ResourceManager$Proxy$_$$_WeldSubclass.runVdsCommand(Unknown Source) [vdsbroker.jar:] ... 94 more
2021-07-29 10:30:37,861-07 ERROR [org.ovirt.engine.core.bll.storage.domain.DeactivateStorageDomainCommand] (EE-ManagedThreadFactory-engineScheduled-Thread-25) [35c5b47] Failed to deactivate storage domain 'a5a83df1-47e2-4927-9add-079199ca7ef8' 2021-07-29 10:30:37,868-07 INFO [org.ovirt.engine.core.bll.CommandCompensator] (EE-ManagedThreadFactory-engineScheduled-Thread-25) [35c5b47] Command [id=c63199f8-a720-4053-8e5c-92c8d21e0ce2]: Compensating CHANGED_STATUS_ONLY of org.ovirt.engine.core.common.businessentities.StoragePoolIsoMap; snapshot: EntityStatusSnapshot:{id='StoragePoolIsoMapId:{storagePoolId='f72ec125-69a1-4c1b-a5e1-313fcb70b6ff', storageId='a5a83df1-47e2-4927-9add-079199ca7ef8'}', status='Unknown'}. 2021-07-29 10:30:37,882-07 ERROR [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (EE-ManagedThreadFactory-engineScheduled-Thread-25) [35c5b47] EVENT_ID: USER_DEACTIVATE_STORAGE_DOMAIN_FAILED(969), Failed to deactivate Storage Domain vm-storage-ssd (Data Center EDC2). 2021-07-29 10:30:37,884-07 WARN [org.ovirt.engine.core.bll.storage.pool.ReconstructMasterDomainCommand] (EE-ManagedThreadFactory-engine-Thread-25) [60d33d] Validation of action 'ReconstructMasterDomain' failed for user SYSTEM. Reasons: VAR__ACTION__RECONSTRUCT_MASTER,VAR__TYPE__STORAGE__DOMAIN,ACTION_TYPE_FAILED_STORAGE_DOMAIN_STATUS_ILLEGAL2,$status Locked 2021-07-29 10:30:37,888-07 INFO [org.ovirt.engine.core.bll.eventqueue.EventQueueMonitor] (EE-ManagedThreadFactory-engine-Thread-48) [35c5b47] Finished reconstruct for pool 'f72ec125-69a1-4c1b-a5e1-313fcb70b6ff'. Clearing event queue 2021-07-29 10:30:37,899-07 INFO [org.ovirt.engine.core.vdsbroker.vdsbroker.HSMGetAllTasksStatusesVDSCommand] (EE-ManagedThreadFactory-engineScheduled-Thread-50) [] Command 'org.ovirt.engine.core.vdsbroker.vdsbroker.HSMGetAllTasksStatusesVDSCommand' return value ' TaskStatusListReturn:{status='Status [code=654, message=Not SPM]'}
Thanks, -Matthew
On 7/29/21 2:52 AM, Shani Leviim wrote:
Notice: This message was sent from outside the University of Victoria email system. Please be cautious with links and sensitive information.
Hi Matthew, Actually, your description is related to 2 features available for ovirt 4.4.5 <https://www.ovirt.org/release/4.4.5/> 1. The ability to switch the master storage domain while domains are up and running [1] 2. Clearing the finished tasks from REST API [2] and UI [3].
We recommend you upgrade your engine to enjoy those features.
In the meanwhile, as you've described, moving the Master role from one storage to the other is available using putting the domain into maintenance. In order to clear the finished tasks from SPM: vdsm-client Host getAllTasksStatuses
It should be something like that: { "1dc4d885-577a-4b6a-b01f-e682602a907c": { "code": 0, "message": "1 jobs completed successfully", "taskID": "1dc4d885-577a-4b6a-b01f-e682602a907c", "taskResult": "success", "taskState": "finished" } }
Then clear that tasks: vdsm-client Task clear taskID=12345 Once it gets cleared, the reconstruction can be finished.
To verify there are no more finished async tasks, you can run this SQL query on the engine: engine=# select * from async_tasks WHERE storage_pool_id = '123';
[1] https://bugzilla.redhat.com/show_bug.cgi?id=1910022 <https://bugzilla.redhat.com/show_bug.cgi?id=1910022> [2] https://bugzilla.redhat.com/show_bug.cgi?id=1627997 <https://bugzilla.redhat.com/show_bug.cgi?id=1627997> [3] https://bugzilla.redhat.com/show_bug.cgi?id=1910302 <https://bugzilla.redhat.com/show_bug.cgi?id=1910302>
*Regards, * *Shani Leviim *
On Thu, Jul 29, 2021 at 8:33 AM Matthew Benstead <matthewb@uvic.ca <mailto:matthewb@uvic.ca>> wrote:
Hello,
I'm trying to decommission the old master storage domain in ovirt, and replace it with a new one. All of the VMs have been migrated off of the old master, and everything has been running on the new storage domain for a couple months. But when I try to put the old domain into maintenance mode I get an error.
Old Master: vm-storage-ssd New Domain: vm-storage-ssd2
The error is:
Failed to Reconstruct Master Domain for Data Center EDC2
As well as:
Sync Error on Master Domain between Host daccs01 and oVirt Engine. Domain: vm-storage-ssd is marked as Master in oVirt Engine database but not on the Storage side. Please consult with Support on how to fix this issue.
2021-07-28 11:41:34,870-07 WARN [org.ovirt.engine.core.vdsbroker.irsbroker.IrsProxy] (EE-ManagedThreadFactory-engine-Thread-23) [] Master domain version is not in sync between DB and VDSM. Domain vm-storage-ssd marked as master, but the version in DB: 283 and in VDSM: 280
And:
Not stopping SPM on vds daccs01, pool id f72ec125-69a1-4c1b-a5e1-313fcb70b6ff as there are uncleared tasks Task '5fa9edf0-56c3-40e4-9327-47bf7764d28d', status 'finished'
After a couple minutes all the domains are marked as active again and things continue, but vm-storage-ssd is still listed as the master domain. Any thoughts?
This is on 4.3.10.4-1.el7 on CentOS 7.
engine=# SELECT storage_name, storage_pool_id, storage, status FROM storage_pool_with_storage_domain ORDER BY storage_name; storage_name | storage_pool_id | storage | status -----------------------+--------------------------------------+----------------------------------------+-------- compute1-iscsi-ssd | f72ec125-69a1-4c1b-a5e1-313fcb70b6ff | yvUESE-yWUv-VIWL-qX90-aAq7-gK0I-EqppRL | 1 compute7-iscsi-ssd | f72ec125-69a1-4c1b-a5e1-313fcb70b6ff | 8ekHdv-u0RJ-B0FO-LUUK-wDWs-iaxb-sh3W3J | 1 export-domain-storage | f72ec125-69a1-4c1b-a5e1-313fcb70b6ff | d3932528-6844-481a-bfed-542872ace9e5 | 1 iso-storage | f72ec125-69a1-4c1b-a5e1-313fcb70b6ff | f800b7a6-6a0c-4560-8476-2f294412d87d | 1 vm-storage-7200rpm | f72ec125-69a1-4c1b-a5e1-313fcb70b6ff | a0bff472-1348-4302-a5c7-f1177efa45a9 | 1 vm-storage-ssd | f72ec125-69a1-4c1b-a5e1-313fcb70b6ff | 95acd9a4-a6fb-4208-80dd-1c53d6aacad0 | 1 vm-storage-ssd2 | f72ec125-69a1-4c1b-a5e1-313fcb70b6ff | 829d0600-c3f7-4dae-a749-d7f05c6a6ca4 | 1 (7 rows)
Thanks, -Matthew -- _______________________________________________ Users mailing list -- users@ovirt.org <mailto:users@ovirt.org> To unsubscribe send an email to users-leave@ovirt.org <mailto:users-leave@ovirt.org> Privacy Statement: https://www.ovirt.org/privacy-policy.html <https://www.ovirt.org/privacy-policy.html> oVirt Code of Conduct: https://www.ovirt.org/community/about/community-guidelines/ <https://www.ovirt.org/community/about/community-guidelines/> List Archives: https://lists.ovirt.org/archives/list/users@ovirt.org/message/OXOXW6B2NWXOUG... <https://lists.ovirt.org/archives/list/users@ovirt.org/message/OXOXW6B2NWXOUGZV3OKO4OMDXVDJSQLZ/>

This is the scenario I can think of, hope it will work as well :) Regarding the bug <https://bugzilla.redhat.com/show_bug.cgi?id=1913764> backport, adding +Nir Soffer <nsoffer@redhat.com> +Vojtech Juranek <vjuranek@redhat.com> *Regards,* *Shani Leviim* On Wed, Aug 4, 2021 at 11:54 PM Matthew Benstead <matthewb@uvic.ca> wrote:
Yikes.... ok. So the recommended path would be to:
- Put all domains except the current master and one of the iscsi domains into maintenance mode - Then put the master domain into maintenance mode - Hopefully that iscsi domain then becomes master - Then activate vm-storage-ssd2 - Put the iscsi domain into maintenance mode so master would move to vm-storage-ssd2
Right? Or am I missing something?
I assume the bug you mentioned wouldn't be back ported to 4.3 when fixed.... right? Thanks, -Matthew
On 8/4/21 1:17 PM, Shani Leviim wrote:
Notice: This message was sent from outside the University of Victoria email system. Please be cautious with links and sensitive information.
There should be an error message saying that the master domain (as mention on the db) should come up first. I also think that detaching the current gluster master domain and starting the engine with no master do a new one would be elected instead will leave you in an infinite loop of reconstruction, and editing the db should be involved.
Make sure to backup your env before doing any db change manually, and consider it can damage your environment.
*Regards, *
*Shani Leviim *
On Wed, Aug 4, 2021 at 10:27 PM Matthew Benstead <matthewb@uvic.ca> wrote:
Hi Shani,
Yes, it's a gluster domain, and it looks like yes that matches the bug:
2021-08-04 08:47:28,516-0700 ERROR (jsonrpc/5) [storage.StoragePool] migration to new master failed (sp:909) Traceback (most recent call last): File "/usr/lib/python2.7/site-packages/vdsm/storage/sp.py", line 898, in masterMigrate exclude=('./lost+found',)) File "/usr/lib/python2.7/site-packages/vdsm/storage/fileUtils.py", line 69, in tarCopy raise TarCopyFailed(tsrc.returncode, tdst.returncode, out, err) TarCopyFailed: (1, 0, '', '')
The iscsi domains' aren't as resilient as the gluster domains, so I'd really like to have the vm-storage-ssd2 domains as the master. I'll have to take an outage anyways to put the gluster domains in maintenance mode, so could I just put all domains into maintenance mode and then activate vm-storage-ssd2 first so it's elected master? Or is there some transfer mechanism that must take place?
Thanks, -Matthew
-- Matthew Benstead System Administrator Pacific Climate Impacts Consortium <https://pacificclimate.org/> University of Victoria, UH1 PO Box 1800, STN CSC Victoria, BC, V8W 2Y2 Phone: +1-250-721-8432 Email: matthewb@uvic.ca On 8/4/21 11:26 AM, Shani Leviim wrote:
Notice: This message was sent from outside the University of Victoria email system. Please be cautious with links and sensitive information.
The master domain should be up first.
I've noticed from your prev response to:
[root@compute7 ~]# vdsm-client StoragePool getInfo storagepoolID="f72ec125-69a1-4c1b-a5e1-313fcb70b6ff" { "info": { "name": "No Description", "isoprefix": "/rhev/data-center/mnt/10.0.231.91: _storage_data_projects_ovirt_nobackup_iso-storage/3fc76134-2143-4921-ad36-ee84abca40e8/images/11111111-1111-1111-1111-111111111111", "pool_status": "connected", "lver": 9356, "spm_id": 6, "master_uuid": "a5a83df1-47e2-4927-9add-079199ca7ef8", "version": "5", "domains": "f73307bc-06c8-4996-86d1-78947cdaf6dd:Attached,d5ae843b-5815-4f3a-b1be-370e56fe0962:Active,a5a83df1-47e2-4927-9add-079199ca7ef8:Active,311c1382-12c2-43a0-96d0-e2084180b114:Active,fc049ebe-03f9-43fc-adca-d6bfeb99c288:Active,3fc76134-2143-4921-ad36-ee84abca40e8:Active,2f2aab43-6ce3-4cb0-9142-b2b57e5083b3:Active", *"type": "GLUSTERFS",* "master_ver": 288 },
There's a bug regarding putting GLUSTER domains into maintenance for replacing the master role: https://bugzilla.redhat.com/show_bug.cgi?id=1913764.
If you can see that error on the vdsm.log, your case matches the bug: (tasks/2) [storage.StoragePool] Migration to new master 2a0d3c24-3357-4677-b9b2-35486af464a3 failed (sp:903) Traceback (most recent call last): File "/usr/lib/python3.6/site-packages/vdsm/storage/sp.py", line 891, in masterMigrate exclude=('./lost+found',)) File "/usr/lib/python3.6/site-packages/vdsm/storage/fileUtils.py", line 71, in tarCopy raise TarCopyFailed(tsrc.returncode, tdst.returncode, out, err) vdsm.storage.fileUtils.TarCopyFailed: (1, 0, b'', b'')
There's a chance you can switch the master domain to one of the iscsi domains: Try putting domain vm-storage-ssd2 into maintenance and all other GLUSTER domains into maintenance, so only the iscsi domains remain active. Then, try to put vm-storage-ssd into maintenance.
*Regards, *
*Shani Leviim *
On Wed, Aug 4, 2021 at 7:00 PM Matthew Benstead <matthewb@uvic.ca> wrote:
Thanks Shani - Unfortunately it still fails. I had been been hoping to make the change without an outage, but if I shutdown the VMs and put all the storage domains into maintenance mode, and then activate the new storage domain (that I want to be the master) first, would that work? Or is there some kind of transfer that needs to take place?
Before:
During:
Errors:
Thanks, -Matthew
On 8/4/21 2:46 AM, Shani Leviim wrote:
Notice: This message was sent from outside the University of Victoria email system. Please be cautious with links and sensitive information.
Thanks, Matthew So it seems that both vdsm and engine are aligned with that data. Does your env still fail with reconstruction errors?
Once it's "stabilized" (so both domains are up and available), can you please try again putting the current master into maintenance and share the results?
*Regards, *
*Shani Leviim *
On Tue, Aug 3, 2021 at 8:29 PM Matthew Benstead <matthewb@uvic.ca> wrote:
Thanks Shani,
Here's the output from the SPM - it looks like the master version is 288:
[root@compute7 ~]# vdsm-client StoragePool getInfo storagepoolID="f72ec125-69a1-4c1b-a5e1-313fcb70b6ff" { "info": { "name": "No Description", "isoprefix": "/rhev/data-center/mnt/10.0.231.91: _storage_data_projects_ovirt_nobackup_iso-storage/3fc76134-2143-4921-ad36-ee84abca40e8/images/11111111-1111-1111-1111-111111111111", "pool_status": "connected", "lver": 9356, "spm_id": 6, "master_uuid": "a5a83df1-47e2-4927-9add-079199ca7ef8", "version": "5", "domains": "f73307bc-06c8-4996-86d1-78947cdaf6dd:Attached,d5ae843b-5815-4f3a-b1be-370e56fe0962:Active,a5a83df1-47e2-4927-9add-079199ca7ef8:Active,311c1382-12c2-43a0-96d0-e2084180b114:Active,fc049ebe-03f9-43fc-adca-d6bfeb99c288:Active,3fc76134-2143-4921-ad36-ee84abca40e8:Active,2f2aab43-6ce3-4cb0-9142-b2b57e5083b3:Active", "type": "GLUSTERFS", "master_ver": 288 }, "dominfo": { "f73307bc-06c8-4996-86d1-78947cdaf6dd": { "status": "Attached", "isoprefix": "", "alerts": [] }, "d5ae843b-5815-4f3a-b1be-370e56fe0962": { "status": "Active", "diskfree": "94847578931200", "isoprefix": "", "alerts": [], "disktotal": "359981635338240", "version": 0 }, "a5a83df1-47e2-4927-9add-079199ca7ef8": { "status": "Active", "diskfree": "708888707072", "isoprefix": "", "alerts": [], "disktotal": "751252275200", "version": 5 }, "311c1382-12c2-43a0-96d0-e2084180b114": { "status": "Active", "diskfree": "1598335766528", "isoprefix": "", "alerts": [], "disktotal": "2197949513728", "version": 5 }, "2f2aab43-6ce3-4cb0-9142-b2b57e5083b3": { "status": "Active", "diskfree": "1416265465856", "isoprefix": "", "alerts": [], "disktotal": "3837687496704", "version": 5 }, "3fc76134-2143-4921-ad36-ee84abca40e8": { "status": "Active", "diskfree": "94847578931200", "isoprefix": "/rhev/data-center/mnt/10.0.231.91: _storage_data_projects_ovirt_nobackup_iso-storage/3fc76134-2143-4921-ad36-ee84abca40e8/images/11111111-1111-1111-1111-111111111111", "alerts": [], "disktotal": "359981635338240", "version": 0 }, "fc049ebe-03f9-43fc-adca-d6bfeb99c288": { "status": "Active", "diskfree": "1337882312704", "isoprefix": "", "alerts": [], "disktotal": "3837687496704", "version": 5 } } }
And the master domain version for the vm-storage-ssd domain is 288 in the database as well:
engine=# select * from storage_pool where id = 'f72ec125-69a1-4c1b-a5e1-313fcb70b6ff'; id | name | description | storage_pool_type | storage_pool_format_type | status | master_domain_version | spm_vds_id | compatibility_version | _create_date | _update_date | quota_enforcement_type | free_text_comment | is_local
--------------------------------------+------+-------------+-------------------+--------------------------+--------+-----------------------+--------------------------------------+-----------------------+-------------------------------+-------------------------------+------------------------+-------------------+---------- f72ec125-69a1-4c1b-a5e1-313fcb70b6ff | EDC2 | | | 5 | 1 | 288 | 51769733-0cf6-4270-8288-ec96474b7609 | 4.3 | 2015-08-10 20:51:03.831215-07 | 2021-07-29 10:31:02.234262-07 | 0 | | f (1 row)
Here is the master storage domain details: engine=# select * from storage_domains where storage_pool_id='f72ec125-69a1-4c1b-a5e1-313fcb70b6ff' and storage_domain_type='0'; id | storage | storage_name | storage_description | storage_comment | storage_pool_id | available_disk_size | confirmed_available_disk_size | vdo_savings | used_disk_size | commited_disk_size | actual_images_size | status | storage_pool_name | storage_type | storage_domain_type | storage_domain_format_type | last_time_used_as_master | wipe_after_delete | discard_after_delete | first_metadata_device | vg_metadata_device | backup | block_size | storage_domain_shared_status | recoverable | contains_unregistered_entities | warning_low_space_indicator | critical_space_action_blocker | warning_low_confirmed_space_indicator | external_status | supports_discard | is_hosted_engine_storage
--------------------------------------+--------------------------------------+----------------+---------------------+-----------------+--------------------------------------+---------------------+-------------------------------+-------------+----------------+--------------------+--------------------+--------+-------------------+--------------+---------------------+----------------------------+--------------------------+-------------------+----------------------+-----------------------+--------------------+--------+------------+------------------------------+-------------+--------------------------------+-----------------------------+-------------------------------+---------------------------------------+-----------------+------------------+-------------------------- a5a83df1-47e2-4927-9add-079199ca7ef8 | 95acd9a4-a6fb-4208-80dd-1c53d6aacad0 | vm-storage-ssd | | | f72ec125-69a1-4c1b-a5e1-313fcb70b6ff | 660 | | | 39 | 0 | 0 | 3 | EDC2 | 7 | 0 | 5 | 1627497705160 | f | f | | | f | 512 | 1 | t | f | 10 | 5 | 0 | 0 | | f (1 row)
This is the domain we want to switch over to the master domain so I can decommission the old one.
engine=# select * from storage_domains where storage_pool_id='f72ec125-69a1-4c1b-a5e1-313fcb70b6ff' and storage_name = 'vm-storage-ssd2'; id | storage | storage_name | storage_description | storage_comment | storage_pool_id | available_disk_size | confirmed_available_disk_size | vdo_savings | used_disk_size | commited_disk_size | actual_images_size | status | storage_pool_name | storage_type | storage_domain_type | storage_domain_format_type | last_time_used_as_master | wipe_after_delete | discard_after_delete | first_metadata_device | vg_metadata_device | backup | block_size | storage_domain_shared_status | recoverable | contains_unregistered_entities | warning_low_space_indicator | critical_space_action_blocker | warning_low_confirmed_space_indicator | external_status | supports_discard | is_hosted_engine_storage
--------------------------------------+--------------------------------------+-----------------+----------------------------+-----------------+--------------------------------------+---------------------+-------------------------------+-------------+----------------+--------------------+--------------------+--------+-------------------+--------------+---------------------+----------------------------+------------------
--------+-------------------+----------------------+-----------------------+--------------------+--------+------------+------------------------------+-------------+--------------------------------+-----------------------------+-------------------------------+---------------------------------------+-----------------+------------------+-------------------------- 311c1382-12c2-43a0-96d0-e2084180b114 | 829d0600-c3f7-4dae-a749-d7f05c6a6ca4 | vm-storage-ssd2 | Storage01,02,03 vm-storage | | f72ec125-69a1-4c1b-a5e1-313fcb70b6ff | 1488 | | | 559 | 1147 | 538 | 3 | EDC2 | 7 | 1 | 5 | 1627497694904 | f | f | | | f | 512 | 1 | t | f | 10 | 5 | 10 | 0 | | f (1 row)
Thanks, -Matthew
On 8/1/21 2:00 AM, Shani Leviim wrote:
Notice: This message was sent from outside the University of Victoria email system. Please be cautious with links and sensitive information.
Hi Matthew,
You might need to sync back the master version and domain between the engine and vdsm. To verify those parameters on vdsm, run this command on the SPM host: vdsm-client StoragePool getInfo storagepoolID="f72ec125-69a1-4c1b-a5e1-313fcb70b6ff"
The result should be something like: "info": { "domains": "1234:Active,5678:Active,91011:Active", "isoprefix": "", "lver": 6,
* "master_uuid": "123", "master_ver": 14,* "name": "No Description", "pool_status": "connected", "spm_id": 1, "type": "NFS", "version": "5" }
Then, compare the master version value with the engine: engine=> select * from storage_pool where id = 'f72ec125-69a1-4c1b-a5e1-313fcb70b6ff';
And the master domain: engine=> select * from storage_domains where storage_pool_id='f72ec125-69a1-4c1b-a5e1-313fcb70b6ff' and storage_domain_type='0';
(0 means master, for reference, see https://github.com/oVirt/ovirt-engine/blob/a65cf0eae8858ab2278c3f537dc427e3f... )
Then we can get the bigger picture (and update the engine data to match the vdsm)
*Regards, *
*Shani Leviim *
On Thu, Jul 29, 2021 at 8:40 PM Matthew Benstead <matthewb@uvic.ca> wrote:
Thanks Shani - yes we plan to upgrade to 4.4 in the future, but we're on 4.3 right now due to only running CentOS 7 at the moment.
I was able to clear the job from the SPM:
[root@daccs01 ~]# vdsm-client Host getAllTasksStatuses { "5fa9edf0-56c3-40e4-9327-47bf7764d28d": { "message": "1 jobs completed successfully", "code": 0, "taskID": "5fa9edf0-56c3-40e4-9327-47bf7764d28d", "taskResult": "success", "taskState": "finished" } } [root@daccs01 ~]# vdsm-client Task clear taskID=5fa9edf0-56c3-40e4-9327-47bf7764d28d true [root@daccs01 ~]# vdsm-client Host getAllTasksStatuses {}
And confirm there were no async_tasks:
engine=# select * from async_tasks; task_id | action_type | status | result | step_id | command_id | started_at | storage_pool_id | task_type | vdsm_task_id | root_command_id | user_id
---------+-------------+--------+--------+---------+------------+------------+-----------------+-----------+--------------+-----------------+--------- (0 rows)
However, when putting the vm-storage-ssd domain into maintenance mode, it failed again:
Here are some the logs entries - anything else I can look at?
2021-07-29 10:30:37,848-07 ERROR [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (EE-ManagedThreadFactory-engineScheduled-Thread-25) [35c5b47] EVENT_ID: VDS_BROKER_COMMAND_FAILURE(10,802), VDSM compute7.pcic.uvic.ca command ConnectStoragePoolVDS failed: Wrong Master domain or its version: u'SD=a5a83df1-47e2-4927-9add-079199ca7ef8, pool=f72ec125-69a1-4c1b-a5e1-313fcb70b6ff' 2021-07-29 10:30:37,848-07 INFO [org.ovirt.engine.core.vdsbroker.vdsbroker.ConnectStoragePoolVDSCommand] (EE-ManagedThreadFactory-engineScheduled-Thread-25) [35c5b47] Command 'org.ovirt.engine.core.vdsbroker.vd sbroker.ConnectStoragePoolVDSCommand' return value 'StatusOnlyReturn [status=Status [code=324, message=Wrong Master domain or its version: u'SD=a5a83df1-47e2-4927-9add-079199ca7ef8, pool=f72ec125-69a1-4c1b-a5e1- 313fcb70b6ff']]' ... 2021-07-29 10:30:37,848-07 INFO [org.ovirt.engine.core.vdsbroker.vdsbroker.ConnectStoragePoolVDSCommand] (EE-ManagedThreadFactory-engineScheduled-Thread-25) [35c5b47] HostName = compute7.pcic.uvic.ca 2021-07-29 10:30:37,849-07 ERROR [org.ovirt.engine.core.vdsbroker.vdsbroker.ConnectStoragePoolVDSCommand] (EE-ManagedThreadFactory-engineScheduled-Thread-25) [35c5b47] Command 'ConnectStoragePoolVDSCommand(HostN ame = compute7.pcic.uvic.ca, ConnectStoragePoolVDSCommandParameters:{hostId='51769733-0cf6-4270-8288-ec96474b7609', vdsId='51769733-0cf6-4270-8288-ec96474b7609', storagePoolId='f72ec125-69a1-4c1b-a5e1-313fcb70b6 ff', masterVersion='288'})' execution failed: IRSGenericException: IRSErrorException: IRSNoMasterDomainException: Wrong Master domain or its version: u'SD=a5a83df1-47e2-4927-9add-079199ca7ef8, pool=f72ec125-69a1 -4c1b-a5e1-313fcb70b6ff' ... 2021-07-29 10:30:37,849-07 ERROR [org.ovirt.engine.core.vdsbroker.irsbroker.IrsBrokerCommand] (EE-ManagedThreadFactory-engineScheduled-Thread-25) [35c5b47] IrsBroker::Failed::DeactivateStorageDomainVDS: IRSGener icException: IRSErrorException: IRSNoMasterDomainException: Wrong Master domain or its version: u'SD=a5a83df1-47e2-4927-9add-079199ca7ef8, pool=f72ec125-69a1-4c1b-a5e1-313fcb70b6ff' 2021-07-29 10:30:37,855-07 INFO [org.ovirt.engine.core.vdsbroker.irsbroker.DeactivateStorageDomainVDSCommand] (EE-ManagedThreadFactory-engineScheduled-Thread-25) [35c5b47] FINISH, DeactivateStorageDomainVDSComm and, return: , log id: 1c215ca4 2021-07-29 10:30:37,855-07 ERROR [org.ovirt.engine.core.bll.storage.domain.DeactivateStorageDomainCommand] (EE-ManagedThreadFactory-engineScheduled-Thread-25) [35c5b47] DeactivateStorageDomainVDS failed 'a5a83df 1-47e2-4927-9add-079199ca7ef8': org.ovirt.engine.core.common.errors.EngineException: EngineException: org.ovirt.engine.core.vdsbroker.irsbroker.IRSNoMasterDomainException: IRSGenericException: IRSErrorException: IRSNoMasterDomainException: Wrong Master domain or its version: u'SD=a5a83df1-47e2-4927-9add-079199ca7ef8, pool=f72ec125-69a1-4c1b-a5e1-313fcb70b6ff' (Failed with error StoragePoolWrongMaster and code 324) at org.ovirt.engine.core.bll.VdsHandler.handleVdsResult(VdsHandler.java:118) [bll.jar:] at org.ovirt.engine.core.bll.VDSBrokerFrontendImpl.runVdsCommand(VDSBrokerFrontendImpl.java:33) [bll.jar:] at org.ovirt.engine.core.bll.CommandBase.runVdsCommand(CommandBase.java:2112) [bll.jar:] at org.ovirt.engine.core.bll.storage.domain.DeactivateStorageDomainCommand.dectivateStorageDomain(DeactivateStorageDomainCommand.java:340) [bll.jar:] ... at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) [rt.jar:1.8.0_292] at java.lang.Thread.run(Thread.java:748) [rt.jar:1.8.0_292] at org.glassfish.enterprise.concurrent.ManagedThreadFactoryImpl$ManagedThread.run(ManagedThreadFactoryImpl.java:250) [javax.enterprise.concurrent-1.0.jar:] Caused by: org.ovirt.engine.core.vdsbroker.irsbroker.IRSNoMasterDomainException: IRSGenericException: IRSErrorException: IRSNoMasterDomainException: Wrong Master domain or its version: u'SD=a5a83df1-47e2-4927-9add-079199ca7ef8, pool=f72ec125-69a1-4c1b-a5e1-313fcb70b6ff' at org.ovirt.engine.core.vdsbroker.vdsbroker.BrokerCommandBase.proceedProxyReturnValue(BrokerCommandBase.java:50) [vdsbroker.jar:] at org.ovirt.engine.core.vdsbroker.vdsbroker.ConnectStoragePoolVDSCommand.proceedConnectProxyReturnValue(ConnectStoragePoolVDSCommand.java:48) [vdsbroker.jar:] at org.ovirt.engine.core.vdsbroker.vdsbroker.ConnectStoragePoolVDSCommand.proceedProxyReturnValue(ConnectStoragePoolVDSCommand.java:36) [vdsbroker.jar:] ... at org.jboss.weld.bean.proxy.CombinedInterceptorAndDecoratorStackMethodHandler.invoke(CombinedInterceptorAndDecoratorStackMethodHandler.java:79) [weld-core-impl-3.1.1.Final.jar:3.1.1.Final] at org.jboss.weld.bean.proxy.CombinedInterceptorAndDecoratorStackMethodHandler.invoke(CombinedInterceptorAndDecoratorStackMethodHandler.java:68) [weld-core-impl-3.1.1.Final.jar:3.1.1.Final] at org.ovirt.engine.core.vdsbroker.ResourceManager$Proxy$_$$_WeldSubclass.runVdsCommand(Unknown Source) [vdsbroker.jar:] ... 94 more
2021-07-29 10:30:37,861-07 ERROR [org.ovirt.engine.core.bll.storage.domain.DeactivateStorageDomainCommand] (EE-ManagedThreadFactory-engineScheduled-Thread-25) [35c5b47] Failed to deactivate storage domain 'a5a83df1-47e2-4927-9add-079199ca7ef8' 2021-07-29 10:30:37,868-07 INFO [org.ovirt.engine.core.bll.CommandCompensator] (EE-ManagedThreadFactory-engineScheduled-Thread-25) [35c5b47] Command [id=c63199f8-a720-4053-8e5c-92c8d21e0ce2]: Compensating CHANGED_STATUS_ONLY of org.ovirt.engine.core.common.businessentities.StoragePoolIsoMap; snapshot: EntityStatusSnapshot:{id='StoragePoolIsoMapId:{storagePoolId='f72ec125-69a1-4c1b-a5e1-313fcb70b6ff', storageId='a5a83df1-47e2-4927-9add-079199ca7ef8'}', status='Unknown'}. 2021-07-29 10:30:37,882-07 ERROR [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (EE-ManagedThreadFactory-engineScheduled-Thread-25) [35c5b47] EVENT_ID: USER_DEACTIVATE_STORAGE_DOMAIN_FAILED(969), Failed to deactivate Storage Domain vm-storage-ssd (Data Center EDC2). 2021-07-29 10:30:37,884-07 WARN [org.ovirt.engine.core.bll.storage.pool.ReconstructMasterDomainCommand] (EE-ManagedThreadFactory-engine-Thread-25) [60d33d] Validation of action 'ReconstructMasterDomain' failed for user SYSTEM. Reasons: VAR__ACTION__RECONSTRUCT_MASTER,VAR__TYPE__STORAGE__DOMAIN,ACTION_TYPE_FAILED_STORAGE_DOMAIN_STATUS_ILLEGAL2,$status Locked 2021-07-29 10:30:37,888-07 INFO [org.ovirt.engine.core.bll.eventqueue.EventQueueMonitor] (EE-ManagedThreadFactory-engine-Thread-48) [35c5b47] Finished reconstruct for pool 'f72ec125-69a1-4c1b-a5e1-313fcb70b6ff'. Clearing event queue 2021-07-29 10:30:37,899-07 INFO [org.ovirt.engine.core.vdsbroker.vdsbroker.HSMGetAllTasksStatusesVDSCommand] (EE-ManagedThreadFactory-engineScheduled-Thread-50) [] Command 'org.ovirt.engine.core.vdsbroker.vdsbroker.HSMGetAllTasksStatusesVDSCommand' return value ' TaskStatusListReturn:{status='Status [code=654, message=Not SPM]'}
Thanks, -Matthew
On 7/29/21 2:52 AM, Shani Leviim wrote:
Notice: This message was sent from outside the University of Victoria email system. Please be cautious with links and sensitive information.
Hi Matthew, Actually, your description is related to 2 features available for ovirt 4.4.5 <https://www.ovirt.org/release/4.4.5/> 1. The ability to switch the master storage domain while domains are up and running [1] 2. Clearing the finished tasks from REST API [2] and UI [3].
We recommend you upgrade your engine to enjoy those features.
In the meanwhile, as you've described, moving the Master role from one storage to the other is available using putting the domain into maintenance. In order to clear the finished tasks from SPM: vdsm-client Host getAllTasksStatuses
It should be something like that: { "1dc4d885-577a-4b6a-b01f-e682602a907c": { "code": 0, "message": "1 jobs completed successfully", "taskID": "1dc4d885-577a-4b6a-b01f-e682602a907c", "taskResult": "success", "taskState": "finished" } }
Then clear that tasks: vdsm-client Task clear taskID=12345 Once it gets cleared, the reconstruction can be finished.
To verify there are no more finished async tasks, you can run this SQL query on the engine: engine=# select * from async_tasks WHERE storage_pool_id = '123';
[1] https://bugzilla.redhat.com/show_bug.cgi?id=1910022 [2] https://bugzilla.redhat.com/show_bug.cgi?id=1627997 [3] https://bugzilla.redhat.com/show_bug.cgi?id=1910302
*Regards, *
*Shani Leviim *
On Thu, Jul 29, 2021 at 8:33 AM Matthew Benstead <matthewb@uvic.ca> wrote:
Hello,
I'm trying to decommission the old master storage domain in ovirt, and replace it with a new one. All of the VMs have been migrated off of the old master, and everything has been running on the new storage domain for a couple months. But when I try to put the old domain into maintenance mode I get an error.
Old Master: vm-storage-ssd New Domain: vm-storage-ssd2
The error is:
Failed to Reconstruct Master Domain for Data Center EDC2
As well as:
Sync Error on Master Domain between Host daccs01 and oVirt Engine. Domain: vm-storage-ssd is marked as Master in oVirt Engine database but not on the Storage side. Please consult with Support on how to fix this issue.
2021-07-28 11:41:34,870-07 WARN [org.ovirt.engine.core.vdsbroker.irsbroker.IrsProxy] (EE-ManagedThreadFactory-engine-Thread-23) [] Master domain version is not in sync between DB and VDSM. Domain vm-storage-ssd marked as master, but the version in DB: 283 and in VDSM: 280
And:
Not stopping SPM on vds daccs01, pool id f72ec125-69a1-4c1b-a5e1-313fcb70b6ff as there are uncleared tasks Task '5fa9edf0-56c3-40e4-9327-47bf7764d28d', status 'finished'
After a couple minutes all the domains are marked as active again and things continue, but vm-storage-ssd is still listed as the master domain. Any thoughts?
This is on 4.3.10.4-1.el7 on CentOS 7.
engine=# SELECT storage_name, storage_pool_id, storage, status FROM storage_pool_with_storage_domain ORDER BY storage_name; storage_name | storage_pool_id | storage | status
-----------------------+--------------------------------------+----------------------------------------+-------- compute1-iscsi-ssd | f72ec125-69a1-4c1b-a5e1-313fcb70b6ff | yvUESE-yWUv-VIWL-qX90-aAq7-gK0I-EqppRL | 1 compute7-iscsi-ssd | f72ec125-69a1-4c1b-a5e1-313fcb70b6ff | 8ekHdv-u0RJ-B0FO-LUUK-wDWs-iaxb-sh3W3J | 1 export-domain-storage | f72ec125-69a1-4c1b-a5e1-313fcb70b6ff | d3932528-6844-481a-bfed-542872ace9e5 | 1 iso-storage | f72ec125-69a1-4c1b-a5e1-313fcb70b6ff | f800b7a6-6a0c-4560-8476-2f294412d87d | 1 vm-storage-7200rpm | f72ec125-69a1-4c1b-a5e1-313fcb70b6ff | a0bff472-1348-4302-a5c7-f1177efa45a9 | 1 vm-storage-ssd | f72ec125-69a1-4c1b-a5e1-313fcb70b6ff | 95acd9a4-a6fb-4208-80dd-1c53d6aacad0 | 1 vm-storage-ssd2 | f72ec125-69a1-4c1b-a5e1-313fcb70b6ff | 829d0600-c3f7-4dae-a749-d7f05c6a6ca4 | 1 (7 rows)
Thanks, -Matthew -- _______________________________________________ Users mailing list -- users@ovirt.org To unsubscribe send an email to users-leave@ovirt.org Privacy Statement: https://www.ovirt.org/privacy-policy.html oVirt Code of Conduct: https://www.ovirt.org/community/about/community-guidelines/ List Archives: https://lists.ovirt.org/archives/list/users@ovirt.org/message/OXOXW6B2NWXOUG...

Thank-you everyone for your help on this (now old) issue. We were able to take an outage today to try the workaround below and it was successful! We are now able to retire the old domain, and move forward with the new master domain. Thank-you everyone for the help - especially Shani. Thanks, -Matthew -- On 8/5/21 1:35 AM, Shani Leviim wrote:
Notice: This message was sent from outside the University of Victoria email system. Please be cautious with links and sensitive information.
This is the scenario I can think of, hope it will work as well :)
Regarding the bug <https://bugzilla.redhat.com/show_bug.cgi?id=1913764> backport, adding +Nir Soffer <mailto:nsoffer@redhat.com> +Vojtech Juranek <mailto:vjuranek@redhat.com>
*Regards, * *Shani Leviim *
On Wed, Aug 4, 2021 at 11:54 PM Matthew Benstead <matthewb@uvic.ca <mailto:matthewb@uvic.ca>> wrote:
Yikes.... ok. So the recommended path would be to:
* Put all domains except the current master and one of the iscsi domains into maintenance mode * Then put the master domain into maintenance mode * Hopefully that iscsi domain then becomes master * Then activate vm-storage-ssd2 * Put the iscsi domain into maintenance mode so master would move to vm-storage-ssd2
Right? Or am I missing something?
I assume the bug you mentioned wouldn't be back ported to 4.3 when fixed.... right?
Thanks, -Matthew
On 8/4/21 1:17 PM, Shani Leviim wrote:
Notice: This message was sent from outside the University of Victoria email system. Please be cautious with links and sensitive information.
There should be an error message saying that the master domain (as mention on the db) should come up first. I also think that detaching the current gluster master domain and starting the engine with no master do a new one would be elected instead will leave you in an infinite loop of reconstruction, and editing the db should be involved.
Make sure to backup your env before doing any db change manually, and consider it can damage your environment.
*Regards, * *Shani Leviim *
On Wed, Aug 4, 2021 at 10:27 PM Matthew Benstead <matthewb@uvic.ca <mailto:matthewb@uvic.ca>> wrote:
Hi Shani,
Yes, it's a gluster domain, and it looks like yes that matches the bug:
2021-08-04 08:47:28,516-0700 ERROR (jsonrpc/5) [storage.StoragePool] migration to new master failed (sp:909) Traceback (most recent call last): File "/usr/lib/python2.7/site-packages/vdsm/storage/sp.py", line 898, in masterMigrate exclude=('./lost+found',)) File "/usr/lib/python2.7/site-packages/vdsm/storage/fileUtils.py", line 69, in tarCopy raise TarCopyFailed(tsrc.returncode, tdst.returncode, out, err) TarCopyFailed: (1, 0, '', '')
The iscsi domains' aren't as resilient as the gluster domains, so I'd really like to have the vm-storage-ssd2 domains as the master. I'll have to take an outage anyways to put the gluster domains in maintenance mode, so could I just put all domains into maintenance mode and then activate vm-storage-ssd2 first so it's elected master? Or is there some transfer mechanism that must take place?
Thanks, -Matthew
-- Matthew Benstead System Administrator Pacific Climate Impacts Consortium <https://pacificclimate.org/> University of Victoria, UH1 PO Box 1800, STN CSC Victoria, BC, V8W 2Y2 Phone: +1-250-721-8432 Email: matthewb@uvic.ca <mailto:matthewb@uvic.ca>
On 8/4/21 11:26 AM, Shani Leviim wrote:
Notice: This message was sent from outside the University of Victoria email system. Please be cautious with links and sensitive information.
The master domain should be up first.
I've noticed from your prev response to:
[root@compute7 ~]# vdsm-client StoragePool getInfo storagepoolID="f72ec125-69a1-4c1b-a5e1-313fcb70b6ff" { "info": { "name": "No Description", "isoprefix": "/rhev/data-center/mnt/10.0.231.91:_storage_data_projects_ovirt_nobackup_iso-storage/3fc76134-2143-4921-ad36-ee84abca40e8/images/11111111-1111-1111-1111-111111111111", "pool_status": "connected", "lver": 9356, "spm_id": 6, "master_uuid": "a5a83df1-47e2-4927-9add-079199ca7ef8", "version": "5", "domains": "f73307bc-06c8-4996-86d1-78947cdaf6dd:Attached,d5ae843b-5815-4f3a-b1be-370e56fe0962:Active,a5a83df1-47e2-4927-9add-079199ca7ef8:Active,311c1382-12c2-43a0-96d0-e2084180b114:Active,fc049ebe-03f9-43fc-adca-d6bfeb99c288:Active,3fc76134-2143-4921-ad36-ee84abca40e8:Active,2f2aab43-6ce3-4cb0-9142-b2b57e5083b3:Active", *"type": "GLUSTERFS",* "master_ver": 288 },
There's a bug regarding putting GLUSTER domains into maintenance for replacing the master role: https://bugzilla.redhat.com/show_bug.cgi?id=1913764 <https://bugzilla.redhat.com/show_bug.cgi?id=1913764>.
If you can see that error on the vdsm.log, your case matches the bug: (tasks/2) [storage.StoragePool] Migration to new master 2a0d3c24-3357-4677-b9b2-35486af464a3 failed (sp:903) Traceback (most recent call last): File "/usr/lib/python3.6/site-packages/vdsm/storage/sp.py", line 891, in masterMigrate exclude=('./lost+found',)) File "/usr/lib/python3.6/site-packages/vdsm/storage/fileUtils.py", line 71, in tarCopy raise TarCopyFailed(tsrc.returncode, tdst.returncode, out, err) vdsm.storage.fileUtils.TarCopyFailed: (1, 0, b'', b'')
There's a chance you can switch the master domain to one of the iscsi domains: Try putting domain vm-storage-ssd2 into maintenance and all other GLUSTER domains into maintenance, so only the iscsi domains remain active. Then, try to put vm-storage-ssd into maintenance.
*Regards, * *Shani Leviim *
On Wed, Aug 4, 2021 at 7:00 PM Matthew Benstead <matthewb@uvic.ca <mailto:matthewb@uvic.ca>> wrote:
Thanks Shani - Unfortunately it still fails. I had been been hoping to make the change without an outage, but if I shutdown the VMs and put all the storage domains into maintenance mode, and then activate the new storage domain (that I want to be the master) first, would that work? Or is there some kind of transfer that needs to take place?
Before:
During:
Errors:
Thanks, -Matthew
On 8/4/21 2:46 AM, Shani Leviim wrote:
Notice: This message was sent from outside the University of Victoria email system. Please be cautious with links and sensitive information.
Thanks, Matthew So it seems that both vdsm and engine are aligned with that data. Does your env still fail with reconstruction errors?
Once it's "stabilized" (so both domains are up and available), can you please try again putting the current master into maintenance and share the results?
*Regards, * *Shani Leviim *
On Tue, Aug 3, 2021 at 8:29 PM Matthew Benstead <matthewb@uvic.ca <mailto:matthewb@uvic.ca>> wrote:
Thanks Shani,
Here's the output from the SPM - it looks like the master version is 288:
[root@compute7 ~]# vdsm-client StoragePool getInfo storagepoolID="f72ec125-69a1-4c1b-a5e1-313fcb70b6ff" { "info": { "name": "No Description", "isoprefix": "/rhev/data-center/mnt/10.0.231.91:_storage_data_projects_ovirt_nobackup_iso-storage/3fc76134-2143-4921-ad36-ee84abca40e8/images/11111111-1111-1111-1111-111111111111", "pool_status": "connected", "lver": 9356, "spm_id": 6, "master_uuid": "a5a83df1-47e2-4927-9add-079199ca7ef8", "version": "5", "domains": "f73307bc-06c8-4996-86d1-78947cdaf6dd:Attached,d5ae843b-5815-4f3a-b1be-370e56fe0962:Active,a5a83df1-47e2-4927-9add-079199ca7ef8:Active,311c1382-12c2-43a0-96d0-e2084180b114:Active,fc049ebe-03f9-43fc-adca-d6bfeb99c288:Active,3fc76134-2143-4921-ad36-ee84abca40e8:Active,2f2aab43-6ce3-4cb0-9142-b2b57e5083b3:Active", "type": "GLUSTERFS", "master_ver": 288 }, "dominfo": { "f73307bc-06c8-4996-86d1-78947cdaf6dd": { "status": "Attached", "isoprefix": "", "alerts": [] }, "d5ae843b-5815-4f3a-b1be-370e56fe0962": { "status": "Active", "diskfree": "94847578931200", "isoprefix": "", "alerts": [], "disktotal": "359981635338240", "version": 0 }, "a5a83df1-47e2-4927-9add-079199ca7ef8": { "status": "Active", "diskfree": "708888707072", "isoprefix": "", "alerts": [], "disktotal": "751252275200", "version": 5 }, "311c1382-12c2-43a0-96d0-e2084180b114": { "status": "Active", "diskfree": "1598335766528", "isoprefix": "", "alerts": [], "disktotal": "2197949513728", "version": 5 }, "2f2aab43-6ce3-4cb0-9142-b2b57e5083b3": { "status": "Active", "diskfree": "1416265465856", "isoprefix": "", "alerts": [], "disktotal": "3837687496704", "version": 5 }, "3fc76134-2143-4921-ad36-ee84abca40e8": { "status": "Active", "diskfree": "94847578931200", "isoprefix": "/rhev/data-center/mnt/10.0.231.91:_storage_data_projects_ovirt_nobackup_iso-storage/3fc76134-2143-4921-ad36-ee84abca40e8/images/11111111-1111-1111-1111-111111111111", "alerts": [], "disktotal": "359981635338240", "version": 0 }, "fc049ebe-03f9-43fc-adca-d6bfeb99c288": { "status": "Active", "diskfree": "1337882312704", "isoprefix": "", "alerts": [], "disktotal": "3837687496704", "version": 5 } } }
And the master domain version for the vm-storage-ssd domain is 288 in the database as well:
engine=# select * from storage_pool where id = 'f72ec125-69a1-4c1b-a5e1-313fcb70b6ff'; id | name | description | storage_pool_type | storage_pool_format_type | status | master_domain_version | spm_vds_id | compatibility_version | _create_date | _update_date | quota_enforcement_type | free_text_comment | is_local --------------------------------------+------+-------------+-------------------+--------------------------+--------+-----------------------+--------------------------------------+-----------------------+-------------------------------+-------------------------------+------------------------+-------------------+---------- f72ec125-69a1-4c1b-a5e1-313fcb70b6ff | EDC2 | | | 5 | 1 | 288 | 51769733-0cf6-4270-8288-ec96474b7609 | 4.3 | 2015-08-10 20:51:03.831215-07 | 2021-07-29 10:31:02.234262-07 | 0 | | f (1 row)
Here is the master storage domain details: engine=# select * from storage_domains where storage_pool_id='f72ec125-69a1-4c1b-a5e1-313fcb70b6ff' and storage_domain_type='0'; id | storage | storage_name | storage_description | storage_comment | storage_pool_id | available_disk_size | confirmed_available_disk_size | vdo_savings | used_disk_size | commited_disk_size | actual_images_size | status | storage_pool_name | storage_type | storage_domain_type | storage_domain_format_type | last_time_used_as_master | wipe_after_delete | discard_after_delete | first_metadata_device | vg_metadata_device | backup | block_size | storage_domain_shared_status | recoverable | contains_unregistered_entities | warning_low_space_indicator | critical_space_action_blocker | warning_low_confirmed_space_indicator | external_status | supports_discard | is_hosted_engine_storage --------------------------------------+--------------------------------------+----------------+---------------------+-----------------+--------------------------------------+---------------------+-------------------------------+-------------+----------------+--------------------+--------------------+--------+-------------------+--------------+---------------------+----------------------------+--------------------------+-------------------+----------------------+-----------------------+--------------------+--------+------------+------------------------------+-------------+--------------------------------+-----------------------------+-------------------------------+---------------------------------------+-----------------+------------------+-------------------------- a5a83df1-47e2-4927-9add-079199ca7ef8 | 95acd9a4-a6fb-4208-80dd-1c53d6aacad0 | vm-storage-ssd | | | f72ec125-69a1-4c1b-a5e1-313fcb70b6ff | 660 | | | 39 | 0 | 0 | 3 | EDC2 | 7 | 0 | 5 | 1627497705160 | f | f | | | f | 512 | 1 | t | f | 10 | 5 | 0 | 0 | | f (1 row)
This is the domain we want to switch over to the master domain so I can decommission the old one.
engine=# select * from storage_domains where storage_pool_id='f72ec125-69a1-4c1b-a5e1-313fcb70b6ff' and storage_name = 'vm-storage-ssd2'; id | storage | storage_name | storage_description | storage_comment | storage_pool_id | available_disk_size | confirmed_available_disk_size | vdo_savings | used_disk_size | commited_disk_size | actual_images_size | status | storage_pool_name | storage_type | storage_domain_type | storage_domain_format_type | last_time_used_as_master | wipe_after_delete | discard_after_delete | first_metadata_device | vg_metadata_device | backup | block_size | storage_domain_shared_status | recoverable | contains_unregistered_entities | warning_low_space_indicator | critical_space_action_blocker | warning_low_confirmed_space_indicator | external_status | supports_discard | is_hosted_engine_storage --------------------------------------+--------------------------------------+-----------------+----------------------------+-----------------+--------------------------------------+---------------------+-------------------------------+-------------+----------------+--------------------+--------------------+--------+-------------------+--------------+---------------------+----------------------------+------------------ --------+-------------------+----------------------+-----------------------+--------------------+--------+------------+------------------------------+-------------+--------------------------------+-----------------------------+-------------------------------+---------------------------------------+-----------------+------------------+-------------------------- 311c1382-12c2-43a0-96d0-e2084180b114 | 829d0600-c3f7-4dae-a749-d7f05c6a6ca4 | vm-storage-ssd2 | Storage01,02,03 vm-storage | | f72ec125-69a1-4c1b-a5e1-313fcb70b6ff | 1488 | | | 559 | 1147 | 538 | 3 | EDC2 | 7 | 1 | 5 | 1627497694904 | f | f | | | f | 512 | 1 | t | f | 10 | 5 | 10 | 0 | | f (1 row)
Thanks, -Matthew
On 8/1/21 2:00 AM, Shani Leviim wrote:
Notice: This message was sent from outside the University of Victoria email system. Please be cautious with links and sensitive information.
Hi Matthew,
You might need to sync back the master version and domain between the engine and vdsm. To verify those parameters on vdsm, run this command on the SPM host: vdsm-client StoragePool getInfo storagepoolID="f72ec125-69a1-4c1b-a5e1-313fcb70b6ff"
The result should be something like: "info": { "domains": "1234:Active,5678:Active,91011:Active", "isoprefix": "", "lver": 6, * "master_uuid": "123", "master_ver": 14,* "name": "No Description", "pool_status": "connected", "spm_id": 1, "type": "NFS", "version": "5" }
Then, compare the master version value with the engine: engine=> select * from storage_pool where id = 'f72ec125-69a1-4c1b-a5e1-313fcb70b6ff';
And the master domain: engine=> select * from storage_domains where storage_pool_id='f72ec125-69a1-4c1b-a5e1-313fcb70b6ff' and storage_domain_type='0';
(0 means master, for reference, see https://github.com/oVirt/ovirt-engine/blob/a65cf0eae8858ab2278c3f537dc427e3f... <https://github.com/oVirt/ovirt-engine/blob/a65cf0eae8858ab2278c3f537dc427e3ff20eba7/backend/manager/modules/common/src/main/java/org/ovirt/engine/core/common/businessentities/StorageDomainType.java>)
Then we can get the bigger picture (and update the engine data to match the vdsm)
*Regards, * *Shani Leviim *
On Thu, Jul 29, 2021 at 8:40 PM Matthew Benstead <matthewb@uvic.ca <mailto:matthewb@uvic.ca>> wrote:
Thanks Shani - yes we plan to upgrade to 4.4 in the future, but we're on 4.3 right now due to only running CentOS 7 at the moment.
I was able to clear the job from the SPM:
[root@daccs01 ~]# vdsm-client Host getAllTasksStatuses { "5fa9edf0-56c3-40e4-9327-47bf7764d28d": { "message": "1 jobs completed successfully", "code": 0, "taskID": "5fa9edf0-56c3-40e4-9327-47bf7764d28d", "taskResult": "success", "taskState": "finished" } } [root@daccs01 ~]# vdsm-client Task clear taskID=5fa9edf0-56c3-40e4-9327-47bf7764d28d true [root@daccs01 ~]# vdsm-client Host getAllTasksStatuses {}
And confirm there were no async_tasks:
engine=# select * from async_tasks; task_id | action_type | status | result | step_id | command_id | started_at | storage_pool_id | task_type | vdsm_task_id | root_command_id | user_id ---------+-------------+--------+--------+---------+------------+------------+-----------------+-----------+--------------+-----------------+--------- (0 rows)
However, when putting the vm-storage-ssd domain into maintenance mode, it failed again:
Here are some the logs entries - anything else I can look at?
2021-07-29 10:30:37,848-07 ERROR [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (EE-ManagedThreadFactory-engineScheduled-Thread-25) [35c5b47] EVENT_ID: VDS_BROKER_COMMAND_FAILURE(10,802), VDSM compute7.pcic.uvic.ca <http://compute7.pcic.uvic.ca> command ConnectStoragePoolVDS failed: Wrong Master domain or its version: u'SD=a5a83df1-47e2-4927-9add-079199ca7ef8, pool=f72ec125-69a1-4c1b-a5e1-313fcb70b6ff' 2021-07-29 10:30:37,848-07 INFO [org.ovirt.engine.core.vdsbroker.vdsbroker.ConnectStoragePoolVDSCommand] (EE-ManagedThreadFactory-engineScheduled-Thread-25) [35c5b47] Command 'org.ovirt.engine.core.vdsbroker.vd sbroker.ConnectStoragePoolVDSCommand' return value 'StatusOnlyReturn [status=Status [code=324, message=Wrong Master domain or its version: u'SD=a5a83df1-47e2-4927-9add-079199ca7ef8, pool=f72ec125-69a1-4c1b-a5e1- 313fcb70b6ff']]' ... 2021-07-29 10:30:37,848-07 INFO [org.ovirt.engine.core.vdsbroker.vdsbroker.ConnectStoragePoolVDSCommand] (EE-ManagedThreadFactory-engineScheduled-Thread-25) [35c5b47] HostName = compute7.pcic.uvic.ca <http://compute7.pcic.uvic.ca> 2021-07-29 10:30:37,849-07 ERROR [org.ovirt.engine.core.vdsbroker.vdsbroker.ConnectStoragePoolVDSCommand] (EE-ManagedThreadFactory-engineScheduled-Thread-25) [35c5b47] Command 'ConnectStoragePoolVDSCommand(HostN ame = compute7.pcic.uvic.ca <http://compute7.pcic.uvic.ca>, ConnectStoragePoolVDSCommandParameters:{hostId='51769733-0cf6-4270-8288-ec96474b7609', vdsId='51769733-0cf6-4270-8288-ec96474b7609', storagePoolId='f72ec125-69a1-4c1b-a5e1-313fcb70b6 ff', masterVersion='288'})' execution failed: IRSGenericException: IRSErrorException: IRSNoMasterDomainException: Wrong Master domain or its version: u'SD=a5a83df1-47e2-4927-9add-079199ca7ef8, pool=f72ec125-69a1 -4c1b-a5e1-313fcb70b6ff' ... 2021-07-29 10:30:37,849-07 ERROR [org.ovirt.engine.core.vdsbroker.irsbroker.IrsBrokerCommand] (EE-ManagedThreadFactory-engineScheduled-Thread-25) [35c5b47] IrsBroker::Failed::DeactivateStorageDomainVDS: IRSGener icException: IRSErrorException: IRSNoMasterDomainException: Wrong Master domain or its version: u'SD=a5a83df1-47e2-4927-9add-079199ca7ef8, pool=f72ec125-69a1-4c1b-a5e1-313fcb70b6ff' 2021-07-29 10:30:37,855-07 INFO [org.ovirt.engine.core.vdsbroker.irsbroker.DeactivateStorageDomainVDSCommand] (EE-ManagedThreadFactory-engineScheduled-Thread-25) [35c5b47] FINISH, DeactivateStorageDomainVDSComm and, return: , log id: 1c215ca4 2021-07-29 10:30:37,855-07 ERROR [org.ovirt.engine.core.bll.storage.domain.DeactivateStorageDomainCommand] (EE-ManagedThreadFactory-engineScheduled-Thread-25) [35c5b47] DeactivateStorageDomainVDS failed 'a5a83df 1-47e2-4927-9add-079199ca7ef8': org.ovirt.engine.core.common.errors.EngineException: EngineException: org.ovirt.engine.core.vdsbroker.irsbroker.IRSNoMasterDomainException: IRSGenericException: IRSErrorException: IRSNoMasterDomainException: Wrong Master domain or its version: u'SD=a5a83df1-47e2-4927-9add-079199ca7ef8, pool=f72ec125-69a1-4c1b-a5e1-313fcb70b6ff' (Failed with error StoragePoolWrongMaster and code 324) at org.ovirt.engine.core.bll.VdsHandler.handleVdsResult(VdsHandler.java:118) [bll.jar:] at org.ovirt.engine.core.bll.VDSBrokerFrontendImpl.runVdsCommand(VDSBrokerFrontendImpl.java:33) [bll.jar:] at org.ovirt.engine.core.bll.CommandBase.runVdsCommand(CommandBase.java:2112) [bll.jar:] at org.ovirt.engine.core.bll.storage.domain.DeactivateStorageDomainCommand.dectivateStorageDomain(DeactivateStorageDomainCommand.java:340) [bll.jar:] ... at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) [rt.jar:1.8.0_292] at java.lang.Thread.run(Thread.java:748) [rt.jar:1.8.0_292] at org.glassfish.enterprise.concurrent.ManagedThreadFactoryImpl$ManagedThread.run(ManagedThreadFactoryImpl.java:250) [javax.enterprise.concurrent-1.0.jar:] Caused by: org.ovirt.engine.core.vdsbroker.irsbroker.IRSNoMasterDomainException: IRSGenericException: IRSErrorException: IRSNoMasterDomainException: Wrong Master domain or its version: u'SD=a5a83df1-47e2-4927-9add-079199ca7ef8, pool=f72ec125-69a1-4c1b-a5e1-313fcb70b6ff' at org.ovirt.engine.core.vdsbroker.vdsbroker.BrokerCommandBase.proceedProxyReturnValue(BrokerCommandBase.java:50) [vdsbroker.jar:] at org.ovirt.engine.core.vdsbroker.vdsbroker.ConnectStoragePoolVDSCommand.proceedConnectProxyReturnValue(ConnectStoragePoolVDSCommand.java:48) [vdsbroker.jar:] at org.ovirt.engine.core.vdsbroker.vdsbroker.ConnectStoragePoolVDSCommand.proceedProxyReturnValue(ConnectStoragePoolVDSCommand.java:36) [vdsbroker.jar:] ... at org.jboss.weld.bean.proxy.CombinedInterceptorAndDecoratorStackMethodHandler.invoke(CombinedInterceptorAndDecoratorStackMethodHandler.java:79) [weld-core-impl-3.1.1.Final.jar:3.1.1.Final] at org.jboss.weld.bean.proxy.CombinedInterceptorAndDecoratorStackMethodHandler.invoke(CombinedInterceptorAndDecoratorStackMethodHandler.java:68) [weld-core-impl-3.1.1.Final.jar:3.1.1.Final] at org.ovirt.engine.core.vdsbroker.ResourceManager$Proxy$_$$_WeldSubclass.runVdsCommand(Unknown Source) [vdsbroker.jar:] ... 94 more
2021-07-29 10:30:37,861-07 ERROR [org.ovirt.engine.core.bll.storage.domain.DeactivateStorageDomainCommand] (EE-ManagedThreadFactory-engineScheduled-Thread-25) [35c5b47] Failed to deactivate storage domain 'a5a83df1-47e2-4927-9add-079199ca7ef8' 2021-07-29 10:30:37,868-07 INFO [org.ovirt.engine.core.bll.CommandCompensator] (EE-ManagedThreadFactory-engineScheduled-Thread-25) [35c5b47] Command [id=c63199f8-a720-4053-8e5c-92c8d21e0ce2]: Compensating CHANGED_STATUS_ONLY of org.ovirt.engine.core.common.businessentities.StoragePoolIsoMap; snapshot: EntityStatusSnapshot:{id='StoragePoolIsoMapId:{storagePoolId='f72ec125-69a1-4c1b-a5e1-313fcb70b6ff', storageId='a5a83df1-47e2-4927-9add-079199ca7ef8'}', status='Unknown'}. 2021-07-29 10:30:37,882-07 ERROR [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (EE-ManagedThreadFactory-engineScheduled-Thread-25) [35c5b47] EVENT_ID: USER_DEACTIVATE_STORAGE_DOMAIN_FAILED(969), Failed to deactivate Storage Domain vm-storage-ssd (Data Center EDC2). 2021-07-29 10:30:37,884-07 WARN [org.ovirt.engine.core.bll.storage.pool.ReconstructMasterDomainCommand] (EE-ManagedThreadFactory-engine-Thread-25) [60d33d] Validation of action 'ReconstructMasterDomain' failed for user SYSTEM. Reasons: VAR__ACTION__RECONSTRUCT_MASTER,VAR__TYPE__STORAGE__DOMAIN,ACTION_TYPE_FAILED_STORAGE_DOMAIN_STATUS_ILLEGAL2,$status Locked 2021-07-29 10:30:37,888-07 INFO [org.ovirt.engine.core.bll.eventqueue.EventQueueMonitor] (EE-ManagedThreadFactory-engine-Thread-48) [35c5b47] Finished reconstruct for pool 'f72ec125-69a1-4c1b-a5e1-313fcb70b6ff'. Clearing event queue 2021-07-29 10:30:37,899-07 INFO [org.ovirt.engine.core.vdsbroker.vdsbroker.HSMGetAllTasksStatusesVDSCommand] (EE-ManagedThreadFactory-engineScheduled-Thread-50) [] Command 'org.ovirt.engine.core.vdsbroker.vdsbroker.HSMGetAllTasksStatusesVDSCommand' return value ' TaskStatusListReturn:{status='Status [code=654, message=Not SPM]'}
Thanks, -Matthew
On 7/29/21 2:52 AM, Shani Leviim wrote:
Notice: This message was sent from outside the University of Victoria email system. Please be cautious with links and sensitive information.
Hi Matthew, Actually, your description is related to 2 features available for ovirt 4.4.5 <https://www.ovirt.org/release/4.4.5/> 1. The ability to switch the master storage domain while domains are up and running [1] 2. Clearing the finished tasks from REST API [2] and UI [3].
We recommend you upgrade your engine to enjoy those features.
In the meanwhile, as you've described, moving the Master role from one storage to the other is available using putting the domain into maintenance. In order to clear the finished tasks from SPM: vdsm-client Host getAllTasksStatuses
It should be something like that: { "1dc4d885-577a-4b6a-b01f-e682602a907c": { "code": 0, "message": "1 jobs completed successfully", "taskID": "1dc4d885-577a-4b6a-b01f-e682602a907c", "taskResult": "success", "taskState": "finished" } }
Then clear that tasks: vdsm-client Task clear taskID=12345 Once it gets cleared, the reconstruction can be finished.
To verify there are no more finished async tasks, you can run this SQL query on the engine: engine=# select * from async_tasks WHERE storage_pool_id = '123';
[1] https://bugzilla.redhat.com/show_bug.cgi?id=1910022 <https://bugzilla.redhat.com/show_bug.cgi?id=1910022> [2] https://bugzilla.redhat.com/show_bug.cgi?id=1627997 <https://bugzilla.redhat.com/show_bug.cgi?id=1627997> [3] https://bugzilla.redhat.com/show_bug.cgi?id=1910302 <https://bugzilla.redhat.com/show_bug.cgi?id=1910302>
*Regards, * *Shani Leviim *
On Thu, Jul 29, 2021 at 8:33 AM Matthew Benstead <matthewb@uvic.ca <mailto:matthewb@uvic.ca>> wrote:
Hello,
I'm trying to decommission the old master storage domain in ovirt, and replace it with a new one. All of the VMs have been migrated off of the old master, and everything has been running on the new storage domain for a couple months. But when I try to put the old domain into maintenance mode I get an error.
Old Master: vm-storage-ssd New Domain: vm-storage-ssd2
The error is:
Failed to Reconstruct Master Domain for Data Center EDC2
As well as:
Sync Error on Master Domain between Host daccs01 and oVirt Engine. Domain: vm-storage-ssd is marked as Master in oVirt Engine database but not on the Storage side. Please consult with Support on how to fix this issue.
2021-07-28 11:41:34,870-07 WARN [org.ovirt.engine.core.vdsbroker.irsbroker.IrsProxy] (EE-ManagedThreadFactory-engine-Thread-23) [] Master domain version is not in sync between DB and VDSM. Domain vm-storage-ssd marked as master, but the version in DB: 283 and in VDSM: 280
And:
Not stopping SPM on vds daccs01, pool id f72ec125-69a1-4c1b-a5e1-313fcb70b6ff as there are uncleared tasks Task '5fa9edf0-56c3-40e4-9327-47bf7764d28d', status 'finished'
After a couple minutes all the domains are marked as active again and things continue, but vm-storage-ssd is still listed as the master domain. Any thoughts?
This is on 4.3.10.4-1.el7 on CentOS 7.
engine=# SELECT storage_name, storage_pool_id, storage, status FROM storage_pool_with_storage_domain ORDER BY storage_name; storage_name | storage_pool_id | storage | status -----------------------+--------------------------------------+----------------------------------------+-------- compute1-iscsi-ssd | f72ec125-69a1-4c1b-a5e1-313fcb70b6ff | yvUESE-yWUv-VIWL-qX90-aAq7-gK0I-EqppRL | 1 compute7-iscsi-ssd | f72ec125-69a1-4c1b-a5e1-313fcb70b6ff | 8ekHdv-u0RJ-B0FO-LUUK-wDWs-iaxb-sh3W3J | 1 export-domain-storage | f72ec125-69a1-4c1b-a5e1-313fcb70b6ff | d3932528-6844-481a-bfed-542872ace9e5 | 1 iso-storage | f72ec125-69a1-4c1b-a5e1-313fcb70b6ff | f800b7a6-6a0c-4560-8476-2f294412d87d | 1 vm-storage-7200rpm | f72ec125-69a1-4c1b-a5e1-313fcb70b6ff | a0bff472-1348-4302-a5c7-f1177efa45a9 | 1 vm-storage-ssd | f72ec125-69a1-4c1b-a5e1-313fcb70b6ff | 95acd9a4-a6fb-4208-80dd-1c53d6aacad0 | 1 vm-storage-ssd2 | f72ec125-69a1-4c1b-a5e1-313fcb70b6ff | 829d0600-c3f7-4dae-a749-d7f05c6a6ca4 | 1 (7 rows)
Thanks, -Matthew -- _______________________________________________ Users mailing list -- users@ovirt.org <mailto:users@ovirt.org> To unsubscribe send an email to users-leave@ovirt.org <mailto:users-leave@ovirt.org> Privacy Statement: https://www.ovirt.org/privacy-policy.html <https://www.ovirt.org/privacy-policy.html> oVirt Code of Conduct: https://www.ovirt.org/community/about/community-guidelines/ <https://www.ovirt.org/community/about/community-guidelines/> List Archives: https://lists.ovirt.org/archives/list/users@ovirt.org/message/OXOXW6B2NWXOUG... <https://lists.ovirt.org/archives/list/users@ovirt.org/message/OXOXW6B2NWXOUGZV3OKO4OMDXVDJSQLZ/>
participants (2)
-
Matthew Benstead
-
Shani Leviim