Hi Benny,
I used the tool to track one of the illegal volumes:
image: e05874d2-fb8a-4fd2-94ff-2f4bc6438d47
[...]
- 887f486b-15cf-4083-9b35-8b7821a7841a
status: ILLEGAL, voltype: LEAF, format: COW, legality:
ILLEGAL, type: SPARSE
So I tracked 887f486b-15cf-4083-9b35-8b7821a7841a in the logs and I saw:
2018-06-16 04:46:20,818+01 INFO
[org.ovirt.engine.core.vdsbroker.vdsbroker.GetVolumeInfoVDSCommand]
(pool-5-thread-3) [cfc392ec-dc9f-418d-8156-d05c8e7ab9f8] START,
GetVolumeInfoVDSCommand(HostName = host.domain.es,
GetVolumeInfoVDSCommandParameters:{expectedEngineErrors='[VolumeDoesNotExist]',
runAsync='true', hostId='b2dfb945-d767-44aa-a547-2d1a4381f8e3',
storagePoolId='75bf8f48-970f-42bc-8596-f8ab6efb2b63',
storageDomainId='110ea376-d789-40a1-b9f6-6b40c31afe01',
imageGroupId='e05874d2-fb8a-4fd2-94ff-2f4bc6438d47',
imageId='887f486b-15cf-4083-9b35-8b7821a7841a'}), log id: 2a795424
2018-06-16 04:46:22,256+01 ERROR
[org.ovirt.engine.core.bll.DestroyImageCheckCommand] (pool-5-thread-3)
[cfc392ec-dc9f-418d-8156-d05c8e7ab9f8] The following images were not
removed: [887f486b-15cf-4083-9b35-8b7821a7841a]
2018-06-16 04:47:44,900+01 ERROR
[org.ovirt.engine.core.bll.snapshots.RemoveSnapshotSingleDiskLiveCommand]
(DefaultQuartzScheduler10) [cfc392ec-dc9f-418d-8156-d05c8e7ab9f8]
Snapshot '7b6f43ac-d3ad-47b2-8882-f5dccd74cf07' images
'887f486b-15cf-4083-9b35-8b7821a7841a'..'538600a5-31ab-40af-b326-d56bfc92bb0b'
merged, but volume removal failed. Some or all of the following volumes
may be orphaned: [887f486b-15cf-4083-9b35-8b7821a7841a]. Please retry
Live Merge on the snapshot to complete the operation.
Can you provide some additional steps?
Thank you!
El 2018-06-18 18:27, Benny Zlotnik escribió:
> We prevent starting VMs with illegal images[1]
>
> You can use "$ vdsm-tool dump-volume-chains"
> to look for illegal images and then look in the engine log for the
> reason they became illagal,
>
> if it's something like this, it usually means you can remove them:
>
> 63696:2018-06-15 09:41:58,134+01 ERROR
> [org.ovirt.engine.core.bll.snapshots.RemoveSnapshotSingleDiskLiveCommand]
> (DefaultQuartzScheduler2) [6fa97ea4-8f61-4a48-8e08-a8bb1b9de826]
> Merging of snapshot 'e609d6cc-2025-4cf0-ad34-03519131cdd1' images
>
'1d01c6c8-b61e-42bc-a054-f04c3f792b10'..'ef6f732e-2a7a-4a14-a10f-bcc88bdd805f'
> failed. Images have been marked illegal and can no longer be previewed
> or reverted to. Please retry Live Merge on the snapshot to complete
> the operation.
>
> On Mon, Jun 18, 2018 at 5:46 PM, <nicolas(a)devels.es> wrote:
>
>> Indeed, when the problem started I think the SPM was the host I
>> added as VDSM log in the first e-mail. Currently it is the one I
>> sent in the second mail.
>>
>> FWIW, if it helps to debug more fluently, we can provide VPN access
>> to our infrastructure so you can access and see whateve you need
>> (all hosts, DB, etc...).
>>
>> Right now the machines that keep running work, but once shut down
>> they start showing the problem below...
>>
>> Thank you
>>
>> El 2018-06-18 15:20, Benny Zlotnik escribió:
>>
>> I'm having trouble following the errors, I think the SPM changed or
>> the vdsm log from the right host might be missing.
>>
>> However, I believe what started the problems is this transaction
>> timeout:
>>
>> 2018-06-15 14:20:51,378+01 ERROR
>> [org.ovirt.engine.core.bll.tasks.CommandAsyncTask]
>> (org.ovirt.thread.pool-6-thread-29)
>> [1db468cb-85fd-4189-b356-d31781461504] [within thread]: endAction
>> for
>> action type RemoveSnapshotSingleDisk threw an exception.:
>> org.springframework.jdbc.CannotGetJdbcConnectionException: Could
>> not
>> get JDBC Connection; nested exception is java.sql.SQLException:
>> javax.resource.ResourceException: IJ000460: Error checking for a
>> transaction
>> at
>>
>
org.springframework.jdbc.datasource.DataSourceUtils.getConnection(DataSourceUtils.java:80)
>> [spring-jdbc.jar:4.2.4.RELEASE]
>> at
>>
> org.springframework.jdbc.core.JdbcTemplate.execute(JdbcTemplate.java:615)
>> [spring-jdbc.jar:4.2.4.RELEASE]
>> at
>>
> org.springframework.jdbc.core.JdbcTemplate.query(JdbcTemplate.java:680)
>> [spring-jdbc.jar:4.2.4.RELEASE]
>> at
>>
> org.springframework.jdbc.core.JdbcTemplate.query(JdbcTemplate.java:712)
>> [spring-jdbc.jar:4.2.4.RELEASE]
>> at
>>
> org.springframework.jdbc.core.JdbcTemplate.query(JdbcTemplate.java:762)
>> [spring-jdbc.jar:4.2.4.RELEASE]
>> at
>>
>
org.ovirt.engine.core.dal.dbbroker.PostgresDbEngineDialect$PostgresSimpleJdbcCall.executeCallInternal(PostgresDbEngineDialect.java:152)
>> [dal.jar:]
>>
>> This looks like a bug
>>
>> Regardless, I am not sure restoring a backup would help since you
>> probably have orphaned images on the storage which need to be
>> removed
>>
>> Adding Ala
>>
>> On Mon, Jun 18, 2018 at 4:19 PM, <nicolas(a)devels.es> wrote:
>>
>> Hi Benny,
>>
>> Please find the SPM logs at [1].
>>
>> Thank you
>>
>> [1]:
>>
>>
>
https://wetransfer.com/downloads/62bf649462aabbc2ef21824682b0a08320180618...
>> [1]
>> [1]
>>
>> El 2018-06-18 13:19, Benny Zlotnik escribió:
>> Can you send the SPM logs as well?
>>
>> On Mon, Jun 18, 2018 at 1:13 PM, <nicolas(a)devels.es> wrote:
>>
>> Hi Benny,
>>
>> Please find the logs at [1].
>>
>> Thank you.
>>
>> [1]:
>>
>>
>
https://wetransfer.com/downloads/12208fb4a6a5df3114bbbc10af194c8820180618...
>> [2]
>> [2]
>>
>> [1]
>>
>> El 2018-06-18 09:28, Benny Zlotnik escribió:
>>
>> Can you provide full engine and vdsm logs?
>>
>> On Mon, Jun 18, 2018 at 11:20 AM, <nicolas(a)devels.es> wrote:
>>
>> Hi,
>>
>> We're running oVirt 4.1.9 (we cannot upgrade at this time) and
>> we're having a major problem in our infrastructure. On friday, a
>> snapshots were automatically created on more than 200 VMs and as
>> this was just a test task, all of them were deleted at the same
>> time, which seems to have corrupted several VMs.
>>
>> When trying to delete a snapshot on some of the VMs, a "General
>> error" is thrown with a NullPointerException in the engine log
>> (attached).
>>
>> But the worst part is that when some of these machines is powered
>> off and then powered on, the VMs are corrupt...
>>
>> VM myvm is down with error. Exit message: Bad volume specification
>> {u'index': 0, u'domainID':
u'110ea376-d789-40a1-b9f6-6b40c31afe01',
>> 'reqsize': '0', u'format': u'cow',
u'bootOrder': u'1', u'address':
>> {u'function': u'0x0', u'bus': u'0x00',
u'domain': u'0x0000',
>> u'type': u'pci', u'slot': u'0x06'},
u'volumeID':
>> u'1fd0f9aa-6505-45d2-a17e-859bd5dd4290', 'apparentsize':
>> '23622320128', u'imageID':
u'65519220-68e1-462a-99b3-f0763c78eae2',
>> u'discard': False, u'specParams': {}, u'readonly':
u'false',
>> u'iface': u'virtio', u'optional': u'false',
u'deviceId':
>> u'65519220-68e1-462a-99b3-f0763c78eae2', 'truesize':
'23622320128',
>> u'poolID': u'75bf8f48-970f-42bc-8596-f8ab6efb2b63',
u'device':
>> u'disk', u'shared': u'false', u'propagateErrors':
u'off', u'type':
>> u'disk'}.
>>
>> We're really frustrated by now and don't know how to procceed... We
>> have a DB backup (with engine-backup) from thursday which would
>> have
>> a "sane" DB definition without all the snapshots, as they were all
>> created on friday. Would it be safe to restore this backup?
>>
>> Any help is really appreciated...
>>
>> Thanks.
>> _______________________________________________
>> Users mailing list -- users(a)ovirt.org
>> To unsubscribe send an email to users-leave(a)ovirt.org
>> Privacy Statement:
https://www.ovirt.org/site/privacy-policy/ [3]
>> [3]
>> [2]
>> [1]
>> oVirt Code of Conduct:
>>
https://www.ovirt.org/community/about/community-guidelines/ [4] [4]
>> [3]
>> [2]
>> List Archives:
>>
>>
>
https://lists.ovirt.org/archives/list/users@ovirt.org/message/P5OOGBL3BRZ...
>> [5]
>> [5]
>> [4]
>> [3]
>>
>> Links:
>> ------
>> [1]
https://www.ovirt.org/site/privacy-policy/ [3] [3] [2]
>> [2]
https://www.ovirt.org/community/about/community-guidelines/ [4]
>> [4]
>> [3]
>> [3]
>>
>>
>
https://lists.ovirt.org/archives/list/users@ovirt.org/message/P5OOGBL3BRZ...
>> [5]
>> [5]
>> [4]
>>
>> Links:
>> ------
>> [1]
>>
>>
>
https://wetransfer.com/downloads/12208fb4a6a5df3114bbbc10af194c8820180618...
>> [2]
>> [2]
>> [2]
https://www.ovirt.org/site/privacy-policy/ [3] [3]
>> [3]
https://www.ovirt.org/community/about/community-guidelines/ [4]
>> [4]
>> [4]
>>
>>
>
https://lists.ovirt.org/archives/list/users@ovirt.org/message/P5OOGBL3BRZ...
>> [5]
>> [5]
>>
>> Links:
>> ------
>> [1]
>>
>
https://wetransfer.com/downloads/62bf649462aabbc2ef21824682b0a08320180618...
>> [1]
>> [2]
>>
>
https://wetransfer.com/downloads/12208fb4a6a5df3114bbbc10af194c8820180618...
>> [2]
>> [3]
https://www.ovirt.org/site/privacy-policy/ [3]
>> [4]
https://www.ovirt.org/community/about/community-guidelines/ [4]
>> [5]
>>
>
https://lists.ovirt.org/archives/list/users@ovirt.org/message/P5OOGBL3BRZ...
>> [5]
>
>
>
> Links:
> ------
> [1]
>
https://wetransfer.com/downloads/62bf649462aabbc2ef21824682b0a08320180618...
> [2]
>
https://wetransfer.com/downloads/12208fb4a6a5df3114bbbc10af194c8820180618...
> [3]
https://www.ovirt.org/site/privacy-policy/
> [4]
https://www.ovirt.org/community/about/community-guidelines/
> [5]
>
https://lists.ovirt.org/archives/list/users@ovirt.org/message/P5OOGBL3BRZ...