On Mon, Aug 31, 2020 at 4:48 AM Germano Veit Michel <germano(a)redhat.com> wrote:
On Sun, Aug 30, 2020 at 8:46 PM Nir Soffer <nsoffer(a)redhat.com> wrote:
>
>
>
> On Fri, Aug 28, 2020, 08:36 Germano Veit Michel <germano(a)redhat.com> wrote:
>>
>>
>>
>> On Fri, Aug 28, 2020 at 9:29 AM Nir Soffer <nsoffer(a)redhat.com> wrote:
>>>
>>>
>>>
>>> On Thu, Aug 27, 2020, 16:38 Tal Nisan <tnisan(a)redhat.com> wrote:
>>>>
>>>>
>>>>
>>>> On Fri, Aug 21, 2020 at 4:34 AM Germano Veit Michel
<germano(a)redhat.com> wrote:
>>>>>
>>>>> Hi,
>>>>>
>>>>> Is there a reliable way to figure out if a snapshot is in preview
only using information obtained from the storage domain metadata?
>>>>> I'm trying to find a way to distinguish a problematic snapshot
chain (double parent) from a snapshot in preview in order to improve dump-volume chains.
>>>>>
>>>>> Currently dump-volume-chains throws an error (DuplicateParentError)
if a snapshot is in preview for the image, as there is a 'Y' shape split in the
chain
>>>>> with 2 volumes (previous chain + preview) pointing to a common
parent:
>>>>>
>>>>> image: dff0a0c0-b731-4e5b-9f32-d97310ca40de
>>>>>
>>>>> Error: more than one volume pointing to the same parent
volume e.g: (_BLANK_UUID<-a), (a<-b), (a<-c)
>>>>>
>>>>> Unordered volumes and children:
>>>>>
>>>>> - e6c7bec0-53c6-4729-a4a0-a9b3ef2b8c38 <-
5eb2b29d-82d6-4337-8511-3c86705d566e
>>>>> status: OK, voltype: LEAF, format: COW, legality:
LEGAL, type: SPARSE, capacity: 1073741824, truesize: 1073741824
>>>>>
>>>>> - e0475853-4514-4464-99e7-b185cce9b67d <-
deceff83-9d88-4f87-8304-d5bf74d119b1
>>>>> status: OK, voltype: LEAF, format: COW, legality:
LEGAL, type: SPARSE, capacity: 1073741824, truesize: 1073741824
>>>>>
>>>>> - e6c7bec0-53c6-4729-a4a0-a9b3ef2b8c38 <-
e0475853-4514-4464-99e7-b185cce9b67d
>>>>> status: OK, voltype: INTERNAL, format: COW, legality:
LEGAL, type: SPARSE, capacity: 1073741824, truesize: 1073741824
>>>>>
>>>>> - 00000000-0000-0000-0000-000000000000 <-
e6c7bec0-53c6-4729-a4a0-a9b3ef2b8c38
>>>>> status: OK, voltype: INTERNAL, format: RAW, legality:
LEGAL, type: PREALLOCATED, capacity: 1073741824, truesize: 1073741824
>>>>>
>>>>> From the engine side it's easy, but I'd need to solve this
problem using only metadata from the storage.
>>>>>
>>>>> The only thing I could think of is that one of the volumes pointing
to the common parent has voltype LEAF. Any better ideas?
>>>>
>>>> don't think that there is any, Engine is the orchestrator and due to
that the info is only in the database
>>>
>>>
>>> There is no good way, but you can look at the length of the chain, and the
"ctime" value.
>>>
>>> For example if this was the original chain:
>>>
>>> a <- b <- c
>>>
>>> if we preview a:
>>>
>>> a <- b <- c
>>> a <- d
>>>
>>> You know that d is a preview volume.
>>>
>>> If we preview b, we will have two chains of same length:
>>>
>>> a <- b <- c
>>> a <- b <- d
>>>
>>> But the ctime value of d will be larger, since preview is created after
>>> the leaf was created.
>>>
>>> ctime is using time.time() so it is not affected by time zone changes
>>> but it may be wrong due to host time changes, so it is not reliable.
>>>
>>> Can you open a bug for this?
>>
>>
https://bugzilla.redhat.com/show_bug.cgi?id=1873382
>>
>> I have a prototype working with some code I pasted in the bugzilla, but I
don't think it's reliable and an overcomplication of what should be simple.
>
>
> I don't think the code in the bug is the way to handle this.
>
> It will be simpler and more useful to:
> 1. Find leaves
> 2. Follow the chain from each leaf, until the base (volume with no parent).
> 3. Display a tree instead of list, like lsblk.
>
> For example:
>
> dbf1e90c-41d5-4c2d-a8d2-15f2f04d3561
> ├─ea6af566-922c-4ca2-af17-67f7cd08826c
> └─aa5643ef-8c74-4b28-91e0-8d45d6ee426b
> └─30c4f6d1-7f1d-470b-96ae-7594cf367dfa
I like the idea of this visual representation, but it does not fix the problem.
The problem is dump-volume-chains throwing incorrect errors in case there is a snapshot
in preview.
Error: more than one volume pointing to the same parent volume e.g: (_BLANK_UUID<-a),
(a<-b), (a<-c)
This error is wrong, you should remove it, and instead show the tree.
There is still a double parent on the representation above. So if the
analysis is done (text output), there will
be an error detected no matter how we print it. If there is no way to distinguish a
preview from a double parent
problem without leaving any doubt based only on storage metadata only then we can improve
the
representation but ultimately the problem remains there.
Ideally I'd like to keep DoubleParentError logic and detect Previews to eliminate the
false errors.
This is not possible now.
The analysis should be done in the image discrepancy tool on the
engine, which has dump-volume-chains
output (json - no analysis) and the engine db. And we are already doing some basic checks
there. Maybe
we should even move the entire analysis logic there and make dump-volume-chains just
print and dump
data without doing analysis if the analysis cannot be done based on partial data.
The main idea here was to simply stop false errors for those who look for them in
dump-volume-chains
text output.
>
>
> Users of the tool will have to check engine db to understand how to fix the disk.
>
> Even if it was easy to detect a volume in preview, how do you know which chain
> should be kept? Did it fail just after the user asked to commit the preview?
This tool is not used to diagnose and correct issues on its own. It is used for 2 things,
but mainly the first:
a) Nice readable way to see volumes and their metadata, plus chain
b) Any obvious errors
The duplicate parent is printing false problems during preview, breaking the tool for B.
The main use is still A, use of dump-volume-chains is to stop collecting /dev/VG/metadata
LV or *.meta files
and have this info for the volumes in the sosreport.
I'm not aware of anyone using just the output of the tool to perform chain changes,
every failure
also requires checking the DB too and most importantly the logs (unless rotated).
>
> Storage format does not have a way to store info about the state of the disk, or make
atomic
> changes like remove one chain when committing after a preview. This is also the
reason we
> have trouble with removing snapshots.
Which means we cannot know for sure what is happening in the chain, right?
With this in mind, any suggestion to stop the false errors?
Change the code to handle a tree instead of a list of volumes, error is gone.
Since we cannot be sure of this based just on SD metadata, maybe the simplest is to
remove the
duplicate parent error string and/or add some warning that it could be a snapshot in
preview and just
print the unordered volumes.
The improved visual representation could be handled separate from this. I've thought
of something
similar in the past but found hard to print the volume metadata in a nice way (and we
need to handle
big chains of several dozen snapshots).
Thanks,
Germano
>
>
> Nir
>
>>
>>