[ovirt-devel] Re: unicode sandwich in otopi/engine-setup

24 Sep 2019


      On Wed, Sep 4, 2019 at 5:21 PM Yedidyah Bar David <didi@redhat.com> wrote:
...
On Wed, Sep 4, 2019 at 11:50 AM Amit Bawer <abawer@redhat.com> wrote:
...
On Wed, Sep 4, 2019 at 10:25 AM Yedidyah Bar David <didi@redhat.com> wrote:
...
On Thu, Aug 29, 2019 at 10:00 PM Amit Bawer <abawer@redhat.com> wrote:
...
On Thu, Aug 29, 2019 at 11:41 AM Yedidyah Bar David <didi@redhat.com> wrote:
...
Hi all,
This is in a sense a continuation of the thread "Why filetransaction
needs to encode the content to utf-8?", but I decided that a new
thread is better.
I started to systematically convert the code to use a unicode
sandwich. I admit it was harder than I expected, and made me think
somewhat differently about the move to python3, and about how
reasonable (or not) it is to develop in the common subset of python2
and python3 vs ditching python2 and moving fully to python3. It seems
like at least parts of our (integration team) code will still have to
run in python2 also in oVirt 4.4, so I guess we'll not have much
choice :-)
Current patches are only for otopi and engine-setup, and are by no
means thorough - I didn't check each and every open() call and similar
ones. But it's enough for getting engine-setup finish successfully on
both python2 and python3 (EL7 and Fedora 29), with some utf-8 inserted
in relevant places of the input (for the plugins already handled).
I didn't bother trying non-utf-8 encodings. Perhaps I should, but it's
not completely clear to me what's the best approach [2].
A universal solution when dealing with sys.argv which could contain file paths/names in various languages,
would be selecting sys.getfilesystemencoding() for the encoding scheme instead of a hard coded 'utf-8' [3].
We've done something similar in sanlock python-c API for converting file-system paths into bytes, although it's in C,
the principle of using the file-system default encoding applies there as well [4].
Thanks for the hint. Looked at this and thought a bit, and I tend to
ignore/postpone until a need arises. We already have "utf-8" hard-coded
in otopi 27 times, not sure it makes sense now to go after each and every
one of them and analyze the more-general function (or expression, or even
more complex) to replace it with. I guess this is only relevant for Windows,
and I do not think anyone is going to try to port otopi to Windows soon.
Searching for relevant keywords in google finds mostly results from around
2009-2012, which I guess was the time around which most systems converted
their non-utf-8 file collections to utf-8. A somewhat newer example (2016):
http://beets.io/blog/paths.html
So I am going to ignore this. If you think that's a bad choice, please
open a bug, and I'll handle it later. Thanks!
Default Linux locale encoding is UTF-8, so I don't think its a bad choice.
Thanks, Amit.
Now rebased and triggered an OST run:
https://jenkins.ovirt.org/view/oVirt%20system%20tests/job/ovirt-system-tests...
If it passes, I'll merge both stacks soon. I'll then send a different email,
to notify developers to update otopi on their machines.
Was a bit hesitant to merge without verifying on fedora, but fc30 is
still not fully building/built, so gave up and verified only on fc30
in dev env.

Rebased patches, going to merge them soon if they pass CI.
...
Best regards,
...
...
For now, my top priority is to get otopi+engine-setup+host-deploy work
well enough for:
1. Developers that use fedora for everything, or mix fedora and RHEL7/8
(e.g. engine on one, host on another).
2. RHV 4.4, with hosts being RHEL8.
Best regards,
...
[3] https://stackoverflow.com/a/5113874
[4] https://pagure.io/sanlock/blob/master/f/python/sanlock.c#_76
...
Currently, you must have both otopi and engine updated to get things
working. If there is demand, I might spend some time
splitting/rebasing/etc to make it possible to update just one of them
and only later the other, but not sure it's worth it.
I don't mind splitting/squashing if it makes reviews simpler, but I
think the patches are ok as-is. These are the bottom patches of each
stack:
otopi: https://gerrit.ovirt.org/102085
engine-setup: https://gerrit.ovirt.org/102934
[1] http://python-future.org/unicode_literals.html
[2] https://stackoverflow.com/questions/4012571/python-which-encoding-is-used-fo...
Thanks and best regards,
--
Didi
--
Didi
--
Didi
-- 
Didi