On Thu, Aug 29, 2019 at 11:41 AM Yedidyah Bar David <didi@redhat.com> wrote:

Hi all,

This is in a sense a continuation of the thread "Why filetransaction
needs to encode the content to utf-8?", but I decided that a new
thread is better.

I started to systematically convert the code to use a unicode
sandwich. I admit it was harder than I expected, and made me think
somewhat differently about the move to python3, and about how
reasonable (or not) it is to develop in the common subset of python2
and python3 vs ditching python2 and moving fully to python3. It seems
like at least parts of our (integration team) code will still have to
run in python2 also in oVirt 4.4, so I guess we'll not have much
choice :-)

Current patches are only for otopi and engine-setup, and are by no
means thorough - I didn't check each and every open() call and similar
ones. But it's enough for getting engine-setup finish successfully on
both python2 and python3 (EL7 and Fedora 29), with some utf-8 inserted
in relevant places of the input (for the plugins already handled).

I didn't bother trying non-utf-8 encodings. Perhaps I should, but it's
not completely clear to me what's the best approach [2].

A universal solution when dealing with sys.argv which could contain file paths/names in various languages,
would be selecting sys.getfilesystemencoding() for the encoding scheme instead of a hard coded 'utf-8' [3].

We've done something similar in sanlock python-c API for converting file-system paths into bytes, although it's in C,

the principle of using the file-system default encoding applies there as well [4].

[3] https://stackoverflow.com/a/5113874

[4] https://pagure.io/sanlock/blob/master/f/python/sanlock.c#_76

Currently, you must have both otopi and engine updated to get things
working. If there is demand, I might spend some time
splitting/rebasing/etc to make it possible to update just one of them
and only later the other, but not sure it's worth it.

I don't mind splitting/squashing if it makes reviews simpler, but I
think the patches are ok as-is. These are the bottom patches of each
stack:

otopi: https://gerrit.ovirt.org/102085

engine-setup: https://gerrit.ovirt.org/102934

[1] http://python-future.org/unicode_literals.html

[2] https://stackoverflow.com/questions/4012571/python-which-encoding-is-used-for-processing-sys-argv

Thanks and best regards,
--
Didi