unicode sandwich in otopi/engine-setup

Yedidyah Bar David

29 Aug 2019 29 Aug '19

10:41 a.m.

Hi all, This is in a sense a continuation of the thread "Why filetransaction needs to encode the content to utf-8?", but I decided that a new thread is better. I started to systematically convert the code to use a unicode sandwich. I admit it was harder than I expected, and made me think somewhat differently about the move to python3, and about how reasonable (or not) it is to develop in the common subset of python2 and python3 vs ditching python2 and moving fully to python3. It seems like at least parts of our (integration team) code will still have to run in python2 also in oVirt 4.4, so I guess we'll not have much choice :-) Current patches are only for otopi and engine-setup, and are by no means thorough - I didn't check each and every open() call and similar ones. But it's enough for getting engine-setup finish successfully on both python2 and python3 (EL7 and Fedora 29), with some utf-8 inserted in relevant places of the input (for the plugins already handled). I didn't bother trying non-utf-8 encodings. Perhaps I should, but it's not completely clear to me what's the best approach [2]. Currently, you must have both otopi and engine updated to get things working. If there is demand, I might spend some time splitting/rebasing/etc to make it possible to update just one of them and only later the other, but not sure it's worth it. I don't mind splitting/squashing if it makes reviews simpler, but I think the patches are ok as-is. These are the bottom patches of each stack: otopi: https://gerrit.ovirt.org/102085 engine-setup: https://gerrit.ovirt.org/102934 [1] http://python-future.org/unicode_literals.html [2] https://stackoverflow.com/questions/4012571/python-which-encoding-is-used-fo... Thanks and best regards, -- Didi

Show replies by date

Amit Bawer

29 Aug 29 Aug

9 p.m.

On Thu, Aug 29, 2019 at 11:41 AM Yedidyah Bar David <didi@redhat.com> wrote:

...

Hi all,

This is in a sense a continuation of the thread "Why filetransaction needs to encode the content to utf-8?", but I decided that a new thread is better.

I started to systematically convert the code to use a unicode sandwich. I admit it was harder than I expected, and made me think somewhat differently about the move to python3, and about how reasonable (or not) it is to develop in the common subset of python2 and python3 vs ditching python2 and moving fully to python3. It seems like at least parts of our (integration team) code will still have to run in python2 also in oVirt 4.4, so I guess we'll not have much choice :-)

Current patches are only for otopi and engine-setup, and are by no means thorough - I didn't check each and every open() call and similar ones. But it's enough for getting engine-setup finish successfully on both python2 and python3 (EL7 and Fedora 29), with some utf-8 inserted in relevant places of the input (for the plugins already handled).

I didn't bother trying non-utf-8 encodings. Perhaps I should, but it's not completely clear to me what's the best approach [2].

A universal solution when dealing with sys.argv which could contain file paths/names in various languages, would be selecting sys.getfilesystemencoding() for the encoding scheme instead of a hard coded 'utf-8' [3]. We've done something similar in sanlock python-c API for converting file-system paths into bytes, although it's in C, the principle of using the file-system default encoding applies there as well [4]. [3] https://stackoverflow.com/a/5113874 [4] https://pagure.io/sanlock/blob/master/f/python/sanlock.c#_76

...

Currently, you must have both otopi and engine updated to get things working. If there is demand, I might spend some time splitting/rebasing/etc to make it possible to update just one of them and only later the other, but not sure it's worth it.

I don't mind splitting/squashing if it makes reviews simpler, but I think the patches are ok as-is. These are the bottom patches of each stack:

otopi: https://gerrit.ovirt.org/102085

engine-setup: https://gerrit.ovirt.org/102934

[1] http://python-future.org/unicode_literals.html

[2] https://stackoverflow.com/questions/4012571/python-which-encoding-is-used-fo...

Thanks and best regards, -- Didi

Yedidyah Bar David

4 Sep 4 Sep

9:24 a.m.

On Thu, Aug 29, 2019 at 10:00 PM Amit Bawer <abawer@redhat.com> wrote:

...

On Thu, Aug 29, 2019 at 11:41 AM Yedidyah Bar David <didi@redhat.com> wrote:

...
Hi all,

This is in a sense a continuation of the thread "Why filetransaction needs to encode the content to utf-8?", but I decided that a new thread is better.

I started to systematically convert the code to use a unicode sandwich. I admit it was harder than I expected, and made me think somewhat differently about the move to python3, and about how reasonable (or not) it is to develop in the common subset of python2 and python3 vs ditching python2 and moving fully to python3. It seems like at least parts of our (integration team) code will still have to run in python2 also in oVirt 4.4, so I guess we'll not have much choice :-)

Current patches are only for otopi and engine-setup, and are by no means thorough - I didn't check each and every open() call and similar ones. But it's enough for getting engine-setup finish successfully on both python2 and python3 (EL7 and Fedora 29), with some utf-8 inserted in relevant places of the input (for the plugins already handled).

I didn't bother trying non-utf-8 encodings. Perhaps I should, but it's not completely clear to me what's the best approach [2].

A universal solution when dealing with sys.argv which could contain file paths/names in various languages, would be selecting sys.getfilesystemencoding() for the encoding scheme instead of a hard coded 'utf-8' [3]. We've done something similar in sanlock python-c API for converting file-system paths into bytes, although it's in C, the principle of using the file-system default encoding applies there as well [4].

Thanks for the hint. Looked at this and thought a bit, and I tend to ignore/postpone until a need arises. We already have "utf-8" hard-coded in otopi 27 times, not sure it makes sense now to go after each and every one of them and analyze the more-general function (or expression, or even more complex) to replace it with. I guess this is only relevant for Windows, and I do not think anyone is going to try to port otopi to Windows soon. Searching for relevant keywords in google finds mostly results from around 2009-2012, which I guess was the time around which most systems converted their non-utf-8 file collections to utf-8. A somewhat newer example (2016): http://beets.io/blog/paths.html So I am going to ignore this. If you think that's a bad choice, please open a bug, and I'll handle it later. Thanks! For now, my top priority is to get otopi+engine-setup+host-deploy work well enough for: 1. Developers that use fedora for everything, or mix fedora and RHEL7/8 (e.g. engine on one, host on another). 2. RHV 4.4, with hosts being RHEL8. Best regards,

...

[3] https://stackoverflow.com/a/5113874 [4] https://pagure.io/sanlock/blob/master/f/python/sanlock.c#_76

...
Currently, you must have both otopi and engine updated to get things working. If there is demand, I might spend some time splitting/rebasing/etc to make it possible to update just one of them and only later the other, but not sure it's worth it.

I don't mind splitting/squashing if it makes reviews simpler, but I think the patches are ok as-is. These are the bottom patches of each stack:

otopi: https://gerrit.ovirt.org/102085

engine-setup: https://gerrit.ovirt.org/102934

[1] http://python-future.org/unicode_literals.html

[2] https://stackoverflow.com/questions/4012571/python-which-encoding-is-used-fo...

Thanks and best regards, -- Didi

-- Didi

Amit Bawer

10:49 a.m.

On Wed, Sep 4, 2019 at 10:25 AM Yedidyah Bar David <didi@redhat.com> wrote:

...

On Thu, Aug 29, 2019 at 10:00 PM Amit Bawer <abawer@redhat.com> wrote:

...
On Thu, Aug 29, 2019 at 11:41 AM Yedidyah Bar David <didi@redhat.com>

...
...
Hi all,

This is in a sense a continuation of the thread "Why filetransaction needs to encode the content to utf-8?", but I decided that a new thread is better.

I started to systematically convert the code to use a unicode sandwich. I admit it was harder than I expected, and made me think somewhat differently about the move to python3, and about how reasonable (or not) it is to develop in the common subset of python2 and python3 vs ditching python2 and moving fully to python3. It seems like at least parts of our (integration team) code will still have to run in python2 also in oVirt 4.4, so I guess we'll not have much choice :-)

Current patches are only for otopi and engine-setup, and are by no means thorough - I didn't check each and every open() call and similar ones. But it's enough for getting engine-setup finish successfully on both python2 and python3 (EL7 and Fedora 29), with some utf-8 inserted in relevant places of the input (for the plugins already handled).

I didn't bother trying non-utf-8 encodings. Perhaps I should, but it's not completely clear to me what's the best approach [2].

A universal solution when dealing with sys.argv which could contain file

wrote: paths/names in various languages,

...
would be selecting sys.getfilesystemencoding() for the encoding scheme instead of a hard coded 'utf-8' [3]. We've done something similar in sanlock python-c API for converting file-system paths into bytes, although it's in C, the principle of using the file-system default encoding applies there as well [4].

Thanks for the hint. Looked at this and thought a bit, and I tend to ignore/postpone until a need arises. We already have "utf-8" hard-coded in otopi 27 times, not sure it makes sense now to go after each and every one of them and analyze the more-general function (or expression, or even more complex) to replace it with. I guess this is only relevant for Windows, and I do not think anyone is going to try to port otopi to Windows soon.

Searching for relevant keywords in google finds mostly results from around 2009-2012, which I guess was the time around which most systems converted their non-utf-8 file collections to utf-8. A somewhat newer example (2016):

http://beets.io/blog/paths.html

So I am going to ignore this. If you think that's a bad choice, please open a bug, and I'll handle it later. Thanks!

Default Linux locale encoding is UTF-8, so I don't think its a bad choice.

...

For now, my top priority is to get otopi+engine-setup+host-deploy work well enough for:

1. Developers that use fedora for everything, or mix fedora and RHEL7/8 (e.g. engine on one, host on another).

2. RHV 4.4, with hosts being RHEL8.

Best regards,

...
[3] https://stackoverflow.com/a/5113874 [4] https://pagure.io/sanlock/blob/master/f/python/sanlock.c#_76

...
Currently, you must have both otopi and engine updated to get things working. If there is demand, I might spend some time splitting/rebasing/etc to make it possible to update just one of them and only later the other, but not sure it's worth it.

I don't mind splitting/squashing if it makes reviews simpler, but I think the patches are ok as-is. These are the bottom patches of each stack:

otopi: https://gerrit.ovirt.org/102085

engine-setup: https://gerrit.ovirt.org/102934

[1] http://python-future.org/unicode_literals.html

[2]

https://stackoverflow.com/questions/4012571/python-which-encoding-is-used-fo...

...
...
Thanks and best regards, -- Didi

-- Didi

Yedidyah Bar David

4:21 p.m.

On Wed, Sep 4, 2019 at 11:50 AM Amit Bawer <abawer@redhat.com> wrote:

...

On Wed, Sep 4, 2019 at 10:25 AM Yedidyah Bar David <didi@redhat.com> wrote:

...
On Thu, Aug 29, 2019 at 10:00 PM Amit Bawer <abawer@redhat.com> wrote:

...
On Thu, Aug 29, 2019 at 11:41 AM Yedidyah Bar David <didi@redhat.com> wrote:

...
Hi all,

This is in a sense a continuation of the thread "Why filetransaction needs to encode the content to utf-8?", but I decided that a new thread is better.

I started to systematically convert the code to use a unicode sandwich. I admit it was harder than I expected, and made me think somewhat differently about the move to python3, and about how reasonable (or not) it is to develop in the common subset of python2 and python3 vs ditching python2 and moving fully to python3. It seems like at least parts of our (integration team) code will still have to run in python2 also in oVirt 4.4, so I guess we'll not have much choice :-)

Current patches are only for otopi and engine-setup, and are by no means thorough - I didn't check each and every open() call and similar ones. But it's enough for getting engine-setup finish successfully on both python2 and python3 (EL7 and Fedora 29), with some utf-8 inserted in relevant places of the input (for the plugins already handled).

I didn't bother trying non-utf-8 encodings. Perhaps I should, but it's not completely clear to me what's the best approach [2].

A universal solution when dealing with sys.argv which could contain file paths/names in various languages, would be selecting sys.getfilesystemencoding() for the encoding scheme instead of a hard coded 'utf-8' [3]. We've done something similar in sanlock python-c API for converting file-system paths into bytes, although it's in C, the principle of using the file-system default encoding applies there as well [4].

Thanks for the hint. Looked at this and thought a bit, and I tend to ignore/postpone until a need arises. We already have "utf-8" hard-coded in otopi 27 times, not sure it makes sense now to go after each and every one of them and analyze the more-general function (or expression, or even more complex) to replace it with. I guess this is only relevant for Windows, and I do not think anyone is going to try to port otopi to Windows soon.

Searching for relevant keywords in google finds mostly results from around 2009-2012, which I guess was the time around which most systems converted their non-utf-8 file collections to utf-8. A somewhat newer example (2016):

http://beets.io/blog/paths.html

So I am going to ignore this. If you think that's a bad choice, please open a bug, and I'll handle it later. Thanks!

Default Linux locale encoding is UTF-8, so I don't think its a bad choice.

Thanks, Amit. Now rebased and triggered an OST run: https://jenkins.ovirt.org/view/oVirt%20system%20tests/job/ovirt-system-tests... If it passes, I'll merge both stacks soon. I'll then send a different email, to notify developers to update otopi on their machines. Best regards,

...

...
For now, my top priority is to get otopi+engine-setup+host-deploy work well enough for:

1. Developers that use fedora for everything, or mix fedora and RHEL7/8 (e.g. engine on one, host on another).

2. RHV 4.4, with hosts being RHEL8.

Best regards,

...
[3] https://stackoverflow.com/a/5113874 [4] https://pagure.io/sanlock/blob/master/f/python/sanlock.c#_76

...
Currently, you must have both otopi and engine updated to get things working. If there is demand, I might spend some time splitting/rebasing/etc to make it possible to update just one of them and only later the other, but not sure it's worth it.

I don't mind splitting/squashing if it makes reviews simpler, but I think the patches are ok as-is. These are the bottom patches of each stack:

otopi: https://gerrit.ovirt.org/102085

engine-setup: https://gerrit.ovirt.org/102934

[1] http://python-future.org/unicode_literals.html

[2] https://stackoverflow.com/questions/4012571/python-which-encoding-is-used-fo...

Thanks and best regards, -- Didi

-- Didi

-- Didi

Yedidyah Bar David

24 Sep 24 Sep

11:51 a.m.

On Wed, Sep 4, 2019 at 5:21 PM Yedidyah Bar David <didi@redhat.com> wrote:

...

On Wed, Sep 4, 2019 at 11:50 AM Amit Bawer <abawer@redhat.com> wrote:

...
On Wed, Sep 4, 2019 at 10:25 AM Yedidyah Bar David <didi@redhat.com> wrote:

...
On Thu, Aug 29, 2019 at 10:00 PM Amit Bawer <abawer@redhat.com> wrote:

...
On Thu, Aug 29, 2019 at 11:41 AM Yedidyah Bar David <didi@redhat.com> wrote:

...
Hi all,

This is in a sense a continuation of the thread "Why filetransaction needs to encode the content to utf-8?", but I decided that a new thread is better.

I started to systematically convert the code to use a unicode sandwich. I admit it was harder than I expected, and made me think somewhat differently about the move to python3, and about how reasonable (or not) it is to develop in the common subset of python2 and python3 vs ditching python2 and moving fully to python3. It seems like at least parts of our (integration team) code will still have to run in python2 also in oVirt 4.4, so I guess we'll not have much choice :-)

Current patches are only for otopi and engine-setup, and are by no means thorough - I didn't check each and every open() call and similar ones. But it's enough for getting engine-setup finish successfully on both python2 and python3 (EL7 and Fedora 29), with some utf-8 inserted in relevant places of the input (for the plugins already handled).

I didn't bother trying non-utf-8 encodings. Perhaps I should, but it's not completely clear to me what's the best approach [2].

A universal solution when dealing with sys.argv which could contain file paths/names in various languages, would be selecting sys.getfilesystemencoding() for the encoding scheme instead of a hard coded 'utf-8' [3]. We've done something similar in sanlock python-c API for converting file-system paths into bytes, although it's in C, the principle of using the file-system default encoding applies there as well [4].

Thanks for the hint. Looked at this and thought a bit, and I tend to ignore/postpone until a need arises. We already have "utf-8" hard-coded in otopi 27 times, not sure it makes sense now to go after each and every one of them and analyze the more-general function (or expression, or even more complex) to replace it with. I guess this is only relevant for Windows, and I do not think anyone is going to try to port otopi to Windows soon.

Searching for relevant keywords in google finds mostly results from around 2009-2012, which I guess was the time around which most systems converted their non-utf-8 file collections to utf-8. A somewhat newer example (2016):

http://beets.io/blog/paths.html

So I am going to ignore this. If you think that's a bad choice, please open a bug, and I'll handle it later. Thanks!

Default Linux locale encoding is UTF-8, so I don't think its a bad choice.

Thanks, Amit.

Now rebased and triggered an OST run:

https://jenkins.ovirt.org/view/oVirt%20system%20tests/job/ovirt-system-tests...

If it passes, I'll merge both stacks soon. I'll then send a different email, to notify developers to update otopi on their machines.

Was a bit hesitant to merge without verifying on fedora, but fc30 is still not fully building/built, so gave up and verified only on fc30 in dev env. Rebased patches, going to merge them soon if they pass CI.

...

Best regards,

...
...
For now, my top priority is to get otopi+engine-setup+host-deploy work well enough for:

1. Developers that use fedora for everything, or mix fedora and RHEL7/8 (e.g. engine on one, host on another).

2. RHV 4.4, with hosts being RHEL8.

Best regards,

...
[3] https://stackoverflow.com/a/5113874 [4] https://pagure.io/sanlock/blob/master/f/python/sanlock.c#_76

...
Currently, you must have both otopi and engine updated to get things working. If there is demand, I might spend some time splitting/rebasing/etc to make it possible to update just one of them and only later the other, but not sure it's worth it.

I don't mind splitting/squashing if it makes reviews simpler, but I think the patches are ok as-is. These are the bottom patches of each stack:

otopi: https://gerrit.ovirt.org/102085

engine-setup: https://gerrit.ovirt.org/102934

[1] http://python-future.org/unicode_literals.html

[2] https://stackoverflow.com/questions/4012571/python-which-encoding-is-used-fo...

Thanks and best regards, -- Didi

-- Didi

-- Didi

-- Didi

2388

Age (days ago)

2414

Last active (days ago)

List overview

Download

5 comments

2 participants

participants (2)

Amit Bawer
Yedidyah Bar David