Skip to content

Input fixes #6

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 3 commits into
base: rewrite
Choose a base branch
from
Open

Input fixes #6

wants to merge 3 commits into from

Conversation

brliron
Copy link

@brliron brliron commented Aug 29, 2018

  • Fix unicode input (before this fix, the é character moved the cursor 2 characters to the right and required 2 backspaces to remove).
  • Switch input to non-blocking mode.
  • Fix end key.
  • Implement delete key.

Copy link
Owner

@NatTupper NatTupper left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • I like the ESCDELAY feature. I didn't know about that.
  • The while call in gc.ui_thread.funcs blocks are there so that functions actually finish execution before moving on. Leaving them out would cause problems.
  • Wide characters take up more memory. Normal characters still work with unicode, specifically UTF-8. Using wide characters may also cause locale issues since that would be UTF-16. Mixing UTF-8 and UTF-16 is messy.
  • the if ch == -1 is necessary because I've said input to nodelay. If it's omitted, it continues execution, causing issues. You did remove the nodelay, and it may not be necessary now since Discline-curses uses threads and doesn't rely on asyncio for everything anymore. I'll do some tests to see if it's still required.
  • The input buffer needs to be a list since characters need to be inserted in various places according to where the cursor is. Strings are immutable and therefore a new string would need to be created each and every time there is a change.
  • The various time.sleep lines were originally necessary because of Discline-curses' dependence on asyncio. They are probably no longer required which you noticed.

As it stands now, I cannot approve the pull request.

@brliron
Copy link
Author

brliron commented Aug 29, 2018

The while call in gc.ui_thread.funcs blocks are there so that functions actually finish execution before moving on. Leaving them out would cause problems.

I thought they weren't needed, but after thinking about it, I can see a lot of reasons why they may be. I'll put them back.

Wide characters take up more memory. Normal characters still work with unicode, specifically UTF-8. Using wide characters may also cause locale issues since that would be UTF-16. Mixing UTF-8 and UTF-16 is messy.

It messes up the cursor on my end. Let's take for example the é character: https://www.fileformat.info/info/unicode/char/e9/index.htm. Its UTF-8 code is 0xC3 0xA9. With getch, you will read 2 characters: first the character 0xC3, then the character 0xA9. They will take 2 slots in the inputBuffer list. With the string "étude" for example, the inputBuffer list has a size of 6. And the cursor calculations use this size of 6, with the "é" string having a size of 2. If the cursor is right after the letter "d", pressing backspace will remove the "u" letter instead of the "d" letter. And every time you type a non-ascii character, it becomes worse. I really don't want to try with Japanese or Cyrillic...
About "more memory" and "Mixing UTF-8 and UTF-16", here is how I understand things.
The encoding of the Python 3 str type is an implementation detail. It may be UTF-8. It may be UTF-16. Anyway, it will always work, and Python code should just expect it to work for any unicode character, without thinking about implementation details. And actually, the implementation changed. Python 3.2 always used UTF-16 on Windows and UTF-32 on Linux. Python 3.3 uses ASCII, UCS-2 or UCS-4. See https://stackoverflow.com/a/9079985 and https://www.python.org/dev/peps/pep-0393/.
Every time you specify an encoding, you specify an input encoding. For example, in the old version of the code with bytearray(self.inputBuffer).decode("utf-8"), the "utf-8" is the encoding of the bytearray. The string created will always use whichever encoding Python decides to use.
Now, let's see about get_wch. The name of the Python get_wch function is misleading for a C programmer, because its name directly comes from the C get_wch function (https://linux.die.net/man/3/get_wch). The C function does return a "wide" character (probably in UTF-32). But the Python 3 function returns a str object, using the same encoding as every other Python str object. And because I wanted to check that, I went in the Python source code. The implementation of get_wch is here: https://github.com/python/cpython/blob/e42b705188271da108de42b55d9344642170aa2b/Modules/_cursesmodule.c#L1320. The character returned by the C function get_wch is converted to a Python string with the PyUnicode_FromOrdinal function, which is documented here: https://github.com/python/cpython/blob/e42b705188271da108de42b55d9344642170aa2b/Include/unicodeobject.h#L1075.

I didn't think this wall of text would end up being that big.

the if ch == -1 is necessary because I've said input to nodelay. If it's omitted, it continues execution, causing issues. You did remove the nodelay, and it may not be necessary now since Discline-curses uses threads and doesn't rely on asyncio for everything anymore. I'll do some tests to see if it's still required.

While the Python getch is documented to return -1 when no input is available in no-delay mode, the Python get_wch is documented to throw an exception instead (https://docs.python.org/3/library/curses.html#curses.window.getch). If I understand correctly, getch never returns -1 in blocking mode, and get_wch never returns -1 in any mode.

The input buffer needs to be a list since characters need to be inserted in various places according to where the cursor is. Strings are immutable and therefore a new string would need to be created each and every time there is a change.

Yeah, I worked around that by creating a new string every time, but using a list of strings (I mean a list of character, but a character is a string of size 1) would probably be more efficient. I will change that.

The various time.sleep lines were originally necessary because of Discline-curses' dependence on asyncio. They are probably no longer required which you noticed.

I guessed they were needed to avoid eating 100% of the CPU by asking repetitively "do you have a key?" "And now?" "Still no new key?" "Gimme a new key plz", which is fixed by using blocking mode.

@NatTupper
Copy link
Owner

Wide chars aren't required because multibyte characters will be caught each segment at a time. So if a character is 3 bytes long, getch() will take the first byte and the next time it is run, it grabs the second, then the third. The reason I used a list for strings was getch() outputs an integer equivalent to the value of a character or a section of the character. I had tried converting them to characters with chr() and then storing them, but they never would combine correctly. So in order to store them correctly, you have to have some sort of iterable that stores each byte and then combine them when they're needed. Before I implemented this, non-ASCII characters could not be inputted. However I didn't take into account Cyrillic, extended-Latin characters, Greek, etc.. I did take into account CJK (Chinese, Japanese, Korean). Those should work. There's a little function that gets the correct length of each character and allows you to enter and erase them correctly, but it only works for CJK. I'll implement extended-Latin and Cyrillic next.

There's another problem with wide chars and that is that they may not work with characters more than 2 bytes in length. Wide chars are UTF-16, so they store 2 bytes. I'm not sure they'd be stored correctly if captured using get_wch(), which is why it's better to use getch().

You're right that nodelay mode is no longer required because Discline-curses now uses threads for most things. I can remove that and clean up some code.

@brliron
Copy link
Author

brliron commented Aug 31, 2018

There's another problem with wide chars and that is that they may not work with characters more than 2 bytes in length. Wide chars are UTF-16, so they store 2 bytes. I'm not sure they'd be stored correctly if captured using get_wch(), which is why it's better to use getch().

Everything I found leads me to think this is wrong. I didn't find where the C lib curses converts its characters, but the API returns a wint_t, which is 4 bytes wide. Then, the Python wrapper uses PyUnicode_FromOrdinal. The parameter of this function is an int (4 bytes), and "must be in range(0x110000)" - which is the full Unicode range. That function returns a Python string, which uses UTF-32 (or ASCII when the string contains only ASCII characters, because that uses less memory).

And I'm not talking about #2. Actually, my pull request has the #2 bug and I should fix it. But #2 is related to fonts, and asian characters being displayed 2 times bigger than other characters. The bug I tried to fix causes 2 issues:

  • Misplaced cursor (looks like Formatting inconsistency? #2, but isn't)
  • Press backspace 3 times to delete a Japanese character (totally not related to Formatting inconsistency? #2)
    Maybe you fixed it somewhere else? But I tested with the rewrite branch just now, you still have to press backspace 9 times to remove 日日日, and the cursor is still misplaced (but it's also misplaced with my pull request, but that's because the cursor needs to have the fix in Formatting inconsistency? #2).

If you really don't want to go for get_wch to fix this, you can get the width of an unicode character yourself and do the maths yourself. The array at https://en.wikipedia.org/wiki/UTF-8#Description describes how to do it.

@NatTupper
Copy link
Owner

  • All right. I didn't know that python stored wide chars differently. If I were using C, it would need to be a common type, but that's not true here with python. I'll play around with get_wch()and see how it performs.
  • The misplaced cursor thing is a definite bug which needs fixing.
  • The backspace thing also is a major bug. I thought I had fixed that, but apparently not.

I'd happily merge your pull request once I decide if I want get_wch() to be in use. Once I comment here about that, if you could modify your pull request accordingly, I will test your modifications and merge them once I'm satisfied with them.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants