PuTTY wish unicode-combining

This is a mirror. Follow this link to find the primary PuTTY web site.

Home | FAQ | Feedback | Licence | Updates | Mirrors | Keys | Links | Team
Download: Stable · Pre-release · Snapshot | Docs | Changes | Wishlist

summary: Support Unicode combining characters
class: wish: This is a request for an enhancement.
difficulty: tricky: Needs many tuits.
depends: compressed-scrollback
blocks: unicode-normalisation
priority: medium: This should be fixed one day.
fixed-in: 2004-10-16 70230fc0ca9af7be32c44173facdde7b7660af7a (0.58)

Unicode contains a number of combining characters (typically diacritics) which can in principle be combined with _any_ character appearing before them. For example, you can display CE 9B (U+039B GREEK CAPITAL LETTER LAMBDA) followed by CC 8A (U+030A COMBINING RING ABOVE) and get a lambda with a ring above it, as used in a well-known sci-fi TV series logo.

PuTTY in UTF-8 mode currently does not support these combining characters at all. It should.

At a minimum, it should support the use of combining characters to create composite characters that actually exist as their own code points in Unicode. Hence, displaying U+030A when the cursor is just to the right of a capital A should yield U+00C5 (LATIN CAPITAL LETTER A WITH RING ABOVE).

Ideally - and this is where it gets hard - PuTTY should support an arbitrary sequence of diacritics in any character cell, so that the above logo can be displayed in full and copied and pasted sensibly. So rather than having exactly one Unicode code point per character cell, we would need to make the terminal data structures flexible so that more than one code point could be stored in a single char cell. That sequence of code points would be transmitted to the Windows text display function as a single string, causing the display of the correct composition of glyphs; and anyone copying and pasting that character cell would get the same sequence of code points too. Overwriting that cell would, of course, take out the whole lot in one go.

Experimentation with xterm in UTF-8 mode suggests that if the resulting character does have a Unicode representation then it should be used in cut-and-paste: hence sending A (U+0041) followed by U+030A should yield a character cell which pastes back as U+00C5. More formally, I think what's required is for the sequence of code points occupying a character cell to be maintained in Normalisation Form C (see UAX #15).

Glenn Maynard suggests that all this has a practical application:

Somewhat more usefully, full combining support is needed for Korean. As I understand it, while there are a lot of precomposed Korean characters available, they don't *all* exist, so real Korean support needs real combining support.

Update, 2004-10-15: PuTTY now has the ability to store an arbitrary number of code points per character cell. Cut and paste is implemented correctly, and display is by simple overlaying. Normalisation is not yet implemented; see unicode-normalisation.


If you want to comment on this web site, see the Feedback page.
Audit trail for this wish.
(last revision of this bug record was at 2017-04-28 16:52:45 +0100)