Subject: Re: RemapGlyph()
From: Andrew Dunbar (hippietrail@yahoo.com)
Date: Wed Jun 20 2001 - 23:53:16 CDT
> ad> Can somebody please explain the role of GR_Graphics::remapGlyph()?
> ad> It converts zero-width characters into "degree" symbols.  This is
> ad> the cause of Bug 1518.  Why do we do this?
> 
> ms> I'm fighting with this too. Symbol fonts have major problems being
> ms> printed on Gtk. I suspect that something even more serious is
> ms> happening with them too.
> 
> ms> Twice I've saved a test document with symbol fonts and both have
> ms> turned up as "bogus documents".  I suspect we problems in our
> ms> import/exporters.
> 
> Whoa, Nellie.  I think there are three different things in this one
> Q&A.
> 
> 1.  What does remapGlyphs actually do?  I'll come back to tha, below.
> 
> 2.  Printing under Gtk.  My guess is that this is a specific print
> driver problem.  Since remapGlyphs only comes into play when a
> character has a zero-width glyph, it can't be the source of this
> problem unless Gtk printing can somehow do something more appropriate
> with zero-width glyphs.
> 
> 3.  Bogus documents for documents with symbol fonts.  I think you're
> right about the exporter being the problem, though I thought this had
> been fixed at least a couple months ago (I still think that).  It was
> first noticed that the *.abw exporter was exporting smart quote
> characters as some other strange thing.  My assumption is that other
> non-Latin1 characters would get similar treatment.
My guess is that any or all of these could be due to character
set encodings and font encoding.  I understand that the old symbol
fonts had an encoding (or code page) all their own.  So just as
we have converters for iso-8859-1 etc we need one for the symbol
font.  I believe this is also an issue in the RTF import/export
for symbol fonts.
> OK, so what is remapGlyphs all about?  Here is a description I sent to
> someone about a year ago (so some things in the code base may have
> changed since then).
Ah thanks for this description (:
> ================================================================
> On some platforms, and for some fonts, there are glyphs missing at
> positions of interest.  In particular, the fonts supplied with Abi on
> Unix only have glyphs among the first 256 positions (ie, 8 bits).
> That means that any Unicode characters >=256 will be measured and
> rendered as zero-width characters.  This can be somewhat confusing to
> the average user.
Is it due X or the fonts or AbiWord that missing characters are
classed as zero-width characters?
> The most common case of this is in the use of "smart quotes" in
> documents imported from MSWord.  Abi MSWord and RTF importers
> correctly translate the characters to the appropriate Unicode
> characters positions, but they are all in the U+20xx range.
The MS western encoding contains characters that ISO-8859-1 does
not.  "Smart quotes" are the most obvious.  Importing must always
pass through iconv/mbtowc since Abi uses Unicode natively.
> The remapGlyphs feature provides preference values for which
> characters to show instead of invisible characters.  The remapping is
> done only for display/printing purposes; the document itself is not
> changed.  The default preferences will only do the remapping if the
> character is actually zero-width in the font being used, remappings
> are provided for the four Unicode curly quote characters, and there is
> a default remapping for any other characters that happen to come up
> zero-width.
Now we have a problem.  Missing characters and zero-width characters
are not the same thing.  Unicode contains many zero-width combining
characters which are fully visible.  Typically accent marks which
render over the previous letter.  Vietnamese also uses these even in
8 bit encodings.
This type of remapping for unsupported characters is known as
"transliteration" where we attempt to find the next best thing.
A regular "a" in place of an "á" for example.  Abiword has limited
code for this in XAP_EncodingManager::approximate() which also
handles smart quotes.  libiconv has beautiful support for
transliteration.  I recommend we distinguish between missing and
zero-width characters, and centralize transilteration.
> AbiWord 0.7.10 had some other distracting character spacing problems,
> so the best way to see the difference is to use a recent Unix build
> and view a document with smart quotes (easiest is to import a simple
> MSWord document, but I have attached a somewhat messy document I've
> been testing with) with the preference value turned on and off.
> Besides being invisible when the preference is turned off, moving the
> cursor with arrow keys does a double step at the position of the
> zero-width character.  This makes great sense to programmers but is
> confusing to regular folks.
This makes sense to regular Vietnamese folks too.  I think we can
have the best of both worlds.
Andrew Dunbar.
-- http://linguaphile.sourceforge.net_________________________________________________________ Do You Yahoo!? Get your free @yahoo.com address at http://mail.yahoo.com
This archive was generated by hypermail 2b25 : Thu Jun 21 2001 - 00:07:45 CDT