[ List Archives Home ] [ Thread index for 2008 ]
[ Date index for 2008 ]
[ Author index for 2008 ]
[
Date Prev][
Date Next][
Thread Prev][
Thread Next][
Date Index][
Thread Index]
- Date: Fri, 23 Jun 2000 09:15:52 -0700 (PDT)
- From: Bob Rasmussen <ras@xxxxxxxxxx>
- Subject: Re: Diacritics and special characters
In response to questions, here's some background on the subject.
Innopac stores data in MARC format, which can include a wide variety of
"scripts". The curly-bracket display is used for cataloging, in order to
ensure that the exact desired character is in the record.
When Innopac goes to display (or print) the data, via browser, Java, or
telnet, it makes use of a "diac" file, which tells the server what diacritics
(and other scripts) the client can display. This technique originated with
dumb terminals. A VT220, for instance, can display the characters in the
Latin-1 set, suitable for Western Europe. III also produced a T160E terminal,
which could display about 400 character/diacritic combinations. For Far East,
they support single-country solutions, such as Chinese Big-5, or CCCII for
combined Chinese, Japanese, and Korean (CJK). They've done some work with
Thai, Vietnamese, and Indic scripts, I think, but I don't know details.
In a character-based environment, such as dumb terminals or telnet, the choice
of diac table is associated with the type of terminal. Users of Anzio (our
telnet client) who want diacritics will typically choose T160E emulation,
and those who want Far East will choose CCCII. It really works.
For Java or web-based servers (I don't know all the product names), there was
ONE diac file associated with the server.
There were two shortcomings with this approach, that for me at least came to
the surface at IUG 15 months ago: 1) There was no way to handle multiple
language sets, such as Japanese and French, let alone Arabic and anything
else; and 2) users of web browsers accessing CCCII data needed an add-on
product, such as UnionWay or WinMass, to translate CCCII to and from
characters that the PC could handle.
What was clearly needed for an all-encompassing "diac" file, that would
translate ALL characters and diacritic combos to a lingua franca. And the
lingua franca of the web and Java is Unicode, coded as UTF-8. Also, Anzio can
be configured to process UTF-8 data coming from and going to the host.
At IUG in Philadelphia recently, III announced that a UTF-8 diac file was
being released, initially for use with the Java and web products, and
"later" for the telnet interface. This seems, at least theoretically,
to be precisely the right solution.
So now the question: has anyone installed this UTF-8 diac support for web or
Java? Can you provide your URL and a few sample books to look up?
And question 2 (of much more interest to me): have they released the UTF-8
support for the telnet product? If so, has anyone installed that? Again, can
you provide URL and samples?
Note that you may still have issues on the client side with:
1. What characters are covered by fonts that are installed?
2. How does the client handle font switching, if necessary, between languages?
3. How well does the client handle combining (non-spacing) diacritics? Do they
really combine?
4. What methods are available for input of non-Roman characters?
Now, I hope you don't mind, I'll summarize Anzio's handling of these things.
Anzio can process data to/from the host in UTF8, T160E, CCCII, USMARC, and
various ISO sets and Windows codepages. It goes through some rather elaborate
logic to ensure that combining diacritics are handled well, even in cases
where the font does not contain them (we recommend using Courier New, with all
the extensions downloadable from Microsoft).
Anzio does not currently do font-switching. Some users have set up macro keys
to allow the user to switch fonts as needed. Also, we are testing a font from
Monotype that has all characters defined in Unicode 3.
Combining diacritics can be entered several ways. Far East characters can be
entered with an add-on (such as WinMass), with the Input Method Editor of the
Windows setup (such as Japanese, on a Japanese Windows installation), or on
Windows 2000 with all of the IMEs available.
Printing of all these characters is available in AnzioWin but not Anzio Lite.
I welcome any corrections, comments, etc. Because I know there are lurkers
interested, please respond on-list if at all appropriate.
--
Regards,
....Bob Rasmussen, President, Rasmussen Software, Inc.
personal e-mail: ras@xxxxxxxxxx
company e-mail: rsi@xxxxxxxxxx
voice: (US) 503-624-0360 (9:00-6:00 Pacific Time)
fax: (US) 503-624-0760
web: http://www.anzio.com