|
Internationalization |
|
CustomMailer can be used to send email in many world-wide languages,
but it may take a little work by your and/or your recipients
depending upon the language. All the languages of Western Europe
and the Americas
generally work OK since the 8-bit character encodings and fonts
necessary for these are standard in Windows and are supported directly
by CustomMailer. However, for Eastern European, Asian, and
African languages such as Greek, Russian, Chinese, Japanese, Korean,
Arabic, Hebrew, etc. extra considerations are necessary to support the
special fonts and encoding
schemes that are required. CustomMailer is written in Java, so
all
its strings are double byte (Unicode),
but more things need to be true for an end-to-end double byte solution.
The following sections describe each aspect of using CustomMailer
for sending mail internationally.
User interface
At present the user interface for CustomMailer (menus, dialogs,
prompts, etc.) as well as the documentation for CustomMailer are all in
English. Our focus in CustomMailer has been on making sure it
supports sending internationalized mail content, and we regret we have
not had
the resources to translate the documentation or application GUI itself
into other languages.
Message template and mailing list input
As delivered, CustomMailer allows you to read message template and
mailing list files encoded with the following character sets: US-ASCII,
ISO-8859-1, Windows-1252 (also known as Cp1252), UTF-8, and Unicode.
The first four of these are 8-bit (single byte) standards,
whereas Unicode is a 16-bit (double byte) standard. The following
table shows the 8-bit character sets supported by CustomMailer.
Table - Eight-bit character sets
supported by CustomMailer
US-ASCII = yellow
ISO-8859-1 =
yellow + blue
Windows-1252 =
yellow + blue
+ green
0 NUL |
1 SOH |
2 STX |
3 ETX |
4 EOT |
5 ENQ |
6 ACK |
7 BEL |
8 BS |
9 TAB |
10 LF |
11 VT |
12 FF |
13 CR |
14 SO |
15 SI |
16 DLE |
17 DC1 |
18 DC2 |
19 DC3 |
20 DC4 |
21 NAK |
22 SYN |
23 ETB |
24 CAN |
25 EM |
26 SUB |
27 ESC |
28 FS |
29 GS |
30 RS |
31 US |
32 space |
33 ! |
34 " |
35 # |
36 $ |
37 % |
38 & |
39 ' |
40 ( |
41 ) |
42 * |
43 + |
44 , |
45 - |
46 . |
47 / |
48 0 |
49 1 |
50 2 |
51 3 |
52 4 |
53 5 |
54 6 |
55 7 |
56 8 |
57 9 |
58 : |
59 ; |
60 < |
61 = |
62 > |
63 ? |
64 @ |
65 A |
66 B |
67 C |
68 D |
69 E |
70 F |
71 G |
72 H |
73 I |
74 J |
75 K |
76 L |
77 M |
78 N |
79 O |
80 P |
81 Q |
82 R |
83 S |
84 T |
85 U |
86 V |
87 W |
88 X |
89 Y |
90 Z |
91 [ |
92 \ |
93 ] |
94 ^ |
95 _ |
96 ` |
97 a |
98 b |
99 c |
100 d |
101 e |
102 f |
103 g |
104 h |
105 i |
106 j |
107 k |
108 l |
109 m |
110 n |
111 o |
112 p |
113 q |
114 r |
115 s |
116 t |
117 u |
118 v |
119 w |
120 x |
121 y |
122 z |
123 { |
124 | |
125 } |
126 ~ |
127 del |
128 € |
129 |
130 ‚ |
131 ƒ |
132 „ |
133 … |
134 † |
135 ‡ |
136 ˆ |
137 ‰ |
138 Š |
139 ‹ |
140 Œ |
141 |
142 Ž |
143 |
144 |
145 ‘ |
146 ’ |
147 “ |
148 ” |
149 • |
150 – |
151 — |
152 ˜ |
153 ™ |
154 š |
155 › |
156 œ |
157 |
158 ž |
159 Ÿ |
160 |
161 ¡ |
162 ¢ |
163 £ |
164 ¤ |
165 ¥ |
166 ¦ |
167 § |
168 ¨ |
169 © |
170 ª |
171 « |
172 ¬ |
173 |
174 ® |
175 ¯ |
176 ° |
177 ± |
178 ² |
179 ³ |
180 ´ |
181 µ |
182 ¶ |
183 · |
184 ¸ |
185 ¹ |
186 º |
187 » |
188 ¼ |
189 ½ |
190 ¾ |
191 ¿ |
192 À |
193 Á |
194 Â |
195 Ã |
196 Ä |
197 Å |
198 Æ |
199 Ç |
200 È |
201 É |
202 Ê |
203 Ë |
204 Ì |
205 Í |
206 Î |
207 Ï |
208 Ð |
209 Ñ |
210 Ò |
211 Ó |
212 Ô |
213 Õ |
214 Ö |
215 × |
216 Ø |
217 Ù |
218 Ú |
219 Û |
220 Ü |
221 Ý |
222 Þ |
223 ß |
224 à |
225 á |
226 â |
227 ã |
228 ä |
229 å |
230 æ |
231 ç |
232 è |
233 é |
234 ê |
235 ë |
236 ì |
237 í |
238 î |
239 ï |
240 ð |
241 ñ |
242 ò |
243 ó |
244 ô |
245 õ |
246 ö |
247 ÷ |
248 ø |
249 ù |
250 ú |
251 û |
252 ü |
253 ý |
254 þ |
255 ÿ |
Display and fonts
As delivered, CustomMailer is able to display all alphabet-based
languages in both the message template and mailing list. This
includes languages that use alphabets such as Latin, Greek, Cyrillic,
Hebrew, Arabic, Devanagari, etc. However, the default fonts of
CustomMailer does not support the display of ideograph-based languages
such as Chinese, Japanese, Korean, Thai, etc. If you read in a
Unicode message template file that uses an
ideograph-based character set, you will see the ideographs replaced by
question marks or little box characters in CustomMailer. However,
the underlying content will still be correct and you can send messages
written in these languages successfully to your recipients. If
you would like to enable CustomMailer to display message templates and
mailing lists using ideographic characters, you can modify CustomMailer
to
support this as described in "Additional character sets" below.
Sending mail
As delivered, CustomMailer supports the sending of plain text mail
messages using the following character encodings: US-ASCII, ISO-8859-1,
Windows-1252, and UTF-8. By default, CustomMailer will examine
each email message and selected the appropriate character encoding for
sending it. If your message is entirely within the yellow area in
the table above, CustomMailer will send it as "US-ASCII". If your
message falls within the yellow and blue areas, CustomMailer will send
it as "ISO-8859-1". If your message uses characters in the
yellow, blue, and green areas, CustomMailer will send it as
"Windows-1252" (and not "Cp1252", which we found
some systems do not recognize). If your message contains Unicode
characters
not in the table above, then CustomMailer will automatically convert it
to
the "UTF-8" encoding scheme. UTF-8 is the Internet standard for
encoding16-bit
Unicode characters as 8-bit characters, which is necessary for sending
Unicode
data over the Internet since the SMTP standard only supports 8-bit
codes.
CustomMailer will encode not only your message body, but also the
SUBJECT
and ORGANIZATION fields, as well as the proper name portions of the TO,
FROM,
CC, BCC, REPLY TO, and RETURN TO fields.
Most Internet mail systems support full 8-bit transmission, but a
few older
systems only support 7-bit transmission. If CustomMailer
determines that a given recipient's SMTP server can only receive 7-bit
data, CustomMailer converts any non-ASCII 8-bit characters using a
7-bit transfer encoding method known as "quoted-printable".
However, since there are a few 8-bit systems that have
difficulties with quoted-printable encoding, CustomMailer always sends
8-bit characters to 8-bit systems. Since US-ASCII encoded
messages fit in 7 bits, they will work on both 7-bit and 8-bit systems
and
are sent as is.
CustomMailer also allows you to force the character encoding it uses
for sending mail. To send your message with a specific
encoding, locate the line mailSendingCharset= in the CustomMailer
4.0\CustomMailerApp folder and change it to, for example, mailSendingCharset=ISO-8859-1.
This will force the message to be sent in the specified character
encoding. Any characters that fall outside the character set of a
less-capable encoding are replaced by question marks ("?").
Forcing the character encoding to a more-capable encoding will
also work. If the more-capable encoding is UTF-8, the message
will be reencoded and sent as UTF-8, regardless of whether UTF-8 was
required. If Windows-1252 is specified for an
otherwise US-ASCII or ISO-8859-1 message or if ISO-8859-1 is specified
for
an otherwise US-ASCII message, the characters won't have to be
reencoded
but the message will be designated as using the specified character set.
CustomMailer handles HTML messages differently. HTML supports
its own internal character encoding scheme using what's known as
"numeric character references", for example: ….
CustomMailer automatically converts all non-ASCII characters in
your HTML message to numeric character
references. The result is a US-ASCII encoded message which can be
sent unmodified to either 7-bit or 8-bit systems. In this way
HTML
messages can contain any non-ASCII characters (including Unicode) and
they
will be transmitted correctly. This special handling only applies
to the HTML portion of the message, and the alternate text portion of
an
HTML message as well as all the header fields are handled like a plain
text
message, encoded as described above.
As delivered, CustomMailer does not support sending messages in other
international character sets
such as
ISO-8859-n (n>1), Big5, GB2312, Shift-JIS, etc. However, if
desired CustomMailer can be configured to provide this
support, see "Additional character sets" below.
Receiving mail
Finally, having received a message from you, your recipients must have
their mail readers set up to recognize the proper encodings and display
with the appropriate fonts. Most modern mail readers
automatically shift to the proper character set based on the character
set specified by CustomMailer in the MIME header of the email. No
Windows, Macintosh, or
Unix systems that we know of have a problem receiving US-ASCII or
ISO-8859-1. All current Windows, Macintosh, and Unix systems also
can handle Windows-1252, though sometimes older systems have trouble
recognizing the unique Windows-1252 characters or the supposedly
equivalent "Cp1252" name for the Windows-1252 character set. For
this reason, staying within the ISO-8859-1 character set (or even
US-ASCII) is a good idea if absolutely reliable mail is a must.
All current operating systems and mail readers are also capable of
reading UTF-8. However, some mail readers may need to be
specially configured to recognize UTF-8 and display the resulting
characters using appropriate fonts. Almost all modern mail
readers
do this automatically. Those that don't generally have a
preferences
or menu option to select the character coding explicitly. Note
that in many modern mail programs (for example, the Netscape mail
reader) there is a way to specify the default Character Coding,
but this just specifies the character coding that will be used when the
mail message itself does not say what character set it is using, which
is not an issue since CustomMailer always specifies the character set
in its messages explicitly.
It is also necessary to make sure the font used by the mail reader
to display
the message supports the intended characters. Almost all fonts
support
US-ASCII and ISO-8859-1. Many fonts also support Windows-1252,
but
there are some that substitute little boxes, question marks, or spaces
for
the unique Windows-1252 characters. For Unicode messages, the
recipient needs to set up an appropriate font for their language.
For example, under the Netscape mail reader, in Preferences:
Appearance: Fonts you can set the fonts for Unicode to any mainstream
font like Courier, Times New Roman, or Helvetica and all the alphabetic
languages will display correctly (including Greek, Russian, Arabic,
Hebrew, Devanagari, etc.). But the ideographic languages require
special fonts. For example, if you
set the Netscape Unicode Font to MS Song (the Microsoft font used for
Chinese),
then you will see a Chinese message in Chinese ideographs, though then
the other non-8-bit languages such as Greek, Russian, Arabic, Hebrew,
Devanagari,
etc. will no longer work.
Fortunately, you can generally rely on the fact that
almost any non-Western language recipient already receives lots of
email from all over the world and therefore has already configured
their email reader to display UTF-8 encoded mail in a font appropriate
for their native language.
If you need more help, here are a some useful links about how to set
up a browser/mail reader for Chinese and other Asian languages.
http://www2.meu.unimelb.edu.au/Webmentor/courses/nalsas/info/Demo/DemoMain.htm
http://chinese.yahoo.com/docs/info/download.html
http://www.hknet.com/HKNet/chinfaq.html
http://www.people.virginia.edu/~mk3u/mk_lab/Chinese_opening.htm
C:\Program
Files\CustomMailer 4.0\JRE\1.1\lib
C:\Program
Files\CustomMailer
4.0\CustomMailerApp\CustomMailerPreferences.txt
(or wherever you
installed CustomMailer 4.0 if not in this default
location). Toward the end of this file, locate the line:
mailSendingCharset=
mailSendingCharset=Big5
C:\Program Files\CustomMailer
4.0\CustomMailerApp\CustomMailerPreferences.txt
Toward the end of this file, locate the line:
messageTemplateCharset=
To read message template files encoded in, for example, Big5, change
this to:
messageTemplateCharset=Big5
C:\Program Files\CustomMailer
4.0\CustomMailerApp\CustomMailerPreferences.txt
Toward the end of this file, locate the line:
mailingListCharset=
To read message template files encoded in, for example, Big5, change
this to:
mailingListCharset=Big5
C:\Program Files\CustomMailer
4.0\CustomMailerApp\CustomMailerPreferences.txt
Toward the end of this file, locate the lines:
messageHeadersFont=
messageBodyFont=
mailingListFont=
helvetica,
courier,
and helvetica,
respectively.
These default settings support the entire Windows-1252 character
set. Each of the 8 languages listed above have their own
locale-specific font definitions, for which you should change the three
lines to: messageHeadersFont=sansserif
messageBodyFont=dialoginput
mailingListFont=sansserif
serif,
dialog,
monospaced,
timesroman,
and zapfdingbats.
These
alternatives will mostly vary in how they treat Latin characters
interspersed with the characters of your locale.