|
The Problem
A pound sign in an e-mail arrives as a hash sign or an accented or other special character is received as a different letter or character.
The Solution
The SENDER must set up their e-mail system correctly as described below under Settings for some e-mail systems.
However, this might apply to you as well. The problem is that seeing the correct characters on your screen display doesn't necessarily mean that they will be sent and received correctly and you won't know anything is wrong unless your correspondent takes the trouble to tell you. Problem characters are the pound sign, the euro sign and most accented characters. The euro sign can be a particular problem for which there are alternative solutions (See Western European character sets and Other character sets below).
To send accented or some special characters (including the pound and euro signs) in text-based e-mails you need to set the following options in your e-mail system,
- Set MIME encoding with Quoted Printable set,
- select Character Sets such as ISO 8859-1 (or ISO 8859-15) for Western European languages. See below under Other character sets for more information.
See below for how to set these for some of the more popular e-mail systems. MIME encoding with Quoted Printable set can and should be left permanently set as they will not need to be changed for any other type of e-mail.
Once set up, you should send a test message to a friend at another site (or to me if you like) to ensure that the pound/euro sign and/or accented characters are being received correctly.
Contents
0. The problem
1. Settings for some e-mail systems
2. Western European character sets
3. MIME encoding
4. Other character sets
5. Character entry
6. Receiving e-mails
7. Source code of e-mails
8. Glossary
1. Settings for some e-mail systems
return to menu
Different versions of the same e-mail system may use slightly different settings to achieve the same result. The following settings (for PC systems) should be alright for recent versions (2002) (If not, please let me know). If someone will send me settings for Macs I will add them to the list.
Don't forget, once set there is no need to change the MIME encoding with Quoted Printable setting again, even if you sometimes use HTML.
Eudora
For settings, click ..
- Tools menu
- Options
- Composing Mail
- then tick May Use Quoted-Printable.
You can also find Attachments in Options and then select Encoding Method to MIME.
Character set cannot be chosen. It is fixed for country of origin only. (Note that this does not in itself prevent sending characters from another character set, only that the recipient will have to assume the appropriate character set, which would probably be their own anyway.)
Netscape and Mozilla
For settings, click ..
- Edit menu
- Preferences
- then double click Mail & Newsgroups
- then click Messages or Composition
- then select Using the 'Quoted Printable' MIME Encoding
For Character Set selection, click ..
- View menu, and
- Character Set/Coding,
- then tick required Character Set.
When sending e-mails check that the correct character set is being used in the same way but using the View menu in the new message panel.
Opera
Does not appear possible to set Quoted Printable
(.. so could not even send pound sign, £)
For Character Set selection, click ..
- click File menu
- Preferences
- E-mail
- and Properties.
- Then in E-mail account properties
select Outgoing tab
and select required character set in Font and encoding section.
Does not appear possible to select a character set for incoming messages.
Outlook Express
For settings, click ..
- Tools menu
- Options
- and click the Send tab
- and then under Mail Sending Format select Plain Text
- then click Plain Text Settings
- and select MIME and Quoted Printable
- Click OK and then Apply.
(If you also need to send HTML, set the corresponding HTML settings.)
Under News Sending Format you may want to make the corresponding settings as for Plain Text (and HTML?).
For selection of your default Character Set for sending messages, click ..
- International Settings in the Send tab and
- then for Default Encoding
select required character set.
- For selection of your default Character Set for receiving messages,
click Fonts in the Read tab and
then select the required font settings and click Set as default.
- To select a different character set for reading a particular message, click ..
- View menu
- Encoding
- then select the appropriate character set
or for sending a new message click the Format menu on the New Message window
- then click Encoding and select the required character set.
Pegasus
For settings, click ..
- Click Tools menu, and Options,
- click Advanced settings and note default MIME character set, probably ISO-8859-1,
- click Messages and Replies and select Use MIME features on,
- click Sending mail and deselect Allow 8-bit MIME message encoding.
Note that though it appears possible to change the name of the default MIME character set above I have only succeeded in sending accented characters with ISO-8859-1 which was as set when the system was downloaded.(Note. Jaro.Lajovic@mf.uni-lj.si has been able to use ISO 8859-2 and is willing to give advice on how to do this.) It does not appear possible to change the character set for reading messages.
2. Western European character sets
return to menu
All characters (including the characters for the digits) are represented in computers by numbers from 0 to 255 (256 in all). This because they are each represented by 8 binary bits called a byte which can store a value of 2 raised to the power 8 (256).
The current (2002) international e-mail transmission protocol only allows the use of 7 bit numbers, allowing just 128 different characters, and therein lies the main problem we are addressing.
The first 32 of these are various control characters of which only one, namely the TAB character, is used normally by us in e-mails. This leaves only 96 characters which can be used for e-mails (10 digits, 26 lower case and 26 upper case letters and 34 punctuation and special characters). These are the only characters, or rather, values representing characters, which can be used in our e-mails (so obviously we need some way of representing all the other characters we might need, which we will discuss shortly).
The numbers from 0 to 31 are used for various control characters, and the numbers from 32 to 127 represent the 96 characters we can normally use for e-mails. These are shown below with their numeric values.
| ASCII codes: 32-127 |
| 32 SPACE |
48 0 |
64 @ |
80 P |
96 ` |
112 p |
| 33 ! |
49 1 |
65 A |
81 Q |
97 a |
113 q |
| 34 " |
50 2 |
66 B |
82 R |
98 b |
114 r |
| 35 # |
51 3 |
67 C |
83 S |
99 c |
115 s |
| 36 $ |
52 4 |
68 D |
84 T |
100 d |
116 t |
| 37 % |
53 5 |
69 E |
85 U |
101 e |
117 u |
| 38 & |
54 6 |
70 F |
86 V |
102 f |
118 v |
| 39 ' |
55 7 |
71 G |
87 W |
103 g |
119 w |
| 40 ( |
56 8 |
72 H |
88 X |
104 h |
120 x |
| 41 ) |
57 9 |
73 I |
89 Y |
105 i |
121 y |
| 42 * |
58 : |
74 J |
90 Z |
106 j |
122 z |
| 43 + |
59 ; |
75 K |
91 [ |
107 k |
123 { |
| 44 , |
60 < |
76 L |
92 \ |
108 l |
124 | |
| 45 - |
61 = |
77 M |
93 ] |
109 m |
125 } |
| 46 . |
62 > |
78 N |
94 ^ |
110 n |
126 ~ |
| 47 / |
63 ? |
79 O |
95 _ |
111 o |
127 DELETE |
This is the original basic set of characters in the US ASCII Standard and which is now encapsulated in the first half of what is called the ISO 8859 Standard.
Note that this set of characters does not include the pound sign, £, or the euro sign, €, see below, and obviously does not include any accented characters used in other countries or any special characters like the symbols for division, ÷, and multiplication, ×, and other useful characters which we would like to be able to use in our writing generally and, by extension, to be able to send in our e-mails.
This has been achieved for Western European countries (and others) by allocating characters to those values in a byte from 160 to 255 (Values from 128 to 159 were not allocated initially as it was thought they might cause a clash with the corresponding control characters in the basic ASCII character set from 0 to 31. These are encapsulated in the ISO 8859-1 (Latin 1) standard character set in what is sometimes referred to as the upper half of the set as shown below.
There are now two alternatives to ISO 8859-1 (Latin1) which both include the euro sign, but in different places. It has been included in ISO 8859-15 (Latin9) with the value of 164. It has been included in Windows 1252 with the value of 128. If your keyboard has a euro symbol on it what code you get may depend upon the character set you are using. Use ISO 8859-15 if this is available, otherwise use Windows 1252, but whichever you use you need to check that it actually works with a correspondent.
Note that, if you use the euro key with ISO 8859-15 (and possibly with
ISO 8859-1) but do not use MIME encoding with Quoted Printable set, the
recipient will see a dollar sign, which could be confusing, if not
expensive!
| Extended ASCII codes: 160-255 |
| 160 |
176 ° |
192 À |
208 Ð |
224 à |
240 ð |
| 161 ¡ |
177 ± |
193 Á |
209 Ñ |
225 á |
241 ñ |
| 162 ¢ |
178 ² |
194 Â |
210 Ò |
226 â |
242 ò |
| 163 £ |
179 ³ |
195 Ã |
211 Ó |
227 ã |
243 ó |
| 164 ¤ |
180 ´ |
196 Ä |
212 Ô |
228 ä |
244 ô |
| 165 ¥ |
181 µ |
197 Å |
213 Õ |
229 å |
245 õ |
| 166 ¦ |
182 ¶ |
198 Æ |
214 Ö |
230 æ |
246 ö |
| 167 § |
183 · |
199 Ç |
215 × |
231 ç |
247 ÷ |
| 168 ¨ |
184 ¸ |
200 È |
216 Ø |
232 è |
248 ø |
| 169 © |
185 ¹ |
201 É |
217 Ù |
233 é |
249 ù |
| 170 ª |
186 º |
202 Ê |
218 Ú |
234 ê |
250 ú |
| 171 « |
187 » |
203 Ë |
219 Û |
235 ë |
251 û |
| 172 ¬ |
188 ¼ |
204 Ì |
220 Ü |
236 ì |
252 ü |
| 173 |
189 ½ |
205 Í |
221 Ý |
237 í |
253 ý |
| 174 ® |
190 ¾ |
206 Î |
222 Þ |
238 î |
254 þ |
| 175 ¯ |
191 ¿ |
207 Ï |
223 ß |
239 ï |
255 ÿ |
However, remembering that only values in the range 0 to 127 can be sent reliably through the internet, the problem is how could we send these characters in our e-mails?
3. MIME encoding
return to menu
The solution is to use MIME encoding which was devised by the Internet Consortium to overcome the US ASCII bottleneck. There are two schemes within MIME for transmitting e-mails containing characters with values from 128 to 255. One is called Base64 (not recommended here) which entails converting every three 8 bit characters into four 7 bit characters using only the basic ASCII characters (values 0 - 127), the other is called Quoted Printable which entails converting only those characters with a value over 127 into a triple of characters consisting of an equals sign followed by the two character hexadecimal (see glossary below) value of the decimal value of that character (all these possible characters are in the basic ASCII character set with values 0 - 127). For example, £50.00 would be coded as =A350.00 with the =A3 representing the pound sign with a value of A3 = 163 (see above). This would be converted automatically back to a pound sign by the recipient's e-mail system PROVIDED that the SENDER had specified MIME encoding with Quoted Printable set as described above.
A disadvantage of Base64 is that the length of the message is increased considerably whether there are any accented or special characters in the message or not, whereas the advantage of Quoted Printable is that only those accented or special characters with values over 127 are converted from one character to three characters.
The majority of current e-mail systems have the facility to set their system to send e-mails using MIME encoding with Quoted Printable set. The point to remember about this is that once set there is no need to change the setting from one message to the next. If there are no accented or special characters in a message, MIME encoding with Quoted Printable will not be invoked. Indeed, there is an argument that this setting should be the default setting for all text based e-mail systems.
How to set some e-mail systems for MIME encoding with Quoted Printable set is described under Settings for some E-Mail Systems above.
4. Other Character Sets
return to menu
ISO 8859-15 (Latin9)is an amended version of ISO 8859-1 (Latin1) for Western Europe in which the euro sign has been incorporated with a value of 164 with some hitherto missing special characters for French, at the expense of 7 previously less used characters.
Windows 1252 is a Microsoft version which is identical to ISO 8859-1 except that Windows 1252 includes 32 additional special characters using the values 128-159, of which 128 is the euro sign. If using a euro key on your keyboard with Windows 1252 you should get the correct code. If you don't have a euro key then see Character entry below.
However, there are other character sets within the ISO 8859 range, which your e-mail system may be able to use for other languages, such as Cyrillic (Russian etc.) or Greek, with ISO standard numbers, such as ISO 8859-5, ISO 8859-7 etc. (and also UNICODE (UTF-7 or UTF-8) not described here).
In practice, in the USA and most of Europe the ISO 8859 range is normally used (2002) for e-mails but UNICODE may be used for many Eastern languages whose characters cannot be accommodated within a limit of 256 characters.
The standard ISO 8859 range all use the same basic US ASCII character set with values from 0 - 127 but they each have a further set of characters peculiar to their languages in the upper half of the set, with the values 160 - 255. You can see a list of these at http://czyborra.com/charsets/iso8859.html
To send an e-mail containing these characters you must, of course, continue to use MIME encoding with Quoted Printable set, but in addition you should specify the intended meaning of the values being sent by specifying the character set intended as ISO 8859-2, or ISO 8859-3 etc. as required.
How to set some e-mail systems for different character sets is described under Settings for some e-mail systems above.
5. Character entry
return to menu
Different countries have different keyboards designed for their languages and these enable users to enter their language's accented and special characters by using appropriately marked keys. The keyboard, combined with appropriate language tables already set up in the computer system, generates the correct computer values for those characters above the value of 127. For instance, the UK keyboard has keys for the pound sign (163) and the euro (164), the French keyboard has keys for a acute (193 and 225) and c cedilla (199 and 231) etc.
However, the question arises about what to do if you want to send an accented or special character which is not represented on your keyboard. For instance, a UK user may wish to send an e-mail to a French colleague in French requiring the use of some accented characters not shown on a UK keyboard.
Some text editing systems do have facilities for displaying the extra characters available for selection (and this would be a useful addition to an e-mail system) but we are considering composing messages in a standard way in a variety of e-mail systems. The simplest standard way of entering characters not appearing on your keyboard is to use a standard keyboard facility as follows.
Hold down the Alt key and, using the keypad, enter the digit zero followed by the three digit value of the character required. Nothing happens at first but when you take your finger off the Alt key the required character appears on the screen. For example, ô is Alt/0244. Beware that for this to work with some e-mail systems (e.g. Outlook Express and Eudora) the NumLock light needs to be on.
If your keyboard does not have a keypad, for example, you are using a laptop, there is usually a way of using some of the lettered keys instead to replace the keypad.
The euro sign can still be a problem. If you have a euro key you should use this as the keyboard will usually enter the correct value according to the character set being used. If you use the keyboard method described above to enter the euro sign the system may rather bizarrely ignore the code you enter and substitute a question mark, ?. In all cases, you need to check that it actually works with a correspondent
Of course, you need to know the decimal value of the character required. There are a number of websites which contain lists of the contents of most, if not all, of the ISO 8859-n range of characters sets (such as Czyborra's site mentioned above). The current highest value of n is 15 (2002), but new sets are added from time to time. The best thing to do is to write out the set you are going to use frequently in much the same format as mine above and save it in a file which you can then have open at the same time as composing your e-mails so you can refer to it easily. If you send a lot of these e-mails I'm sure you will begin to remember the values of the most frequently used characters anyway.
6. Receiving e-mails
return to menu
Sometimes you will receive e-mails from abroad with strange characters where accented or special characters should be. With the above knowledge you will be able to diagnose what has gone wrong which would enable you to correct the text if necessary and/or advise the sender of what has happened and what to do about it.
For instance if you receive the word frangais from a French correspondent it looks as though the letter g (103) should have been a c cedilla (231), the difference between their values in the ISO 8859-1 table being 128. This would indicate that the e-mail had been sent as is without MIME encoding and Quoted Printable set (or at least without Quoted Printable set) so that the high order bit of the 8 bit byte had been lost during transmission through the internet.
If you examine the first, or lower half of an ISO 8859 character set (always the same as US ASCII) and compare it with the second, or upper half of the character set, you can see the correspondence between the characters which differ in value by 128 (for the technophiles, the value of the high order bit which could be lost in transmission through the internet).
The solution is to persuade your correspondent to set their e-mail system for MIME encoding with Quoted Printable set, and specifying the required ISO 8859-n character set. Actually ISO 8859-1 is normally assumed if not specified, but it is probably safer to be specific.
On the other hand, it may be that the sender has sent you an e-mail using MIME encoding with Quoted Printable set but with some other character set from the ISO 8859 range, for example, ISO 8859-7 for Greek. On the assumption that your e-mail system was set to expect ISO 8859-1 by default, you might see European accented characters dotted about what hopefully you would recognise as Greek words. In that case you would realise that you were at least seeing the upper half of an ISO 8859 character set but needed to set your own e-mail system to use one of the other character sets, in this case ISO 8859-7 (Greek).
To change your character set to match the sender's you would need to click something like View and Character Set to see a list of available character sets and then to select the one required when you should be able to see the text properly. However, the number of available character sets in any particular e-mail system is usually limited so you may be unlucky. In which case you might want to negotiate with the sender to use UTF-7 or UTF-8 (UNICODE - not described here) instead, or change to use another e-mail system which can process their character set!
If the sender had used UNICODE then you might see the text spread out so that every other character made sense but the intervening characters were squares or some other meaningless symbol. In that case changing your own character set to UTF-8 or UTF-7 (UNICODE) might cure the problem.
If you are unable to work out what character set the sender used it is usually possible to examine the source code of e-mails as described below to see what settings were actually used.
Whenever you change the setting of your own character set don't forget to restore your default afterwards or you might not be able to read your normal e-mails properly! Or you might even inadvertently later on send your own e-mails with the wrong coding.
The good news is that some e-mail systems will automatically select the correct character set from an incoming message, and provided it has been sent with MIME encoding with Quoted Printable set, it should display correctly without any intervention on your part.
7. Source code of e-mails
return to menu
The original source code, or more importantly, the transmission information in the heading of an e-mail message not normally displayed with the message, can be examined in most e-mail systems. See below for how to do this for some of the more popular e-mail systems.
The information of interest here is whether MIME encoding was specified, whether Quoted Printable was set, and the Character Set specified, and all usually appearing near the beginning of the e-mail, or possibly in front of separate parts of the e-mail.
MIME encoding usually shown as "MIME Version: 1.0", or similar.
Quoted Printable shown by X-MX-Comment: QUOTED-PRINTABLE message automatically decoded, or similar.
Character set shown by Content-Type: text/plain; charset=iso-8859-1, or similar.
Procedures for some e-mail systems for displaying e-mail source code:
- Eudora:
This does not appear to be able to display the source code of an e-mail.
- Netscape and Mozilla:
- Click View menu
- Page Source
- Opera:
- Click View menu
- Source
- Outlook Express:
- Click File menu
- Properties
- Details
- Pegasus:
Click the Raw View tab in the message window.
8. Glossary
return to menu
| ASCII | American Standard Code for Information Interchange |
| Bit | The smallest part of a byte which can be either 0 or 1. |
| Byte | A computer character consisting of 8 bits which can be used to represent any value between 0 and 255 (256 values in all). |
| Hexadecimal | A representation of a number using base 16 instead of the usual base 10. In this representation the digits are 0-9 and A-F (16 in all), so, for example, Hex.10 = 16, 20 = 32, 0A = 10, A0 = 160, A3 = 163 etc. with a maximum in this case with only two characters, of Hex.FF = 255. |
| ISO | International Standards Organisation (http://www.iso.ch) |
| MIME | Multi-purpose Internet Mail Extensions
(http://www.ufaq.org/navcom/mime_tutorial.html) |
If you have a problem with any of the above I would be pleased to hear from you.
David Wigg
wiggjd@bcs.org.uk
MT on the NET Project
Natural Language Translation Specialist Group
The British Computer Society
http://www.bcs-mt.org.uk/
|