After we discussed the most basic concepts about the character encoding in previously 3 parts, next I'd like to discuss how the word you typed from a input method is correctly shown on the screen.
At first, let's take a look at a picture which is part of my presentation of "Web I18N".
As shown above, we can tell that mainly after 4 steps, one character you typed is shown correctly on your screen.
- Step 1: Type it via your input method.
The input method can link two kinds of mapping: mapping the inputted English characters to your own characters for the keyboard is usally only support 26 english characters and mapping your own characters to it's encoded code.
- Step 2: The textpad accepted what you typed in your own language and store with the correct text file encoding mode. Text file has some encoding mark at the beginning of the file. Usally, "0xFF 0xFE" will be the first 2 bytes if you stored the text with "Unicode(Big Endian)", "0xFE 0xFF" for "Unicode(Little Endian)" and "0xEF 0xBB 0xBF" for UTF-8 mode.
- Step 3: These text file will be transfered from server to client browser, and the server and client will exchange the language information.Here, the HTTP protocol plays an important role. Take a deep look as HTTP protocol will help u a lot.
- Step 4: After the browser received the HTML file it mapping the character code inside it with the font.
Using which font is prompted by response or a HTML tag which contains the charset information.
- Step 5: The browser invoke the system API to rander the HTML with correct character in your own language.Till now, the end user will see the right characters that he can read.(maybe actually he can not read it, but the browser though it's shown the right characters)
After now, u should understand the full work flow for a character from inputed to show to the end user.
Next, we will see how to avoid some traps may inside this flow and give u a complete solution.