What is Unicode, ASCII, and ANSI?

Share your tips, tricks and favorite iMacros macros, scripts and applications for web automation in general here.
Forum rules
iMacros EOL - Attention!

The renewal maintenance has officially ended for Progress iMacros effective November 20, 2023 and all versions of iMacros are now considered EOL (End-of-Life). The iMacros products will no longer be supported by Progress (aside from customer license issues), and these forums will also no longer be moderated from the Progress side.

Thank you again for your business and support.

Sincerely,
The Progress Team
Post Reply
User avatar
Tech Support
Posts: 4948
Joined: Tue Sep 20, 2005 7:25 pm
Contact:

What is Unicode, ASCII, and ANSI?

Post by Tech Support » Thu Mar 01, 2007 10:14 am

The Basics
Letters are represented in a computer by numeric codes. Pretty much everybody agrees that, when the computer sees a code of 100 (decimal), it represents a lowercase "d". We don't all agree on what 250 represents, and therein lies the rub.

ASCII vs ANSI
We commonly refer to character encoding as a letter's "ASCII value," when we really mean "ANSI value." A lot of the time that's sufficient, but in fact the ASCII standard is pretty much obsolete.

ASCII (American Standard Code for Information Interchange) is a 7-bit standard that has been around since the late 1950s (its current incarnation dates from 1968). It defines 128 different characters, which is more than enough for English: upper- and lowercase letters, punctuation, numerals, control codes (remember control-c?), and nonprinting codes such as tab, return, and backspace.

ASCII and ANSI are pretty good as long as you are western European. These two mappings are extremely limited in that they may only code (i.e. assign a number to) 256 letters, so that there is no space to include other glyphs from other languages.

Unicode
Unicode fixes the limitations of ASCII and ANSI, by providing enough space for over a million different symbols. Like the above two systems, each character is given a number, so that Russian ? is 042F, and the Korean won symbol ? is 20A9. (Note that all Unicode numbers are Hexadecimal, meaning that one counts by 16’s not 10’s, not a problem as users really don’t need to know the mapping numbers anyway.) So, although not yet totally comprehensive, Unicode covers most of the world’s writing systems. Most importantly, the mapping is consistent, so that any user anywhere on any computer has the same encoding as everyone else, no matter what font is being used.

So Unicode is a map, a chart of (what will one day be) all of the characters, letters, symbols, punctuation marks, etc. necessary for writing all of the world’s languages past and present.

What is the difference between UTF-8, UTF-16?
UTF-8 uses variable byte to store a Unicode. In different code range, it has its own code length, varies from 1 byte to 6 bytes. Because it varies from 8 bits (1 byte), it is so called "UTF-8". UTF-8 is suitable for using on Internet, networks or some kind of applications that needs to use slow connection.

Unicode (or UCS) Transformation Format, 16-bit encoding form. The UTF-16 is the Unicode Transformation Format that serializes a Unicode scalar value (code point) as a sequence of two bytes, in either big-endian or little-endian format. Because it is grouped by 16-bits (2 bytes), it is also called "UTF-16", which is the most commonly used standard.


iMacros 7, iMacros for IE/Firefox/Chrome and the Scripting Interface support all formats (ASCII, ANSI, UTF-8, UTF-16).
User avatar
Tech Support
Posts: 4948
Joined: Tue Sep 20, 2005 7:25 pm
Contact:

Re: What is Unicode, ASCII, and ANSI?

Post by Tech Support » Wed Jan 19, 2011 10:38 am

Now that all iMacros versions use Unicode, there can be a problem - in very rare cases - if you use normal ASCII files with "strange" characters such as:
¬

This is not an issue of iMacros itself, but a known issue called "Character Set Confusion".

You see, as long as one uses the ASCII "lower" set (until 127) there is no problem, but if the ASCII file contains a character which is in the "higher" set, then iMacros is not quite sure which code page was used to encode the character.

Now iMacros 7 assumes that the file was encoded in UTF8, which has an overlap with ASCII.

Solution: Convert your ASCII text macro file in UTF-8 (most text editors like Notepad++ have a function for this, and even Windows Notepad allows you to Save As and select the encoding).
Post Reply