Character Sets
One Byte Character Sets
GSM character set most widespread in messaging technology is defined by document referred to as [GSM 03.38] also referred to as IA5. It is always supported by both the infrastructure (SMSCs) and mobile handsets.
Basically the character set consists of 127 Latin, accented, Greek characters and symbols in default set plus few additional characters in extended set. Each character is defined by 7-bit value and characters are lined up one-after-another allowing to pack 160 characters in 140 bytes of data.
GSM Character Set
Character & Name | GSM (hex) | UCS-2 (hex) | CIMD2 (ASCII) | |
---|---|---|---|---|
@ | Commercial At | 00 | 0040 | _Oa |
£ | Pound Sign | 01 | 00a3 | _L- |
$ | Dollar Sign | 02 | 0024 | |
¥ | Yen Sign | 03 | 00a5 | _Y- |
è | Latin Small Letter E With Grave | 04 | 00e8 | _e` |
é | Latin Small Letter E With Acute | 05 | 00e9 | _e' |
ù | Latin Small Letter U With Grave | 06 | 00f9 | _u` |
ì | Latin Small Letter I With Grave | 07 | 00ec | _i` |
ò | Latin Small Letter O With Grave | 08 | 00f2 | _o` |
Ç | Latin Capital Letter C With Cedilla | 09 | 00c7 | _C, |
(cr) | Carriage Return | 0a | 000a | |
Ø | Latin Capital Letter O With Stroke | 0b | 00d8 | _O/ |
ø | Latin Small Letter O With Stroke | 0c | 00f8 | _o/ |
(lf) | Line Feed | 0d | 000d | |
Å | Latin Capital Letter A With Ring Above | 0e | 00c5 | _A* |
å | Latin Small Letter A With Ring Above | 0f | 00e5 | _a* |
Δ | Greek Capital Letter Delta | 10 | 0394 | _gd |
_ | Low Line (Underscore) | 11 | 005f | _-- |
Φ | Greek Capital Letter Phi | 12 | 03a6 | _gf |
Γ | Greek Capital Letter Gamma | 13 | 0393 | _gg |
Λ | Greek Capital Letter Lambda | 14 | 039b | _gl |
Ω | Greek Capital Letter Omega | 15 | 03a9 | _go |
Π | Greek Capital Letter Pi | 16 | 03a0 | _gp |
Ψ | Greek Capital Letter Psi | 17 | 03a8 | _gi |
Σ | Greek Capital Letter Sigma | 18 | 03a3 | _gs |
Θ | Greek Capital Letter Theta | 19 | 0398 | _gt |
Ξ | Greek Capital Letter Xi | 1a | 039e | _gx |
(esc) | Escape | 1b | 001b | _XX |
Æ | Latin Capital Letter Ae | 1c | 00c6 | _AE |
æ | Latin Small Letter Ae | 1d | 00e6 | _ae |
ß | Latin Small Letter Sharp S | 1e | 00df | _ss |
É | Latin Capital Letter E With Acute | 1f | 00c9 | _E' |
(sp) | Space | 20 | 0020 | |
! | Exclamation Mark | 21 | 0021 | |
" | Quotation Mark | 22 | 0022 | _qq |
# | Number Sign | 23 | 0023 | |
¤ | Currency Sign | 24 | 00a4 | _ox |
% | Percent Sign | 25 | 0025 | |
& | Ampersand | 26 | 0026 | |
' | Apostrophe | 27 | 0027 | |
( | Left Parenthesis | 28 | 0028 | |
) | Right Parenthesis | 29 | 0029 | |
* | Asterisk | 2a | 002a | |
+ | Plus Sign | 2b | 002b | |
, | Comma | 2c | 002c | |
- | Hyphen-Minus | 2d | 002d | |
. | Full Stop | 2e | 002e | |
/ | Solidus | 2f | 002f | |
0 | Digit 0 | 30 | 0030 | |
1 | Digit 1 | 31 | 0031 | |
2 | Digit 2 | 32 | 0032 | |
3 | Digit 3 | 33 | 0033 | |
4 | Digit 4 | 34 | 0034 | |
5 | Digit 5 | 35 | 0035 | |
6 | Digit 6 | 36 | 0036 | |
7 | Digit 7 | 37 | 0037 | |
8 | Digit 8 | 38 | 0038 | |
9 | Digit 9 | 39 | 0039 | |
: | Colon | 3a | 003a | |
; | Semicolon | 3b | 003b | |
< | Less-Than Sign | 3c | 003c | |
= | Equals Sign | 3d | 003d | |
> | Greater-Than Sign | 3e | 003e | |
? | Question Mark | 3f | 003f | |
¡ | Inverted Exclamation Mark | 40 | 00a1 | _!! |
A | Latin Capital Letter A | 41 | 0041 | |
B | Latin Capital Letter B | 42 | 0042 | |
C | Latin Capital Letter C | 43 | 0043 | |
D | Latin Capital Letter D | 44 | 0044 | |
E | Latin Capital Letter E | 45 | 0045 | |
F | Latin Capital Letter F | 46 | 0046 | |
G | Latin Capital Letter G | 47 | 0047 | |
H | Latin Capital Letter H | 48 | 0048 | |
I | Latin Capital Letter I | 49 | 0049 | |
J | Latin Capital Letter J | 4a | 004a | |
K | Latin Capital Letter K | 4b | 004b | |
L | Latin Capital Letter L | 4c | 004c | |
M | Latin Capital Letter M | 4d | 004d | |
N | Latin Capital Letter N | 4e | 004e | |
O | Latin Capital Letter O | 4f | 004f | |
P | Latin Capital Letter P | 50 | 0050 | |
Q | Latin Capital Letter Q | 51 | 0051 | |
R | Latin Capital Letter R | 52 | 0052 | |
S | Latin Capital Letter S | 53 | 0053 | |
T | Latin Capital Letter T | 54 | 0054 | |
U | Latin Capital Letter U | 55 | 0055 | |
V | Latin Capital Letter V | 56 | 0056 | |
W | Latin Capital Letter W | 57 | 0057 | |
X | Latin Capital Letter X | 58 | 0058 | |
Y | Latin Capital Letter Y | 59 | 0059 | |
Z | Latin Capital Letter Z | 5a | 005a | |
Ä | Latin Capital Letter A With Diaeresis | 5b | 00c4 | _A" |
Ö | Latin Capital Letter O With Diaeresis | 5c | 00d6 | _O" |
Ñ | Latin Capital Letter N With Tilde | 5d | 00d1 | _N~ |
Ü | Latin Capital Letter U With Diaeresis | 5e | 00dc | _U" |
§ | Paragraph | 5f | 00a7 | _so |
¿ | Inverted Question Mark | 60 | 00bf | _?? |
a | Latin Small Letter A | 61 | 0061 | |
b | Latin Small Letter B | 62 | 0062 | |
c | Latin Small Letter C | 63 | 0063 | |
d | Latin Small Letter D | 64 | 0064 | |
e | Latin Small Letter E | 65 | 0065 | |
f | Latin Small Letter F | 66 | 0066 | |
g | Latin Small Letter G | 67 | 0067 | |
h | Latin Small Letter H | 68 | 0068 | |
i | Latin Small Letter I | 69 | 0069 | |
j | Latin Small Letter J | 6a | 006a | |
k | Latin Small Letter K | 6b | 006b | |
l | Latin Small Letter L | 6c | 006c | |
m | Latin Small Letter M | 6d | 006d | |
n | Latin Small Letter N | 6e | 006e | |
o | Latin Small Letter O | 6f | 006f | |
p | Latin Small Letter P | 70 | 0070 | |
q | Latin Small Letter Q | 71 | 0071 | |
r | Latin Small Letter R | 72 | 0072 | |
s | Latin Small Letter S | 73 | 0073 | |
t | Latin Small Letter T | 74 | 0074 | |
u | Latin Small Letter U | 75 | 0075 | |
v | Latin Small Letter V | 76 | 0076 | |
w | Latin Small Letter W | 77 | 0077 | |
x | Latin Small Letter X | 78 | 0078 | |
y | Latin Small Letter Y | 79 | 0079 | |
z | Latin Small Letter Z | 7a | 007a | |
ä | Latin Small Letter A With Diaeresis | 7b | 00e4 | _a" |
ö | Latin Small Letter O With Diaeresis | 7c | 00f6 | _o" |
ñ | Latin Small Letter N With Tilde | 7d | 00f1 | _n~ |
ü | Latin Small Letter U With Diaeresis | 7e | 00fc | _u" |
à | Latin Small Letter A With Grave | 7f | 00e0 | _a` |
Extended GSM Character Set
Character & Name | GSM (hex) | UCS-2 (hex) | CIMD2 (ASCII) | |
---|---|---|---|---|
^ | Modified Letter Circumflex Accent | 1b14 | 02c6 | _XX_gl |
{ | Left Curly Bracket | 1b28 | 007b | _XX( |
| | Vertical Bar | 1b40 | 007c | _XX_!! |
} | Right Curly Bracket | 1b29 | 007d | _XX) |
\ | Reverse Solidus | 1b2f | 5c | _XX/ |
[ | Left Square Bracket | 1b3c | 5b | _XX< |
~ | Tilde | 1b3d | 7e | _XX= |
] | Right Square Bracket | 1b3e | 5d | _XX> |
€ | Euro sign | 1b65 | 20ac | _XXe |
Encoding of Extended Characters
Extended characters (also referred as "Escape characters") are encoded by being preceded by 1b (hex) escape character, i.e. Circumflex Accent ("^") occupies two characters of message and is encoded as 1b14 (hex), Vertical Bar character ("|") is encoded as 1b40 (hex) and so on.
It is worth remembering that each extended character in the string consumes two characters of the message space.
Unicode Encoding
All SMSCs and all modern handsets should support two byte Unicode (big endian UCS-2) encoding. Sending Unicode message means that one message can handle a fixed maximum number of 70 characters (140 bytes, two byte each character) so longer messages have to be concatenated. Although Unicode passes through mobile network unchanged support for particular characters may be limited on different markets (e.g. handsets sold in Europe may not support Far East characters etc.)
Library Encoding
Our library supports Wide Char (Unicode) string on input and they apply all the necessary conversions to produce proper encoding. Similarly character sets received from the mobile network are converted into Wide Char (Unicode) string on output.