Slides
Characters
Q: A string is "a series of characters"... but what is a character?
A: a character is a number (or character code) that stands for a symbol.
symbol | code | name |
---|---|---|
A |
65 | capital A |
B |
66 | capital B |
Z |
90 | capital Z |
_ |
95 | underscore |
a |
97 | lowercase A |
??? | 10 | newline |
(Some characters stand for unprintable symbols like newline
or tab
or bell
.)
ASCII and ye shall receive-ski
- ASCII: American Standard Code for Information Interchange
- Invented in 1963
(image from Wikimedia Commons)
Unicode
- ASCII only goes from 0 to 127
- Unicode is the same as ASCII for values from 0 and 127
- but Unicode goes a lot higher
- Currently more than 130,000 characters, including symbols for
- 139 modern and historic scripts
- accents and other diacritics
- various mathematical ∞, currency £, and cultural ☮ symbols
- emoji 😂
Unicode Strings
JavaScript strings are Unicode
- technically, JS uses the UTF-16 encoding in memory
- and the UTF-8 encoding for text files
That means you can use emoji in your JavaScript programs!
Like this:
"😂".repeat(20)
(sadly this doesn't work in Windows PowerShell, but it does work in Atom+node, like this:)
response.setHeader('content-type', 'text/html');
response.write('<meta charset="UTF-8">')
response.write("😂".repeat(20));
response.end();
Unicode Encodings
- UTF-32 is a fixed-width encoding for Unicode
- every character is 32 bits long
- UTF-8 is a variable-width encoding for Unicode
- all ASCII characters are one byte long (8 bits)
- other characters are up to four bytes long (32 bits)
- used for text files
- UTF-16 is a variable-width encoding for Unicode
- every character is either 16 or 32 bits long
- used by JavaScript at runtime
[TODO: diagrams]