Slides
Characters
Q: A string is "a series of characters"... but what is a character?
A: a character is a number (or character code) that stands for a symbol.
| symbol | code | name | 
|---|---|---|
A | 
65 | capital A | 
B | 
66 | capital B | 
Z | 
90 | capital Z | 
_ | 
95 | underscore | 
a | 
97 | lowercase A | 
| ??? | 10 | newline | 
(Some characters stand for unprintable symbols like newline or tab or bell.)
ASCII and ye shall receive-ski
- ASCII: American Standard Code for Information Interchange
 - Invented in 1963
 
(image from Wikimedia Commons)
Unicode
- ASCII only goes from 0 to 127
 - Unicode is the same as ASCII for values from 0 and 127
- but Unicode goes a lot higher
 
 - Currently more than 130,000 characters, including symbols for
- 139 modern and historic scripts
 - accents and other diacritics
 - various mathematical ∞, currency £, and cultural ☮ symbols
 - emoji 😂
 
 
Unicode Strings
JavaScript strings are Unicode
- technically, JS uses the UTF-16 encoding in memory
 - and the UTF-8 encoding for text files
 
That means you can use emoji in your JavaScript programs!
Like this:
"😂".repeat(20)
(sadly this doesn't work in Windows PowerShell, but it does work in Atom+node, like this:)
  response.setHeader('content-type', 'text/html');
  response.write('<meta charset="UTF-8">')
  response.write("😂".repeat(20));
  response.end();
Unicode Encodings
- UTF-32 is a fixed-width encoding for Unicode
- every character is 32 bits long
 
 - UTF-8 is a variable-width encoding for Unicode
- all ASCII characters are one byte long (8 bits)
 - other characters are up to four bytes long (32 bits)
 - used for text files
 
 - UTF-16 is a variable-width encoding for Unicode
- every character is either 16 or 32 bits long
 - used by JavaScript at runtime
 
 
[TODO: diagrams]