# Unicode and You!
## An Introduction to **Unicode** in **Python**
Eddie Antonio Santos
Special thanks to Jessica Malik!
# What even *is* Unicode?
Isn't Unicode all those çɦåяäcṱǝrŝ that aren't in ASCII?
No! It's all those characters including ASCII.
Isn't Unicode a character encoding?
No! It supports many character encodings!
Isn't Unicode that emoji 💩 company?
Sort of! They are a consortium that standardizes text, characters, and
emoji!
Unicode is frustrating
Sometimes—But knowing is half the battle!
What is Unicode?
- Standard for representing text in computers
- Maps a code point to every character.
Ever.
- Database of properties for each character
# How do I use Unicode in Python?
How to use Unicode in Python 3
```python
"Hello, World!"
```
```python
"Dzień dobry!"
```
```python
"ᑖᓂᓯ"
```
```python
"Hello, 🌎!"
```
How to use Unicode in Python 2*
```python
u"Hello, World!"
```
```python
u"Dzień dobry!"
```
```python
u"ᑖᓂᓯ"
```
```python
u"Hello, 🌎!"
```
*you must specify the source file's coding
at the top of the file
## In Python 2:
`"Dzień dobry!" != u"Dzień dobry!"`
Recommendation
Use Python 3
What is a character?
letter, digit, punctuation, symbol
space, formatting, control character
How are characters represented in Unicode?
A number called a Code point
- A = U+0041
- Ω = U+03A9
- 語 = U+8A9E
- 𐎄 = U+10384
1,114,112 total; 137,374 (12.33%) used
(As of version 11.0)
How do I get code points in Python?
ord()
What other properties do characters have?
name: unicodedata.name()
general category: unicodedata.category()
Typing Unicode characters in Python
Directly!
By hex code: "\uXXXX"
or "\U000XXXXX"
By name: "\N{NAME}"
Unicode outside of Python
Character Encoding
code points ⬌ bytes
character != byte
Explosion of character encodings!
US-ASCII
Latin-1 == ISO 8859-1 ⊆ Windows Code Page 1252
ISO 8859-2
Windows Code Page 1251
Macintosh Western == MacRoman
Shift-JIS (several)
EBCDIC (several)
...
GB 2312
Big5
A letter written by Madame Marie Curie
Recommendation
Use UTF-8 character encoding
UTF-8 Supports ALL Unicode characters
Backwards compatible with ASCII
Recommendation
Always explicitly specify the character encoding
```python
# -*- coding: UTF-8 -*-
```
```python
open("filename", "w", encoding="UTF-8")
```
```python
socket.write("¿Qué haremos mañana?".encode("UTF-8"))
```
```html
<meta charset="UTF-8">
```
```http
Content-Type: application/json; charset=utf-8
```
Recap
- Unicode characters are code points (numbers)
- Character encodings convert Unicode and bytes
- Recommendation: Use Python 3
- Recommendation: Use UTF-8
- Recommendation: Explicitly specify character encoding
ASK ME QUESTIONS
About Unicode, Python, and the intersection thereof!
# Extra links!
[Unicode! And how ES6 can help!](http://www.eddieantonio.ca/unicode-es6/)