上QQ阅读APP看书,第一时间看更新
Loading data in unicode / UTF-8
A document's encoding tells an application how the characters in the document are represented as bytes in the file. Essentially, the encoding specifies how many bits there are per character. In a standard ASCII document, all characters are 8 bits. HTML files are often encoded as 8 bits per character, but with the globalization of the internet, this is not always the case. Many HTML documents are encoded as 16-bit characters, or use a combination of 8- and 16-bit characters.
A particularly common form HTML document encoding is referred to as UTF-8. This is the encoding form that we will examine.