Encodings
Last updated
Was this helpful?
Last updated
Was this helpful?
The HTTP header will specify what the character encoding is in documents being sent. This allows us to know how we need to trick the encoder. If nothing is specified it will default to ISO-8859-1 (latin 1).
Example encoding: Content-Type: text/html; charset=utf-8
Telling the server our content type:
-PHP: header('Content-type: text/html; charset=utf-8');
-ASP.NET: <%Response.charset="utf-8"%>
-JSP: <%@ page contentType="text/html; charset=UTF-8" %>
-HTML: <meta http-equiv="Content-Type" Content="text/html; charset=utf-8">
-HTML5: <meta charset="utf-8">
PHP:
<?=base_convert("OHPE",36,10);?>
//base 36 to dec(10), flip for encode
<?=base64_encode('encode this string')?>
//Encode
<?=base64_decode('ZW5jb2RlIHRoaXMgc3RyaW5n')?>
JS:
(1142690).toString(36)
//encode, dec to 36
1142690..toString(36)
//alternative
parseInt("ohpe",36)
//decode
Win base64:
window.btoa('encode this string');
//Encode
window.atob('ZW5jb2RlIHRoaXMgc3RyaW5n');
//Decode
To include characters that are outside of the encoding character-set or to change a character like <
to be the text version. We can use the following syntax:
HTML5:
&#D;
//here we replace D with the Unicode decimal character number
&#xH;
//here we replace H with the Unicode hexadecimal character number
HTML:
U+0026 U+0023 D U+003B
U+0026 U+0023 U+0058 H U+003B
There are also some common ones that don't need hex/dec numbers:
<
represents the <
sign.
>
represents the >
sign.
&
represents the &
sign.
"
represents the "
mark.
Reference list for the U+ encodings: