-
Notifications
You must be signed in to change notification settings - Fork 0
/
utf8.ksy
39 lines (34 loc) · 1.17 KB
/
utf8.ksy
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
meta:
id: utf8
title: UTF-8
license: WTFPL
doc: |
Data type for a single UTF-8 character, essentially a kind of variable-length
integer encoding.
This seems like a reasonably common thing to want but for some reason I
couldn't seem to find it so I feel like an idiot for even implementing this.
I'm sure I'll be deleting it in a few days when someone tells me how to do
it more simply.
There are many descriptions and references at Wikipedia:
https://en.wikipedia.org/wiki/UTF-8
seq:
- id: head
type: u1
- id: tail
type: u1
repeat: expr
repeat-expr: length - 1
instances:
length:
value: |
(head & 0xD0) != 0 ? 4 :
(head & 0xC0) != 0 ? 3 :
(head & 0x80) != 0 ? 2 :
1
value:
value: |
(head & 0xD0) != 0 ? (head & 0x3F) << 18 + (tail[0] & 0x3F) << 12 + (tail[1] & 0x3F) << 6 + (tail[0] & 0x3F) :
(head & 0xC0) != 0 ? (head & 0x3F) << 12 + (tail[0] & 0x3F) << 6 + (tail[1] & 0x3F) :
(head & 0x80) != 0 ? (head & 0x3F) << 6 + (tail[0] & 0x3F) :
(head & 0x3F)
doc: Resulting value as Unicode code point