-
Notifications
You must be signed in to change notification settings - Fork 1
/
Copy pathmp4-layout.txt
1165 lines (1013 loc) · 63.6 KB
/
mp4-layout.txt
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
772
773
774
775
776
777
778
779
780
781
782
783
784
785
786
787
788
789
790
791
792
793
794
795
796
797
798
799
800
801
802
803
804
805
806
807
808
809
810
811
812
813
814
815
816
817
818
819
820
821
822
823
824
825
826
827
828
829
830
831
832
833
834
835
836
837
838
839
840
841
842
843
844
845
846
847
848
849
850
851
852
853
854
855
856
857
858
859
860
861
862
863
864
865
866
867
868
869
870
871
872
873
874
875
876
877
878
879
880
881
882
883
884
885
886
887
888
889
890
891
892
893
894
895
896
897
898
899
900
901
902
903
904
905
906
907
908
909
910
911
912
913
914
915
916
917
918
919
920
921
922
923
924
925
926
927
928
929
930
931
932
933
934
935
936
937
938
939
940
941
942
943
944
945
946
947
948
949
950
951
952
953
954
955
956
957
958
959
960
961
962
963
964
965
966
967
968
969
970
971
972
973
974
975
976
977
978
979
980
981
982
983
984
985
986
987
988
989
990
991
992
993
994
995
996
997
998
999
1000
****************************
* ISO 14496-1 Media Format *
****************************
- values use big endian (network) byte order
- general terms: integer = signed value
- general values: byte/char/octet = 8-bit value; short/word = 16-bit value;
long = 32-bit value
- fixed point values: value made up of an integer for whole numbers
and an unsigned value for the decimal
- binary values: base-2 long unsigned values (values from 0 and 1)
- octal values: base-8 long unsigned values (values from 0 through to 7)
- decimal values: base-10 long unsigned values (values from 0 through to 9)
- hexadecimal (hex) values: base-16 long unsigned values
(values from 0 to 9 and A to F)
- box offsets: values relative to boxes only
and are used to skip to the next box
- sample chunk/block offsets: values relative to the file's length
- UUID: a hexadecimal Universal Unique Identifier
that is 128 bits in length
FILE INFO
Suffixes = ".mp4", ".m4a"; Mac OS Type = "mpg4"; Mac OS Creator = "TVOD";
MIME="video/mp4" and "audio/mp4"
Standard single fork binary file that only uses a resource fork on HFS/HFS+ volumes
to store mac specific file info, quicktime movie previews and can store a quicktime
version of the file's header, but this is only valid if transcoded to the quicktime
format as other storage media may not use or support multiple file forks.
Unknown boxes can be safely skipped over, most boxes can be in any order and most
lowercase long ASCII text strings used for box names/types were pre-defined by Apple
and any others are reserved for future use by Apple and the ISO. It is discouraged
to use custom boxes and to only use ISO defined ones.
Box type strings can be either standard length atom type strings or a 32 byte UUID,
UUIDs are appended following the standard type of 'uuid' and if the box offset is
equal to one then a 64-bit box offset is appended after the box type string or UUID.
Wide boxes used in the 'mdat' box can be used with other box types as needed.
The term QUICKTIME denotes an unused atom/box or item from the format that this one
was based upon. The terms 3GPP and APPLE denote custom additions to the format.
Even though the original ISO specification is static Apple members have added
bits from the 3GPP and iTunes versions as extensions such as those in parts 10 and 12.
FILE IDENTIFICATION
* 8+ bytes file type box = long unsigned offset + long ASCII text string 'ftyp'
-> 4 bytes major brand = long ASCII text main type string
-> 4 bytes major brand version = long unsigned main type revision value
-> 4+ bytes compatible brands = list of long ASCII text used technology strings
- types are ISO 14496-1 Base Media = isom ; ISO 14496-12 Base Media = iso2
- types are ISO 14496-1 vers. 1 = mp41 ; ISO 14496-1 vers. 2 = mp42
- types are quicktime movie = 'qt ' ; JVT AVC = avc1
- types are 3G MP4 profile = '3gp' + ASCII value ; 3G Mobile MP4 = mmp4
- types are Apple AAC audio w/ iTunes info = 'M4A ' ; AES encrypted audio = 'M4P '
- types are Apple audio w/ iTunes position = 'M4B ' ; ISO 14496-12 MPEG-7 meta data = 'mp71'
- NOTE: All compatible with 'isom', vers. 1 uses no Scene Description Tracks,
vers. 2 uses the full part one spec, M4A uses custom ISO 14496-12 info,
qt means the format complies with the original Apple spec, 3gp uses sample
descriptions in the same style as the original Apple spec.
FILE MEDIA DATA
Note: if any box grows in excess of 2^32 bytes (> 4.2 GB), the box size can be extended
in increments of 64 bits (18.4 EB).
By setting the box size to 1 and appending a new 64 bit box size.
This is why empty 'wide' boxes may be found on either side of this box header for
future expansion of the sample data.
By setting the box size to 0, the media data box is open ended and extends to the end
of the file.
* 8+ bytes media (sample) data box = long unsigned offset + long ASCII text string 'mdat'
-> 8 bytes larger file offset place holder box
= long unsigned offset set to 8 + long ASCII text string 'wide'
OR
-> 8 bytes wider mdat box offset = 64-bit unsigned offset
- only if mdat standard offset set to 1
-> Sample data = hex dump
- Media with multiple tracks have sample data interleaved unless preloaded.
UNUSED SPACE OR DATA TO BE DELETED/REUSED WITHIN FILE
* 8+ bytes free space (current) box
= long unsigned offset + long ASCII text string 'free'
* 8+ bytes skip over (older) box
= long unsigned offset + long ASCII text string 'skip'
* 8+ bytes widen (lengthen) file box
= long unsigned offset + long ASCII text string 'wide'
EXTERNAL MPEG-7 META DATA ONLY
* 8+ bytes optional ISO/IEC 14496-12 presentation meta data box
= long unsigned offset + long ASCII text string 'meta'
-> 4 bytes version/flags = byte hex version + 24-bit hex flags
(current = 0)
* 8+ bytes ISO/IEC 14496-12 handler reference box
= long unsigned offset + long ASCII text string 'hdlr'
- this box must be toward the start of the meta box
-> 4 bytes version/flags = byte hex version + 24-bit hex flags (current = 0)
-> 4 bytes QUICKTIME type = long ASCII text string
(eg. Media Handler = 'mhlr')
-> 4 bytes subtype/meta data type = long ASCII text string
- types are MPEG-7 XML = 'mp7t' ; MPEG-7 binary XML = 'mp7b'
- type is APPLE meta data for iTunes reader = 'mdir'
-> 4 bytes QUICKTIME manufacturer reserved = long ASCII text string
(eg. Apple = 'appl' or 0)
-> 4 bytes QUICKTIME component reserved flags = long hex flags (none = 0)
-> 4 bytes QUICKTIME component reserved flags mask = long hex mask (none = 0)
-> component type name ASCII string
(eg. "Meta Data Handler" - no name = zero length string)
-> 1 byte component name string end = byte padding set to zero
- note: the quicktime spec uses a Pascal string
instead of the above C string
* 8+ bytes optional ISO/IEC 14496-12 MPEG-7 XML box
= long unsigned offset + long ASCII text string 'xml '
-> 4 bytes version/flags = byte hex version + 24-bit hex flags
(current = 0)
-> MPEG-7 XML meta data = text dump
* 8+ bytes optional ISO/IEC 14496-12 MPEG-7 binary XML box
= long unsigned offset + long ASCII text string 'bxml'
-> 4 bytes version/flags = byte hex version + 24-bit hex flags
(current = 0)
-> MPEG-7 encoded XML meta data = hex dump
* 8+ bytes optional ISO/IEC 14496-12 item location box
= long unsigned offset + long ASCII text string 'iloc'
-> 4 bytes version/flags = byte hex version + 24-bit hex flags
(current = 0)
-> 1 nibble size of access offsets = 4 bits one byte multiples
- 8-bit offset = 0 ; 32-bit offset = 4 ; 64-bit offset = 8
-> 1 nibble size of data lengths = 4 bits one byte multiples
- 8-bit offset = 0 ; 32-bit offset = 4 ; 64-bit offset = 8
-> 1 nibble size of starting offset = 4 bits one byte multiples
- 8-bit offset = 0 ; 32-bit offset = 4 ; 64-bit offset = 8
-> 1 nibble reserved = 4 bits set to zero
-> 2 bytes number of locations = short unsigned index total
-> 2+ bytes item reference = short unsigned id
-> 2+ bytes stream data reference = short unsigned index from 'dref' box
- if meta data item in same file set to zero
-> 1-8+ bytes starting offset = byte - dlong unsigned offset
-> 2+ bytes number of access points = short unsigned index total
-> 1-8+ bytes access offset = byte - dlong unsigned relative offset
(relative to starting offset)
-> 1-8+ bytes data length = byte - dlong unsigned length
* 8+ bytes optional ISO/IEC 14496-12 primary item box
= long unsigned offset + long ASCII text string 'pitm'
-> 4 bytes version/flags = byte hex version + 24-bit hex flags
(current = 0)
-> 2 bytes main item reference = short unsigned id
* 8+ bytes optional ISO/IEC 14496-12 item encryption box
= long unsigned offset + long ASCII text string 'ipro'
-> 4 bytes version/flags = byte hex version + 24-bit hex flags
(current = 0)
-> 2 bytes number of encryption boxes = short unsigned index total
* 8+ bytes ISO/IEC 14496-12 encryption scheme info box
= long unsigned offset + long ASCII text string 'sinf'
- if meta data encrypted to ISO/IEC 14496-12 standards
* 8+ bytes ISO/IEC 14496-12 original format box
= long unsigned offset + long ASCII text string 'frma'
-> 4 bytes description format = long ASCII text string
* 8+ bytes optional ISO/IEC 14496-12 IPMP info box
= long unsigned offset + long ASCII text string 'imif'
-> 4 bytes version/flags = byte hex version + 24-bit hex flags
(current = 0)
-> IPMP descriptors = hex dump from IPMP part of ES Descriptor box
* 8+ bytes optional ISO/IEC 14496-12 scheme type box
= long unsigned offset + long ASCII text string 'schm'
-> 4 bytes version/flags = byte hex version + 24-bit hex flags
(current = 0 ; contains URI if flags = 0x000001)
-> 4 bytes encryption type = long ASCII text string
- types are 128-bit AES counter = 'ACM1' ; 128-bit AES FS = 'AFS1'
- types are NULL algorithm = 'ENUL' ; 160-bit HMAC-SHA-1 = 'SHM2'
- types are RTCP = 'ANUL' ; private scheme = ' '
-> 2 bytes encryption version = short unsigned version
-> optional scheme URI string = UTF-8 text string
(eg. web site)
-> 1 byte optional scheme URI string end = byte padding set to zero
* 8+ bytes ISO/IEC 14496-12 scheme data box
= long unsigned offset + long ASCII text string 'schi'
-> encryption related key = hex dump
* 8+ bytes optional ISO/IEC 14496-12 item information box
= long unsigned offset + long ASCII text string 'pitm'
-> 4 bytes version/flags = byte hex version + 24-bit hex flags
(current = 0)
-> 2 bytes main item reference = short unsigned id
-> 2 bytes encryption box array value = short unsigned index
-> item name or URL string = UTF-8 text string
-> 1 byte name or URL c string end = byte value set to zero
-> item mime type string = UTF-8 text string
-> 1 byte mime type c string end = byte value set to zero
-> optional item transfer encoding string = UTF-8 text string
-> 1 byte optional transfer encoding c string end = byte value set to zero
FILE MEDIA HEADER
Note: the header is safer when stored at the beginning of the file or in another
file fork as HFS resource type 'moov'; ID any.
The advantage of using another file fork is that the header can be lengthened
without recalculating the sample offsets or new header must be written at the end
of the file.
* 8+ bytes movie (presentation) box = long unsigned offset + long ASCII text string 'moov'
* 8+ bytes QUICKTIME movie data reference atom
= long unsigned offset + long ASCII text string 'mdra'
- if this is used no other atoms or boxes should be present at this level
* 8+ bytes data reference atom
= long unsigned offset + long ASCII text string 'dref'
-> 4 bytes reference type name = long ASCII text string
- types are file alias = 'alis' ; resource alias = 'rsrc' ;
- types are url c string = 'url '
-> 4 bytes reference version/flags
= byte hex version (current = 0) + 24-bit hex flags
- some flags are external data = 0x000000 ; internal data = 0x000001
-> mac os file alias record structure
OR
-> mac os file alias record structure plus resource info
OR
-> url c string = ASCII text string
-> 1 byte url c string end = byte value set to zero
* 8+ bytes QUICKTIME compressed moov atom
= long unsigned offset + long ASCII text string 'cmov'
- if this is used no other atoms should be present
as this is for an entire compressed movie resource
* 8+ bytes data compression atom
= long unsigned offset + long ASCII text string 'dcom'
-> 4 bytes compression code = long ASCII text string
- compression codes are Deflate = 'zlib' ; Apple Compression = 'adec'
* 8+ bytes compressed moov data atom
= long unsigned offset + long ASCII text string 'cmvd'
-> 4 bytes uncompressed size = long unsigned value
-> entire compressed movie 'moov' resource = hex dump
* 8+ bytes QUICKTIME reference movie record atom
= long unsigned offset + long ASCII text string 'rmra'
- if this atom is used it must come first within the movie resource box
* 8+ bytes reference movie descriptor atom
= long unsigned offset + long ASCII text string 'rmda'
* 8+ bytes reference movie data reference atom
= long unsigned offset + long ASCII text string 'rdrf'
-> 4 bytes reference version/flags
= byte hex version (current = 0) + 24-bit hex flags
- some flags are external data = 0x000000 ; internal data = 0x000001
-> 4 bytes reference type name = long ASCII text string (if internal = 0)
- types are file alias = 'alis' ; resource alias = 'rsrc' ;
- types are url c string = 'url '
-> 4+ bytes reference data = long unsigned length
-> mac os file alias record structure
OR
-> mac os file alias record structure plus resource info
OR
-> url c string = ASCII text string
-> 1 byte url c string end = byte value set to zero
* 8+ bytes optional reference movie quality atom
= long unsigned offset + long ASCII text string 'rmqu'
-> 4 bytes queue position = long unsigned value from 100 to 0
* 8+ bytes optional reference movie cpu rating atom
= long unsigned offset + long ASCII text string 'rmcs'
-> 4 bytes reserved flag = byte hex version + 24-bit hex flags (current = 0)
-> 2 bytes speed rating = short unsigned value from 500 to 100
* 8+ bytes optional reference movie version check atom
= long unsigned offset + long ASCII text string 'rmvc'
-> 4 bytes flags = byte hex version + 24-bit hex flags (current = 0)
-> 4 bytes gestalt selector = long ASCII text string
(eg. quicktime = 'qtim')
-> 4 bytes gestalt min value = long hex value
(eg. QT 3.02 mac file version = 0x03028000)
-> 4 bytes gestalt no value = long value set to zero
OR
-> 4 bytes gestalt value mask = long hex mask
-> 4 bytes gestalt value = long hex value
-> 2 bytes gestalt check type = short unsigned value
(min value = 0 or mask = 1)
* 8+ bytes optional reference movie component check atom
= long unsigned offset + long ASCII text string 'rmcd'
-> 4 bytes flags = byte hex version + 24-bit hex flags (current = 0)
-> 8 bytes component type/subtype
= long ASCII text string + long ASCII text string
(eg. Timecode Media Handler = 'mhlrtmcd')
-> 4 bytes component manufacturer = long ASCII text string
(eg. Apple = 'appl' or 0)
-> 4 bytes component flags = long hex flags (none = 0)
-> 4 bytes component flags mask = long hex mask (none = 0)
-> 4 bytes component min version = long hex value (none = 0)
* 8+ bytes optional reference movie data rate atom
= long unsigned offset + long ASCII text string 'rmdr'
-> 4 bytes flags = byte hex version + 24-bit hex flags (current = 0)
-> 4 bytes data rate = long integer bit rate value
- common analog modem rates are 1400; 2800; 3300; 5600
- common broadband rates are 5600; 11200; 25600; 38400; 51200; 76800; 100000
- common high end broadband rates are T1 = 150000; no limit/LAN = 0x7FFFFFFF
* 8+ bytes optional reference movie language atom
= long unsigned offset + long ASCII text string 'rmla'
-> 4 bytes flags = byte hex version + 24-bit hex flags (current = 0)
-> 2 bytes mac language = short unsigned language value (english = 0)
* 8+ bytes optional reference movie alternate group atom
= long unsigned offset + long ASCII text string 'rmag'
(structure was not provided in MoviesFormat.h of the 4.1.2 win32 sdk)
-> 4 bytes flags = long value set to zero
-> 2 bytes alternate/other = short integer track id value (none = 0)
* 8+ bytes optional initial object descriptor box
= long unsigned offset + long ASCII text string 'iods'
- NOTE: this was added in vers. 2 of spec
-> 4 bytes version/flags = 8-bit hex version + 24-bit hex flags
-> 1 byte file IOD type tag = 8-bit hex value 0x10
-> 3 bytes extended descriptor type tag string = 3 * 8-bit hex value
- types are Start = 0x80 ; End = 0xFE
- NOTE: the extended start tags may be left out
-> 1 byte descriptor type length = 8-bit unsigned length
-> 2 bytes OD ID = 16-bit unsigned value
-> 1 byte OD profile level = 8-bit unsigned value
-> 1 byte scene profile level = 8-bit unsigned value
-> 1 byte audio profile level = 8-bit unsigned value
-> 1 byte video profile level = 8-bit unsigned value
-> 1 byte graphics profile level = 8-bit unsigned value
- NOTE: if level unused then set to 0xFF
-> 1 byte ES ID included descriptor type tag = 8-bit hex value 0x0E
-> 3 bytes extended descriptor type tag string = 3 * 8-bit hex value
- types are Start = 0x80 ; End = 0xFE
- NOTE: the extended start tags may be left out
-> 1 byte descriptor type length = 8-bit unsigned length
-> 4 bytes Track ID = 32-bit unsigned value
- NOTE: refers to non-data system tracks
* 8+ bytes movie (presentation) header box
= long unsigned offset + long ASCII text string 'mvhd'
-> 1 byte version = 8-bit unsigned value
- if version is 1 then date and duration values are 8 bytes in length
-> 3 bytes flags = 24-bit hex flags (current = 0)
-> 4 bytes created mac UTC date
= long unsigned value in seconds since beginning 1904 to 2040
-> 4 bytes modified mac UTC date
= long unsigned value in seconds since beginning 1904 to 2040
OR
-> 8 bytes created mac UTC date
= 64-bit unsigned value in seconds since beginning 1904
-> 8 bytes modified mac UTC date
= 64-bit unsigned value in seconds since beginning 1904
-> 4 bytes time scale = long unsigned time unit per second (default = 600)
-> 4 bytes duration = long unsigned time length (in time units)
OR
-> 8 bytes duration = 64-bit unsigned time length (in time units)
-> 4 bytes decimal user playback speed = long fixed point rate (normal = 1.0)
-> 2 bytes decimal user volume = short fixed point level
(mute = 0.0 ; normal = 1.0 ; QUICKTIME MAX = 3.0)
-> 10 bytes reserved = 5 * short values set to zero
-> 4 bytes decimal window geometry matrix value A
= long fixed point width scale (normal = 1.0)
-> 4 bytes decimal window geometry matrix value B
= long fixed point width rotate (normal = 0.0)
-> 4 bytes decimal window geometry matrix value U
= long fixed point width angle (restricted to 0.0)
-> 4 bytes decimal window geometry matrix value C
= long fixed point height rotate (normal = 0.0)
-> 4 bytes decimal window geometry matrix value D
= long fixed point height scale (normal = 1.0)
-> 4 bytes decimal window geometry matrix value V
= long fixed point height angle (restricted to 0.0)
-> 4 bytes decimal window geometry matrix value X
= long fixed point positon (left = 0.0)
-> 4 bytes decimal window geometry matrix value Y
= long fixed point positon (top = 0.0)
-> 4 bytes decimal window geometry matrix value W
= long fixed point divider scale (restricted to 1.0)
-> 8 bytes QUICKTIME preview
= long unsigned start time + long unsigned time length (in time units)
-> 4 bytes QUICKTIME still poster
= long unsigned frame time (in time units)
-> 8 bytes QUICKTIME selection time
= long unsigned start time + long unsigned time length (in time units)
-> 4 bytes QUICKTIME current time = long unsigned frame time (in time units)
-> 4 bytes next/new track id = long integer value (single track = 2)
* 8+ bytes QUICKTIME clipping (mask) atom
= long unsigned offset + long ASCII text string 'clip'
* 8+ bytes clipping region atom
= long unsigned offset + long ASCII text string 'crgn'
-> 2 bytes region size = short unsigned box size
-> 8 bytes region boundary
= long fixed point x value + long fixed point y value
-> QuickDraw Region Data = hex dump
* 8+ bytes track (element) box = long unsigned offset + long ASCII text string 'trak'
* 8+ bytes track (element) header box
= long unsigned offset + long ASCII text string 'tkhd'
-> 1 byte version = byte unsigned value
- if version is 1 then date and duration values are 8 bytes in length
-> 3 bytes flags = 24-bit unsigned flags
- sum of TrackEnabled = 1 ; TrackInMovie = 2 ;
TrackInPreview = 4; TrackInPoster = 8
- MPEG-4 only defines TrackEnabled as being valid
-> 4 bytes created mac UTC date
= long unsigned value in seconds since beginning 1904 to 2040
-> 4 bytes modified mac UTC date
= long unsigned value in seconds since beginning 1904 to 2040
OR
-> 8 bytes created mac UTC date
= 64-bit unsigned value in seconds since beginning 1904
-> 8 bytes modified mac UTC date
= 64-bit unsigned value in seconds since beginning 1904
-> 4 bytes track id = long integer value (first track = 1)
-> 8 bytes reserved = 2 * long value set to zero
-> 4 bytes duration = long unsigned time length (in time units)
OR
-> 8 bytes duration = 64-bit unsigned time length (in time units)
- if duration is undefined set above bits to all ones
-> 4 bytes reserved = long value set to zero
-> 2 bytes video layer = short integer positon
(middle = 0 ; negatives are in front)
-> 2 bytes QUICKTIME alternate/other = short integer track id (none = 0)
-> 2 bytes track audio volume = short fixed point level
(mute = 0x0001 ; 100% = 1.0 ; QUICKTIME 200% max = 2.0)
-> 2 bytes reserved = short value set to zero
-> 4 bytes decimal video geometry matrix value A
= long fixed point width scale (normal = 1.0)
-> 4 bytes decimal video geometry matrix value B
= long fixed point width rotate (normal = 0.0)
-> 4 bytes decimal video geometry matrix value U
= long fixed point width angle (restricted to 0.0)
-> 4 bytes decimal video geometry matrix value C
= long fixed point height rotate (normal = 0.0)
-> 4 bytes decimal video geometry matrix value D
= long fixed point height scale (normal = 1.0)
-> 4 bytes decimal video geometry matrix value V
= long fixed point height angle (restricted to 0.0)
-> 4 bytes decimal video geometry matrix value X
= long fixed point positon (left = 0.0)
-> 4 bytes decimal video geometry matrix value Y
= long fixed point positon (top = 0.0)
-> 4 bytes decimal video geometry matrix value W
= long fixed point divider scale (restricted to 1.0)
-> 8 bytes decimal video frame size
= long fixed point width + long fixed point height
* 8+ bytes QUICKTIME clipping (mask) atom
= long unsigned offset + long ASCII text string 'clip'
- see moov clipping atom above
* 8+ bytes QUICKTIME matte (video overlay) atom
= long unsigned offset + long ASCII text string 'matt'
* 8+ bytes compressed matte atom
= long unsigned offset + long ASCII text string 'kmat'
-> 4 bytes version/flags = byte hex version + 24-bit hex flags (current = 0)
-> Matte Image Description Structure
(similar to Media Sample Description Table)
-> Matte Data = hex dump
* 8+ bytes optional edits (# of external tracks) box
= long unsigned offset + long ASCII text string 'edts'
- if tracks are of different start times this atom is needed to maintain media sync.
* 8+ bytes optional edit list box
= long unsigned offset + long ASCII text string 'elst'
-> 1 byte version = byte unsigned value
- if version is 1 then duration values are 8 bytes in length
-> 3 bytes flags = 24-bit hex flags (current = 0)
-> 4 bytes number of edits = long unsigned total (default = 1)
-> 8 bytes edit time
= long unsigned time length + long unsigned start time (in time units)
OR
-> 16 bytes edit time
= 64-bit unsigned time length + 64-bit unsigned start time (in time units)
- if start time is -1, then that time length is edited out
-> 4 bytes decimal playback speed = long fixed point rate (normal = 1.0)
* 8+ bytes QUICKTIME preload atom
= long unsigned offset + long ASCII text string 'load'
-> 8 bytes preload time
= long unsigned start time + long unsigned time length (in time units)
-> 4 bytes flags = long integer value
- flags are PreloadAlways = 1 or TrackEnabledPreload = 2
-> 4 bytes default hints flags = long hex data play options
- flags are KeepInBuffer = 0x00000004 ; HighQuality = 0x00000100 ;
- flags are SingleFieldPlayback = 0x00100000
- flags are DeinterlaceFields = 0x04000000
* 8+ bytes optional track references box
= long unsigned offset + long ASCII text string 'tref'
* 8+ bytes type of reference box
= long unsigned offset + long ASCII text string
-> vers. 1 box type is stream hint = 'hint'
-> vers. 2 box types are other dependency = 'dpnd' ; IPI declarations = 'ipir'
-> vers. 2 box types are elementary stream = 'mpod' ;
-> vers. 2 box types are synchronization (video/audio) = 'sync
-> QUICKTIME atom types are timecode = 'tmcd'; chapterlist = 'chap'
-> QUICKTIME atom types are transcript (text) = 'scpt'
-> QUICKTIME atom types are non-primary source (used in other track) = 'ssrc'
-> 4+ bytes Track IDs = long integer track numbers (Disabled Track ID = 0)
* 8+ bytes QUICKTIME non-primary source input map atom
= long unsigned offset + long ASCII text string 'imap'
* 8+ bytes input atom
= long unsigned offset + long ASCII text string 0x0000 + 'in'
-> 4 bytes atom ID = long integer atom reference (first ID = 1)
-> 2 bytes reserved = short value set to zero
-> 2 bytes number of internal atoms = short unsigned count
-> 4 bytes reserved = long value set to zero
* 8+ bytes input type atom
= 32-bit integer unsigned + long ASCII text string 0x0000 + 'ty'
-> 4 bytes type modifier name = long integer value
-> name values are matrix = 1 ; clip = 2 ;
-> name values are volume = 3; audio balance = 4
-> name values are graphics mode = 5; matrix object = 6
-> name values are graphics mode object = 7; image type = 'vide'
* 8+ bytes object ID atom
= long unsigned offset + long ASCII text string 'obid'
-> 4 bytes object ID = long integer value
* 8+ bytes media (stream) box = long unsigned offset + long ASCII text string 'mdia'
* 8+ bytes media (stream) header box
= long unsigned offset + long ASCII text string 'mdhd'
-> 1 byte version = byte unsigned value
- if version is 1 then date and duration values are 8 bytes in length
-> 3 bytes flags = 24-bit unsigned flags (current = 0)
-> 4 bytes created mac UTC date
= long unsigned value in seconds since beginning 1904 to 2040
-> 4 bytes modified mac UTC date
= long unsigned value in seconds since beginning 1904 to 2040
OR
-> 8 bytes created mac UTC date
= 64-bit unsigned value in seconds since beginning 1904
-> 8 bytes modified mac UTC date
= 64-bit unsigned value in seconds since beginning 1904
-> 4 bytes time scale = long unsigned media time unit
(video = fps rate ; audio = sample per sec. rate)
-> 4 bytes duration = long unsigned media time length (in media time units)
OR
-> 8 bytes duration = 64-bit unsigned time length (in time units)
-> 1/8 byte ISO language padding = 1-bit value set to 0
-> 1 7/8 bytes content language = 3 * 5-bits ISO 639-2 language code less 0x60
- example code for english = 0x15C7
-> 2 bytes QUICKTIME quality = short integer playback quality value (normal = 0)
* 8+ bytes handler reference box
= long unsigned offset + long ASCII text string 'hdlr'
- this box must be toward the start of the media box
-> 4 bytes version/flags = byte hex version + 24-bit hex flags (current = 0)
-> 4 bytes QUICKTIME type = long ASCII text string
(eg. Media Handler = 'mhlr')
-> 4 bytes subtype/media type = long ASCII text string
- types are Visual Media = 'vide' ; Audio Media = 'soun' ; Hint = "hint'
- types are Object Descriptor = 'odsm' ; Clock Reference = 'crsm'
- types are Scene Description = 'sdsm' ; MPEG-7 Stream = 'm7sm'
- types are Object Content Info = 'ocsm' ; IPMP = 'ipsm' : MPEG-J = 'mjsm'
-> 4 bytes QUICKTIME manufacturer reserved = long ASCII text string
(eg. Apple = 'appl' or 0)
-> 4 bytes QUICKTIME component reserved flags = long hex flags (none = 0)
-> 4 bytes QUICKTIME component reserved flags mask = long hex mask (none = 0)
-> component type name ASCII string
(eg. "Media Handler" - no name = zero length string)
-> 1 byte component name string end = byte padding set to zero
- note: the quicktime spec uses a Pascal string
instead of the above C string
* 8+ bytes media (stream) information box
= long unsigned offset + long ASCII text string 'minf'
* 8+ bytes visual media (stream) info header box
= long unsigned offset + long ASCII text string 'vmhd'
-> 4 bytes version/flags = byte hex version + 24-bit hex flags
- version = 0 ; flags = 0x000001 for QUICKTIME or zero MPEG-4
-> 2 bytes QuickDraw graphic mode = short hex type
- mode types are copy = 0x0000 ; dither copy = 0x0040 ; straight alpha = 0x0100
- mode types are composition dither copy = 0x0103 ; blend = 0x0020
- mode premultipled types are white alpha = 0x101 ; black alpha = 0x102
- mode color types are transparent = 0x0024; straight alpha blend = 0x0104
- NOTE: MPEG-4 only uses copy mode and quicktime uses dither copy by default
-> 6 bytes graphic mode color = 3 * short unsigned QuickDraw RGB color values
OR
* 8+ bytes sound media (stream) info header box
= long unsigned offset + long ASCII text string 'smhd'
-> 4 bytes version/flags = byte hex version + 24-bit hex flags (current = 0)
-> 2 bytes audio balance = short fixed point value
- balnce scale is left = negatives ; normal = 0.0 ; right = positives
-> 2 bytes reserved = short value set to zero
OR
* 8+ bytes hint stream (stream) info header box
= long unsigned offset + long ASCII text string 'hint'
-> 4 bytes version/flags = byte hex version + 24-bit hex flags (current = 0)
-> 2 bytes maximum packet delivery unit = short unsigned value
-> 2 bytes average packet delivery unit = short unsigned value
-> 4 bytes maximum bit rate = long unsigned value
-> 4 bytes average bit rate = long unsigned value
-> 4 bytes reserved = long value set to zero
OR
* 8+ bytes mpeg-4 media (stream) header box
= long unsigned offset + long ASCII text string 'nmhd'
-> 4 bytes version/flags = byte hex version + 24-bit hex flags (current = 0)
* 8+ bytes QUICKTIME handler reference atom
= long unsigned offset + long ASCII text string 'hdlr'
-> 4 bytes version/flags = byte hex version + 24-bit hex flags (current = 0)
-> 8 bytes type/subtype = long ASCII text string + long ASCII text string
(eg. Alias Data Handler = 'dhlralis' ; URL Data Handler = 'dhlrurl ')
-> 4 bytes manufacturer reserved = long ASCII text string
(eg. Apple = 'appl' or 0)
-> 4 bytes component reserved flags = long hex flags (none = 0)
-> 4 bytes component reserved flags mask = long hex mask (none = 0)
-> 1 byte component name string length = byte unsigned length
(no name = zero length string)
-> component type name ASCII string (eg. "Data Handler")
* 8+ bytes data (locator) information box
= long unsigned offset + long ASCII text string 'dinf'
* 8+ bytes data reference box
= long unsigned offset + long ASCII text string 'dref'
-> 4 bytes version/flags = byte hex version + 24-bit hex flags
(current = 0)
-> 4 bytes number of references = long unsigned total
(minimum = 1)
* 8+ bytes reference type box
= long unsigned offset + long ASCII text string
- box types are url c string = 'url ' ; urn c strings = 'urn '
- QUICKTIME atom types are file alias = 'alis' ; resource alias = 'rsrc'
-> 4 bytes version/flags
= byte hex version (current = 0) + 24-bit hex flags
- some flags are external data = 0x000000 ; internal data = 0x000001
-> url c string = ASCII text string points to external data
-> 1 byte url c string end = byte value set to zero
OR
-> urn c string = ASCII text string points to external data
-> 1 byte urn c string end = byte value set to zero
-> url c string = ASCII text string points to external data
-> 1 byte url c string end = byte value set to zero
OR
-> QUICKTIME mac os file alias record structure
points to external data
OR
-> QUICKTIME mac os file alias record structure
plus resource info points to external data
OR
* 8+ bytes Data URL box
= long unsigned offset + long ASCII text string 'url '
-> 4 bytes version/flags = byte hex version + 24-bit hex flags
(current = 0)
-> url c string = ASCII text string points to external data
-> 1 byte url c string end = byte value set to zero
OR
* 8+ bytes Data URN box = long unsigned offset + long ASCII text string 'urn '
-> 4 bytes version/flags = byte hex version + 24-bit hex flags (current = 0)
-> urn c string = ASCII text string points to external data
-> 1 byte urn c string end = byte value set to zero
-> url c string = ASCII text string points to external data
-> 1 byte url c string end = byte value set to zero
* 8+ bytes sample (framing info) table box
= long unsigned offset + long ASCII text string 'stbl'
* 8+ bytes sample (frame encoding) description box
= long unsigned offset + long ASCII text string 'stsd'
-> 4 bytes version/flags = byte hex version + 24-bit hex flags
(current = 0)
-> 4 bytes number of descriptions = long unsigned total
(default = 1)
-> 4 bytes description length = long unsigned length
-> 4 bytes description visual format = long ASCII text string 'mp4v'
- if encoded to ISO/IEC 14496-10 or 3GPP AVC standards then use:
-> 4 bytes description visual format = long ASCII text string 'avc1'
- if encrypted to ISO/IEC 14496-12 or 3GPP standards then use:
-> 4 bytes description visual format = long ASCII text string 'encv'
- if encoded to 3GPP H.263v1 standards then use:
-> 4 bytes description visual format = long ASCII text string 's263'
-> 6 bytes reserved = 48-bit value set to zero
-> 2 bytes data reference index
= short unsigned index from 'dref' box
- there are other sample descriptions
available in the Apple QT format dev docs
-> 2 bytes QUICKTIME video encoding version = short hex version
- default = 0 ; audio data size before decompression = 1
-> 2 bytes QUICKTIME video encoding revision level = byte hex version
- default = 0 ; video can revise this value
-> 4 bytes QUICKTIME video encoding vendor = long ASCII text string
- default = 0
-> 4 bytes QUICKTIME video temporal quality = long unsigned value (0 to 1024)
-> 4 bytes QUICKTIME video spatial quality = long unsigned value (0 to 1024)
- some quality values are lossless = 1024 ; maximum = 1023 ; high = 768
- some quality values are normal = 512 ; low = 256 ; minimum = 0
-> 4 bytes video frame pixel size
= short unsigned width + short unsigned height
-> 8 bytes video resolution
= long fixed point horizontal + long fixed point vertical
- defaults to 72.0 dpi
-> 4 bytes QUICKTIME video data size = long value set to zero
-> 2 bytes video frame count = short unsigned total (set to 1)
-> 1 byte video encoding name string length = byte unsigned length
-> 31 bytes video encoder name string
-> NOTE: if video encoder name string < 31 chars then pad with zeros
-> 2 bytes video pixel depth = short unsigned bit depth
- colors are 1 (Monochrome), 2 (4), 4 (16), 8 (256)
- colors are 16 (1000s), 24 (Ms), 32 (Ms+A)
- grays are 33 (B/W), 34 (4), 36 (16), 40(256)
-> 2 bytes QUICKTIME video color table id = short integer value
(no table = -1)
-> optional QUICKTIME color table data if above set to 0
(see color table atom below for layout)
OR
-> 4 bytes description length = long unsigned length
-> 4 bytes description audio format = long ASCII text string 'mp4a'
- if encrypted to ISO/IEC 14496-12 or 3GPP standards then use:
-> 4 bytes description audio format = long ASCII text string 'enca'
- if encoded to 3GPP GSM 6.10 AMR narrowband standards then use:
-> 4 bytes description audio format = long ASCII text string 'samr'
- if encoded to 3GPP GSM 6.10 AMR wideband standards then use:
-> 4 bytes description audio format = long ASCII text string 'sawb'
-> 6 bytes reserved = 48-bit value set to zero
-> 2 bytes data reference index
= short unsigned index from 'dref' box
-> 2 bytes QUICKTIME audio encoding version = short hex version
- default = 0 ; audio data size before decompression = 1
-> 2 bytes QUICKTIME audio encoding revision level
= byte hex version
- default = 0 ; video can revise this value
-> 4 bytes QUICKTIME audio encoding vendor
= long ASCII text string
- default = 0
-> 2 bytes audio channels = short unsigned count
(mono = 1 ; stereo = 2)
-> 2 bytes audio sample size = short unsigned value
(8 or 16)
-> 2 bytes QUICKTIME audio compression id = short integer value
- default = 0
-> 2 bytes QUICKTIME audio packet size = short value set to zero
-> 4 bytes audio sample rate = long unsigned fixed point rate
OR
-> 4 bytes description length = long unsigned length
-> 4 bytes description system format = long ASCII text string 'mp4s'
- if encrypted to ISO/IEC 14496-12 standards then use:
-> 4 bytes description system format = long ASCII text string 'encs'
-> 6 bytes reserved = 48-bit value set to zero
-> 2 bytes data reference index
= short unsigned index from 'dref' box
* 8+ bytes ISO/IEC 14496-12/3GPP encryption scheme info box
= long unsigned offset + long ASCII text string 'sinf'
- if stream encrypted to ISO/IEC 14496-12 standards
* 8+ bytes ISO/IEC 14496-12/3GPP/QUICKTIME original format box
= long unsigned offset + long ASCII text string 'frma'
-> 4 bytes description format = long ASCII text string
- formats are MPEG-4 visual = 'mp4v' ; MPEG-4 AVC = 'avc1'
- formats are MPEG-4 audio = 'mp4a' ; MPEG-4 system = 'mp4s'
- 3GPP formats are H.253 = 's263' ; AMR narrow = 'samr'
- 3GPP format is AMR wide = 'sawb'
* 8+ bytes optional ISO/IEC 14496-12 IPMP info box
= long unsigned offset + long ASCII text string 'imif'
-> 4 bytes version/flags = byte hex version + 24-bit hex flags
(current = 0)
-> IPMP descriptors = hex dump from IPMP part of ES Descriptor box
* 8+ bytes optional ISO/IEC 14496-12/3GPP scheme type box
= long unsigned offset + long ASCII text string 'schm'
-> 4 bytes version/flags = byte hex version + 24-bit hex flags
(current = 0 ; contains URI if flags = 0x000001)
-> 4 bytes encryption type = long ASCII text string
- types are 128-bit AES counter = 'ACM1' ; 128-bit AES FS = 'AFS1'
- types are NULL algorithm = 'ENUL' ; 160-bit HMAC-SHA-1 = 'SHM2'
- types are RTCP = 'ANUL' ; private scheme = ' '
-> 2 bytes encryption version = short unsigned version
-> optional scheme URI string = UTF-8 text string
(eg. web site)
-> 1 byte optional scheme URI string end = byte padding set to zero
* 8+ bytes ISO/IEC 14496-12/3GPP scheme data box
= long unsigned offset + long ASCII text string 'schi'
-> encryption related key = hex dump
* 8+ bytes 3GPP H.263v1 decode config box
= long unsigned offset + long ASCII text string 'd263'
-> 4 bytes encoder vendor = long ASCII text string
-> 1 byte encoder version = 8-bit unsigned revision
-> 1 byte H.263 level = 8-bit unsigned stream level
-> 1 byte H.263 profile = 8-bit unsigned stream profile
* 8+ bytes optional 3GPP H.263v1 bit rate box
= long unsigned offset + long ASCII text string 'bitr'
-> 4 bytes average bit rate = 32-bit unsigned value
-> 4 bytes maximum bit rate = 32-bit unsigned value
* 8+ bytes 3GPP GSM 6.10 AMR decode config box
= long unsigned offset + long ASCII text string 'damr'
-> 4 bytes encoder vendor = long ASCII text string
-> 1 byte encoder version = 8-bit unsigned revision
-> 2 byte packet modes = 16-bit unsigned bit mode index
-> 1 byte number of packet mode changes = 8-bit unsigned value
-> 1 byte samples per packet = 8-bit unsigned value
* 8+ bytes ISO/IEC 14496-10 or 3GPP AVC decode config box
= long unsigned offset + long ASCII text string 'avcC'
-> 1 byte version = 8-bit hex version (current = 1)
-> 1 byte H.264 profile = 8-bit unsigned stream profile
-> 1 byte H.264 compatible profiles = 8-bit hex flags
-> 1 byte H.264 level = 8-bit unsigned stream level
-> 1 1/2 nibble reserved = 6-bit unsigned value set to 63
-> 1/2 nibble NAL length = 2-bit length byte size type
- 1 byte = 0 ; 2 bytes = 1 ; 4 bytes = 3
-> 1 byte number of SPS = 8-bit unsigned total
-> 2+ bytes SPS length = short unsigned length
-> + SPS NAL unit = hexdump
-> 1 byte number of PPS = 8-bit unsigned total
-> 2+ bytes PPS length = short unsigned length
-> + PPS NAL unit = hexdump
* 8+ bytes vers. 2 ES Descriptor box
= long unsigned offset + long ASCII text string 'esds'
- if encoded to ISO/IEC 14496-10 AVC standards then optionally use:
= long unsigned offset + long ASCII text string 'm4ds'
-> 4 bytes version/flags = 8-bit hex version + 24-bit hex flags
(current = 0)
-> 1 byte ES descriptor type tag = 8-bit hex value 0x03
-> 3 bytes extended descriptor type tag string = 3 * 8-bit hex value
- types are Start = 0x80 ; End = 0xFE
- NOTE: the extended start tags may be left out
-> 1 byte descriptor type length = 8-bit unsigned length
-> 2 bytes ES ID = 16-bit unsigned value
-> 1 byte stream priority = 8-bit unsigned value
- Defaults to 16 and ranges from 0 through to 31
-> 1 byte decoder config descriptor type tag = 8-bit hex value 0x04
-> 3 bytes extended descriptor type tag string = 3 * 8-bit hex value
- types are Start = 0x80 ; End = 0xFE
- NOTE: the extended start tags may be left out
-> 1 byte descriptor type length = 8-bit unsigned length
-> 1 byte object type ID = 8-bit unsigned value
- type IDs are system v1 = 1 ; system v2 = 2
- type IDs are MPEG-4 video = 32 ; MPEG-4 AVC SPS = 33
- type IDs are MPEG-4 AVC PPS = 34 ; MPEG-4 audio = 64
- type IDs are MPEG-2 simple video = 96
- type IDs are MPEG-2 main video = 97
- type IDs are MPEG-2 SNR video = 98
- type IDs are MPEG-2 spatial video = 99
- type IDs are MPEG-2 high video = 100
- type IDs are MPEG-2 4:2:2 video = 101
- type IDs are MPEG-4 ADTS main = 102
- type IDs are MPEG-4 ADTS Low Complexity = 103
- type IDs are MPEG-4 ADTS Scalable Sampling Rate = 104
- type IDs are MPEG-2 ADTS = 105 ; MPEG-1 video = 106
- type IDs are MPEG-1 ADTS = 107 ; JPEG video = 108
- type IDs are private audio = 192 ; private video = 208
- type IDs are 16-bit PCM LE audio = 224 ; vorbis audio = 225
- type IDs are dolby v3 (AC3) audio = 226 ; alaw audio = 227
- type IDs are mulaw audio = 228 ; G723 ADPCM audio = 229
- type IDs are 16-bit PCM Big Endian audio = 230
- type IDs are Y'CbCr 4:2:0 (YV12) video = 240 ; H264 video = 241
- type IDs are H263 video = 242 ; H261 video = 243
-> 6 bits stream type = 3/4 byte hex value
- type IDs are object descript. = 1 ; clock ref. = 2
- type IDs are scene descript. = 4 ; visual = 4
- type IDs are audio = 5 ; MPEG-7 = 6 ; IPMP = 7
- type IDs are OCI = 8 ; MPEG Java = 9
- type IDs are user private = 32
-> 1 bit upstream flag = 1/8 byte hex value
-> 1 bit reserved flag = 1/8 byte hex value set to 1
-> 3 bytes buffer size = 24-bit unsigned value
-> 4 bytes maximum bit rate = 32-bit unsigned value
-> 4 bytes average bit rate = 32-bit unsigned value
-> 1 byte decoder specific descriptor type tag
= 8-bit hex value 0x05
-> 3 bytes extended descriptor type tag string
= 3 * 8-bit hex value
- types are Start = 0x80 ; End = 0xFE
- NOTE: the extended start tags may be left out
-> 1 byte descriptor type length
= 8-bit unsigned length
-> ES header start codes = hex dump
-> 1 byte SL config descriptor type tag = 8-bit hex value 0x06
-> 3 bytes extended descriptor type tag string = 3 * 8-bit hex value
- types are Start = 0x80 ; End = 0xFE
- NOTE: the extended start tags may be left out
-> 1 byte descriptor type length = 8-bit unsigned length
-> 1 byte SL value = 8-bit hex value set to 0x02
* 8+ bytes QUICKTIME video gamma atom
= long unsigned offset + long ASCII text string 'gama'
-> 4 bytes decimal level = long fixed point level
* 8+ bytes QUICKTIME video field order atom
= long unsigned offset + long ASCII text string 'fiel'
-> 2 bytes field count/order = byte integer total + byte integer order
* 8+ bytes QUICKTIME video m-jpeg quantize table atom
= long unsigned offset + long ASCII text string 'mjqt'
-> quantization table = hex dump
* 8+ bytes QUICKTIME video m-jpeg huffman table atom
= long unsigned offset + long ASCII text string 'mjht'
-> huffman table = hex dump
* 8+ bytes time to sample (frame timing) box
= long unsigned offset + long ASCII text string 'stts'
-> 4 bytes version/flags = byte hex version + 24-bit hex flags
(current = 0)
-> 4 bytes number of times = long unsigned total
-> 8+ bytes time per frame
= long unsigned frame count + long unsigned duration
- multiple durations means variable framing rate
- single duration means fixed framing rate
- calculate framing (fps): media units / (average) duration
* 8+ bytes optional sync sample (key/intra frame) box
= long unsigned offset + long ASCII text string 'stss'
-> 4 bytes version/flags = byte hex version + 24-bit hex flags
(current = 0)
-> 4 bytes number of key frames = long unsigned total
-> 4+ bytes key/intra frame location = long unsigned framing time
- key/intra frame location according to sample/framing time
* 8+ bytes sample/framing to chunk/block box
= long unsigned offset + long ASCII text string 'stsc'
-> 4 bytes version/flags = byte hex version + 24-bit hex flags
(current = 0)
-> 4 bytes number of blocks = long unsigned total
-> 8+ bytes frames per block
= long unsigned first/next block + long unsigned # of frames
-> 4+ bytes samples description id
= long unsigned description number
* 8+ bytes sample (block byte) size box
= long unsigned offset + long ASCII text string 'stsz'
-> 4 bytes version/flags = byte hex version + 24-bit hex flags
(current = 0)
-> 4 bytes block byte size for all = 32-bit integer byte value
(different sizes = 0)
-> 4 bytes number of block sizes = long unsigned total
-> 4+ bytes block byte sizes = long unsigned byte values
* 8+ bytes chunk/block offset box