Skip to content

Commit

Permalink
A whole bunch of CFF fixes
Browse files Browse the repository at this point in the history
## Corrupted CFF index data

There was a subtle bug in CFF Index implementation that resulted in
a data corruption. In certain circumstances some items didn't get
properly encoded. This happened when items were not previously accessed.

This resulted, for instance, in missing glyphs. But only sometimes
because indexes might've still contain data that shouldn't've been
there. In combination with incorrect encoding (see further) this
resulted in some glyphs still being rendered, sometimes even correctly.

Along with the fix a rather large API change landed. This resulted in
quite a big diff.

## Incorrect CFF encoding in subsets

TTFunk used to reuse encoding from the original font. This mapping was
incorrect for subset fonts which used not just a subset of glyphs but
also a different encoding.

A separate issue was that some fonts have empty CFF encoding. This
incorrect mapping resulted in encoding that mapped all codes to glyph 0.

This had impact on Prawn in particular. PDF spec explicitly says that
CFF encoding is not to be used in OpenType fonts. `cmap` table should
directly index charstrings in the CFF table. Despite this PDF renderers
still use CFF encoding to retrieve glyphs. So TTFunk has to discard the
original CFF encoding and supply its own.
  • Loading branch information
pointlessone committed Dec 6, 2023
1 parent acddf19 commit 08ba10e
Show file tree
Hide file tree
Showing 26 changed files with 401 additions and 232 deletions.
35 changes: 35 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -7,6 +7,41 @@ The format is based on [Keep a Changelog](http://keepachangelog.com/).

## [Unreleased]

### Fixed

* Corrupted CFF index data

there was a subtle bug in cff index implementation that resulted in
a data corruption. in certain circumstances some items didn't get
properly encoded. this happened when items were not previously accessed.

this resulted, for instance, in missing glyphs. but only sometimes
because indexes might've still contain data that shouldn't've been
there. in combination with incorrect encoding (see further) this
resulted in some glyphs still being rendered, sometimes even correctly.

along with the fix a rather large api change landed. this resulted in
quite a big diff.

Alexander Mankuta

* Incorrect CFF encoding in subsets

TTFunk used to reuse encoding from the original font. This mapping was
incorrect for subset fonts which used not just a subset of glyphs but
also a different encoding.

A separate issue was that some fonts have empty CFF encoding. This
incorrect mapping resulted in encoding that mapped all codes to glyph 0.

This had impact on Prawn in particular. PDF spec explicitly says that
CFF encoding is not to be used in OpenType fonts. `cmap` table should
directly index charstrings in the CFF table. Despite this PDF renderers
still use CFF encoding to retrieve glyphs. So TTFunk has to discard the
original CFF encoding and supply its own.

Alexander Mankuta

## 1.7.0

### Changes
Expand Down
11 changes: 1 addition & 10 deletions lib/ttfunk/otf_encoder.rb
Original file line number Diff line number Diff line change
Expand Up @@ -27,7 +27,7 @@ def base_table
end

def cff_table
@cff_table ||= original.cff.encode(new_to_old_glyph, old_to_new_glyph)
@cff_table ||= original.cff.encode(subset)
end

def vorg_table
Expand All @@ -48,14 +48,5 @@ def optimal_table_order
(tables.keys - ['DSIG'] - OPTIMAL_TABLE_ORDER) +
['DSIG']
end

def collect_glyphs(glyph_ids)
# CFF top indexes are supposed to contain only one font, although they're
# capable of supporting many (no idea why this is true, maybe for CFF
# v2??). Anyway it's cool to do top_index[0], don't worry about it.
glyph_ids.each_with_object({}) do |id, h|
h[id] = original.cff.top_index[0].charstrings_index[id]
end
end
end
end
1 change: 1 addition & 0 deletions lib/ttfunk/subset/code_page.rb
Original file line number Diff line number Diff line change
Expand Up @@ -40,6 +40,7 @@ def initialize(original, code_page, encoding)

def to_unicode_map
self.class.unicode_mapping_for(encoding)
.select { |codepoint, _unicode| @subset[codepoint] }
end

def use(character)
Expand Down
12 changes: 6 additions & 6 deletions lib/ttfunk/table/cff.rb
Original file line number Diff line number Diff line change
Expand Up @@ -31,18 +31,18 @@ def tag
TAG
end

def encode(new_to_old, old_to_new)
def encode(subset)
EncodedString.new do |result|
sub_tables = [
result.concat(
header.encode,
name_index.encode,
top_index.encode(&:encode),
top_index.encode,
string_index.encode,
global_subr_index.encode
]
)

sub_tables.each { |tb| result << tb }
top_index[0].finalize(result, new_to_old, old_to_new)
charmap = subset.new_cmap_table[:charmap]
top_index[0].finalize(result, charmap)
end
end

Expand Down
31 changes: 18 additions & 13 deletions lib/ttfunk/table/cff/charset.rb
Original file line number Diff line number Diff line change
Expand Up @@ -35,7 +35,7 @@ def strings_for_charset_id(charset_id)
end

attr_reader :entries, :length
attr_reader :top_dict, :format, :count, :offset_or_id
attr_reader :top_dict, :format, :items_count, :offset_or_id

def initialize(top_dict, file, offset_or_id = nil, length = nil)
@top_dict = top_dict
Expand All @@ -44,15 +44,15 @@ def initialize(top_dict, file, offset_or_id = nil, length = nil)
if offset
super(file, offset, length)
else
@count = self.class.strings_for_charset_id(offset_or_id).size
@items_count = self.class.strings_for_charset_id(offset_or_id).size
end
end

def each
return to_enum(__method__) unless block_given?

# +1 adjusts for the implicit .notdef glyph
(count + 1).times { |i| yield self[i] }
(items_count + 1).times { |i| yield self[i] }
end

def [](glyph_id)
Expand All @@ -73,13 +73,18 @@ def offset
end
end

# mapping is new -> old glyph ids
def encode(mapping)
def encode(charmap)
# no offset means no charset was specified (i.e. we're supposed to
# use a predefined charset) so there's nothing to encode
return '' unless offset

sids = mapping.keys.sort.map { |new_gid| sid_for(mapping[new_gid]) }
sids =
charmap
.values
.reject { |mapping| mapping[:new].zero? }
.sort_by { |mapping| mapping[:new] }
.map { |mapping| sid_for(mapping[:old]) }

ranges = TTFunk::BinUtils.rangify(sids)
range_max = ranges.map(&:last).max

Expand Down Expand Up @@ -138,7 +143,7 @@ def find_string(sid)

idx = sid - 390

if idx < file.cff.string_index.count
if idx < file.cff.string_index.items_count
file.cff.string_index[idx]
end
else
Expand All @@ -153,23 +158,23 @@ def parse!

case format_sym
when :array_format
@count = top_dict.charstrings_index.count - 1
@length = count * element_width
@items_count = top_dict.charstrings_index.items_count - 1
@length = @items_count * element_width
@entries = OneBasedArray.new(read(length, 'n*'))

when :range_format8, :range_format16
# The number of ranges is not explicitly specified in the font.
# Instead, software utilizing this data simply processes ranges
# until all glyphs in the font are covered.
@count = 0
@items_count = 0
@entries = []
@length = 0

until count >= top_dict.charstrings_index.count - 1
until @items_count >= top_dict.charstrings_index.items_count - 1
@length += 1 + element_width
sid, num_left = read(element_width, element_format)
entries << (sid..(sid + num_left))
@count += num_left + 1
@entries << (sid..(sid + num_left))
@items_count += num_left + 1
end
end
end
Expand Down
4 changes: 0 additions & 4 deletions lib/ttfunk/table/cff/charstring.rb
Original file line number Diff line number Diff line change
Expand Up @@ -91,10 +91,6 @@ def render(x: 0, y: 0, font_size: 72)
)
end

def encode
raw
end

private

def parse!
Expand Down
18 changes: 9 additions & 9 deletions lib/ttfunk/table/cff/charstrings_index.rb
Original file line number Diff line number Diff line change
Expand Up @@ -11,21 +11,21 @@ def initialize(top_dict, *remaining_args)
@top_dict = top_dict
end

def [](index)
entry_cache[index] ||= TTFunk::Table::Cff::Charstring.new(
private

def decode_item(index, _offset, _length)
TTFunk::Table::Cff::Charstring.new(
index, top_dict, font_dict_for(index), super
)
end

# gets passed a mapping of new => old glyph ids
def encode(mapping)
super() do |_entry, index|
self[mapping[index]].encode if mapping.include?(index)
end
def encode_items(charmap)
charmap
.reject { |code, mapping| mapping[:new].zero? && !code.zero? }
.sort_by { |_code, mapping| mapping[:new] }
.map { |(_code, mapping)| items[mapping[:old]] }
end

private

def font_dict_for(index)
# only CID-keyed fonts contain an FD selector and font dicts
if top_dict.is_cid_font?
Expand Down
46 changes: 24 additions & 22 deletions lib/ttfunk/table/cff/encoding.rb
Original file line number Diff line number Diff line change
Expand Up @@ -22,24 +22,26 @@ def codes_for_encoding_id(encoding_id)
end
end

attr_reader :top_dict, :format, :count, :offset_or_id
attr_reader :top_dict, :format, :items_count, :offset_or_id

def initialize(top_dict, file, offset_or_id = nil, length = nil)
@top_dict = top_dict
@offset_or_id = offset_or_id || DEFAULT_ENCODING_ID

if offset
super(file, offset, length)
@supplemental = format >> 7 == 1
else
@count = self.class.codes_for_encoding_id(offset_or_id).size
@items_count = self.class.codes_for_encoding_id(offset_or_id).size
@supplemental = false
end
end

def each
return to_enum(__method__) unless block_given?

# +1 adjusts for the implicit .notdef glyph
(count + 1).times { |i| yield self[i] }
(items_count + 1).times { |i| yield self[i] }
end

def [](glyph_id)
Expand All @@ -62,16 +64,18 @@ def offset
end
end

def encode(new_to_old, old_to_new)
# no offset means no encoding was specified (i.e. we're supposed to
# use a predefined encoding) so there's nothing to encode
return '' unless offset
return encode_supplemental(new_to_old, old_to_new) if supplemental?
def encode(charmap)
# Any subset encoding is all but guaranteed to be different from the
# standard encoding so we don't even attempt to see if it matches. We
# assume it's different and just encode it anew.

return encode_supplemental(charmap) if supplemental?

codes =
new_to_old.keys.sort.map do |new_gid|
code_for(new_to_old[new_gid])
end
charmap
.reject { |_code, mapping| mapping[:new].zero? }
.sort_by { |_code, mapping| mapping[:new] }
.map { |(code, _m)| code }

ranges = TTFunk::BinUtils.rangify(codes)

Expand All @@ -95,18 +99,16 @@ def encode(new_to_old, old_to_new)

def supplemental?
# high-order bit set to 1 indicates supplemental encoding
@format >> 7 == 1
@supplemental
end

private

def encode_supplemental(_new_to_old, old_to_new)
def encode_supplemental(charmap)
new_entries =
@entries.each_with_object({}) do |(code, old_gid), ret|
if (new_gid = old_to_new[old_gid])
ret[code] = new_gid
end
end
charmap
.reject { |_code, mapping| mapping[:new].zero? }
.transform_values { |mapping| mapping[:new] }

result = [format_int(:supplemental), new_entries.size].pack('CC')
fmt = element_format(:supplemental)
Expand Down Expand Up @@ -150,22 +152,22 @@ def parse!

case format_sym
when :array_format
@count = entry_count
@items_count = entry_count
@entries = OneBasedArray.new(read(length, 'C*'))

when :range_format
@entries = []
@count = 0
@items_count = 0

entry_count.times do
code, num_left = read(element_width, element_format)
@entries << (code..(code + num_left))
@count += num_left + 1
@items_count += num_left + 1
end

when :supplemental
@entries = {}
@count = entry_count
@items_count = entry_count

entry_count.times do
code, glyph = read(element_width, element_format)
Expand Down
20 changes: 10 additions & 10 deletions lib/ttfunk/table/cff/fd_selector.rb
Original file line number Diff line number Diff line change
Expand Up @@ -12,7 +12,7 @@ class FdSelector < TTFunk::SubTable
RANGE_ENTRY_SIZE = 3
ARRAY_ENTRY_SIZE = 1

attr_reader :top_dict, :count, :entries, :n_glyphs
attr_reader :top_dict, :items_count, :entries, :n_glyphs

def initialize(top_dict, file, offset, length = nil)
@top_dict = top_dict
Expand Down Expand Up @@ -48,16 +48,16 @@ def [](glyph_id)
def each
return to_enum(__method__) unless block_given?

count.times { |i| yield self[i] }
items_count.times { |i| yield self[i] }
end

# mapping is new -> old glyph ids
def encode(mapping)
def encode(charmap)
# get list of [new_gid, fd_index] pairs
new_indices =
mapping.keys.sort.map do |new_gid|
[new_gid, self[mapping[new_gid]]]
end
charmap
.reject { |code, mapping| mapping[:new].zero? && !code.zero? }
.sort_by { |_code, mapping| mapping[:new] }
.map { |(_code, mapping)| [mapping[:new], self[mapping[:old]]] }

ranges = rangify_gids(new_indices)
total_range_size = ranges.size * RANGE_ENTRY_SIZE
Expand Down Expand Up @@ -108,10 +108,10 @@ def parse!

case format_sym
when :array_format
@n_glyphs = top_dict.charstrings_index.count
@n_glyphs = top_dict.charstrings_index.items_count
data = io.read(n_glyphs)
@length += data.bytesize
@count = data.bytesize
@items_count = data.bytesize
@entries = data.bytes

when :range_format
Expand All @@ -135,7 +135,7 @@ def parse!
last_start_gid, last_fd_index = ranges.last
@entries << [(last_start_gid...(n_glyphs + 1)), last_fd_index]

@count = entries.reduce(0) { |sum, entry| sum + entry.first.size }
@items_count = entries.reduce(0) { |sum, entry| sum + entry.first.size }
end
end

Expand Down
Loading

0 comments on commit 08ba10e

Please sign in to comment.