A whole bunch of CFF fixes

## Corrupted CFF index data There was a subtle bug in CFF Index implementation that resulted in a data corruption. In certain circumstances some items didn't get properly encoded. This happened when items were not previously accessed. This resulted, for instance, in missing glyphs. But only sometimes because indexes might've still contain data that shouldn't've been there. In combination with incorrect encoding (see further) this resulted in some glyphs still being rendered, sometimes even correctly. Along with the fix a rather large API change landed. This resulted in quite a big diff. ## Incorrect CFF encoding in subsets TTFunk used to reuse encoding from the original font. This mapping was incorrect for subset fonts which used not just a subset of glyphs but also a different encoding. A separate issue was that some fonts have empty CFF encoding. This incorrect mapping resulted in encoding that mapped all codes to glyph 0. This had impact on Prawn in particular. PDF spec explicitly says that CFF encoding is not to be used in OpenType fonts. `cmap` table should directly index charstrings in the CFF table. Despite this PDF renderers still use CFF encoding to retrieve glyphs. So TTFunk has to discard the original CFF encoding and supply its own.
prawnpdf · Dec 6, 2023 · 08ba10e · 08ba10e
1 parent acddf19
commit 08ba10e
Show file tree

Hide file tree

Showing 26 changed files with 401 additions and 232 deletions.
diff --git a/CHANGELOG.md b/CHANGELOG.md
@@ -7,6 +7,41 @@ The format is based on [Keep a Changelog](http://keepachangelog.com/).
 
 ## [Unreleased]
 
+### Fixed
+
+* Corrupted CFF index data
+
+  there was a subtle bug in cff index implementation that resulted in
+  a data corruption. in certain circumstances some items didn't get
+  properly encoded. this happened when items were not previously accessed.
+
+  this resulted, for instance, in missing glyphs. but only sometimes
+  because indexes might've still contain data that shouldn't've been
+  there. in combination with incorrect encoding (see further) this
+  resulted in some glyphs still being rendered, sometimes even correctly.
+
+  along with the fix a rather large api change landed. this resulted in
+  quite a big diff.
+
+  Alexander Mankuta
+
+* Incorrect CFF encoding in subsets
+
+  TTFunk used to reuse encoding from the original font. This mapping was
+  incorrect for subset fonts which used not just a subset of glyphs but
+  also a different encoding.
+
+  A separate issue was that some fonts have empty CFF encoding. This
+  incorrect mapping resulted in encoding that mapped all codes to glyph 0.
+
+  This had impact on Prawn in particular. PDF spec explicitly says that
+  CFF encoding is not to be used in OpenType fonts. `cmap` table should
+  directly index charstrings in the CFF table. Despite this PDF renderers
+  still use CFF encoding to retrieve glyphs. So TTFunk has to discard the
+  original CFF encoding and supply its own.
+
+  Alexander Mankuta
+
 ## 1.7.0
 
 ### Changes

diff --git a/lib/ttfunk/otf_encoder.rb b/lib/ttfunk/otf_encoder.rb
@@ -27,7 +27,7 @@ def base_table
     end
 
     def cff_table
-      @cff_table ||= original.cff.encode(new_to_old_glyph, old_to_new_glyph)
+      @cff_table ||= original.cff.encode(subset)
     end
 
     def vorg_table
@@ -48,14 +48,5 @@ def optimal_table_order
         (tables.keys - ['DSIG'] - OPTIMAL_TABLE_ORDER) +
         ['DSIG']
     end
-
-    def collect_glyphs(glyph_ids)
-      # CFF top indexes are supposed to contain only one font, although they're
-      # capable of supporting many (no idea why this is true, maybe for CFF
-      # v2??). Anyway it's cool to do top_index[0], don't worry about it.
-      glyph_ids.each_with_object({}) do |id, h|
-        h[id] = original.cff.top_index[0].charstrings_index[id]
-      end
-    end
   end
 end
diff --git a/lib/ttfunk/subset/code_page.rb b/lib/ttfunk/subset/code_page.rb
@@ -40,6 +40,7 @@ def initialize(original, code_page, encoding)
 
       def to_unicode_map
         self.class.unicode_mapping_for(encoding)
+          .select { |codepoint, _unicode| @subset[codepoint] }
       end
 
       def use(character)

diff --git a/lib/ttfunk/table/cff.rb b/lib/ttfunk/table/cff.rb
@@ -31,18 +31,18 @@ def tag
         TAG
       end
 
-      def encode(new_to_old, old_to_new)
+      def encode(subset)
         EncodedString.new do |result|
-          sub_tables = [
+          result.concat(
             header.encode,
             name_index.encode,
-            top_index.encode(&:encode),
+            top_index.encode,
             string_index.encode,
             global_subr_index.encode
-          ]
+          )
 
-          sub_tables.each { |tb| result << tb }
-          top_index[0].finalize(result, new_to_old, old_to_new)
+          charmap = subset.new_cmap_table[:charmap]
+          top_index[0].finalize(result, charmap)
         end
       end
 

diff --git a/lib/ttfunk/table/cff/charset.rb b/lib/ttfunk/table/cff/charset.rb
@@ -35,7 +35,7 @@ def strings_for_charset_id(charset_id)
         end
 
         attr_reader :entries, :length
-        attr_reader :top_dict, :format, :count, :offset_or_id
+        attr_reader :top_dict, :format, :items_count, :offset_or_id
 
         def initialize(top_dict, file, offset_or_id = nil, length = nil)
           @top_dict = top_dict
@@ -44,15 +44,15 @@ def initialize(top_dict, file, offset_or_id = nil, length = nil)
           if offset
             super(file, offset, length)
           else
-            @count = self.class.strings_for_charset_id(offset_or_id).size
+            @items_count = self.class.strings_for_charset_id(offset_or_id).size
           end
         end
 
         def each
           return to_enum(__method__) unless block_given?
 
           # +1 adjusts for the implicit .notdef glyph
-          (count + 1).times { |i| yield self[i] }
+          (items_count + 1).times { |i| yield self[i] }
         end
 
         def [](glyph_id)
@@ -73,13 +73,18 @@ def offset
           end
         end
 
-        # mapping is new -> old glyph ids
-        def encode(mapping)
+        def encode(charmap)
           # no offset means no charset was specified (i.e. we're supposed to
           # use a predefined charset) so there's nothing to encode
           return '' unless offset
 
-          sids = mapping.keys.sort.map { |new_gid| sid_for(mapping[new_gid]) }
+          sids =
+            charmap
+              .values
+              .reject { |mapping| mapping[:new].zero? }
+              .sort_by { |mapping| mapping[:new] }
+              .map { |mapping| sid_for(mapping[:old]) }
+
           ranges = TTFunk::BinUtils.rangify(sids)
           range_max = ranges.map(&:last).max
 
@@ -138,7 +143,7 @@ def find_string(sid)
 
             idx = sid - 390
 
-            if idx < file.cff.string_index.count
+            if idx < file.cff.string_index.items_count
               file.cff.string_index[idx]
             end
           else
@@ -153,23 +158,23 @@ def parse!
 
           case format_sym
           when :array_format
-            @count = top_dict.charstrings_index.count - 1
-            @length = count * element_width
+            @items_count = top_dict.charstrings_index.items_count - 1
+            @length = @items_count * element_width
             @entries = OneBasedArray.new(read(length, 'n*'))
 
           when :range_format8, :range_format16
             # The number of ranges is not explicitly specified in the font.
             # Instead, software utilizing this data simply processes ranges
             # until all glyphs in the font are covered.
-            @count = 0
+            @items_count = 0
             @entries = []
             @length = 0
 
-            until count >= top_dict.charstrings_index.count - 1
+            until @items_count >= top_dict.charstrings_index.items_count - 1
               @length += 1 + element_width
               sid, num_left = read(element_width, element_format)
-              entries << (sid..(sid + num_left))
-              @count += num_left + 1
+              @entries << (sid..(sid + num_left))
+              @items_count += num_left + 1
             end
           end
         end

diff --git a/lib/ttfunk/table/cff/charstring.rb b/lib/ttfunk/table/cff/charstring.rb
@@ -91,10 +91,6 @@ def render(x: 0, y: 0, font_size: 72)
           )
         end
 
-        def encode
-          raw
-        end
-
         private
 
         def parse!

diff --git a/lib/ttfunk/table/cff/charstrings_index.rb b/lib/ttfunk/table/cff/charstrings_index.rb
@@ -11,21 +11,21 @@ def initialize(top_dict, *remaining_args)
           @top_dict = top_dict
         end
 
-        def [](index)
-          entry_cache[index] ||= TTFunk::Table::Cff::Charstring.new(
+        private
+
+        def decode_item(index, _offset, _length)
+          TTFunk::Table::Cff::Charstring.new(
             index, top_dict, font_dict_for(index), super
           )
         end
 
-        # gets passed a mapping of new => old glyph ids
-        def encode(mapping)
-          super() do |_entry, index|
-            self[mapping[index]].encode if mapping.include?(index)
-          end
+        def encode_items(charmap)
+          charmap
+            .reject { |code, mapping| mapping[:new].zero? && !code.zero? }
+            .sort_by { |_code, mapping| mapping[:new] }
+            .map { |(_code, mapping)| items[mapping[:old]] }
         end
 
-        private
-
         def font_dict_for(index)
           # only CID-keyed fonts contain an FD selector and font dicts
           if top_dict.is_cid_font?

diff --git a/lib/ttfunk/table/cff/encoding.rb b/lib/ttfunk/table/cff/encoding.rb
@@ -22,24 +22,26 @@ def codes_for_encoding_id(encoding_id)
           end
         end
 
-        attr_reader :top_dict, :format, :count, :offset_or_id
+        attr_reader :top_dict, :format, :items_count, :offset_or_id
 
         def initialize(top_dict, file, offset_or_id = nil, length = nil)
           @top_dict = top_dict
           @offset_or_id = offset_or_id || DEFAULT_ENCODING_ID
 
           if offset
             super(file, offset, length)
+            @supplemental = format >> 7 == 1
           else
-            @count = self.class.codes_for_encoding_id(offset_or_id).size
+            @items_count = self.class.codes_for_encoding_id(offset_or_id).size
+            @supplemental = false
           end
         end
 
         def each
           return to_enum(__method__) unless block_given?
 
           # +1 adjusts for the implicit .notdef glyph
-          (count + 1).times { |i| yield self[i] }
+          (items_count + 1).times { |i| yield self[i] }
         end
 
         def [](glyph_id)
@@ -62,16 +64,18 @@ def offset
           end
         end
 
-        def encode(new_to_old, old_to_new)
-          # no offset means no encoding was specified (i.e. we're supposed to
-          # use a predefined encoding) so there's nothing to encode
-          return '' unless offset
-          return encode_supplemental(new_to_old, old_to_new) if supplemental?
+        def encode(charmap)
+          # Any subset encoding is all but guaranteed to be different from the
+          # standard encoding so we don't even attempt to see if it matches. We
+          # assume it's different and just encode it anew.
+
+          return encode_supplemental(charmap) if supplemental?
 
           codes =
-            new_to_old.keys.sort.map do |new_gid|
-              code_for(new_to_old[new_gid])
-            end
+            charmap
+              .reject { |_code, mapping| mapping[:new].zero? }
+              .sort_by { |_code, mapping| mapping[:new] }
+              .map { |(code, _m)| code }
 
           ranges = TTFunk::BinUtils.rangify(codes)
 
@@ -95,18 +99,16 @@ def encode(new_to_old, old_to_new)
 
         def supplemental?
           # high-order bit set to 1 indicates supplemental encoding
-          @format >> 7 == 1
+          @supplemental
         end
 
         private
 
-        def encode_supplemental(_new_to_old, old_to_new)
+        def encode_supplemental(charmap)
           new_entries =
-            @entries.each_with_object({}) do |(code, old_gid), ret|
-              if (new_gid = old_to_new[old_gid])
-                ret[code] = new_gid
-              end
-            end
+            charmap
+              .reject { |_code, mapping| mapping[:new].zero? }
+              .transform_values { |mapping| mapping[:new] }
 
           result = [format_int(:supplemental), new_entries.size].pack('CC')
           fmt = element_format(:supplemental)
@@ -150,22 +152,22 @@ def parse!
 
           case format_sym
           when :array_format
-            @count = entry_count
+            @items_count = entry_count
             @entries = OneBasedArray.new(read(length, 'C*'))
 
           when :range_format
             @entries = []
-            @count = 0
+            @items_count = 0
 
             entry_count.times do
               code, num_left = read(element_width, element_format)
               @entries << (code..(code + num_left))
-              @count += num_left + 1
+              @items_count += num_left + 1
             end
 
           when :supplemental
             @entries = {}
-            @count = entry_count
+            @items_count = entry_count
 
             entry_count.times do
               code, glyph = read(element_width, element_format)

diff --git a/lib/ttfunk/table/cff/fd_selector.rb b/lib/ttfunk/table/cff/fd_selector.rb
@@ -12,7 +12,7 @@ class FdSelector < TTFunk::SubTable
         RANGE_ENTRY_SIZE = 3
         ARRAY_ENTRY_SIZE = 1
 
-        attr_reader :top_dict, :count, :entries, :n_glyphs
+        attr_reader :top_dict, :items_count, :entries, :n_glyphs
 
         def initialize(top_dict, file, offset, length = nil)
           @top_dict = top_dict
@@ -48,16 +48,16 @@ def [](glyph_id)
         def each
           return to_enum(__method__) unless block_given?
 
-          count.times { |i| yield self[i] }
+          items_count.times { |i| yield self[i] }
         end
 
-        # mapping is new -> old glyph ids
-        def encode(mapping)
+        def encode(charmap)
           # get list of [new_gid, fd_index] pairs
           new_indices =
-            mapping.keys.sort.map do |new_gid|
-              [new_gid, self[mapping[new_gid]]]
-            end
+            charmap
+              .reject { |code, mapping| mapping[:new].zero? && !code.zero? }
+              .sort_by { |_code, mapping| mapping[:new] }
+              .map { |(_code, mapping)| [mapping[:new], self[mapping[:old]]] }
 
           ranges = rangify_gids(new_indices)
           total_range_size = ranges.size * RANGE_ENTRY_SIZE
@@ -108,10 +108,10 @@ def parse!
 
           case format_sym
           when :array_format
-            @n_glyphs = top_dict.charstrings_index.count
+            @n_glyphs = top_dict.charstrings_index.items_count
             data = io.read(n_glyphs)
             @length += data.bytesize
-            @count = data.bytesize
+            @items_count = data.bytesize
             @entries = data.bytes
 
           when :range_format
@@ -135,7 +135,7 @@ def parse!
             last_start_gid, last_fd_index = ranges.last
             @entries << [(last_start_gid...(n_glyphs + 1)), last_fd_index]
 
-            @count = entries.reduce(0) { |sum, entry| sum + entry.first.size }
+            @items_count = entries.reduce(0) { |sum, entry| sum + entry.first.size }
           end
         end