Skip to content

Commit

Permalink
PyMuPDF issue 2608: CMAP mrange with surrogate chars.
Browse files Browse the repository at this point in the history
The file given in the bug contains a CMAP with a bfrange entry:

<63> <73> <D835DF08>

i.e. a range where the code 'base' is given as a surrogate pair.

We parse this, and because the base definition is longer than
the single 16 bit value, we break the range down into a series
of single char ranges.

We add these single char ranges (using pdf_map_one_to_many)
incrementing the last value in the range by 1 each time.

Unfortunately, pdf_map_one_to_many spots the surrogates, and
rewrites the data it is passed. And the data is then reused
for the next call into pdf_map_one_to_many.

The fix, implemented here, is to make pdf_map_one_to_many take
a local copy of the values before it modifies them.

This solves the problem.
  • Loading branch information
robinwatts committed Oct 19, 2023
1 parent 4749e94 commit 8c2f678
Show file tree
Hide file tree
Showing 2 changed files with 17 additions and 2 deletions.
2 changes: 2 additions & 0 deletions include/mupdf/pdf/cmap.h
Original file line number Diff line number Diff line change
Expand Up @@ -103,6 +103,8 @@ void pdf_map_range_to_range(fz_context *ctx, pdf_cmap *cmap, unsigned int srclo,

/*
Add a single one-to-many mapping.
len <= 256.
*/
void pdf_map_one_to_many(fz_context *ctx, pdf_cmap *cmap, unsigned int one, int *many, size_t len);
void pdf_sort_cmap(fz_context *ctx, pdf_cmap *cmap);
Expand Down
17 changes: 15 additions & 2 deletions source/pdf/pdf-cmap.c
Original file line number Diff line number Diff line change
Expand Up @@ -718,6 +718,12 @@ pdf_map_range_to_range(fz_context *ctx, pdf_cmap *cmap, unsigned int low, unsign
void
pdf_map_one_to_many(fz_context *ctx, pdf_cmap *cmap, unsigned int low, int *values, size_t len)
{
int *ovalues = values;
/* len is always restricted to <= 256 by the callers. */
int local[256];

assert(len <= 256);

/* Decode unicode surrogate pairs. */
/* Only the *-UCS2 CMaps use one-to-many mappings, so assuming unicode should be safe. */
if (len >= 2)
Expand All @@ -727,16 +733,23 @@ pdf_map_one_to_many(fz_context *ctx, pdf_cmap *cmap, unsigned int low, int *valu
* with other chars. See bug 706131. */
for (i = 0, j = 0; i < len; i++, j++)
{
int hi = values[i];
int hi = ovalues[i];
if (hi >= 0xd800 && hi < 0xdc00 && i < len-1)
{
int lo = values[i+1];
int lo = ovalues[i+1];
if (lo >= 0xdc00 && lo < 0xe000)
{
hi = ((hi - 0xD800) << 10) + (lo - 0xDC00) + 0x10000;
i++;
}
}
if (values != local)
{
/* We can't change the callers data, so copy stuff in. */
if (j)
memcpy(local, values, sizeof(local[0]) * (j-1));
values = local;
}
values[j] = hi;
}
len = j;
Expand Down

0 comments on commit 8c2f678

Please sign in to comment.