-
Notifications
You must be signed in to change notification settings - Fork 218
PDF Embedding
There are various applications that require using another PDF in the process of creating a new one. For instance, applications that merge multiple PDF files into one, would like to recreate pages of those PDFs in a new one. Impositioning applications might want to use pages as placed objects in a newly created page.
For this purpose, you can use the PDFWriter methods for embedding PDF. Two methods are supported:
- Use PDF Pages as pages, simply appending them to the pages of the generated PDF.
- Use PDF Pages as components in the creation of one or more pages in the generated PDF. This method is, in turn, divided to two sub-methods:
- Embed pages as Form xobjects. The library creates a list of form XObjects based on the source PDF pages. you can then use them as regular form xobjects, placing them on pages in the generated PDF, in one or more locations. Using form xobjects the content becomes potentially reusable.
- Embed pages as content of existing pages directly. The library merges content of source PDF pages into the content of a target page, this way allowing a one time including of the graphics. This fits scenarios of placing content that will not be reused on other pages.
In addition to just embedding Pages, the library provides also the ability to copy other objects of interest, based on the users choice. This is important for extensibility. Note that the library recreates pages according to what features it is familiar with. For example, it can recreates pages context. However, annotations that are not supported for the time being will not be copied. you will have to use extensibility options. These options are explained here as well.
All types of PDFs are supported but Encrypted ones. Among the supported types you can find Regular non-updated PDFs, PDFs with incremental changes, Linearized PDFs, PDFs from 1.5+ that have these object streams. Encrypted are left to be implemented upon request.
appending pages from another PDF is rather simple. Here's an example:
PDFWriter pdfWriter;
pdfWriter.StartPDF(L"C:\\MyPDF.PDF",ePDFVersion13);
pdfWriter.AppendPDFPagesFromPDF(L"C:\\OtherPDF.pdf",PDFPageRange());
pdfWriter.EndPDF();
In this example, all pages of OtherPDF.pdf are appended to the result PDF (which is MyPDF.PDF). Notice line 3. It contains a call to AppendPDFPagesFromPDF
. First parameter is the name of the PDF to take the pages from. Second parameter is the choice of pages.
To select which pages to append use the PDFPageRange structure. The Structure has two members - mType
and mSpecificRanges
. mType
can be either eRangeTypeAll, for all pages, or eRangeTypeSpecific, which denotes that only selected pages should be used. To select the pages to append, use the 2nd member - mSpecificRanges
. This member is a list of pairs of unsigned longs (essentially list< pair<unsigned long,unsigned long> >
), where each member is an inclusive range. For Example, providing (1,3) and (5,9) in the list will append pages 1,2,3,5,6,7,8,9 (0 based!).
The default constructor of this structure, as used in this example, simply means that all pages should be embedded.
The complete signature of AppendPDFPagesFromPDF
is as follows:
EStatusCodeAndObjectIDTypeList AppendPDFPagesFromPDF(
const wstring& inPDFFilePath,
const PDFPageRange& inPageRange,
const ObjectIDTypeList& inCopyAdditionalObjects = ObjectIDTypeList())
We discussed the first and second parameters. The 3rd parameter is intended for extensibility, for copying non-page related objects from the source PDF. We'll discuss it later. It has a default value, so you can ignore it for now.
The return value is a pair of a status code and a list of Object IDs (in essence pair<EStatusCode, List<ObjectIDType> >
). Status code is whether appending succeeded or not. The list is of the created pages object IDs. This is useful when you wish to reference the pages from other objects. Yeah...Extensibility.
for a complete code example (more complete than this one, that is) you can check Append Pages Test
Sometimes you'll want to use an original PDF pages as graphic components of a new page. A good example is an imposition application that implements step and repeat - you can use the library to create an "imposed" PDF by creating Form XObjects from the original PDF, and then placing them as content in the new PDF page (or pages...cause they are reusable).
The following example shows how to do this:
PDFWriter pdfWriter;
pdfWriter.StartPDF(L"C:\\MyPDF.PDF",ePDFVersion13);
EStatusCodeAndObjectIDTypeList result = pdfWriter.CreateFormXObjectsFromPDF(
L"C:\\Other2PagePDF.PDF",
PDFPageRange(),
ePDFPageBoxMediaBox);
PDFPage* page = new PDFPage();
page->SetMediaBox(PDFRectangle(0,0,595,842));
PageContentContext* contentContext = pdfWriter.StartPageContentContext(page);
// place the first page in the top left corner of the document
contentContext->q();
contentContext->cm(0.5,0,0,0.5,0,421);
contentContext->Do(page->GetResourcesDictionary().AddFormXObjectMapping(result.second.front()));
contentContext->Q();
// place the second page in the bottom right corner of the document
contentContext->q();
contentContext->cm(0.5,0,0,0.5,297.5,0);
contentContext->Do(page->GetResourcesDictionary().AddFormXObjectMapping(result.second.back()));
contentContext->Q();
pdfWriter.EndPageContentContext(contentContext);
pdfWriter.WritePageAndRelease(page);
pdfWriter.EndPDF();
The important line is the 3rd one:
EStatusCodeAndObjectIDTypeList result = pdfWriter.CreateFormXObjectsFromPDF(
L"C:\\Other2PagePDF.PDF",
PDFPageRange(),
ePDFPageBoxMediaBox);
The call to CreateFormXObjectFromPDF
returns a pair of status code and object IDs list, similar to the pages append function. This time, the IDs are of forms. You can use these IDs later when you wish to place the form xobject, such as in this line:
contentContext->Do(page->GetResourcesDictionary().AddFormXObjectMapping(result.second.front()));
which places the first "page".
The function receives 3 parameters here: file name and page range as well as enumerator of type EPDFPageBox. This parameter determines which of the pages boxes to use as the form bounding box. In this example, the Media box is to be used (ePDFPageBoxMediaBox
).
The complete signature of CreateFormXObjectsFromPDF
is as follows:
EStatusCodeAndObjectIDTypeList CreateFormXObjectsFromPDF(
const wstring& inPDFFilePath,
const PDFPageRange& inPageRange,
EPDFPageBox inPageBoxToUseAsFormBox,
const double* inTransformationMatrix = NULL,
const ObjectIDTypeList& inCopyAdditionalObjects = ObjectIDTypeList());
The 4th parameter here is an optional transformation matrix to apply on the form. It's functionality is similar to the transformation matrix provided when creating Form XObjects with the library. The last parameter is, again, a list of object IDs to copy from the source PDF in addition to the pages themselves, meant for extensibility.
For those of you who wish not to rely on one of the bounding boxes of the page, but rather to provide your own crop box, there is another overload for the CreateFormXObjectsFromPDF
method:
EStatusCodeAndObjectIDTypeList CreateFormXObjectsFromPDF(
const wstring& inPDFFilePath,
const PDFPageRange& inPageRange,
const PDFRectangle& inCropBox,
const double* inTransformationMatrix = NULL,
const ObjectIDTypeList& inCopyAdditionalObjects = ObjectIDTypeList());
This overload is very similar, but in one parameter - inCropBox
- which is the rectangle describing the crop box for this page. It will be used as the form xobject box. Using this overload is fitting when the page is known to describe graphic in only a particular area, but the PDF does not contain this information as any of the bounding boxes common to a PDF page.
For a complete code example check PDF Embedding Test
Using a Form XObject as a container for a source PDF page, in order to place it later in one or more pages, is good especially when the content is to be reused. This is true due to the natural ability of forms to encapsulate code and be identified by their object code. Sometimes, however, you don't need to reuse the content, and then the creation of a form might be an unnecessary overhead. In Addition some information, such as reusability information, in the source page may be lost unless the source page content is not being placed directly in a target page - if such a mediator as a form is used.
For scenarios when it is more fitting to use the graphics just once, then you should use the MergePDFPagesToPage
function. This method accepts a page as input, and injects the code of a page into this target page. It does so at the point of calling the method, so that any graphic already placed in the target page will be maintained. This is as if the graphics was placed there by the user, in other methods.
This method fits unique placement of pages, as it does not allow reuse of the content. The following is a simple usage example:
PDFPage* page = new PDFPage();
page->SetMediaBox(PDFRectangle(0,0,595,842));
PDFPageRange singlePageRange;
singlePageRange.mType = PDFPageRange::eRangeTypeSpecific;
singlePageRange.mSpecificRanges.push_back(ULongAndULong(0,0));
pdfWriter.MergePDFPagesToPage(page,L"C:\\Other2PagePDF.PDF",singlePageRange);
pdfWriter.WritePageAndRelease(page);
Two interesting parts here. Note the usage of PDFPageRange
between the 3rd and 5th row. It is being set to point to the first (0 indexed) page of a page, Then later it is being used in the MergePDFPagesToPage
. The PDFPageRange
structure is used here to point to the first page, and so the MergePDFPagesToPage
will merge just the first page of the target document.
The 2nd thing of note is the call MergePDFPagesToPage
itself. The first parameter is the target page. The method will use its content stream and add the source page content to it. The 2nd parameter is the source PDF file, and the last parameter is the @PDFPageRange@ object defined earlier to instruct the method to import just the first page.
The complete signature of MergePDFPagesToPage
is as follows:
EStatusCode MergePDFPagesToPage(
PDFPage* inPage
const wstring& inPDFFilePath,
const PDFPageRange& inPageRange,
const ObjectIDTypeList& inCopyAdditionalObjects = ObjectIDTypeList());
Something to notice about using this function, is that it is very good for embedding a single page. For more than one page, unless something is done, they will be posited one on top of the other. You should either use it for a single page import, or try one of multiple possible strategies to import pages directly:
- Use
DocumentContext
events, throughIDocumentContextExtender
, to introduce positioning code between the page using theOnBeforeMergePageFromPage
andOnAfterMergePageFromPage
. this will solve most cases, but is a bit cumbersome. - Call to
MergePDFPagesToPage
multiple times, one for each page. This is the easiest method, though requires multiple calls...however it is very inefficient, as multiple calls will allow less sharing of elements of importing PDFs. You see, each separate call for embedding PDF content (unless the copying context is used) requires parsing of the PDF header and directory content. Also - multiple calls for embedding don't share objects, while multiple additions of content in the same call do. - The best method is to create a copying context (as is explained in the next section), and use its merging functionality. Using this method will allow you multiple calls, with elements sharing. In addition it will allow you to merge some pages as immediate, unique content, some as reusable content through form XObjects, and some as complete pages - with a single parsing move. amazing
For a complete code example go to - PDF Merging Test
In addition to using either of the three methods you can copy pages and objects from a PDF in an alternative method. You can create a "Copying Context" and then use it for copying one page as a time, as form xobject or a page. It also allows you to inject page content directly and copy miscellaneous PDF objects. A more complex method, the copying context path allows you more sophisticated copying, and to actually copy from multiple PDFs, in an interleaved fashion - by creating multiple contexts, and using them together.
To create a copying context, call the CreatePDFCopyingContext
method of PDFWriter:
copyingContext = pdfWriter.CreatePDFCopyingContext(L"C:\\PDFLibTests\\TestMaterials\\BasicTIFFImagesTest.PDF");
This will create a context for copying content from the PDF to the result PDF. you can now use the returned PDFDocumentCopyingContext functions:
EStatusCodeAndObjectIDType CreateFormXObjectFromPDFPage(
unsigned long inPageIndex,
EPDFPageBox inPageBoxToUseAsFormBox,
const double* inTransformationMatrix = NULL);
The CreateFormXObjectFromPDFPage
creates a Form XObject from a page in the PDF. the page is indicated by inPageIndex.
Note that using multiple calls here is similar to using the matching command from PDFWriter - however here you get to make different decisions on the other parameters for each page.
There is another overload, to let you determine a custom box for the form xobject. If this is desirable used this method instead:
EStatusCodeAndObjectIDType CreateFormXObjectFromPDFPage(
unsigned long inPageIndex,
const PDFRectangle& inCropBox,
const double* inTransformationMatrix = NULL);
This overload allows you to provide a custom crop box, instead of using one of the page boxes.
EStatusCodeAndObjectIDType AppendPDFPageFromPDF(unsigned long inPageIndex);
The 'AppendPDFPageFromPDF' appends a page (designated by the input index).
EStatusCode MergePDFPageToPage(PDFPage* inTargetPage,unsigned long inSourcePageIndex);
The 'MergePDFPageToPage' merges a source page content to a target page in the written PDF.
EStatusCodeAndObjectIDType CopyObject(ObjectIDType inSourceObjectID);
This method allows you to copy any object from the source PDF, by providing its object ID. This is good for extensibility options, for implementing currently unsupported features such as annotations.
Note that you can use these methods in any fashion you want - you can embed some pages of a PDF as pages, and some as XObjects, and even merge some. If that is what you are looking for then this method is preferred over multiple calls to the PDFWriter functions, because the copied objects will be shared...and so you'll get a more efficient PDF. Note that you can create multiple contexts for different PDFs at the same time, and embed pages from them in an interleaved manner.
In addition to just the embedding options you also get some nice "getters" for extensibility activities:
PDFParser* GetSourceDocumentParser();
This method returns the parser object for the input PDF. The parser object contains the interpreted xref and a list of page IDs (only!). You can use it to retrieve the PDF file objects. The parser is discussed in detail in PDF Parsing.
EStatusCodeAndObjectIDType GetCopiedObjectID(ObjectIDType inSourceObjectID);
If you want to know which object in the result PDF is the matching object of an original PDF object, use the GetCopiedObjectID
method. Provide the source PDF object ID, and it will return a pair of Status Code and an Object ID. the Status code is eSuccess
if the object was copied, and then the 2nd parameter becomes relevant, which will have the object ID.
MapIterator<ObjectIDTypeToObjectIDTypeMap> GetCopiedObjectsMappingIterator();
For iterating all objects that were copied, you can use GetCopiedObjectsMappingIterator
. This method returns a MapIterator iterator object that loops through the copied object IDs. The following example shows how to use it:
MapIterator<ObjectIDTypeToObjectIDTypeMap> it = context->GetCopiedObjectsMappingIterator();
while(it.MoveNext())
{
ObjectIDType sourceObjectID = it.GetKey();
ObjectIDType targetObjectID = it.GetValue();
}
The sourceObjectID in this example will have the original ID from the source PDF, and the targetObjectID will have the resulted copied object ID.
When done with the context, just delete it. (you can also call its End
method before that...but then the destructor does that as well...so no need).
For a code example see here - PDF Copying Context Test
The copying context gives quite a lot of control of the copying process to satisfy most extensibility requirements, when used together with the existing extensibility options of DocumentContext extenders (to say, add content to pages).
Still, there are some added events that you can use, added to IDocumentContextExtender). To read more about them check out The DocumentContext Object.
Also, note that each of the PDFWriter methods for embedding pages (either as pages or xobjects) let's you copy individual objects. It's a bit difficult to know which objects you need to copy in advance in most applications, which is why if you think you need such a capability - better use the copying context, which provides you the parser, and individual object copying.
- First Steps In Creating a PDF file
- Creating PDF Pages
- Images Support
- Text Support
- Adding Content to PDF Pages
- Links
- Unicode and UnicodeString class
- PDF Embedding
- Custom input and output
- Using Form XObjects
- Forward Referencing
- JPG Images Support
- TIFF Images Support
- PNG Images support