Replies: 2 comments 4 replies
-
"What is a curve?" is a great question, and one that should probably be better-documented in the README — because it's a bit of a misnomer. This library's layout-object-names (
In sum: Any multi-segment, continuous path that does not form a rectangle is labeled a I haven't worked much with proper control-pointed Bezier curves, so I unfortunately cannot answer that part of your question authoritatively. But it seems the answer is that typically the points available to pdfplumber (via However, it seems possible that when a PDF defines control points for a path (which does not seem to be the default when drawing simple paths, per the spec), If you dig more into this, I'd be very interested to hear what you learn. |
Beta Was this translation helpful? Give feedback.
-
Thanks for the update.
If it were me... (I know it is up to the pdfminer.six team in the end)
I would create two separate classes. One for a curve made of points along
a curve (piecewise linear) -- an arbitrary combination of mllllllll...
ending in l or h. The second for a piecewise cubic Bezier -- an arbitrary
combination of mcccccc.
The v and y commands can be converted to c commands by explicitly repeating
the control points. So, the LTCubic would be able to be treated as if it
was a mcccccc series.
The hard part here is detecting a situation where a path is made of a
mixture of l and c commands. You should be able to detect that by
inserting fake m points and split whenever a changeover happens. I suppose
some renderers might make different stroking decisions at the blend from
one path to the next, but it seems the simple way.
The alternative seems to be to pass the l or c commands through to the
LTCurve class. It still may be worthwhile simplifying out the v,y
commands, but it goes to the next step.
I now see what is happening for cases with c,v,y (nothing, they fall
through and are treated the same as l). I was so fixated on looking for
explicit parsing of c that I didn't consider this case...
Thanks,
Rob
…On Sun, Feb 7, 2021 at 11:42 AM Jeremy Singer-Vine ***@***.***> wrote:
Update: This discussion prompted me to take a closer look at the PDF spec
re. Bezier curves and pdfminer.six's path-parsing logic. The latter was,
indeed, mixing both control points and points-on-curve, so I've updated my
.paint_path PR there <pdfminer/pdfminer.six#530>
to address that. The proposed changes would remove control points from the
.pts attribute.
I also suggested (in general terms, not code) making the more detailed
path data through an LTCurve attribute, so that people can fully recreate
Bezier curves if they want to.
—
You are receiving this because you authored the thread.
Reply to this email directly, view it on GitHub
<#345 (reply in thread)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AAKETQHYCH5GTPK3MRAYMADS53UKBANCNFSM4XEGXIRQ>
.
|
Beta Was this translation helpful? Give feedback.
-
In the past, I have accomplished pdfplumber-like work flows by converting my PDF pages to EPS and then hand editing the postscript text file.
From this, I have familiarity with the curve entity being a cubic Bezier curve specified by sets of control points. A brief look at the PDF standard leads me to believe that curve path entities are cubic Beziers.
When I use pdfplumber to access a curve, are those the control points of a cubic Bezier? Or does pdfplumber render the curve and then spit out a selected number of points along the each Bezier?
If the curves are cubic Bezier (4 control points per segment), are the first/last control points of each segment repeated? i.e a series of segments would have 4,3,3,3,... control points. Or is each segment stand-alone, 4,4,4,4,4,... control points.
Beta Was this translation helpful? Give feedback.
All reactions