Skip to content

Remove a background text which is overlapped with other texts. #2823

Discussion options

You must be logged in to vote

If you think that the same watermarking approach is being always used in the 100 PDFs, you can avoid the complicated analysis above and simply hunt and destroy an Form XObject that writes "Confidential":

for xref in range(1, doc.xref_length()):  # loop over all objects in PDF
    if doc.xref_get_key(xref, "Subtype")[1] != "/Form":  # only look at Form XObjects
        continue
    stream = doc.xref_stream(xref)  # read stream of object
    # check if it writes text (BT / ET are present)
    if b"Confidential" in stream and b"BT" in stream and b"ET" in stream:
        doc.update_stream(xref, b" ")

        
doc.ez_save("cleand2.pdf")

This also does the job.
I am trying to be cautious not t…

Replies: 4 comments 3 replies

Comment options

You must be logged in to vote
0 replies
Comment options

You must be logged in to vote
1 reply
@JorjMcKie
Comment options

Comment options

You must be logged in to vote
2 replies
@Soumadip-Saha
Comment options

@JorjMcKie
Comment options

Answer selected by Soumadip-Saha
Comment options

You must be logged in to vote
0 replies
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
3 participants
Converted from issue

This discussion was converted from issue #2821 on November 20, 2023 10:57.