Digital watermarks in scientific paper PDFs
Some online publishers have started to watermark their PDF articles. For example RSC is putting a watermark into downloaded PDFs, hardcoding at least the date and the location where the PDF was downloaded. The text is visible on the left corner of the PDF. So publishers want to identify, by whom certain scientific papers were downloaded and read. Additionally a white box is put in the middle of the document, which doesn't interfer with reading, but disturbs copy-paste of text from the article: You can't specifically select the text you want, and when you hit Ctrl-C, the watermark text will also be copied.
Since I don't think, that publishers have a right to do (know) that, but furthermore consider it rather infantile, trying to inable scientists and other people to copy-paste text from articles, I would like to provide a recipe here, how to remove them.
Method 1: OpenOffice
- open a PDF e.g. from RSC.org in OpenOffice 3
- the central white box is not there (not imported?)
- the left watermark can be selected and removed by hitting DEL
- export as PDF
- disadvantage: Fonts, which are unavailable to OpenOffice, look different after export. But it's still readable and now you can do copy-paste without problems.
Method 2: Scripting
- use a script to extract the selectively cut out the watermark objects from the PDF document stream