How can I upload books (to libgen, et al) without a trace back to me?

matcha_addict@lemy.lol · 4 months ago

How can I upload books (to libgen, et al) without a trace back to me?

Olivia@lemmy.today · 4 months ago

The bad news is that uploading e-books will involve programming on your part (for your sanity at least).

The good news is that it should be far easier than other mediums.

If you are approaching from a complete safety perspective (cause you live in a fiefdom that owes tribute to the publishers guild), then you’re going to want to OCR the pages of the book and use the text to make a brand new book free from metadata. I’m pretty sure a python crash course could get you up and running in a month or 6.

If you want what’s closest to the original product, then you’ll need a python script that strips everything from the book into just a text document, then re-convert back into your own book. You’ll have to review the text document to see if any random code was included in the book like invisible text.

Both options are so simple from a programming perspective that I’ve never seen scripts to strip e-book protections. A real (the solution is left un-worked as a challenge for the reader). And from what I know, the publishers have switched to focusing on selling hard copies as their bread and butter, and striking deals with libraries for other revenue. Big money is still in mandatory university textbooks.

Source: Never actually done what you’re asking for

matcha_addict@lemy.lol · 4 months ago

Thanks for your advice. I am a programmer by craft so I can definitely do that. I think the only issue may be books with any important content that is not text, i.e. graphics and images (and unfortunately, many of the books I am interested in have that). If I understood what you said correctly.

Lemongrab@lemmy.one · 4 months ago

Pymupdf has options to handle images. Good package.

FierySpectre@lemmy.world · 4 months ago

But then those images could contain the very fingerprints he’s trying to avoid

Lemongrab@lemmy.one · 4 months ago

Theoretically, yes. Handling of images programmatically could allow for some simple lossy compression which would help.

iopq@lemmy.world · 4 months ago

You can mess with the levels to see any hidden watermarks

FierySpectre@lemmy.world · 4 months ago

There are so many ways to encode information into an image without changing its look that I doubt you’ll find most of them by “changing levels”

conciselyverbose@sh.itjust.works · 4 months ago

I’d personally be a lot more likely to blur and add random noise, then use lossy compression if I wanted to mitigate steganography, but even then, they don’t need to encode a lot of information and they have a base image and secrets to compare to. It’s entirely possible for them to have chosen something reasonably robust through random edits like that.

iopq@lemmy.world · 4 months ago

But what transformations are they stable to?

Kindness@lemmy.ml · 4 months ago

gImageReader or ocrmypdf will get you the pdf text, but after the text will need fiddling with and cleaning. Use LibreOffice, languagetool, write-good, etc to make finding the oddballs easy.

pdftk is what you want for editing pdf metadata.

Gimp is what you’ll need for editing images, Looking for watermarks, smoothing edges, lowering quality, introducing random noise, etc.

exiftool is what you’ll need for image metadata. Or take a screenshot, add a bit of noise or de-noise, and add back to the new pdf.

Scrivener or LibreOffice if you want to polish/republish, though that’s a ton of work.

reddithalation@sopuli.xyz · 4 months ago

I converted a pdf book scan to epub with tessaract ocr and calibre, it didn’t need any programming, but the end result did have a typo every few paragraphs. Most were very similar to each other though, so a few hours cleaning it up would’ve made it pretty readable.

Redjard@lemmy.dbzer0.com · 4 months ago

Even with OCR, couldn’t your copy at least in theory be laced with strategically placed minor word changes? Say throughout the book you pick 30 spots to change a word without changing the meaning of the text, or you introduce a typo. If every copy gets a different set of those that would be a unique identifier.
I think I have heard that being done with imperceptable changes in films sent for showings in theaters.

Olivia@lemmy.today · 4 months ago

@matcha_addict@lemy.lol In this situation, I’d advise acquiring a copy from an alternative source, then just compare the texts of the two.

In practicality though, if you’re already going the OCR route then just utility knife cut the pages from a real book and feed them into a feeder scanner. All they get to know is that some asshole cyberpunk script kiddie jacked your book while you were waiting at a bus stop.