How Do I Copy Text from a Copy Protected Website?

The Best of Ask Leo!

How do I copy/paste from sites that don’t permit it? There is info on a site I’d like to save, but they don’t permit copying/pasting. Is there anyway around that?

As you might expect, the website in question is trying to protect its content from theft. They have valuable information, and I’m sure people try to steal and republish their content frequently. That is quite illegal, and a violation of international copyright law.

I’ll assume that’s NOT what you have in mind. (Though technically, even what you have in mind, while morally acceptable in my opinion, may still be in violation of that law.)

While I’ll answer your question, my real goal here is to point out to site owners just how futile website copy-protection schemes can be.

If it can be seen, it can be copied.

There are several techniques to copy text from websites trying to prevent it, including print to PDF, copying from that PDF, viewing the source of the webpage, disabling JavaScript, disabling CSS, or even taking photographs or screenshots and running those through OCR. Website and other digital-content owners need to realize that if it can be seen, it can be copied.

Above board techniques

By “above board”, I mean using normal website and browser behavior to gain access to text in ways the website owner perhaps hadn’t thought to prevent.

The most common: printing. Specifically, Print to PDF.

The result is a nice PDF of the page. Perhaps that’s enough for you to save. Certainly it has the highest “fidelity” in that it’ll include all the formatting and images exactly as the original webpage.

If saving to PDF doesn’t meet your need, it’s possible the PDF is copy enabled. In my test of the website in question, for example, I was able to print to PDF and then select the desired text from the PDF to copy elsewhere.

Another approach is to use File -> Save As…1 in the browser when viewing the page, and save it “as” plain text. The results will vary from browser to browser, but you’re likely to get a good starting point from which you can copy the desired text.

Yet another approach is to right-click on the webpage and use the “View Source” option available in most browsers. This allows you to view the underlying HTML for the page and copy the relevant content as needed. You’ll have to clean up the results, though, removing the HTML mark-up to make the results readable.

Other techniques

Here I mean taking steps to actively disable whatever copy protection has been placed on the webpage or image.

Two techniques come to mind.

  • Disable JavaScript. Many sites use JavaScript to implement copy protection. Disabling JavaScript disables the copy protection completely. (That happened to be the case with my example site. As a bonus, it also disabled a number of popup ads.) The easiest way is to use Firefox and the “NoScript” plugin, which allows you enable or disable JavaScript on a site-by-site basis.
  • Disable or circumvent CSS. CSS, short for Cascading Style Sheets, is a powerful approach to defining how webpages look, feel, and behave. It’s also easy to turn off: in Firefox, click on View (you my need to press and release the ALT key to expose the menu bar first), Page Style, and then click on No Style. The page will be re-rendered without CSS and the result, while visually unappealing, may well be copy-able.

Depending on the specific techniques used to disable copying, there may be other approaches.

Off-the-wall techniques

“Off the wall” as in things that sound really stupid or something you’d never think of, but are last-resort measures.

They’re also proof of my original statement: if it can be seen, it can be copied.

  • Take a picture. Get your digital camera and take a picture of the screen: instant copy.
  • Take a screen shot. Tools like SnagIt will not only automatically “page down” to get an image of the entire page (in perfect resolution, unlike your camera), but it also includes a “copy text” option that may well copy text for which the traditional clipboard copy has been disabled.
  • OCR. Short for “Optical Character Recognition”, OCR tools take an image of a webpage (ideally the screenshot, since it has the best quality, but possibly also the photo) and extract all the visible text as editable text.

There are probably more odd and unique ways I’m not thinking of.

If it can be seen, it can be copied

Like I said, this isn’t intended as a “how to” for people wanting to make illegal copies of webpages, or even for people who want to do more acceptable things, like share otherwise inaccessible content with others.

That it turns out to be one, however, underscores my real point: copy-protection schemes are pretty futile. If you present your information in a way that humans can read, listen, or watch, then there are ways for that content to be copied.

Placing roadblocks only punishes the innocent. It puts barriers in the way of those who would view or use your content in ways that are only beneficial to you, without really stopping those who are determined to steal it anyway.

If someone can see it, they can copy it, forward it, publish it, whatever. Not that they should, but they can.

That’s simply the nature of today’s technology.

Footnotes & References

1: If present. Edge doesn’t seem to have it. Also note that it may have moved in recent browsers to a sub-menu of the ellipsis (…) menu, and may be called something else similar, like “Save page as…”. Gotta love consistency.

Footnotes & References

For related links, videos, and comments visit How Do I Copy Text from a Copy Protected Website? on Ask Leo!