Tuesday, July 10, 2012

How to attack content scrapers

I have to start this post with a truism: if someone wants to steal your content on the Web there's nothing you can do to stop them. That's the unfortunate truth of the online world and it's not really a surprise to anyone. The real questions are: what can you do to make it clear they've stolen from you, and what can you do when you find out they've stolen from you.

Making clear they've done it:
What you're looking for is to prove that your text came first. You'd think that the timestamp on your post should help with that and it does, but unfortunately sometimes that's not enough. Try creating a favorite phrase that becomes a kind of mark of your writing. That way if someone steals your phrasing it's clear that they've stolen from you. You'll easily be able to point to your other posts or content and show that the stolen content comes from you.

(By the way: what I've just described is actually the underpinning of why trademarks exist: if there's a particular phrase that means "you wrote this" so much that reading it should be a signifier of that, then you should get to protect that association.)

What can you do about it:
Now that you've found out that someone is stealing your content, you'll probably want to make that stop. There's an obvious and a non-obvious way to address this.

The obvious way is by a DMCA takedown notice. This works pretty well for commercial sites based in the USA, but not in a lot of other cases. In a DMCA notice you demonstrate things like:

  • The location of your original post
  • The URL where the stolen content is posted
  • Your contact information
And similar items. (See below for a link to the YouTube DMCA rules which contain the mandatory elements for a notice.

But that may not work for sites hosted outside the USA. "DMCA" stands for "Digital Millennium Copyright Act" which is a US law, and although many countries have similar rules you can't always rely on sending a note to the site operator or ISP to help you get your content taken down.

So you have to be more crafty. And for that there are two major lines of attack:

1. The search engines. Although it's pretty deeply-buried, each major search engine has a way for you to report infringing content. I'll be talking about what is and isn't infringement in a future set of posts but for now take this to the bank: if someone takes 100% of your post without attribution, they're infringing.

2. The advertisers. But don't get your hopes too high: although advertisers don't want to be associated with infringers, they also don't usually know the sites on which they're being advertised. Better to figure out which advertising network they're using and address things there. You can see that when your page is loading: if you watch the bottom left corner of your browser you'll see references to various advertising networks. That can help tell you where to start your criticisms.

By the way: if you've been paying attention to recent developments on the Internet you'll notice that I've just described the two major aspects of a couple of proposed laws that got a lot of attention in the USA recently: SOPA and PIPA. That's because the things these laws required were already being done by search engines and advertisers without a law. So why did Congress think a law was required anyway? That's a subject for a future post. Or 20.

But unfortunately, you might run into a site that's based in a country that doesn't respect takedown laws with advertisers that don't care about their reputation. And that's an unfortunate fact of publishing on the Internet: If you're going to create content, sometimes people will take it and claim it's theirs. And if that happens, sometimes your only real remedy is to create more content.

One last thing: if you do catch someone stealing from you, probably a good idea not to link to the page. Why give them extra traffic or a better Google PageRank?

Two links that helped me to understand the breadth of this issue.
The page from YouTube that shows what you need to affirm for a DMCA notice:

No comments:

Post a Comment

Thanks for commenting. Posts and comments aren't legal advice; requests for legal advice in the comment probably won't get answered. Sorry to have to do this but someone someday is going to make me glad I did...