Can AP's Copyright Claims Hold (Melt)water?

By now you might have heard about the lawsuit that the Associated Press filed against on-demand software company Meltwater News. AP has been critical of aggregators in the past, and decided to file suit against this specific party, alleging copyright infringement, removal of copyright management information, and (drumroll, please…) hot news misappropriation. Yes indeed, the tort made famous by the 1918 Supreme Court case INS v. AP – and about which we here just love to write – refuses to die.

Meltwater News, according to the company's website, allows users to subscribe to a service whereby Meltwater will continually search its corpus of 160,000 cached websites for references to particular keywords (presumably, keywords central to the client's business) and provide excerpts from articles using those keywords, along with a link to the original articles. The service also allows the user to create his or her own newsletters by combining these links with text written by the user. I've been thinking of it as a hybrid of Google Alerts, Wordpress, and Constant Contact, merged into one interface.

I would love to address all of the issues raised by the complaint – the claims for removal of copyright management information under 17 U.S.C. § 1202(b) and "hot news" misappropriation under New York common law are fascinating and deserve blog posts of their own – but for the sake of keeping this post under 10,000 words I will focus on the claims of copyright infringement.

As a quick refresher: Copyright gives authors of creative works the exclusive right to, among other things, reproduce, distribute, and create derivative works of their writings, subject to a variety of carve-outs and limitations. The most important limitation for this case is the doctrine of fair use. Fair use is an equitable defense which allows a judge to decide based on the facts before the court that the interests of the public would be better served if a copyright owner is not allowed to assert exclusive control of a work. Following the adoption of the fair use doctrine in 17 U.S.C. § 107, courts tend to examine a list of four factors, but the two most important for our purposes are the first and last, which are:

  • the purpose of the use, including whether the use is commercial or noncommercial and how "transformative" the use is compared to the original; and
  • the effect of the use on the market for the original work, including any reasonable secondary markets.

Fair use defies easy answers. It is always a highly fact-sensitive inquiry and judges are given wide discretion to give weight to the factors that they think are most important in a given instance. That said, a recent study by Neil Netanel suggests that courts currently are particularly fond of the first factor and its "transformativeness" inquiry.

AP, in its complaint, alleges that its copyrights have been violated in three principal ways: direct copying and excerpting by Meltwater as part of its media monitoring service; Meltwater's "providing the means" for users to copy and to distribute whole articles via its website; and Meltwater's translation service. Below, I discuss each of these alleged forms of infringement, with a particular eye to fair use.

1. Direct Copying

I don't think Meltwater would dispute that they make a copy of all of these articles, although perhaps not a copy which can be easily read by people as opposed to machines. Based on my understanding, they would have to in order to create a searchable database of content, and in any event "copies" are defined in the Copyright Act to include those viewable "with the aid of a machine or device." I suspect that the fight, therefore, will be whether Meltwater's actions should be considered a fair use or not. Our friends over at the Nieman Lab have already highlighted a noteworthy point about the fair use analysis here: there are two blazed trails concerning this sort of activity in fair use caselaw, which point in opposite directions.

A handful of cases – including Pacific & Southern Co. v. Duncan TV and Los Angeles News Service v. Tullo – have examined the fair use defense as applied to commercial news clipping services, and found that such services generally are not engaged in fair use when taking excerpts from television, radio, and print journalism.

On the other side are the search engine cases – cases like Field v. Google and Perfect 10 v. Amazon – which hold that even though search engines by their nature make whole copies of originals and are usually operated for commercial gain, the overwhelming public interest in having search engines exist and the lack of a demonstrable market harm for the original website points toward a finding of fair use.

Based upon the AP's complaint and the parties' public statements, the parties appear to be arguing along these pre-established lines. But a fair use examination shouldn't rely solely on words like "clipping service" and "search engine" to shortcut a detailed analysis. The challenge in doing a full fair use analysis in this post, however, is that it's hard for an outsider to understand exactly what Meltwater does without paying $5,000 for an account. Most of what we know about how Meltwater operates is coming from the complaint and dueling press releases related to the lawsuit. For purposes of this analysis, I will assume that the factual allegations made by AP are true: that Meltwater uses web crawlers to scrape content from a targeted group of websites, builds a cache from those scrapes, uses that cache to perform search functions based on user requests, and serves up excerpts of those with links to the original. (Again, this assumption is for discussion purposes only - the real facts will need to await development in the course of the lawsuit.)

The scraping and caching for purposes of creating a searchable database would, under the logic of Field and Perfect 10, be transformative and thus likely to be found fair. The tricky question here is the excerpts. Are they transformative or not? To put it another way, are the excerpts there to provide context for an information location tool (i.e., to make search results and links meaningful to the user), or are they superseding the need to read the original news article because the excerpts are long enough to serve a news information need directly? The way a judge decides on this point could very well dictate the outcome of AP's direct infringement claim.

That said, transformativeness is only part of the inquiry. A court must also examine the other factors, including the effect of the use on the market for the original. Is it reasonable or likely that the holder of a copyright in a news article can develop a licensing market for search-and-excerpt uses? If not, this factor may weigh in favor of Meltwater. Perhaps more importantly, should the copyright holder's rights extend to aggregation of content, or should free market values instead require competition in the marketplace for aggregate uses? There are no easy answers here, and if the case proceeds to its merits we can expect to see a lot of opinions on this particular point.

2. Copying by Meltwater's Users

The allegations discussed above concern reproduction and distribution by Meltwater itself, but AP goes beyond that in the complaint, alleging contributory and vicarious infringement as well. Contributory infringement generally applies when a defendant knows of the infringement occurring by a third party and induces, causes, or materially contributes to the third party's infringement. Vicarious liability applies when the company has the right and ability to control a third party's infringement, and receives a financial benefit directly attributable to the third party's infringement of the plaintiff's copyrights.

Here, most of the alleged contributory and vicarious infringement involves the "article editor" feature Meltwater includes in its service. AP describes the service as follows:

In order to use the "Article Editor" function, a user clicks on a link to "add new article" in the archive folder. When the user clicks that link, Meltwater generates an activity box with several empty cells. A user can choose to copy and paste any portion of the article text into any of the various cells. If the user pastes the full text of any article into the cell labeled "opening text," the user will be able view the full text of the article on its Meltwater archive page by logging into the site any time, without clicking through to the website that legitimately published the article.

So, Meltwater allegedly provides an article creator/editor function with which a user "can choose" to copy AP articles into a field and share them on the Meltwater system. If users did so, that might in some cases be infringement, and AP thus alleges that Meltwater would be contributorily and/or vicariously liable for such infringement occurring through the article editor.

There are a few problems with assigning liability based on those facts. First, there is a very important carve-out from contributory infringement discussed by the U.S. Supreme Court in Sony Corp. v. Universal City Studios, which protects manufacturers whose devices are "capable of substantial non-infringing uses." Perhaps a user can choose to use the article editor page to copy AP articles from the linked websites and save the whole text, but they also can use the site for a wide array of other uses. I could easily see a Meltwater user posting its own company press releases in this space, or using the article editor to provide an informal update to the user's subscribers. If all Meltwater is providing is a blank field and an invitation to write, then, under Sony, it shouldn't be held responsible if the person chooses to use that space to commit infringement.

Experts are split as to whether the Sony rule about "substantial non-infringing uses" applies to vicarious liability as well, though Sony refers to both forms of liability in its decision. Even if the Sony protection does not apply, however, it would be hard to make a vicarious liability claim under the facts as they are currently pleaded. Vicarious liability is generally understood to require a direct relationship between the infringing activity and the added financial gain of the defendant. Even the Ninth Circuit's broader "commercial draw" approach to vicarious liability has been limited by later rulings to require a causal link between membership in a service and copyright infringement. Nothing I've seen in the facts as disclosed in the complaint shows how Meltwater's financial gain is tied directly to infringing uses, or that users join the service because they can go there to commit infringement.

There could also be a role for the DMCA safe harbor for user-generated content, but it it is not clear whether Meltwater has that defense available. 17 U.S.C. § 512(c) protects online service providers from inducement liability based on content generated by users, provided that the service provider (1) designate a copyright agent with the U.S. Copyright Office, (2) adopt and disclose a policy for handling infringement complaints, and (3) respond to requests to remove content by the copyright owner. All three are required to take advantage of the safe harbor, but I have not been able to find a record of a DMCA agent in the Copyright Office's records registered from Meltwater News, Meltwater U.S. Holdings, or any other Meltwater-based name. It is possible that there is another explanation for why their agent is not in the public list of agents on the Copyright Office website, of course, and perhaps Meltwater will be able to advance this defense. (But either way, there is an important lesson here - register your DMCA agent!)

Interestingly, the complaint doesn't seem to directly mention the recently developed secondary liability claim of "inducing" copyright infringement, which came out of the Supreme Court's decision in MGM v. Grokster. I wouldn't be surprised if we were to see this theory come in later, depending on what discovery reveals. As Viacom v. YouTube teaches, the facts unearthed during discovery can often be quite enlightening.

3. Translation

The last bit of copyright infringement mentioned in the complaint feels a bit like an afterthought. After the AP spends 25 paragraphs describing various alleged activities of Meltwater that give rise to AP's claims of infringement detailed above, it includes this brief allegation:

Meltwater also provides its customers with the ability to create derivative works by translating any article retrieved through the agent or search functions into another language. On information and belief, Meltwater offers a built-in translation function that covers 22 languages. On information and belief, a user may click on a link to "translate" any search result. When the user clicks the "translate" link, a second window appears with the URL of the site to be translated. When the user chooses the desired language and clicks the "translate" button, the full translated text of the article appears on Meltwater's internal website, under the subdomain URL ""

This is an interesting twist. Translations are expressly defined as derivative works under the Copyright Act, and, as the Second Circuit has already noted in Nihon Keizai Shimbun v. Comline Businesses Data, translating a company's news articles for informational purposes is not a fair use. If Meltwater is doing this translating directly, and offering this as a service without permission of the underlying rightsholder, the AP's claim of infringement might be quite strong and straightforward.

On the other hand, if Meltwater is merely linking to an automated translation engine and leaving it to the user to make the derivative work, it could be protected under Sony. The question again would be whether the service as provided is "capable of substantial non-infringing uses." If all Meltwater is doing is providing a vacant field and an option to translate, like Yahoo's Babel Fish or Google's Translate, this would probably be an easy defense. Forcing wholesale translation of specific articles, however, challenges Meltwater to argue that translation of whole articles by its users would be a fair use in a substantial number of cases. It's certainly a colorable argument – a private translation done so that a user can read an article in a foreign language is likely noncommercial and improves the market for the original if the user is visiting the copyright holder's website first – but Meltwater would have to color it.

Andy Sellars is a staff attorney at the Citizen Media Law Project and a fellow at the Berkman Center for Internet & Society. As a law student, he worked on an amicus brief for the CMLP in the last big news aggregator case in the Second Circuit: Barclays v.

(Image of courtesy of Flickr user davesag under CC BY-NC-SA 2.0 license.)

Last updated on February 28th, 2012

Dan Purvis from Meltwater repsonds to our article

On Monday, the CMLP received the following email from Dan Purvis, the Director of PR and Communications for the Meltwater Group. It reads as follows:

Hi Andrew

I just wanted to reach out to you about your recent AP / Meltwater blog
post as I really appreciate the effort you made to provide an objective and
balanced piece. However, I was wondering you could possibly update it as
there are some inaccuracies within it borne out of the AP lawsuit's media

I've outlined the main points below, but would really appreciate the
opportunity to either discuss with you or to have the post updated to
provide more context and our point of view, if that's ok?

Namely, AP has shown that it does not understand our Meltwater News
service. We do not provide copies of AP articles; we instead we provide
links to these articles. We only link to publicly available content and
there is no content stored by us – instead, we provide an archive of
search results with links to where the article was originally published

What is also disappointing is that we found out about the lawsuit via the
media. We would like to open a dialogue with AP, but they never made
contact with us or asked us to take a license from them.

Furthermore, AP used DHS as an example of a client that has left AP for
Meltwater. DHS is not, and have never been a client of Meltwater News.

It's important to understand that we are not in the business of sharing
content, but rather in the business of sharing the knowledge of the
existence of online content. Therefore helping to drive traffic to these
publisher's sites. Our media statement in response to the AP suit also
covers this here:

I hope this helps clarify – happy to discuss any part of it with you.


Jeff Hermes, CMLP's esteemed director, and I asked Dan to clarify whether Meltwater made a cache copy in the context of its search function. Here's what he sent in reply:

The archive and the searches are two different things. In general terms our system works like this:

Our crawlers read the websites in the same way as ordinary browsing of the web – the browser requests a URL from the publisher, and the publisher then chooses to send the article to the browser. When our crawler/browser receives the text from the publisher it is not shown on a screen like in normal web browsing; our system simply analyses the text and indexes it.

The index is a list of all words in the English language, with records of where (in which article) these words appeared. It is comparable to a giant table. In the column you have all possible words, and in the rows you have information of all the articles in which this word has appeared (metadata like the URL, headline, name of the publication, name of author, date of publication etc.) This is the index that our clients use for search, and this is the same way most common search engines work. We are NOT dependent on a copy of the article to provide our clients with the search result.

The equivalent of an indexing process in "real life" would be this: On the table you have the Webster's Encyclopedic Unabridged Dictionary of the English Language and today's edition of the New York Times. You find an article in the newspaper, find the first word in the article and look it up in the dictionary. Then you write NYT, Page 4, "Harvard Voted League Favorite" etc. Besides the word in the dictionary. Then you repeat this process for every word in the newspaper article. Would the dictionary, including the notes besides the words, now constitute an infringing copy of the newspaper article? Of course not.

Most common search engines, like Google, do keep a copy of the article even if it is not needed to produce the search result. They do this because they want to be able to provide their users with a cached copy of the article if the URL is "dead" (I.e. the link no longer leads to the original article). You see the option "cached" under most search results in Google. Meltwater does NOT provide our clients with such cached copy. If the publisher deletes the content, or changes the URL, our clients will simply not get access to the article.

The archive is where our clients can store their search results. They cannot store the articles, because we have not provided them with this content. They store the URL and the metadata as described above – very similar to the search result provided in a normal Google search. What AP says in their lawsuit is that we have a comment field, a text field, in our archive where our clients can put in their own notes about the link they save. AP makes a point out of that that a client can use a browser to open an article, copy all the text in that article, then go to our system and paste in the full article text in the comment text field. This is technically possible, but it would be the act of the client in the same way as anyone can copy and paste an article in a hotmail message. That email would then be stored on the hotmail servers, but one can hardly claim that this makes Microsoft copyright offenders? Our T&C makes it clear that any use of copyright material is not covered by the Meltwater license. Such use of the comment field is of course not what our system is designed for, and any abuse of this feature cannot be the responsibility of Meltwater.

Meltwater is not in the business of providing content to our clients, we are in the business of providing accurate information about the existence of such content, and where they need to go to read it. We support the business model chosen by the publishers. If they want to monetise their content using advertising, we drive traffic to their site thus increasing their revenue. If they have chosen to put their content behind a paywall, we drive traffic to the paywall where our clients will have to sign up or pay like any other user. (We only crawl and index content behind paywalls if we have an agreement in place with the publisher, like we do with Financial Times, for example.)

AP makes a point that other players, like Google News, have taken a license with AP. But the fact is that Google News operates with AP content in a very different way than we do. Where we provide a link to AP content (only where it has been published for free for anyone to read), Google News provides the full text of the article to its users. If they provide the full article to their users they would of course need a license. We do not do that.

Let me start by thanking Dan very much for his detailed contributions. I'd like to openly invite both Meltwater and AP to respond further if they feel so inclined. 

Turning to the substance of this analysis, I still tend to believe that process of creating this super-index that Dan describes creates a "copy" under the Copyright Act (though one that should be found to be a fair use). The question may come down to how detailed the index is about "where . . . these words appeared." If the index is so detailed about word placement that a machine could then reconstruct, word by word, the entire article, it would be a copy by the statutory definition as it would be capable of being "perceived, reproduced, or otherwise communicated . . . with the aid of a machine or device." If it is as disassociated as the page index in the back of a book, on the other hand, that index alone would probably not be a copy.

The one clue we have as to the precision of this index is the fact that Meltwater provides contextual excerpts accompanying the links to the articles. These could be scraped anew from the original website when a user performs the search, or they could be constructed from the index. My sense from the technology is that it would be too slow to build the results as a fresh scrape each time, but Dan, correct me if I'm wrong.

I should also note, as William Patry does in his treatise on copyright, there was a time where courts added some breathing room between the statutory definition of "copy" and a "copy" over which an infringement action can be based, but that time seems to have ended with MAI Systems Corporation v. Peak Computer and its progeny. I don't particularly like that turn in the law, but that appears to be the way courts are going with this analysis.

