Technology & Marketing Law Blog

Web Scraping for Me, But Not for Thee (Guest Blog Post)

by guest blogger Kieran McCarthy

There are few, if any, legal domains where hypocrisy is as baked into the ecosystem as it is with web scraping.

Some of the biggest companies on earth—including Meta and Microsoft—take aggressive, litigious approaches to prohibiting web scraping on their own properties, while taking liberal approaches to scraping data on other companies’ properties.

When we talk about web scraping, what we’re really talking about is data access. All the world’s knowledge is available for the taking on the Internet, and web scraping is how companies acquire it at scale. But the question of who can access and use that data, and for what purposes, is a tricky legal question, which gets trickier the deeper you dig.

Some forms of data are protected by copyright, trademark, or another cognizable forms of intellectual property. But most of the data on the Internet isn’t easily protectible as intellectual property by those who might have an incentive to protect it.

For example, the most aggressive companies in pursuing web-scraping litigation are the social media companies. LinkedIn and Facebook, most notably, have done as much as anyone to shape the law of web scraping. But the content that they’re trying to protect isn’t theirs—it belongs to their users. It’s user-generated content. And while their terms of use provide the social media companies a license to use that user-generated content, it is their users who typically have a copyright interest in their content. The social media companies have no cognizable property right to assert in this content/data.

But make no mistake, these companies view this data, generated by their users on their platforms, as their property. This is true even though the law does not recognize that they have a property interest in it, and even though they expressly disclaim any property rights in that data in their terms of use.

Since the law does not give them a cognizable property interest in this data, they must resort to other legal theories to prevent others from taking it and using it.

In the early days of the Internet, the primary legal theory that companies used to stop scrapers was something called trespass to chattels. This is why Eric—who has been covering this issue for a good while now—tags all scraping posts as “Trespass to Chattels.”

The idea behind this legal theory is that web scraping—often high-volume, unwanted data requests—are a form of trespass on private tangible property—computer servers. But the thing about trespass to chattels is that it requires both a trespass to private tangible property and an element of damages. In the early days of the Internet, when Internet connections sounded like this, it didn’t take a lot of extra traffic to damage someone’s server or the ability to provide a functioning website. Many web scrapers were clumsy and didn’t realize the impact of their additional requests on servers. In the late 1990s and early 2000s, web scraping often did burden or shut down websites.

But as technology improved, this legal theory stopped making as much sense. Server capacity improved by many orders of magnitude, and most scrapers became savvy enough to limit their requests in a way that became imperceptible or at least inconsequential to the host servers. Now, one of elements of the trespass to chattels legal claim—damage to the servers or other tangible property of the host, very rarely happens.

Next, from the early 2000s until 2017, the primary legal theory that was used to deter web scraping was the Computer Fraud and Abuse Act or the CFAA. The CFAA prohibits accessing a “protected computer” without authorization. In the context of web scraping, the question is whether, once a web scraper gets its authorization revoked (usually via cease-and-desist letter, but often in the form of various anti-bot protections), any further scraping and use of a website’s data is “without authorization” within the meaning of the CFAA.

From 2001 to 2017, the simplistic answer was yes, any form of revocation of authorization was typically sufficient to trigger CFAA liability, if the scraper continued to access the site without permission. And then, in 2017, the famous hiQ Labs, Inc. v. LinkedIn Corp. case came out, which affirmed a plaintiff web scraper’s right to access public LinkedIn data under the CFAA.  The Ninth Circuit affirmed, holding:

We agree with the district court that giving companies like LinkedIn free rein to decide, on any basis, who can collect and use data—data that the companies do not own, that they otherwise make publicly available to viewers, and that the companies themselves collect and use—risks the possible creation of information monopolies that would disserve the public interest.

Many interpreted this as allowing an affirmative right to scrape public data, even if that was not the correct reading of the law and the reality was always more nuanced.

In the end, it was a pyrrhic victory. hiQ Labs lost that case, and at summary judgment the district court held that “LinkedIn’s User Agreement unambiguously prohibits scraping and the unauthorized use of scraped data.” LinkedIn obtained a permanent injunction and damages against hiQ Labs on that basis.

Now, the primary vehicle to stop web scraping is with breach of contract claims.

For example, in just the last few weeks, Twitter/X Corp. has filed multiple lawsuits against web scrapers, including against Bright Data, which is perhaps the biggest web-scraping company in the world.

Ten years ago, in web-scraping cases, you’d typically see plaintiffs in scraping cases file 10-15 legal claims, with law firms exploring any legal theory that might stick. Now, in its case against Bright Data, Twitter’s lawyers filed three claims: breach of contract, tortious interference with a contract, and unjust enrichment. Lawyers are increasingly confident that courts will enforce the breach of contract claim against scrapers and obtain the relief thy want. They don’t need or seek alternative legal theories.

And it is this legal reality—web scraping legal enforcement through breach of contract—that allows companies to assert property rights regarding how people use and access data—through the domain of contract law.

Mark Lemley observed this happening nearly 20 years ago, in his prescient, seminal article, “Terms of Use.”

The problem is that the shift from property law to contract law takes the job of defining the Web site owner’s rights out of the hands of the law and into the hands of the site owner. Property law may or may not prohibit a particular “intrusion” on a Web site, but it is the law that determines the answer to that question. The reason my “no-trespassing” sign is effective in the real world is not because there is any sort of agreement to abide by it, but because the law already protects my land against intrusion by another. If the sign read “no walking on the road outside my property,” no one would think of it as an enforceable agreement. If we make the conceptual leap to assuming that refusing to act in the way the site owner wants is also a breach of contract, it becomes the site owner rather than the law that determines what actions are forbidden. The law then enforces that private decision. [citations omitted]

Mark Lemley, 2006 Minnesota Law Review, Terms of Use at 471.

With the breach-of-contract-as-property legal regime, host websites are free to define their rights in online data however they want, in the form of online terms of use agreements.

Rather than creating a new intellectual property regime with general rules for data use—or even simpler—deciding cases using existing intellectual property rules, courts have allowed host websites to create their own intellectual property rights in website data, through the mere act of declaring such data to be property through an online contract. Companies have almost complete liberty to declare data that is not entitled to intellectual property protection to be “proprietary,” and courts allow them to enforce this ad hoc intellectual property regime through breach of contract claims (as long as they aren’t so foolish as to do it in a way that is co-terminus with copyright protections).

And this is where the hypocrisy comes in: the breach-of-contract-as-property legal regime has no legal requirement for intellectual honesty or consistency. It has no requirement to respect others’ IP akin to trademarks or patents in the same way that you do your own. Companies are free to press their advantage on what is deemed “proprietary” on their sites while simultaneously asserting what is free for the taking on others. It is easy to criticize this, but this is what smart lawyers and legal teams do.

Let’s look at what Microsoft is doing right now, as an example.

In the last couple of weeks, Microsoft updated its general terms of use to prohibit scraping, harvesting, or similar extraction methods of its AI services.

Also in the couple of weeks, Microsoft affiliate OpenAI released a product called GPTbot, which is designed to scrape the entire internet.

And while they don’t admit this publicly, OpenAI has almost certainly already scraped the entire non-authwalled-Internet and used it is training data for GPT-3, ChatGPT, and GPT-4.

Nonetheless, without any obvious hints of irony, OpenAI’s own terms of use prohibits scraping.

Last year, Microsoft subsidiary LinkedIn loudly and proudly declared victory in the most high-profile web-scraping litigation in US history, imposing a permanent injunction on a former business rival to prevent it from scraping and accessing its private and public data forever. VP of Legal Sarah Wright declared, “The Court’s ruling helps us better protect everyone in our professional community from unauthorized use of profile data, and it establishes important precedent to stop this kind of abuse in the future.”

I’m picking on Microsoft, as it is the most flagrant offender here. But I could pick on hundreds of others who are also hypocritical on this issue. Notably, Meta is also famously suing a company right now for scraping and selling its public content, even though Meta once paid the same scraper to scrape public data for them.

As I said at the start of this post, hypocrisy is endemic to this legal regime.

I, for one, don’t blame Microsoft or Meta or any of the other companies that take hypocritical stances on scraping. That’s what smart legal teams do when courts allow them to do it.

I blame the courts.

I blame the court in Register.com v. Verio, Inc. that paved the way for contracts of adhesion in the absence of assent. I blame the Northern District of Texas for enabling Southwest Airlines to sue anyone that publishes public information about their flights. I blame the court in the hiQ Labs case that made no attempt to explain the disconnect or inconsistency on why hiQ Labs was entitled to a preliminary injunction on its CFAA claim, but LinkedIn was entitled to a permanent injunction on its breach of contract claim on the exact same facts a few years later.

Courts need to realize that if you allow private companies to invent intellectual property rights through online contracts of adhesion, courts will be at the mercy of private decision-makers on questions that should be questions of public interest.

But given the fact that contracts, even online contracts, are a state-law issue, it’s hard to imagine a simple resolution to this problem. One possible solution might be a more all-encompassing interpretation of the copyright preemption doctrine, but the current law of copyright preemption is a muddled mess of a circuit split and the Supreme Court just declined an opportunity to resolve it.

But regardless of what you and I think about this legal regime, that is the current state of the law.

The next testing ground for it will be with these generative AI cases.

I’ve long said we have not yet reached a stable equilibrium on these issues, because this kind of inconsistency in the law cannot be sustained. That means we are likely to see plenty of fireworks on these issues in the next few years.

The post Web Scraping for Me, But Not for Thee (Guest Blog Post) appeared first on Technology & Marketing Law Blog.

My Comments to the USPTO About the SAD Scheme and Anticounterfeiting/Antipiracy Efforts

[I submitted the following comments to the USPTO]

__

To: United States Patent and Trademark Office, Department of Commerce
From:  Prof. Eric Goldman, Associate Dean for Research, Santa Clara University School of Law
Date:  August 22, 2023
Re: Comments regarding Future Strategies in Anticounterfeiting and Antipiracy, Docket No. PTO-C-2023-0006

I appreciate the opportunity to submit these comments regarding the USPTO’s inquiry into anticounterfeiting and antipiracy efforts.

As the USPTO knows, every IP policy creates Type I and Type II errors, i.e., some infringement will be underenforced, and some enforcements will target non-infringing behavior or even be abusive. Knowing this inevitability, every IP policy must anticipate the possibility of rightsowner over/mis-enforcement and incorporate appropriate substantive and procedural protections for victims of such overenforcements. This is especially important with respect to “anticounterfeiting” and “antipiracy” initiatives given how often IP owners mischaracterize legally permissible activity as “counterfeiting” or “piracy.”

It is impossible to discuss the current state or future strategies of anticounterfeiting and antipiracy enforcement without addressing the phenomenon of rightsowners enumerating IP defendants on sealed Schedule As, a phenomenon I call the “SAD Scheme.” I have attached a draft of an in-press article explaining the SAD Scheme, how it is being widely used and abused, and how it achieves illegitimate and unjust outcomes. Although hundreds of thousands of defendants have been sued pursuant to the SAD Scheme, the scheme frequently bypasses standard transparency rules applicable to judicial proceedings, and as a result it has received insufficient public scrutiny. Even many IP experts aren’t aware of it.

The SAD Scheme is a prime example of how rightsowners are currently misusing existing anticounterfeiting and antipiracy rules. Until the SAD Scheme is appropriately restricted, we will increasingly see rightsowners prefer it over more traditional IP enforcement techniques, such as sending takedown notices to Internet services, using rightsowner-friendly tools provided by the Internet services (such as VeRO or Content ID), and litigating against individual infringers. This means the SAD Scheme threatens to replace most current and future anticounterfeiting/antipiracy tactics. Accordingly, any discussion about anticounterfeiting and antipiracy efforts must account for the SAD Scheme and its capacity for abuse.

As part of the USPTO’s inquiry, I encourage the USPTO to take stock of the SAD Scheme’s prevalence and impact. In particular, the USPTO can help generate more public data about it and improve public visibility into the scheme. The Public Roundtable should also explore what steps the USPTO, other government agencies, and other players in the ecosystem should take to curb its scheme’s abuses.

The post My Comments to the USPTO About the SAD Scheme and Anticounterfeiting/Antipiracy Efforts appeared first on Technology & Marketing Law Blog.

Court Says No Human Author, No Copyright (but Human Authorship of GenAI Outputs Remains Uncertain) (Guest Blog Post)

by guest blogger Heather Whitney

To the surprise of no one, a D.C. district court granted summary judgment for the Copyright Office in Thaler v. Perlmutter, No. 1:22-cv-01564 (D.D.C. Aug. 18, 2023), affirming the Copyright Office’s position that “a work generated entirely by an artificial system absent human involvement [is not] eligible for copyright.” U.S. copyright law protects only works of human authorship, and the defendant, Stephen Thaler, expressly told the Copyright Office that the work at issue, titled “A Recent Entrance to Paradise,” “lack[ed] traditional human authorship.” Eric previously blogged about the Copyright Review Board’s affirmance of the Office’s repeated refusal to register the work back in March 2022.

The Thaler decision is unlikely to have any great impact. There aren’t many people trying to register works “autonomously created by a computer algorithm running on a machine” and disclaiming any human authorship at the outset. The much harder question of “how much human input is necessary to qualify the user of an AI system as the ‘author’ of a generated work” was not before the court. That said, while not presented with the question of how much human input is enough, the court’s dicta arguably suggests that it thinks there is some amount of human input to a generative AI tool that would render the relevant human an author of the resulting output.

This post focuses on the district court’s reasoning. However, before getting to Thaler, it’s worth pausing to underscore the impact that the answer to this harder human-authorship-of-works-created-using-generative-AI question will have.

Take the software industry. Coding assistants like GitHub Copilot, which can auto-complete code, are used by a lot of developers to generate a lot of code. Microsoft’s CEO, Satya Nadella, said last month that 27,000 companies are paying for a GitHub Copilot enterprise license. Just think about how many engineers are using coding assistants without their employers paying for it (or knowing about it). As of February 2023, GitHub announced that, for developers using Copilot, Copilot is behind 46% of the “developers’” code across all programming languages and 61% of all code written in Java. Those percentages are only going to go up as these tools get better, and companies are currently competing to provide the go-to coding assistant tool that developers will use.

But if developers aren’t the “authors” of code they create using coding assistants (and they aren’t adding copyrightable expression to the assistant-generated code after the fact) and the bulk of a company’s proprietary code is generated by a coding assistant, that means the bulk of a company’s proprietary code is not protected by copyright. Regardless of one’s views on the extent to which copyright should protect code, that it might not protect the majority of code created moving forward is a big (and underappreciated) deal.

Thaler Decision: “Works of (Human) Authorship”

The Progress Clause of the Constitution gives Congress the power to “promote the Progress of Science . . . by securing for limited Times to Authors . . . the exclusive Right to their . . . Writings[.]” U.S. Const. art. I § 8, cl. 8. Pursuant to this authorization, the Copyright Act extends copyrights to “original works of authorship fixed in any tangible medium of expression.” 17 U.S.C. § 102(a). The Copyright Act neither defines “authorship” nor “works of authorship.” That said, something cannot be a work of authorship without being the work of at least one author, if for no other reason than the work must be “fixed” in a tangible medium of expression “by or under the authority of the author.” 17 U.S.C. § 101.

Reviewing the Copyright Office’s refusal to register the work under the APA’s arbitrary and capricious standard, the district court gave four main reasons why only humans can be authors, and thus why summary judgment for the Copyright Office was appropriate. (For the record, I’m confident the Copyright Office’s decision could have been reviewed de novo and the district court would have reached the same conclusion.)

1. Precedent.

Courts have never recognized copyright protection in works or elements of works that were not authored by humans. And, while Thaler could not point to a single case of a court recognizing copyright in a work “authored” by a non-human, there are a handful of cases where courts affirmatively refused to do so on the grounds that copyright only protects works of human authorship. As one example of this, the district court pointed to Urantia Found. v. Kristen Maaherra, 114 F.3d 955 (9th Cir. 1997). In Urantia, the Ninth Circuit found a collection of “revelations” purportedly authored by divine beings copyrightable, but only as a compilation. While the selection and arrangement of the “revelations” by humans met the “‘extremely low’ threshold level of creativity required for copyright protection,” the individual “revelations” themselves were not “original” to any human author and thus were not copyrightable.

Some commentators have incorrectly taken Urantia to stand for the proposition that works by non-humans can be copyrightable. Again, the only copyrightable portion was the human’s selection and arrangement of non-protected elements. The Copyright Office made a similar move with respect to its treatment of Kristina Kashtanova’s (they/their) comic Zarya of the Dawn when it registered the work as a compilation but refused to find the images that Kashtanova made using Midjourney to be copyrightable. (My co-author and I discuss the Copyright Office’s decision here.) (Disclosure: I represent Kashtanova in connection with their application to register “Rose Engima.” You can read our cover letter here.)

2. The District Court’s Reading of the Supreme Court.

While the Supreme Court has never squarely addressed the question of whether non-humans can be authors (many claims to the contrary notwithstanding), the district court found that “[h]uman involvement in, and ultimate creative control over, the work at issue was key to the [Supreme Court’s] conclusion that [photography] fell within the bounds of copyright” in Burrow-Giles Lithographic Co. v. Sarony, 111 U.S. 53 (1884). (Christa Laser discusses Sarony in her recent guest post, as have I and my co‑author here.) The district court also thought that the Supreme Court’s decisions in Mazer v. Stein, 347 U.S. 201 (1954) and Goldstein v. California, 412 U.S. 546 (1973) centered authorship on “acts of human creativity.”

While the district court’s account of why the Sarony court concluded that photographs are copyrightable was accurate, it highlights again how much lower the bar for photography is today. And, consequently, it hints at the brewing tension between the Copyright Office’s treatment of works that artists create using generative AI tools and the treatment of works that artists create using cameras (including camera phones).

As the district court put it:

In Sarony . . . the Supreme Court reasoned that photographs amounted to copyrightable creations of ‘authors’ . . . because the photographic result nonetheless ‘represent[ed]’ the ‘original intellectual conception of the author . . . A camera may generated only a ‘mechanical reproduction’ of a scene, but does so only after the photographer develops a ‘mental conception’ of the photograph, which is given its final form by the photographer’s decisions like ‘posing the [subject] in front of the camera, selecting and arranging the costume, draperies, and other various accessories in said photograph, arranging the subject . . . arranging and disposing the light and shade, suggesting and evoking the desired expression, and from such disposition, arrangement, or representation’ crafting the overall image. Human involvement in, and ultimate creative control over, the work at issue was key to the conclusion that the new type of work fell within the bounds of copyright.

I was not surprised to see the National Press Photographers Association and Professional Photographers of America as fellow speakers at the Copyright Office’s recent “listening session” on generative AI and copyright for visual works—precisely because the arguments that the Copyright Office made against Kashtanova being an author could quite easily be used to cast doubt on the copyrightability of the vast majority of photographs.

3. The Copyright Act’s “Plain” Language.

While acknowledging that the Copyright Act does not define “author,” the district court appears to cite modern definitions of “author” to support its conclusion that “[b]y its plain text, the [Copyright Act] requires a copyrightable work to have an originator with the capacity for intellectual, creative, or artistic labor.” The district court then simultaneously asserts that that “originator” must be human while also dropping a footnote that suggests the opposite. Specifically, citing Justin Hughes’ work, the court left open the possibility that non-human sentient beings may be covered, but dismissed this issue because “[t]he day sentient refugees from some intergalactic war arrive on Earth and are granted asylum in Iceland, copyright law will be the least of our problems.”

Some day in the future, as the technology continues to advance, there will be AIs that generate content that some people will believe to be sentient, and those people will latch on to this comment. Indeed, in the Hughes article that the court cites, Hughes sees something along these lines, noting that “once some AI is sentient enough to demand its own civil rights and protection under the Thirteenth Amendment, my guess is that ‘person’ in copyright law will not be limited to homo sapiens.” (For what it’s worth, an AI tool can say that it demands its own civil rights and protection under the Thirteenth Amendment today. It can also say that the year is actually 2000 and we are all living in a simulation.)

4. Purpose of Copyright Protection.

Referring to the Progress Clause, the district court notes that the purpose of copyright, which it characterizes at the promotion of the public good through incentivizing (human) individuals to create, are not furthered by extending copyright to works created without any human involvement. “Non‑human actors need no incentivization with the promise of exclusive rights under United States law, and copyright was therefore not designed to reach them.”

Concluding Thoughts

Again, not a surprising result. Nonetheless, Thaler’s lawyer has stated that they plan to appeal. Thaler’s appeal to the Federal Circuit and petition for certiorari in his analogous patent case were both unsuccessful.

No court has yet addressed whether (and if so, when) humans are the authors of content that they generate through the use of generative AI tools. I suspect that will change relatively soon. While most companies and creators have been focused on the infringement risks raised by the use of generative AI tools, few have thought through their own ability to protect IP they create using those tools moving forward. Defendants in future copyright infringement suits are bound to argue that the works at issue were made using generative AI tools and thus are not protected by copyright at all. It is hard to say at this stage when those defendants will be successful.

* * *

Eric’s Comments

The word “human” appears in the Thaler opinion 50 times. That’s because the court repeatedly says numerous times in different ways: no human creation = no copyright. That answers the question when human involvement in content creation is a binary switch–either on or off. Nowadays, humans routinely use machines to create content, and sometimes the machines contribute a lot to the final outcome. Trying to cleave the human part from the machine contributions sounds impossible, but that’s where the Copyright Office’s position is taking us.

I recently gave a short talk on copyright and generative AI. The video. The slides.

The post Court Says No Human Author, No Copyright (but Human Authorship of GenAI Outputs Remains Uncertain) (Guest Blog Post) appeared first on Technology & Marketing Law Blog.

Court Doesn’t Expect YouTube to Moderate Content Perfectly–Newman v. Google

This is one of several ideologically motivated lawsuits against YouTube for allegedly engaging in “discriminatory” content moderation. The initial cohort of plaintiffs were conservatives (Prager); but then as a purported “gotcha,” the law firm added LGBTQ (Divino) and people of color (Newman) plaintiff cohorts. By experimenting with more sympathetic plaintiff demographics, I assume the law firm hoped to create better precedent that it could then weaponize to help conservatives object to content moderation and more deeply entrench their existing privilege. However, as I observed before, it’s not actually possible to “discriminate” against every subpopulation of user-authors, because discrimination-against-everyone is really discrimination-against-no one. Thus, I always felt the litigation ploy acted as an adverse admission by the plaintiffs. But courts don’t always use facts like that for petard-hoisting, instead grounding their rulings in legal doctrines and admissible evidence. And the precedent is indeed stacked against any account termination or content removal plaintiffs.

After 5 tries, the Divino LGBTQ lawsuit finally failed last month. And after a remarkable 6 tries, the Newman race-based lawsuit has now failed too (prior blog post). In both cases, the high-concept and splashy constitutional issues fizzled out long ago. As the Newman court summarized, “this case has shed its intentional discrimination and constitutional claims, becoming—first and foremost—a breach of contract dispute.”

The court discusses this language that YouTube added to its community guidelines in 2021:

We enforce these Community Guidelines using a combination of human reviewers and machine learning, and apply them to everyone equally—regardless of the subject or the creator’s background, political viewpoint, position, or affiliation

I do not understand YouTube’s decision to add this language while it had all of these discrimination lawsuits pending. What was YouTube thinking???

The court says this language could be an enforceable promise:

the statement reads like a guarantee that users can expect identity-neutral treatment from YouTube when they use its service. Moreover, the statement is definite enough for the Court to ascertain YouTube’s obligation under the contract (it must avoid identity-based differential treatment in its contentmoderation) and to determine whether it has performed or breached that obligation.

This sets up YouTube for a major own-goal. Yet, the court bails YouTube out. [Tip to YouTube: PLEASE PLEASE PLEASE DELETE THIS LANGUAGE FROM YOUR COMMUNITY GUIDELINES IF YOU HAVEN’T ALREADY DONE SO.]

The court says: “the plaintiffs must do more than gesture at plausible ideas in the abstract. They must allege sufficient factual content to give rise to a reasonable inference that their content has been treated in a racially discriminatory manner by YouTube’s algorithm.”

The centerpiece of the plaintiffs’ allegations is a chart comparing 32 of the plaintiffs’ restricted works to 58 unrestricted works by “white” submitters. The court shreds this chart.

As a proxy for determining that submitters were “white,” the plaintiffs identified videos from “large corporations.” This is a non-sequitur, and the court easily disregards these videos. The court calls other video comparisons “downright baffling.” Yet other comparisons actually undercut the plaintiffs’ arguments because the restricted videos apparently deserved more moderation than their comparators, “which dramatically undermines the inference that the differential treatment was based on the plaintiff’s race.”

This leaves only “a scarce few” video comparisons as “even arguably viable,” but that’s not enough to support the contract breach claim (emphasis added):

the complaint provides no context as to how the rest of these users’ videos are treated, and it would be a stretch to draw an inference of racial discrimination without such context. It may be that other similarly graphic makeup videos by Ley have not been restricted, while other such videos by the white comparator have been restricted. If so, this would suggest only that the algorithm does not always get it right. But YouTube’s promise is not that its algorithm is infallible. The promise is that it abstains from identity-based differential treatment.

The issue of error rates is critical to any allegations of identity-based discriminatory content moderation. A large service like YouTube with an exceptionally high content moderation accuracy rate will make many millions of moderation errors–not because of discrimination but because of the inevitable limitations and possible arbitrariness of content moderation. As the court indicates, it’s not realistic to demand perfect content moderation, so that’s not the appropriate baseline for assessing whether content moderation has been done on a discriminatory basis.

So, exactly what proof the court would accept to show identity-based discriminatory content moderation? The court says the 32/58 video comparison was too small a sample to generate reliable results. Yet, the court also said that general statistical evidence of site-wide discrimination wouldn’t matter. So I guess the judge would credit a large number of plaintiffs with a large enough corpus of compared works to achieve statistically reliable results? Or perhaps there is no way for plaintiffs to plead discrimination without smoking-gun evidence of individual discriminatory decisions.

On that front, the plaintiffs’ other key piece of evidence came from a 2017 meeting between YouTube “queer” creators and Google’s Vice President of Product Management, Johanna Wright. Allegedly Wright said that Google was filtering content “belonging to individuals or groups based on gender, race, religion, or sexual orientation.” At another meeting, YouTube allegedly admitted that it differentially removed content from non-white submitters at a higher rate than white submitters. The court says these allegations “do not come close to making up for the glaring deficiencies in the plaintiffs’ chart”:

First, the allegations are vague as to what exactly was said. For example, the complaint purports to quote Wright, but it is not clear where Wright’s words end and the plaintiffs’ recitation of legal buzzwords begins. Similarly, the plaintiffs attribute a great many (buzzword-laden) statements to YouTube’s representatives but barely quote them.

Second, and more importantly, these alleged admissions were made in 2017, four years before YouTube added its promise to the Community Guidelines. In machine-learning years, four years is an eternity. There is no basis for assuming that the algorithm in question today is materially similar to the algorithm in question in 2017. That’s not to say it has necessarily improved—for all we know, perhaps it has worsened. The point is that these allegations are so dated that their relevance is, at best, attenuated. Finally, these allegations do not directly concern any of the plaintiffs or their videos. They are background allegations that could help bolster an inference of race-based differential treatment if it were otherwise raised by the complaint. But, in the absence of specific factual content giving rise to the inference that the plaintiffs themselves have been discriminated against, there is no inference for these background allegations to reinforce.

I wonder again: what evidence could plaintiffs have marshaled to show actionable discriminatory content moderation?

To be clear, I favor high evidentiary barriers to claims of discriminatory content moderation. It should not be possible to establish identity-based discriminatory content moderation based solely on inferences. Otherwise, every group can easily find a statistician who will crunch a dataset to show, with low p-values, that moderation for one content category isn’t perfectly equal to some other content category. This isn’t discrimination; this is an inevitable consequence of editorial decisions at scale (especially if the service doesn’t always know its authors’ demographics).

(I also question if the law can restrict publishers’ ability to discriminate in their editorial decisions. That topic is expressly at issue in the FLA and TX social media censorship cases).

So, plaintiffs need smoking gun evidence, or it’s an easy case to dismiss. As I wrote about this case in 2021:

the plaintiffs claim that YouTube engaged in race-based discriminatory content moderation, but there’s no way for plaintiffs to prove this because there’s no baseline of what “unbiased” content moderation looks like. Instead, an unavoidable truth of content moderation: everyone believes that the Internet services are “biased” against them…but it’s impossible for Internet services to be biased against everyone. Without a simple-to-apply legal defense that content moderation is always and inevitably biased and the law offers no remedy for that, we will experience a repetitive cycle of plaintiff attempts to weaponize the law against that truth.

In future cases alleging discriminatory content moderation, maybe courts should order the plaintiffs to pay the attorneys’ fees, especially when a plaintiff gets 6 bites at the apple.

With this dismissal, the case is now ready for its appeal to the Ninth Circuit, which has always been its inevitable destination. There, this case will join the Divino case. I would like to think that the Ninth Circuit will find these cases easy to affirm, but as indicated by the troubling Vargas ruling, anything could happen when the Ninth Circuit considers the intersection of discrimination claims and editorial decision-making.

One oddity: the court instructs both parties “not to remove or otherwise make unavailable the videos cited in the complaint until the appellate process has run its course.” I understand that the court is trying to preserve the evidence for the appellate court, but this is actually an unconstitutional order to keep publishing content that either party may determine does not meet its editorial standards.

Case citation: Newman v. Google LLC, 2023 WL 5282407 (N.D. Cal. Aug. 17, 2023). The CourtListener page.

The post Court Doesn’t Expect YouTube to Moderate Content Perfectly–Newman v. Google appeared first on Technology & Marketing Law Blog.

Ninth Circuit Easily Dismisses Account Termination Case–King v. Facebook

This is a standard account termination case. The specific facts don’t matter to the outcome, but I enumerate a little more detail in my prior blog post. The 9th Circuit panel’s very short narrative includes:

  • “there is no private right of action under the CDA”
  • “The specific promise to take down explicit content at issue in Barnes does not compare to the general promise made by Facebook, and incorporated into its TOS, to use “good faith” or make an “honest” determination before deciding to exercise publishing or editorial discretion.”

A unanimous per curiam memo disposition means the plaintiffs weren’t even close. You might expect a better showing from a lawyer-plaintiff. Instead, this becomes another banal entry in the ever-growing pile of failed account termination and content removal lawsuits.

Case citation: Adrienne Sepaniak King and Christopher Edward Sepaniak King v. Facebook, Inc., 2023 WL 5318464 (9th Cir. August 18, 2023)

The post Ninth Circuit Easily Dismisses Account Termination Case–King v. Facebook appeared first on Technology & Marketing Law Blog.

512(f) Once Again Ensnared in an Employment Ownership Dispute–Shande v. Zoox

DMCA Section 512(c), the notice-and-takedown provision, codifies a simple paradigm. Copyright owners are in the best position to spot and redress infringement, so they should identify alleged infringement to services and seek intervention when they see infringements. This paradigm, however, breaks down when copyright ownership is contested. In that circumstance, the takedown notice becomes a proxy battle for a larger and likely fact-dependent war over ownership, which the service in the middle isn’t in a good position to resolve.

Today’s post is about another copyright ownership dispute that spilled over into 512(f), the civil cause of action for abusive takedown notices. The litigants are an employer and former employee. The employee created works while employed, but allegedly independently, and posted the works online. The employer sent takedown notices to the hosting service for those works, claiming the works were prepared within the scope of employment and thus works-for-hire. The hosting service honored the takedown notice. Now the employee is suing over the employer’s allegedly wrongful assertion of ownership. This includes a 512(f) claim.

Unsurprisingly, 512(f) does not help the employee. The court accepts that the employer believed the works were created within the scope of employment, so the associated takedown notice wasn’t sent in bad faith. The 512(f) claim fails. So does the rest of the employee’s lawsuit.

One could argue that Section 512 worked as it should in this case. By design, it seeks to push questions over ownership to court, rather than have the intermediary service try to resolve those question. Shande will get his day in court. But even if he prevails in the ownership dispute, 512(f) won’t help compensate him for his troubles.

Case Citation: Shande v. Zoox, Inc., 2023 WL 5211628 (N.D. Cal. Aug. 14, 2023)

BONUS: More 512(f) quick links from this year:

* Cinq Music Group, LLC v. Create Music Group, Inc., 2023 WL 4157446 (C.D. Cal. Jan. 31, 2023). “Courts in the Ninth Circuit have regularly held that the DMCA preempts state law claims arising out of submission of takedown notices.” 512(f) once again wipes out state law claims, even if 512(f) doesn’t apply.

* Powerwand Inc. v. Hefai Neniang Trading Co., 2023 WL 4201748 (W.D. Wash. June 27, 2023). A plaintiff wins 512(f) case on a default judgment.

* Moonbug Entertainment Ltd v. Babybus Network Technology Co., No. 21-cv-06536-EMC (N.D. Cal. July 27, 2023). Jury answered yes to the question: “In the March 20, 2023 Counternotification regarding the Yes Yes Playground video on Super JoJo’s Portuguese language channel, did Babybus knowingly and materially misrepresent that material was removed or disabled from YouTube by mistake or misidentification?” The jury awarded Moonbug $10k for its 512(f) win. A rare 512(f) jury victory.

Prior Posts on Section 512(f)

Surprise! Another 512(f) Claim Fails–Bored Ape Yacht Club v. Ripps
You’re a Fool if You Think You Can Win a 512(f) Case–Security Police and Fire Professionals v. Maritas
512(f) Plaintiff Must Pay $91k to the Defense–Digital Marketing v. McCandless
Anti-Circumvention Takedowns Aren’t Covered by 512(f)–Yout v. RIAA
11th Circuit UPHOLDS a 512(f) Plaintiff Win on Appeal–Alper Automotive v. Day to Day Imports
Court Mistakenly Thinks Copyright Owners Have a Duty to Police Infringement–Sunny Factory v. Chen
Another 512(f) Claim Fails–Moonbug v. Babybus
A 512(f) Plaintiff Wins at Trial! –Alper Automotive v. Day to Day Imports
Satirical Depiction in YouTube Video Gets Rough Treatment in Court
512(f) Preempts Tortious Interference Claim–Copy Me That v. This Old Gal
512(f) Claim Against Robo-Notice Sender Can Proceed–Enttech v. Okularity
Copyright Plaintiffs Can’t Figure Out What Copyrights They Own, Court Says ¯\_(ツ)_/¯
A 512(f) Case Leads to a Rare Damages Award (on a Default Judgment)–California Beach v. Du
512(f) Claim Survives Motion to Dismiss–Brandyn Love v. Nuclear Blast America
512(f) Claim Fails in the 11th Circuit–Johnson v. New Destiny Christian Center
Court Orders Rightsowner to Withdraw DMCA Takedown Notices Sent to Amazon–Beyond Blond v. Heldman
Another 512(f) Claim Fails–Ningbo Mizhihe v Doe
Video Excerpts Qualify as Fair Use (and Another 512(f) Claim Fails)–Hughes v. Benjamin
How Have Section 512(f) Cases Fared Since 2017? (Spoiler: Not Well)
Another Section 512(f) Case Fails–ISE v. Longarzo
Another 512(f) Case Fails–Handshoe v. Perret
* A DMCA Section 512(f) Case Survives Dismissal–ISE v. Longarzo
DMCA’s Unhelpful 512(f) Preempts Helpful State Law Claims–Stevens v. Vodka and Milk
Section 512(f) Complaint Survives Motion to Dismiss–Johnson v. New Destiny Church
‘Reaction’ Video Protected By Fair Use–Hosseinzadeh v. Klein
9th Circuit Sides With Fair Use in Dancing Baby Takedown Case–Lenz v. Universal
Two 512(f) Rulings Where The Litigants Dispute Copyright Ownership
It Takes a Default Judgment to Win a 17 USC 512(f) Case–Automattic v. Steiner
Vague Takedown Notice Targeting Facebook Page Results in Possible Liability–CrossFit v. Alvies
Another 512(f) Claim Fails–Tuteur v. Crosley-Corcoran
17 USC 512(f) Is Dead–Lenz v. Universal Music
512(f) Plaintiff Can’t Get Discovery to Back Up His Allegations of Bogus Takedowns–Ouellette v. Viacom
Updates on Transborder Copyright Enforcement Over “Grandma Got Run Over by a Reindeer”–Shropshire v. Canning
17 USC 512(f) Preempts State Law Claims Over Bogus Copyright Takedown Notices–Amaretto v. Ozimals
17 USC 512(f) Claim Against “Twilight” Studio Survives Motion to Dismiss–Smith v. Summit Entertainment
Cease & Desist Letter to iTunes Isn’t Covered by 17 USC 512(f)–Red Rock v. UMG
Copyright Takedown Notice Isn’t Actionable Unless There’s an Actual Takedown–Amaretto v. Ozimals
Second Life Ordered to Stop Honoring a Copyright Owner’s Takedown Notices–Amaretto Ranch Breedables v. Ozimals
Another Copyright Owner Sent a Defective Takedown Notice and Faced 512(f) Liability–Rosen v. HSI
Furniture Retailer Enjoined from Sending eBay VeRO Notices–Design Furnishings v. Zen Path
Disclosure of the Substance of Privileged Communications via Email, Blog, and Chat Results in Waiver — Lenz v. Universal
YouTube Uploader Can’t Sue Sender of Mistaken Takedown Notice–Cabell v. Zimmerman
Rare Ruling on Damages for Sending Bogus Copyright Takedown Notice–Lenz v. Universal
512(f) Claim Dismissed on Jurisdictional Grounds–Project DoD v. Federici
Biosafe-One v. Hawks Dismissed
Michael Savage Takedown Letter Might Violate 512(f)–Brave New Media v. Weiner
Fair Use – It’s the Law (for what it’s worth)–Lenz v. Universal
Copyright Owner Enjoined from Sending DMCA Takedown Notices–Biosafe-One v. Hawks
New(ish) Report on 512 Takedown Notices
Can 512(f) Support an Injunction? Novotny v. Chapman
Allegedly Wrong VeRO Notice of Claimed Infringement Not Actionable–Dudnikov v. MGA Entertainment

The post 512(f) Once Again Ensnared in an Employment Ownership Dispute–Shande v. Zoox appeared first on Technology & Marketing Law Blog.

LawTuber Loses Defamation Case–Broughty v. Bouzy

Broughty, using an alias, runs the “Nate the Lawyer” channel, part of the LawTube community, with over a quarter-million followers and 27M views. Like many other LawTubers, he sided against Heard in his coverage of the Johnny Depp/Amber Heard trial.

Bouzy is CEO of Bot Sentinel, which claims it is a “non-partisan platform developed to classify and track inauthentic accounts and toxic trolls.” Heard retained Bot Sentinel. Bot Sentinel rated Broughty’s Twitter account “disruptive”; and on his personal Twitter account, Bouzy questioned Broughty’s legal credentials and accused Broughty of various misdeeds. Bouzy later recanted/withdrew some of the allegations (wait, whose Twitter account was more “disruptive”?). Broughty sued Bouzy for defamation and more.

The court says Broughty qualified as a limited-purpose public figure. “He purposefully sought public attention and followers by creating YouTube and Twitter content on matters of public interest, including analysis of the Depp–Heard trial and criticisms of Bot Sentinel.” As a result, Broughty must show that Bouzy had actual malice towards the accuracy of his assertions.

The court essentially characterizes Twitter as an anything-goes hellscape, at least when it comes to online feuds:

the “over-all context” in which the alleged defamation occurred here is Twitter. As both parties are well aware—given that both have engaged in calling out mistruths on the Internet—Twitter is a public forum where a reasonable reader will expect to find many more opinions than facts…Twitter is a forum where a user, “in the same setting and with the same audience, has the immediate opportunity to air his competing view” and thus may generally remedy any defamation with “self-help” rather than rely on litigation

The court is a little more concerned with Bouzy’s questioning whether Broughty was licensed as a lawyer. However, “Defendant’s tweets questioning whether Plaintiff was a lawyer are protected First Amendment speech because a reasonable reader would only view them as opinions, and the facts upon which they were based were disclosed, so ‘readers can easily judge the facts for themselves.'” Plus, Broughty used an online alias, so it’s understandable why Bouzy couldn’t confirm his license status.

As for Bouzy’s coarse language, such as calling Broughty a “grifter,” “liar,” “troll,” and participant in a “smear campaign,” the court says these “are clearly opinions in the context in which they were stated, and none of Defendant’s tweets making these remarks imply that they are based on undisclosed facts.”

Bouzy also alleged that Broughty engaged in criminal conduct, but the court says that’s OK because he showed his sources:

Defendant began the string of tweets about planting evidence with the words “New: Unearthed video of [Plaintiff] admitting he planted evidence” and a link to the video referenced, and his subsequent tweets continued to debate what Plaintiff’s words in the video meant, making it clear to the reasonable reader that his statements accusing Plaintiff of criminal conduct were “merely a personal surmise built upon” what Plaintiff said in this video. Similarly, Defendant’s tweet stating that Plaintiff “bragged to his @YouTube followers about illegally obtaining [Defendant’s] social security number” contained a link to a YouTube video. Readers were free to, and apparently many on Twitter did in fact, reach different conclusions as to whether Plaintiff admitted to any criminal conduct in these videos.

I’ve blogged several cases where courts have given online speakers wide latitude so long as they linked to their sources (e.g., this post), so this ruling is supported by precedent. Still, I’m not seeing any winners here. The way the court characterizes it, Bouzy seemingly sealioned and then name-called and falsely attacked Broughty, perhaps indirectly trying to rehabilitate Heard’s reputation? If so, I’m not exactly sure what constitutes a “troll” that anti-troll operations should track.

Case Citation: Broughty v. Bouzy, 2023 WL 5013654 (D.N.J. Aug. 7, 2023)

BONUS 1: Lima Jevremovic v. Brittany Jeream Courville,  2023 WL 5127332 (D.N.J. Aug. 10, 2023), dismissing a defamation claim when:

Defendant frequently disclaims her commentary as a conspiracy theory, refers to herself as a conspiracy theorist, uses signals such as “I believe” & “I think”, & makes commentary on her social media page which is titled “Legal Edutainer”, all of which point to unactionable opinions rather than defamatory statements

Anything goes online nowadays if you publicly embrace conspiracy theories. ¯\_(ツ)_/¯

BONUS 2: Rosenblum v. Budd, 2023 WL 4938545 (Colo. Ct. App. Aug. 3, 2023). An impersonation Twitter account could be a “misappropriation” violation. Also, it could be defamatory if that account links to leaked private messages from the impersonation target. This latter ruling is especially dubious when the court says that 230 may not apply to the linked content.

BONUS 3: Reeves v. Woolford-Smith, 2023 WL 3563052 (Del. Ct. Common Pleas May 19, 2023). Defamation lawsuit over Facebook reviews fails at trial.

The post LawTuber Loses Defamation Case–Broughty v. Bouzy appeared first on Technology & Marketing Law Blog.