
In the online world of fanfiction writers, who pen stories inspired by their favorite movies, books, and games, and share them for free, there are unspoken codes of conduct. Among the most important: never charge money for your fanfic, and never steal other people’s work.
It makes sense then that fanfic writers were among the first creators to raise the alarm about their work being fed into learning language models powering generative AI without their knowledge or permission. But their efforts to stop the encroachment of AI into fan spaces is an uphill battle.
The latest salvo came in early April, when user nyuuzyou scraped 12.6 million fanfics from the online repository Archive of Our Own (AO3) and uploaded the dataset to Hugging Face, a company that hosts open-source AI models and software.
Nyuuzyou’s upload was quickly discovered by the Reddit community r/AO3, where hundreds of users posted furious reactions. A Tumblr account, ao3scrapesearch, built a search engine that allowed authors to search their usernames and see if their work had been scraped by Nyuuzyou.
“This is something that takes time and effort and your heart and your soul, and you do this in a community.”
Fanfic writers flooded the comment section of the dataset on Hugging Face, getting into arguments with AI defenders. Dckchili defended nyuuzyou’s scrape, claiming that it didn’t matter because Big Tech crawler bots have already scraped the archive numerous times. RaraeAves argued that “the creeps” are depending on fanfic writers to not fight back when their labor and creativity are being exploited.
When Nikki, a Star Wars fanfic writer who goes by infinitegalaxies online, typed her name in the search engine, she saw that more than 70 of her fics had been scraped. But one jumped out. It was a collective essay she’d co-authored with 11 other writers to raise awareness about the threat of AI to fandom and uploaded to AO3. The irony did not escape her.
Nikki mostly writes fanfiction about Reylo, the romantic pairing (or “ship”) of the characters Rey and Kylo Ren from the Star Wars sequel trilogy. The Reylo fandom is close-knit and prolific, with more than 30,000 Reylo stories posted to AO3. About half are set in the canon Star Wars universe of light sabers and space adventures, but the other half take place in alternative universes and explore everything from coffee-shop romances and workplace dramas to medieval knights and fairy kingdoms. One particularly beloved fic in the fandom is set in 1994 and recasts Kylo Ren as Kyril, a mafia boss in newly post-Soviet Russia. The fandom has produced writers like Ali Hazelwood and Thea Guazon, who have made the leap from fanfic to become highly successful, published romance authors.
For Nikki, the Reylo fandom offered a new sense of belonging. She found a home in the supportive community of writers and readers and relished the freedom to write whatever she wanted.
“Fandom is largely a gift economy. We’re just here to have fun and do things out of the goodness of our heart. And to give things to each other and make work in community,” Nikki says.
This sentiment is echoed by many others in the Reylo community, including Em, who writes under the pen name okapijones. Em fell in love with the characters of Rey and Kylo Ren because they represented the enemies-to-lovers light / dark archetypes that reminded her of Beauty and the Beast and Pride and Prejudice. But she hated the way their story ended in the Star Wars sequel trilogy and went looking for other fans who wanted a different ending.
“Fic changed my life. I have met some of the best friends that I have ever had through fic and through the fanfiction community,” Em says. “There’s no rules, there’s no editors. It’s a pure creative playground, and that is going to breed innovation. Some of the most creative stories I’ve ever read, some of the wildest storytelling, is fanfic. And that excites me as a creator, because you can just do whatever you want.”
“This is something that takes time and effort and your heart and your soul, and you do this in a community,” Nikki says. “And then you’re telling me you’re just going to poop it out two seconds on a screen. And I was just like, who asked for this? This is gross.”
In 2023 came Sudowrite’s Story Engine, powered in part by OpenAI’s ChatGPT. Nikki remembers watching a video about the new “writing assistant” AI software that allows users to enter details about characters and plot points and generate an entire novel. She was so appalled that it made her cry. Nikki, who works for a software company, had already seen her workplace shift toward integrating AI. But she hadn’t imagined her hobby would be impacted by it too.
“Trying to knock this stuff down, that’s probably the best thing that one can be doing now.”
Later that year, the prevalence of highly specific sexual terms related to the wolf-biology fanfiction trope of Omegaverse appeared in Sudowrite, revealing that ChatGPT had likely been trained on fanfic without the authors’ knowledge.
Since then, Nikki and many others have been advocating against AI in all its forms in fandom, including using AI to generate fanfic or fanart.
“It’s theft at its core. There’s no ethical use of something that’s built on stolen labor,” Nikki says. Although she’s against genAI in principle because of its reliance on data taken without consent, she also says it breaks with fandom norms of free exchange.
“I did it because I love those characters, because I wanted to play in that sandbox, because I wanted people who also love them to read it. It is a gift.” Em says. “They stole it without my permission.”
But over the last few years, fanfic writers say there have been numerous examples of genAI entrepreneurs trying to cash in on their work — such as people like Cliff Weitzman, the CEO of text-to-voice app Speechify, who was found to have scraped thousands of fics from AO3 and uploaded them to WordStream, a website linked to his app, without the authors’ permission. (He swiftly removed that after fans pushed back on social media.) Then there was Lore.fm, a text-to-speech app from Wishroll Inc, which marketed itself on TikTok as “Audible for AO3.” The app was announced in May 2024 but was withdrawn later that month after fan pushback.
“It’s like a whack-a-mole thing. Every time you turn around, there’s, like, another grifter trying to steal your shit,” Nikki says.
It may seem odd to hear such a strong sentiment from a writer who, like most fanfic creators, uses copyrighted intellectual property as a “sandbox” to make up their own stories. But advocates for fanworks say they are “transformative,” meaning a “fanwork creator holds the rights to their own content, just the same as any professional author, artist, or other creator,” according to AO3. This is very different from what a LLM does when, for example, it generates a novel based on prompts. AI can’t replicate the creative human process of “transformation,” which involves inventing and integrating new ideas. LLMs can only reshuffle and regurgitate content that already exists.
And, unlike the AI-generated books flooding Amazon, one of the principles of fanfiction is that writers do not make any profit from their work.
That hasn’t stopped AI infiltrating fandom in other controversial ways. Some readers, eager to get new updates of their favorite fics, have taken to uploading them into ChatGPT to generate new chapters, much to the consternation of some authors. Some have taken to locking their stories, requiring readers to have an AO3 account to access them or deleting them from the internet altogether.
In the case of nyuuzou’s scrape, fans coordinated online to file take-down notices under the Digital Millennium Copyright Act (DMCA), and the Organization for Transformative Works (OTW), the nonprofit that administers AO3, also filed a takedown. On April 9, Hugging Face disabled the dataset. OTW responded to user concerns about fanfics being scraped in a board meeting on April 26, saying, “We have added a CloudFlare tool to prevent AI scraping and other bots. This helps a lot but is not perfect. However, more robust solutions would have a significant negative impact on some of our users, especially those using older devices.”
Nyuuzou remained unrepentant, filing a counternotice and reuploading the dataset to sites hosted in Russia and China, which are far less responsive to DMCA complaints. Contacted by The Verge via a Telegram account linked on his Hugging Face profile, nyuuzou said he was an 18-year-old student and IT worker in Russia who is “not interested in fanfiction” and uploaded the dataset for “legitimate research purposes.”
“My goal was to support community research in areas like content moderation, anti-plagiarism tools, recommendation systems, and archival preservation,” nyuuzou wrote via Telegram. “I think a lot of the disagreement comes from misunderstandings about why these datasets exist. This was never about creating chatbots or large language models for commercial use.”
Founded in 2016 by French entrepreneurs, Hugging Face started out building chatbots for teenagers. Since then, the company has expanded to hosting open-source models with the stated aim of “democratizing AI” by making machine-learning development accessible to the public.
“Our goal is to enable every company in the world to build their own AI,” Jeff Boudier, Hugging Face’s head of product, told Amazon Web Services (AWS) in February. But Hugging Face is deeply connected to large companies. In addition to its ongoing collaboration with AWS, IBM invested $235 million in Hugging Face in 2023 and announced it was collaborating with the company on watsonx, IBM’s generative AI platform.
Nyuuzou said he was surprised by OTW’s aggressive reaction to the dataset, writing, “I had hoped for dialogue about how research datasets might align with preservation goals.”
“That’s really disingenuous,” says Alex Hanna, director of research at the Distributed AI Research Institute and author of The AI Con: How to Fight Big Tech’s Hype and Create the Future We Want. She’s skeptical of the idea that any dataset uploaded to Hugging Face wouldn’t ultimately be used to train LLMs. “Why would you have a large tranche of unstructured data available on the web if not to train a language model?”
Although individual scrapers like nyuuzou are small fry in the wider economy of genAI, which is dominated by billion-dollar companies like OpenAI, Hanna says it’s still up to sites like AO3 to aggressively protect their users’ work. As for fanfic writers themselves, she thinks Nikki’s strategy of whack-a-mole is the way to go. “Trying to knock this stuff down, that’s probably the best thing that one can be doing now,” Hanna says.
Nikki and Em, the fanfic writers, had a more heated response to nyuuzou’s explanation for the scrape.
“Fuck you, dude,” Em says. “We do free labor for the love of the game and are not profiting off of it — other than creating a community, gaining practice for our craft and creating content for characters and stories that we love. And that is being stolen to fuel things that have such larger implications.”
Nikki says she’s determined to keep pushing back against AI’s encroachment into fandom spaces.
“I don’t go looking for a fight,” she says. “But when people come to us with a fight, I will fight.”