Gergely Orosz’s Post

This is either brilliant or scary: Anthropic accidentally leaked the TypeScript source code of Claude Code (which is closed source). Repos sharing the source are taken down with DMCA - as they share copyrighted code that they are not allowed to do so. This is all very standard until this point. BUT where things take a twist: this one repo rewrote the code using Python, and so it violates no copyright & cannot be taken down with DMCA requests! The brilliance: copyright does not protect certain kinds of derived works. Rewriting TypeScript code in Python means copyright probably doesn't apply...? The scary thing: it can be done in trivial amount of time, with AI agents. This one was done with Codex. This can be done not just for this specific codebase, but any codebase. So what happens with copyright? Will it evolve with AI, or be stuck pre-AI? If any company is a good one to test on what happens: it is Anthropic. AI labs greatly benefit from derived works not being considered as copyrightable: this is one of the reasons they can freely train on copyrighted work, after all. This repo is doing the same: taking copyrighted code by Anthropic (that Anthropic publicly shared!) and deriving it, transforming into another, common programming language. You can imagine Anthropic being in a pickle right now: 1. Do they just leave this, and look the other way? Ignoring that it's not exactly fair to transform their code and leave it up there, not wanting to poke the bear. 2. Or do they risk it and poke the bear: claim that claim copyright applies because the work is very clearly derived and mimics the original... but this could be bad for their own business in much bigger ways! E.g. imagine regulation coming into play that bans this (transforming code from one language to another.) Claude Code and other tools would have to refuse this kind of generation - and become a lot less useful. Lawsuits against AI labs could spike against labs like Anthropic and others. The losses from #2 could eclipse having this repo stay up. And you can bet it would be a high-profile case: an AI lab arguing copyright needs to be updated and explanded thanks to AI agents! So my bet is #1 happens. Not the interest of an AI lab to expand copyright protections to derived work cretated by an LLM... The repo: https://lnkd.in/eGzp_DSZ And here are details about how Anthropic builds Claude Code, from a deepdive in The Pragmatic Engineer: https://lnkd.in/eBvzDmRK (they move very fast, with 60-100 internal releases per day, and so the accidentally leaked code is unlikely to be relevant for too long, as in the comments Rodrigo Pimentel also pointed out [thank you!] )

  • graphical user interface, text, application, email

Are you sure about this? As I know copyright absolutely does protect derived works. That's literally what the word "derivative" means in copyright law.

It's questionable whether using an LLM in this way can be considered a clean room rewrite, especially since in most cases the LLMs have been exposed to the original codebase (though, I assume, not in this one since it's proprietary). Regardless, it's probably not a line Anthropic would be interested to push in court considering their business model.

They could also do #3 - open source the tool and say it was something they already considered :)

The code will tell people little: 1. The code of these systems is not large. What is large is the data space - the matrices (tensors). 2. The code implements very sophisticated mathematics, which generate probability distributions. If you don't understand the reasoning behind those, you won't know why the code is there.

Like
Reply
Yan Khonski

Problem Solver. Backend, API, reliability, scalability, databases.

18h

That's wonderful. The demand for AI powered engineers, who know how the tools work internally, have solid fundamentals, strong engineering background will increase. The companies like Amazon and Oracle will continue with layoffs... Their downtime and outages will become more frequent. Microsoft will continue with enshittification of acquired products. This creates opportunities for new companies and startups. Now I read news that some well founded startups pay at FAANG level. So Amazon and Oracle will become history, as IBM did, and new companies will emerge.

And I copied it to a Russian-based GitHub alternative, so it's virtually impossible to remove the code via any conventional methods like a DMCA takedown. Russian federal law says you must file a court case personally in the city where the law was allegedly broken, and I don't see how any representative of the Pentagon can personally charge me here. Anthropic, I think, never will — and they're also welcome to fly to Saint Petersburg to meet me in person, great props to them. But it's not Anthropic who are concerned about supply chain attacks, if you know what I mean xD https://gitverse.ru/anarchic/claude-code

My opinion it is legal. The US court judged Anthropic did not committed copyright infringement when it trained its models on stolen/pirate ebooks as they do not reproduce the book itself. Returning knowledge from the book is not copyright infringement if you do not return the whole book itself as it was fed into the training. In this case they input the typescript source code but the output is not the whole typescript source code as it was fed but a python code. As US is precedent-based legal system It must be ok.

I think the copyright question here is that the "work" if generated by an AI agent, cannot be assigned copyright (this is the current position from the US office of copyright). This means that if the work cannot be copyrighted, a license cannot be applied to it. Does that mean its a free for all? We have no case law on this to my knowledge (IANAL, so make sure you check this with your friendly IP specialist)

Martin Damovsky

Staff Cloud Platform Engineer | AWS Geek, Terraform, Kubernetes | AI 🤖 | AWS Hero

19h

Python? Why python? Anyone willing to rewrite it to Rust?

If you live by the sword, you die by the sword!

See more comments

To view or add a comment, sign in

Explore content categories