Today the Generative AI Copyright Disclosure Act was introduced by @RepAdamSchiff, and it’s a great step towards fairer data practices in gen AI. - AI companies will have to disclose to the copyright office a full list of copyrighted works used to train their models - Disclosure required 30 days before model release - Disclosure required every time the training data changes significantly - Also applies to previously released models Companies hiding training data sources is the main reason you don’t see even more copyright lawsuits against gen AI companies. Requiring data transparency from gen AI companies will level the playing field for creators and rights holders who want to use copyright law to defend themselves against exploitation. billboard.com/business/legal…
More info from the bill's full text: - What's required to be disclosed is "a sufficiently detailed summary of any copyrighted works used" - There will be a public database of these disclosures - There are fines for failure to comply The public database is particularly important: it means anyone should be able to see if their copyrighted work has been used by a generative AI model. schiff.house.gov/imo/media/doc/…
@ednewtonrex @RepAdamSchiff What stops model-training from moving offshore, outside US jurisdiction?
@ednewtonrex @RepAdamSchiff AI companies should then just shift to Japan where they can train on copyrighted works as much as they want. Copyright laws need to change in this day and age.
@ednewtonrex @RepAdamSchiff This looks like it still serves the AI companies, and passes the onus onto individuals sue for their own works being used instead of putting a fullstop on the activity at large.
@ednewtonrex @RepAdamSchiff Tracing the provenance of arbitrary text on the Internet is next to impossible, so this would be a de facto ban on training frontier models
@ednewtonrex @RepAdamSchiff I think many companies would go for this "or pay some money" option...
While I appreciate the actions being taken, this concerns me greatly because it further props up centralized institutions by making very laborious barriers for small players. imo, it's not a scalable solution, and will likely result in those who are connected being able to skip the line to the copyright office, and those who are not connected stuck waiting for paperwork to process. I am very curious of your thoughts on this post I made in reference to a discussion you were involved in regarding gen ai copyright, if you have the time to take a look. x.com/hettzzz/status…
@ednewtonrex @RepAdamSchiff The list will be like... - GLAIGlFbgAA7YKx.jpg