🚨The clip below could put OpenAI in trouble; here's why: In case OpenAI's Sora is classified as a high-risk AI system per the EU AI Act, they will have to comply with transparency obligations such as informing users about "training, validation and testing data sets used, taking into account the intended purpose of the AI system" (Art. 13.3) In the clip below, Mira Murati, OpenAI's CTO, cannot specify or exemplify the sources of data used to train Sora. If she were from the marketing department, this would be okay, but as a CTO, this is a core aspect of the technology that can lead to legal liability, and she should be able to answer it in a straightforward way. In the US, FTC Chair Lina Khan recently said that sensitive personal data (linked to health, geolocation, and web browsing history) should be excluded from training datasets. If we do not know the sources of data, the question about sensitive data and measures taken to avoid it becomes murky. The FTC is currently investigating OpenAI and has made multiple inquiries regarding the training dataset. From my perspective, they will soon require detailed and public information about the training dataset, and Murati's vague statement can be seen as inadmissible. To learn more and receive my analyses, subscribe to my newsletter (link in bio). Clip posted by @tsarnick and extracted from @WSJ's interview (link below).
Link to the interview: youtube.com/watch?v=mAUpxN…
@LuizaJarovsky Source of data used for “learning” should not matter as long as it was publicly accessible and as long as it didn’t violate providers policy (ie copying it to Open AIs storage). Just as humans are free to see and learn to create new things so should AI training. No?
@LuizaJarovsky Yes, but one comment. You write: “In the clip below, Mira Murati, OpenAI's CTO, cannot specify or exemplify the sources of data used to train Sora.” I don’t think that she “can’t,” but that she won’t. She was coached to stay away from discussion of training data. She complied.
@LuizaJarovsky They found it easier to steal and chose that path. It's a tell-tale stench.