OpenAI's Sora Training Data Sparks Debate

OpenAI’s Mira Murati and Wall Street Journal’s Joanna Stern recently shared insights into the Sora text-to-video model, stirring excitement with its potential.

OpenAI CTO Mira Murati says Sora was trained on publicly available and licensed data pic.twitter.com/rf7pZ0ZX00
— Tsarathustra (@tsarnick) March 13, 2024

However, questions about the data used to train Sora, including whether it included YouTube, Facebook, or Instagram content, remained unanswered.

Murati mentioned “publicly available and licensed data,” highlighting a partnership with Shutterstock but remained vague about social media sources.

Mira Murati Interview in 4 Stages. pic.twitter.com/avB4EXQCPk
— AshutoshShrivastava (@ai_for_success) March 14, 2024

This sparked a broader discussion on transparency in AI training data, especially with current AI copyright issues.

The focus is not only on the legality but also on public awareness and trust. With tech giants like Google and Meta using publicly shared content for training AI, the debate extends to the ethics of data use and public knowledge.

The evolving conversation around AI training data, privacy, and the future of creative work highlights a critical balance between innovation and individual rights.

As AI technologies advance, the importance of transparency and public understanding becomes increasingly clear, underscoring the ongoing dialogue about the intersection of technology, legality, and ethics in the digital age.

OpenAI’s Sora Training Data Sparks Debate

Leave a Comment Cancel reply