Tech Product

Sora

別名: Sora

Overview

OpenAIが発表した動画生成AIモデル。テキストによる指示（プロンプト）から、最長1分間の高品質で物理法則をある程度反映した動画を生成することができる。特定のユーザーの容姿を動画内に反映させる機能なども研究されており、映像制作業界に大きな衝撃を与えている。

Research Papers

5 件

Open-Sora: Democratizing Efficient Video Production for All
Zangwei Zheng, Xiangyu Peng, Tianji Yang, Chenhui Shen, Shenggui Li, Hongxin Liu, Yukun Zhou, Tianyi Li, Yang You

2024 664 件引用 Semantic Scholar

Vision and language are the two foundational senses for humans, and they build up our cognitive ability and intelligence. While significant breakthroughs have been made in AI language ability, artificial visual intelligence, especially the ability to generate and simulate the world we see, is far lagging behind. To facilitate the development and accessibility of artificial visual intelligence, we created Open-Sora, an open-source video generation model designed to produce high-fidelity video content. Open-Sora supports a wide spectrum of visual generation tasks, including text-to-image generation, text-to-video generation, and image-to-video generation. The model leverages advanced deep learning architectures and training/inference techniques to enable flexible video synthesis, which could generate video content of up to 15 seconds, up to 720p resolution, and arbitrary aspect ratios. Specifically, we introduce Spatial-Temporal Diffusion Transformer (STDiT), an efficient diffusion framework for videos that decouples spatial and temporal attention. We also introduce a highly compressive 3D autoencoder to make representations compact and further accelerate training with an ad hoc training strategy. Through this initiative, we aim to foster innovation, creativity, and inclusivity within the community of AI content creation. By embracing the open-source principle, Open-Sora democratizes full access to all the training/inference/data preparation codes as well as model weights. All resources are publicly available at: https://github.com/hpcaitech/Open-Sora.
Sora: A Review on Background, Technology, Limitations, and Opportunities of Large Vision Models
Yixin Liu, Kai Zhang, Yuan Li, Zhiling Yan, Chujie Gao, Ruoxi Chen, Zhengqing Yuan, Yue Huang, Hanchi Sun, Jianfeng Gao, Lifang He, Lichao Sun

2024 625 件引用 Semantic Scholar

Sora is a text-to-video generative AI model, released by OpenAI in February 2024. The model is trained to generate videos of realistic or imaginative scenes from text instructions and show potential in simulating the physical world. Based on public technical reports and reverse engineering, this paper presents a comprehensive review of the model's background, related technologies, applications, remaining challenges, and future directions of text-to-video AI models. We first trace Sora's development and investigate the underlying technologies used to build this"world simulator". Then, we describe in detail the applications and potential impact of Sora in multiple industries ranging from film-making and education to marketing. We discuss the main challenges and limitations that need to be addressed to widely deploy Sora, such as ensuring safe and unbiased video generation. Lastly, we discuss the future development of Sora and video generation models in general, and how advancements in the field could enable new ways of human-AI interaction, boosting productivity and creativity of video generation.
Open-Sora Plan: Open-Source Large Video Generation Model
Bin Lin, Yunyang Ge, Xinhua Cheng, Zongjian Li, Bin Zhu, Shaodong Wang, Xianyi He, Yang Ye, Shenghai Yuan, Liuhan Chen, Tanghui Jia, Junwu Zhang, Zhenyu Tang, Yatian Pang, Bin She, Cen Yan, Zhiheng Hu, Xiao-wen Dong, Lin Chen, Zhang Pan, Xing Zhou, Shaoling Dong, Yonghong Tian, Li Yuan

2024 271 件引用 Semantic Scholar

We introduce Open-Sora Plan, an open-source project that aims to contribute a large generation model for generating desired high-resolution videos with long durations based on various user inputs. Our project comprises multiple components for the entire video generation process, including a Wavelet-Flow Variational Autoencoder, a Joint Image-Video Skiparse Denoiser, and various condition controllers. Moreover, many assistant strategies for efficient training and inference are designed, and a multi-dimensional data curation pipeline is proposed for obtaining desired high-quality data. Benefiting from efficient thoughts, our Open-Sora Plan achieves impressive video generation results in both qualitative and quantitative evaluations. We hope our careful design and practical experience can inspire the video generation research community. All our codes and model weights are publicly available at \url{https://github.com/PKU-YuanGroup/Open-Sora-Plan}.
Open-Sora 2.0: Training a Commercial-Level Video Generation Model in $200k
Xiangyu Peng, Zangwei Zheng, Chenhui Shen, Tom Young, Xinying Guo, Binluo Wang, Hang Xu, Hongxin Liu, M. Jiang, Wenjun Li, Yuhui Wang, Anbang Ye, G. Ren, Qianran Ma, Wanying Liang, Xiangru Lian, Xiwen Wu, Yu Zhong, Zhuangyan Li, Chaoyu Gong, Guojun Lei, Lei Cheng, Liming Zhang, Minghao Li, Ruijie Zhang, Silan Hu, Shijie Huang, Xiaokang Wang, Yuanheng Zhao, Yuqi Wang, Ziang Wei, Yang You

2025 129 件引用 Semantic Scholar

Video generation models have achieved remarkable progress in the past year. The quality of AI video continues to improve, but at the cost of larger model size, increased data quantity, and greater demand for training compute. In this report, we present Open-Sora 2.0, a commercial-level video generation model trained for only $200k. With this model, we demonstrate that the cost of training a top-performing video generation model is highly controllable. We detail all techniques that contribute to this efficiency breakthrough, including data curation, model architecture, training strategy, and system optimization. According to human evaluation results and VBench scores, Open-Sora 2.0 is comparable to global leading video generation models including the open-source HunyuanVideo and the closed-source Runway Gen-3 Alpha. By making Open-Sora 2.0 fully open-source, we aim to democratize access to advanced video generation technology, fostering broader innovation and creativity in content creation. All resources are publicly available at: https://github.com/hpcaitech/Open-Sora.
Is Sora a World Simulator? A Comprehensive Survey on General World Models and Beyond
Zheng Zhu, Xiaofeng Wang, Wangbo Zhao, Chen Min, Nianchen Deng, Min Dou, Yuqi Wang, Botian Shi, Kai Wang, Chi Zhang, Yang You, Zhaoxiang Zhang, Dawei Zhao, Liang Xiao, Jian Zhao, Jiwen Lu, Guan Huang

2024 107 件引用 Semantic Scholar

General world models represent a crucial pathway toward achieving Artificial General Intelligence (AGI), serving as the cornerstone for various applications ranging from virtual environments to decision-making systems. Recently, the emergence of the Sora model has attained significant attention due to its remarkable simulation capabilities, which exhibits an incipient comprehension of physical laws. In this survey, we embark on a comprehensive exploration of the latest advancements in world models. Our analysis navigates through the forefront of generative methodologies in video generation, where world models stand as pivotal constructs facilitating the synthesis of highly realistic visual content. Additionally, we scrutinize the burgeoning field of autonomous-driving world models, meticulously delineating their indispensable role in reshaping transportation and urban mobility. Furthermore, we delve into the intricacies inherent in world models deployed within autonomous agents, shedding light on their profound significance in enabling intelligent interactions within dynamic environmental contexts. At last, we examine challenges and limitations of world models, and discuss their potential future directions. We hope this survey can serve as a foundational reference for the research community and inspire continued innovation. This survey will be regularly updated at: https://github.com/GigaAI-research/General-World-Models-Survey.

Mentioned Articles

20 件

External Mentions

10 件

Hacker News Disney Exits OpenAI Deal After AI Giant Shutters Sora
▲ 206 timpera 2026年3月24日
Hacker News Goodbye to Sora
▲ 1142 mikeocool 2026年3月24日
Hacker News Disney making $1B investment in OpenAI, will allow characters on Sora AI
▲ 327 tiahura 2025年12月11日
Hacker News The Walt Disney Company and OpenAI Partner on Sora
▲ 269 inesranzo 2025年12月11日
Hacker News Is Sora the beginning of the end for OpenAI?
▲ 182 warrenm 2025年10月21日
Hacker News Sora 2
▲ 271 meetpateltech 2025年9月30日
Hacker News Sora 2
▲ 905 skilled 2025年9月30日
Hacker News Sora is here
▲ 1152 toomuchtodo 2024年12月9日
Hacker News OpenAI hits pause on video model Sora after artists leak access in protest
▲ 138 thm 2024年11月27日
Hacker News Sora: Creating video from text
▲ 3647 davidbarker 2024年2月15日

Sora

Overview

Research Papers

Mentioned Articles

毎分150億トークンの代償：AI業界を覆うコンピュート配給制の実態

OpenAIがテック業界の”情報の中心地”TBPNを買収：メディア戦略の転換が問うもの

Fei-Fei Li率いるWorld Labs、空間知能（Spatial Intelligence）実現へ向けて10億ドルを調達：AIは「言葉」を超え「世界」を理解するか

ロボットの人間のような空間認識能力獲得を目指す「D4RT」AIモデルをDeepMindが発表：4次元認識を300倍高速化する“クエリ型”視覚革命

YouTubeの2026年「AIアバター」戦略：クリエイターの「容姿」がソフトウェア化する時代の到来と、迫りくる「AIスロップ」との対決

「究極のリスク」か、人類の繁栄か。Anthropic首席科学者が警告する2030年の「最大の決断」とAI自己進化の行方

OpenAI「Codex」がGPT-5で自己進化する「再帰的開発」へ突入──Soraアプリを18日で構築したAIエージェントの衝撃と真価

OpenAI GPT-5.2 始動：Google Gemini 3への回答となる「自律型エージェント」の真価と、産業界にもたらすパラダイムシフト

ディズニーがOpenAIに10億ドルの巨額投資：Soraでミッキーやスター・ウォーズが生成可能になる「歴史的提携」の全貌

ElevenLabs、ハリウッド俳優等の著名人の“声”をAI音声として利用できるマーケットプレイスを開始

OpenAIの動画生成AI「Sora」、ついにAndroid版が登場

AIは「世界」をどう認識するのか？イーロン・マスクxAIが挑む「世界モデル」の野望とゲーム業界の未来

OpenAI、「Sora 2」モデルと新SNSアプリ「Sora」を発表：物理法則を理解し音声も生成、動画への“カメオ出演”も可能に

TikTokのByteDanceによる動画生成AI「Seedance 1.0」がSora・Veo超えの衝撃

Google、Gemini AdvancedでVeo 2による高品質AI動画生成機能を全面展開 – テキストから8秒の映像作品が簡単に

OpenAI、ChatGPTの画像生成機能を大幅刷新―GPT-4oによる高精度テキスト表現が実現

Adobe Firefly、テキスト生成AI動画機能をベータ公開も課題が山積

OpenAIの共同設立者Schulman氏、Anthropicを退社し、元OpenAI CTOのAIスタートアップに参加

OpenAI、12日間の新製品発表イベント「12 Days of OpenAI」を発表 – Soraなど注目の新機能を連日公開へ

Google、動画生成AI「Veo」のVertex AI提供を開始 – より多くの企業へAI動画制作の門戸を開く

External Mentions

Sora

Overview

Research Papers

Mentioned Articles

毎分150億トークンの代償：AI業界を覆うコンピュート配給制の実態

OpenAIがテック業界の&#8221;情報の中心地&#8221;TBPNを買収：メディア戦略の転換が問うもの

Fei-Fei Li率いるWorld Labs、空間知能（Spatial Intelligence）実現へ向けて10億ドルを調達：AIは「言葉」を超え「世界」を理解するか

ロボットの人間のような空間認識能力獲得を目指す「D4RT」AIモデルをDeepMindが発表：4次元認識を300倍高速化する“クエリ型”視覚革命

YouTubeの2026年「AIアバター」戦略：クリエイターの「容姿」がソフトウェア化する時代の到来と、迫りくる「AIスロップ」との対決

「究極のリスク」か、人類の繁栄か。Anthropic首席科学者が警告する2030年の「最大の決断」とAI自己進化の行方

OpenAI「Codex」がGPT-5で自己進化する「再帰的開発」へ突入──Soraアプリを18日で構築したAIエージェントの衝撃と真価

OpenAI GPT-5.2 始動：Google Gemini 3への回答となる「自律型エージェント」の真価と、産業界にもたらすパラダイムシフト

ディズニーがOpenAIに10億ドルの巨額投資：Soraでミッキーやスター・ウォーズが生成可能になる「歴史的提携」の全貌

ElevenLabs、ハリウッド俳優等の著名人の“声”をAI音声として利用できるマーケットプレイスを開始

OpenAIの動画生成AI「Sora」、ついにAndroid版が登場

AIは「世界」をどう認識するのか？ イーロン・マスクxAIが挑む「世界モデル」の野望とゲーム業界の未来

OpenAI、「Sora 2」モデルと新SNSアプリ「Sora」を発表：物理法則を理解し音声も生成、動画への“カメオ出演”も可能に

TikTokのByteDanceによる動画生成AI「Seedance 1.0」がSora・Veo超えの衝撃

Google、Gemini AdvancedでVeo 2による高品質AI動画生成機能を全面展開 &#8211; テキストから8秒の映像作品が簡単に

OpenAI、ChatGPTの画像生成機能を大幅刷新―GPT-4oによる高精度テキスト表現が実現

Adobe Firefly、テキスト生成AI動画機能をベータ公開も課題が山積

OpenAIの共同設立者Schulman氏、Anthropicを退社し、元OpenAI CTOのAIスタートアップに参加

OpenAI、12日間の新製品発表イベント「12 Days of OpenAI」を発表 &#8211; Soraなど注目の新機能を連日公開へ

Google、動画生成AI「Veo」のVertex AI提供を開始 &#8211; より多くの企業へAI動画制作の門戸を開く

External Mentions

OpenAIがテック業界の”情報の中心地”TBPNを買収：メディア戦略の転換が問うもの

AIは「世界」をどう認識するのか？イーロン・マスクxAIが挑む「世界モデル」の野望とゲーム業界の未来

Google、Gemini AdvancedでVeo 2による高品質AI動画生成機能を全面展開 – テキストから8秒の映像作品が簡単に

OpenAI、12日間の新製品発表イベント「12 Days of OpenAI」を発表 – Soraなど注目の新機能を連日公開へ

Google、動画生成AI「Veo」のVertex AI提供を開始 – より多くの企業へAI動画制作の門戸を開く