Sora之后,苹果视频生成大模型STIV,87亿参数一统T2VTI2V任务插值分类器视频生成模型

AppleMM1Team再发新作,这次是苹果视频生成大模型,关于模型架构、训练和数据的全面报告,87亿参数、支持多模态条件、VBench超PIKA,KLING,GEN-3。

OpenAI的Sora公布了一天之后,在一篇由多位作者署名的论文《STIV:ScalableTextandImageConditionedVideoGeneration》中,苹果正式公布自家的多模态大模型研究成果——这是一个具有高达8.7B参数的支持文本、图像条件的视频生成模型。

近年来,视频生成领域取得了显著进展,尤其是基于DiffusionTransformer(DiT)架构的视频生成模型Sora的推出。尽管研究者已在如何将文本及其他条件融入DiT架构方面进行了广泛探索,如PixArt-Alpha使用跨注意力机制,SD3将文本与噪声块拼接并通过MMDiT模块应用自注意力等,但纯文本驱动的视频生成(T2V)在生成连贯、真实视频方面仍面临挑战。为此,文本-图像到视频(TI2V)任务被提出,通过加入初始图像帧作为参考,提供了更具约束性的生成基础。

当前主要挑战在于如何将图像条件高效地融入DiT架构,同时在模型稳定性和大规模训练效率方面仍需创新。为解决这些问题,我们提出了一个全面、透明的白皮书,涵盖了模型结构,训练策略,数据和下游应用,统一了T2V和TI2V任务。

基于以上问题,该工作的贡献与亮点主要集中在:

该研究不仅提升了视频生成质量,还为视频生成模型在未来多种应用场景中的推广奠定了坚实基础。

构建STIV的配方解析

基础模型架构

STIV基于PixArt-Alpha架构,通过冻结的变分自编码器(VAE)将输入帧转换为时空潜变量,并使用可学习的DiT块进行处理。文本输入由T5分词器和内部训练的CLIP文本编码器处理。此外,该研究还对架构进行了以下优化:

模型扩展与训练优化

融合图像条件的方法

简单的帧替换方法

在训练过程中,我们将第一个帧的噪声潜变量替换为图像条件的无噪声潜变量,然后将这些潜变量传递到STIV模块中,并屏蔽掉被替换帧的损失。在推理阶段,我们在每次扩散步骤中使用原始图像条件的无噪声潜变量作为第一个帧的潜变量。

帧替换策略为STIV的多种应用扩展提供了灵活性。例如,当c_I(conditionofimage)=时,模型默认执行文本到视频(T2V)生成。而当c_I为初始帧时,模型则转换为典型的文本-图像到视频(TI2V)生成。此外,如果提供多个帧作为c_I,即使没有c_T(conditionoftext),也可以用于视频预测。同时,如果将首尾帧作为c_I提供,模型可以学习帧插值,并生成首尾帧之间的中间帧。进一步结合T2V和帧插值,还可以生成长时视频:T2V用于生成关键帧,而帧插值则填补每对连续关键帧之间的中间帧。最终,通过随机选择适当的条件策略,可以训练出一个能够执行所有任务的统一模型。

图像条件随机丢弃

如前所述,帧替换策略为训练不同类型的模型提供了高度灵活性。我们在此展示其具体应用,即同时训练模型以执行文本到视频(T2V)和文本-图像到视频(TI2V)任务。在训练过程中,我们随机丢弃图像条件cI和文本条件cT,类似于T2V模型中仅对文本条件随机丢弃的方式。

联合图像-文本无分类器引导(JIT-CFG)

无分类器引导(Classifier-FreeGuidance,CFG)在文本到图像生成中表现出色,可以通过将概率质量引导到高似然区域来显著提升生成质量。在此基础上,我们提出了联合图像-文本无分类器引导(JIT-CFG),同时利用文本和图像条件进行引导,其速度估计公式为:

其中s为引导比例。当c_I=时,该方法退化为标准的T2V无分类器引导。尽管可以像InstructPix2Pix所述引入两个独立的引导比例,以平衡图像和文本条件的强度,我们发现两步推理方法已经能够取得优异效果。此外,使用两个引导比例会增加一次前向传递,从而提高推理成本。

实验证明图像条件随机丢弃结合JIT-CFG不仅能自然地实现多任务训练,还有效解决了高分辨率视频生成模型训练的“静止”问题。我们推测,图像条件随机丢弃可以防止模型过度依赖图像条件,从而更好地捕捉视频训练数据中的运动信息。

渐进式训练策略

数据

视频预处理和特征提取细节

视频字幕生成与分类细节

我们主要探索了两种方向:(1)抽样少量帧,应用图像字幕生成器生成字幕后,再使用大型语言模型(LLM)对生成的字幕进行总结;(2)直接使用视频专用的LLM生成字幕。

在初步尝试了第一种方法后,我们发现两个主要局限性:一是图像字幕生成器只能捕捉单帧的视觉细节,导致缺乏对视频动作的描述;二是LLM在基于多帧字幕生成密集描述时可能会出现虚构现象(hallucination)。

近期研究使用GPT家族模型创建微调数据集并训练视频LLM。为了在大规模字幕生成中平衡质量和成本,我们选择了一种高效的视频字幕生成器。随后,我们使用LLM对生成的字幕进行分类,并统计视频的类别分布。

DSG-Video:虚构检测评估

为了比较不同字幕生成技术,我们开发了一个评估模块,用于评估字幕的丰富度和准确性。

我们通过测量字幕中提及的唯一对象的多样性来量化字幕的丰富度,并通过检测虚构对象来评估准确性。

受文本到图像评估方法的启发,我们提出了DSG-Video,用于验证字幕中提到的对象是否真实出现在视频内容中。

1.首先,我们利用LLM自动生成针对字幕关键细节的问题,例如对象的身份、动作和上下文。

举例来说,给定一段提到“沙发上坐着一只猫”的字幕,LLM会生成问题,比如“视频中是否有一只猫?”以及“猫是否在沙发上?”

2.然后,我们使用多模态LLM回答这些对象验证问题,通过评估视频中多个均匀采样帧的每个参考对象的存在情况。

对于每个生成的问题(例如,“该帧中是否有猫?”),多模态LLM检查每个采样帧并提供响应。如果对于某个问题,所有帧的响应都表明对象不存在,则我们将其分类为虚构对象。

这一方法确保了对视频中每个对象的逐帧验证。基于此,我们定义了两个评估指标:

结果

基于上述研究,我们将T2V和STIV模型从600M参数扩展到8.7B。

T2V性能

表格列出了不同T2V模型在VBench上的对比结果,包括VBench-Quality、VBench-Semantic和VBench-Total分数。分析表明,扩展T2V模型的参数能够提升语义理解能力。具体来说,当模型从XL增加到XXL和M时(三种模型尺度),VBench-Semantic分数从72.5提升到72.7,最终达到74.8。这表明更大的模型在捕获语义信息方面表现更好。然而,对于视频质量的影响相对有限,VBench-Quality仅从80.7提升至82.1。这一发现表明,模型参数扩展对语义能力的提升大于对视频质量的影响。此外,将空间分辨率从256提升到512时,VBench-Semantic分数显著提高,从74.8上升到77.0。

SFT的影响

TI2V性能

应用

视频预测

我们从STIV-XXL模型出发,训练一个以前四帧为条件的文本-视频到视频模型(STIV-V2V)。实验结果表明,在MSRVTT测试集和MovieGenBench上,视频到视频模型的FVD分数显著低于文本到视频模型。这表明视频到视频模型在生成高保真和一致性视频帧方面表现出色,尤其适用于自动驾驶和嵌入式AI等需要高质量生成的领域。

帧插值

多视角生成

多视角生成旨在从给定的输入图像创建新视角。这项任务对视角一致性要求较高,依赖于良好预训练的视频生成模型。通过将视频生成模型适配为多视角生成,我们可以验证预训练是否有效捕获了3D信息,从而提升生成效果。

我们使用某些新视角相机的定义,并以初始帧为给定图像,预测接下来的新视角帧。通过训练一个TI2V模型并调整分辨率和训练步数,我们实现了与现有方法相当的表现,同时验证了我们的时空注意力机制在保持3D一致性方面的有效性。

长视频生成

更多关于模型结构、图像条件融合方法,训练策略的各种消融实验以及其他研究细节,请参考原论文。

THE END
1.LesParkLesPark is a dating app for the female LGBT community. It allows thousands of women to record and share their lives, express themselves, share thousands of fresh videos, sounds, pictures and make friends easily. Not only live stream chat interaction, singing and dancing pleasure, but also thehttps://apps.apple.com/sa/app/lespark-lesbian-dating-chat/id1658763158
2.SooneAppReviewHowever, Soone is one of the good dating apps that care about the crowd that wants to date and the crowd that is not looking for dates as well.Soone real dating app is designed to help you in increasing your network for any purpose you want. Be it to create friends, find business https://www.mobileappdaily.com/app-review/soone-app
3.15BestWebAppIdeastoInspireYouin2024With little investment, your website app can be a success. However, you need to be prepared for stiff competition. Tinder was the first dating app on the scene and remains one of the most popular dating apps today. According to marriage statistics, there are many people out there who havehttps://spdload.com/blog/web-app-ideas/
4.OkCupidDating:DateSingles版本记录Meet singles and find great connections – download OkCupid and enjoy the best online dating app today! Privacy: https://okcupid-app.zendesk.com/hc/en/articles/22780694078491 Terms: https://okcupid-app.zendesk.com/hc/en/articles/23941864418203 版本: 95.4.0 版本更新日期 2024-11-25 版本对比 OkCuhttp://cdn.aso100.com/app/version/appid/338701294/country/
5.Thebestfreedatingapps[updatedNovember2024]MashableBest dating app for queer women Who it's for: HERis the best free dating site for queer, bisexual, trans, and lesbian women. (It's particularly useful if you're over men pretending to be women online or you're just tired of being asked for a threesome by straight couples.) The fachttps://mashable.com/roundup/best-free-dating-sites-apps-hinge-tinder-okcupid/
6.ReviewNovember2024:Lookingforsomegirlfun?Our dating site reviews and results are objective and independent in contrast to many other comparison sites We update our reviews every month based on new site/app offerings and feedback from our readers We are the Author of the Book "Online Dating for Dummies" - read more on our About Ushttps://www.datingscout.com/lespark/review
7.KoreanDatingIn comparison to what you may be used to, in general,Koreans like to communicate with their boomuch, much more often, throughout the day, fromAnother aspect of Korean dating that could be fun for you to experience is all the couples’ items. From rings to shoes to whole outfits forhttp://www.90daykorean.com/dating-in-korea/
8.Adam4AdamReviewsUpdateSummer2023:BestDating50% of Adam4adam’s users are black men. The platform additionally allows couples to join. Read More Users Age To enjoy the Adam4adam datingdating platforms ask users to open a new account and pay to access that service. The Adam4adam app was convenient for the user whenever he http://adam4adam.reviews/
9.convergentvalidationcorrelations,andcomparisonwithInternet dating site and 70,000+ members in six other countries. Correlations with the five variables characterize the FTI and are consistent withWe undertook the comparison between the FTI and the NEO-FFI for two reasons: (1) the NEO PI-R and NEO FFI are widely used as psychometric http://dx.doi.org/10.3389/fpsyg.2015.01098
10.TheultimateiPhonecameracomparison:HowdoestheiPhone6sBesides wanting an excuse to eat gummy bears, I thought I would add a backlit macro to this comparison. The results were incredible. I was surprised with the impressive quality of the iPhone 4 and 4s. For those of you who can’t afford an upgrade or have better things to do with $1https://snapsnapsnap.photos/iphone-6s-camera-comparison/
11.霍兰德职业兴趣测验(SDS)是由哪几部分组成?萝卜的根形是由位于两对同源染色体上的两对等位基因决定的。现用两个纯合的圆形块根萝卜作亲本进行杂交。F 1 全为扁形块根。F 1 自交后代F 2 中扁形块根、圆形块根、长形块根的比例为9∶6∶1。则下列叙述正确的是 ( )https://www.shuashuati.com/ti/c7a8283539a0412d9a3722987d317a21.html
12.formarriageMailorderbridesfromThailandBangkok is a popular destination for Thailand dating tours, as well as one of the best places to visit in Thailand for couples. Because this cityThis is the best mobile app available. Mobile dating has become increasingly common. Therefore, a fantastic mobile app or site is required to guhttp://www.thaibrides.eu/
13.The100BestShowsonTV,Ranked66.Couples Therapy(Showtime) How to watch:Showtime,Amazon (with Showtime add-on),Hulu with (with Showtime add-on) comparison: Casa Amor. Introduced in the third season of the ITV2 sensation, Casa Amor is the ultimate catalyst for messy, delightful dating showhttp://www.tvguide.com/news/100-best-shows-tv-right-now-2020/
14.www.top10.com/sitemapCost, Features and our experience SilverSingles Dating Site Review (2024) - Cost, Pros and Cons EliteSingles Dating Site & App Review 2024 - Match 2024: Dating Sites Comparison How to Spoil and Surprise Long Distance Boyfriend: 10 Ways Open Minded Dating Apps: Navigating New Norms in https://www.top10.com/sitemap
15.MickyWon'sBlogt searching for poor men there. Dating an impaired woman isn’t any different in comparison to dating a standard woman. Quite simply, couples want to understand how to have good sex to be able to love one another better. The couple should slowly build until the sexual act, not merely http://wonference.com/
16.IntheMediaCenterforInnovationinSocialSciencePopular Dating App, Bumble, Tries to Rekindle People’s Love for Swiping (KCBS Radio, May 2024) Kathryn Coduto (COM & CISS Affiliate) comments on Bumble’s couple matching system. Brainstorming Solutions to Disinformation (National Academies, April 2024) Michelle A. Amazeen (COM, Director ofhttps://www.bu.edu/ciss/community/in-the-media/
17.Storhy–StorageSystemMobile apps offer robust filtering and comparison tools to help you narrow down your choices. YouDigital Dating Landscape Dating apps and online platforms have revolutionized the way people meet andCouples separated by geographical barriers can stay connected in real-time, fostering intimacy and https://www.storhy.net/
18.NegroDocumentarymy journey of transforming my life, building my business, dating in DC and traveling the worldIn the Survival Bracelet Comparison, we included different ratings and comparisons in the selectionWarm and cozy pajamas onesies are thetop picks for matching couples onesies, sewn from a warm,https://negrodocumentary.com/
19.socialmediathepeperperspectivesuicidal thoughts) with a decrease in actual social interactions (e.g., dating or leaving homeOr Why does the site or app exist? Is it sponsored by a company that sells dietary Dining at restaurants, couples check emails, search the web, text, or tweet telling others whathttps://www.peperperspective.com/tag/social-media/
20.WhichProductisBestForYou?No doubt you would have see those couples who are still madly in love with each other and passionatelyHere’s an article that might help you: http://www.themodernman.com/dating/relationships/How different are your techniques in comparison to [Edited out names of other companies] ? Danhttps://www.themodernman.com/success/which-product-is-for-you.html
21.AndyTian:developadatingAPPthattrulymeetstheneedsofIn these markets, dating apps designed to connect prospective couples and move them quickly from virtual to in-person meetings are not likely to succeed. In fact, in some parts of Africa, finding a prospective match who shares your language, religious beliefs and tribal affiliation is paramount https://www.brasilcn.com/article/article_63373.html
22.TheProjectGutenbergeBookofBeforeAdam,byJackLondonmemories of acts and events dating back in time, the simplest explanation is that they have livedMarriage was as yet in a rude state, and couples had a way of quarrelling and separating. through the trees, and Lop-Ear and I were awkward and lumbering and cowardly in comparison. https://www.gutenberg.org/files/310/310-h/310-h.htm
23.JakeandAmir(WebVideo)In "Celebrity Date," Amir plots to succeed in LA by dating a celebrity. Jake is dismissive (for comparison, two extras who left the company in Real Life were explicitly killed off in-"Couples Therapist Part 1" details how Amir hopes his therapy session with Jake will go as hehttps://tvtropes.org/pmwiki/pmwiki.php/WebVideo/JakeAndAmir
24.PowerShellGalleryWords.txt2.2.1.3appareled apparent apparently apparition apparitions appeal appealed appealer appealers appealing appealingly appeals appear appearance appearances appeared appearer appearers appearing appears appease appeased appeasement appeases appeasing appellant appellants appellate https://www.powershellgallery.com/packages/PoshFunctions/2.2.1.3/Content/Words.txt
25.DueprocessLet'sGetHonest!AbsolutelyUncommonAnalysis(from 1982 merger of Connecticut General Life — dating to 1865! and INA (Insurance Company ofBy comparison, the “Parent Coordinator” issue seems like kids’ play unless one begins to (Eagle Forum), and family preservationists — and they report on CPS stripping married couples https://familycourtmatters.org/tag/due-process/
26.dotvim/dict/english.vimatmaster·jgm/dotvim·GitHubSearch or jump to Search code, repositories, users, issues, pull requests Provide feedback We read every piece of feedback, and take your input very seriously. Include my email address so I can be contacted Cancel Submit feedback Saved searches Use saved searches to filter yourhttps://github.com/jgm/dotvim/blob/master/dict/english.vim
27.www.yegor256.com/words.txtcomparison compassion compatibility compatible compelled compensated compensates compensation compete couples coupling courage courageous course courses coursework court courteous courtesy cover coverage dating daughter dave dave's david davvd day day-to-day days db DBWORLD DDD de de-facto dehttps://www.yegor256.com/words.txt