Startup World

Video world models, which predict future frames conditioned on actions, hold immense promise for artificial intelligence, enabling agents to plan and reason in dynamic environments.
Recent advancements, particularly with video diffusion models, have shown impressive capabilities in generating realistic future sequences.
However, a significant bottleneck remains: maintaining long-term memory.
Current models struggle to remember events and states from far in the past due to the high computational cost associated with processing extended sequences using traditional attention layers.
This limits their ability to perform complex tasks requiring sustained understanding of a scene.A new paper, Long-Context State-Space Video World Models by researchers from Stanford University, Princeton University, and Adobe Research, proposes an innovative solution to this challenge.
They introduce a novel architecture that leverages State-Space Models (SSMs) to extend temporal memory without sacrificing computational efficiency.The core problem lies in the quadratic computational complexity of attention mechanisms with respect to sequence length.
As the video context grows, the resources required for attention layers explode, making long-term memory impractical for real-world applications.
This means that after a certain number of frames, the model effectively forgets earlier events, hindering its performance on tasks that demand long-range coherence or reasoning over extended periods.The authors key insight is to leverage the inherent strengths of State-Space Models (SSMs) for causal sequence modeling.
Unlike previous attempts that retrofitted SSMs for non-causal vision tasks, this work fully exploits their advantages in processing sequences efficiently.The proposed Long-Context State-Space Video World Model (LSSVWM) incorporates several crucial design choices:Block-wise SSM Scanning Scheme: This is central to their design.
Instead of processing the entire video sequence with a single SSM scan, they employ a block-wise scheme.
This strategically trades off some spatial consistency (within a block) for significantly extended temporal memory.
By breaking down the long sequence into manageable blocks, they can maintain a compressed state that carries information across blocks, effectively extending the models memory horizon.Dense Local Attention: To compensate for the potential loss of spatial coherence introduced by the block-wise SSM scanning, the model incorporates dense local attention.
This ensures that consecutive frames within and across blocks maintain strong relationships, preserving the fine-grained details and consistency necessary for realistic video generation.
This dual approach of global (SSM) and local (attention) processing allows them to achieve both long-term memory and local fidelity.The paper also introduces two key training strategies to further improve long-context performance:Diffusion Forcing: This technique encourages the model to generate frames conditioned on a prefix of the input, effectively forcing it to learn to maintain consistency over longer durations.
By sometimes not sampling a prefix and keeping all tokens noised, the training becomes equivalent to diffusion forcing, which is highlighted as a special case of long-context training where the prefix length is zero.
This pushes the model to generate coherent sequences even from minimal initial context.Frame Local Attention: For faster training and sampling, the authors implemented a frame local attention mechanism.
This utilizes FlexAttention to achieve significant speedups compared to a fully causal mask.
By grouping frames into chunks (e.g., chunks of 5 with a frame window size of 10), frames within a chunk maintain bidirectionality while also attending to frames in the previous chunk.
This allows for an effective receptive field while optimizing computational load.The researchers evaluated their LSSVWM on challenging datasets, including Memory Maze and Minecraft, which are specifically designed to test long-term memory capabilities through spatial retrieval and reasoning tasks.The experiments demonstrate that their approach substantially surpasses baselines in preserving long-range memory.
Qualitative results, as shown in supplementary figures (e.g., S1, S2, S3), illustrate that LSSVWM can generate more coherent and accurate sequences over extended periods compared to models relying solely on causal attention or even Mamba2 without frame local attention.
For instance, on reasoning tasks for the maze dataset, their model maintains better consistency and accuracy over long horizons.
Similarly, for retrieval tasks, LSSVWM shows improved ability to recall and utilize information from distant past frames.
Crucially, these improvements are achieved while maintaining practical inference speeds, making the models suitable for interactive applications.The Paper Long-Context State-Space Video World Models is on arXivLike this:LikeLoading...





Unlimited Portal Access + Monthly Magazine - 12 issues


Contribute US to Start Broadcasting - It's Voluntary!


ADVERTISE


Merchandise (Peace Series)

 


Tesollo to present humanoid robot hand at AI for Good Global Summit 2025


The curious rise of giant tablets on wheels


Rocket Report: Japan’s workhorse booster takes a bow; you can invest in SpaceX now


World-first: DJI drone movies whole Everest path in one go


DJI’s ultimate phone gimbal gets early Prime Day discount


SEW-EURODRIVE now assembles planetary gear units in the U.S.


Ready-made stem cell therapies for pets could be coming


Supplier of concealed security app spills passwords for 62,000 users


Judge: You can’t ban DEI grants without bothering to define DEI


Meta's AI superintelligence effort sounds just like its failed metaverse


The Last of Us co-creator Neil Druckmann exits HBO show


2025 VW ID Buzz review: If you want an electric minivan, this is it


Man’s ghastly festering ulcer stumps doctors—until they cut out a wedge of flesh


xAI data center gets air authorization to run 15 turbines, but imaging reveals 24 on site


Sky Elements Drone Show Aims for World Records on July 4 Celebrations


Quantum Systems and Fraunhofer FHR to Integrate State-of-the-Art Radar Technology into UAVs


The Number Of P-51 Mustangs Are LeftThe newest survivor census maintained by the lover site MustangsMustangs pegs general numbers at 311 complete airframes. Of these, 29 remain in long-lasting storage, 54 remain in active restoration hangars, 159 are sti


Buyers still waiting: DJI drones face ongoing US Customs snag


How to Set Up a Planetary Gear Motion with SOLIDWORKS


Intuitive Surgical obtains CE mark for da Vinci 5 robot


Pittsburgh Robotics Network introduces Deep Tech Institute for Leadership and Innovation


Cluely’s ARR doubled in a week to $7M, founder Roy Lee says. But rivals are coming.


Who is Soham Parekh, the serial moonlighter Silicon Valley startups can’t stop hiring


Stripe’s first employee, the founder of fintech Increase, sort of bought a bank


Why Cloudflare desires AI business to pay for content


Pinwheel introduces a smartwatch for kids that includes an AI chatbot


Castelion is raising a $350M Series B to scale hypersonic rocket service


Tighten up your cap table with Fidelity, Cimulate, and DepositLink at A Technology NewsRoom All Stage 2025


Writer CEO May Habib to take the AI Stage at A Technology NewsRoom Disrupt 2025


Israeli quantum startup Qedma just raised $26M, with IBM joining in


TikTok is being flooded with racist AI videos created by Google's Veo 3


Whatever that might go wrong with X's new AI-written neighborhood notes


New proof that some supernovae may be a double detonation


Rice might be essential to developing better non-alcoholic beer


AT T present Wireless Account Lock defense to curb the SIM-swap scourge


From Le Mans to Driven-- where does F1: The Movie rank


NYT to start searching erased ChatGPT logs after beating OpenAI in court


Paramount accused of bribery as it settles Trump suit for $16 million


Medical groups warn Senate budget bill will create dystopian health care system


Tesla Q2 2025 sales dropped more than 13% year over year


What's incorrect with AAA games The development of the next Battlefield has answers.To comprehend exactly what's happening with the next Battlefield title-- codenamed Glacier-- we need to rewind a bit. broadened the franchise audience to more directly com


Astronomers might have found a third interstellar item


RTX and Shield AI Partner to Develop New Defense Capabilities


NYPD Considers Net-Firing Drones to Take Down 'Hostile' Drones


Iran Unveils Shahed 107


China Starts Production of D18 Cargo Drone for Low-Altitude Strategic Logistics Operations


Wildlife Drones Saving Rhinos from Poachers in India’s National Parks


DJI expands Power lineup with mighty new Power 2000 station


ABB updates IRB 1200 line, adds 3 robot families for China


Galbot picks up $153M to commercialize G1 semi-humanoid


Luminous gets funding to bring LUMI solar construction robot to Australia


Wonder Dynamics co-founder Nikola Todorovic joins the AI Stage at A Technology NewsRoom Disrupt 2025


Robinhood's co-founder is beaming up (and down) the future of energy


Lovable on track to raise $150M at $2B appraisal


RFK Jr.'s health department calls Nature scrap science, cancels memberships


Pentagon might put SpaceX at the center of a sensor-to-shooter targeting network


FCC chair decides prisoners and their families should keep paying high phone rates


Moderna states mRNA flu vaccine cruised through trial, beating standard shot


Nudify app's strategy to dominate deepfake porn depends upon Reddit, docs show


Nothing Phone 3 gets here July 15 with a small dot matrix rear display


United States crucial facilities exposed as feds caution of possible attacks from Iran


White House works to ground NASA science objectives before Congress can act


Glen Powell plays a hazardous game in The Running Man trailer


Ted Cruz plan to penalize states that control AI shot down in 99-1 vote


GOP desires EV tax credit gone; it would be a catastrophe for Tesla


GOP budget expense poised to squash renewable resource in the US


Tuesday Telescope: A howling wolf in the night sky


Pay up or stop scraping: Cloudflare program charges bots for each crawl


Silvus Technologies Launches Spectrum Dominance 2.0 Next Generation EW Defenses


France's XSun and H3 DYNAMICS Join Forces to Develop World's First Solar Hydrogen Electric UAV


Ukraine’s New Drone Built to Kill Shaheds


Russia's Weapons Stockpile: How Many Missiles and Drones are Left


Parry Labs and Airbus Partner on United States Marine Corps' Unmanned Aerial Logistics Connector


Top 10 robotics advancements of June 2025


Farmer-first future: Agtonomy's technique to clever farming


Genesis AI brings in $105M to build universal robotics foundation design


Amazon releases new AI structure model, releases 1 millionth robotic


Civ Robotics areas Series A funding for automated surveying


Figma moves closer to a blockbuster IPO that could raise $1.5 B


Roadway to Battlefield: Central Eurasia's entrance to A Technology NewsRoom Startup Battlefield


David George from a16z on the future of going public at A Technology NewsRoom Disrupt 2025


Mo Jomaa breaks down IPO preparation for creators on the Scale Stage at A Technology NewsRoom All Stage


Genesis AI introduces with $105M seed funding from Eclipse, Khosla to build AI models for robots


A mammoth tusk boomerang from Poland is 40,000 years old


Analyst: M5 Vision Pro, Vision Air, and smart glasses coming in 2026–2028


Research study roundup: 6 cool science stories we nearly missed out on


Drug cartel hacked FBI official’s phone to track and kill informants, report says


Half a million Spotify users are unknowingly grooving to an AI-generated band


Senate GOP budget plan expense has little-noticed arrangement that might harm your Wi-Fi


Texas politicians advance in effort to wrench space shuttle bus from Smithsonian


Nearly 12 million individuals would lose medical insurance under Senate GOP expense


Project Hail Mary trailer looks like a winner for Andy Weir fans


Meta, TikTok can’t toss wrongful death suit from mom of “subway surfing” teen


Supreme Court to choose whether ISPs need to disconnect users accused of piracy


Trump's tariff threat pushes Canada to scrap digital services tax


NIH budget cuts affect research study funding beyond US borders


The second launch of New Glenn will aim for Mars


Android 16 review: Post-hype


Cops Helicopter Chasing Drones Near United States Air Base in Near Miss with F-15


ZeroAvia Gets UK Government Grant for Development and Flight Test of Liquid Hydrogen Fuel System


Shield AI and Amazon Web Services Collaborate to Deliver Mission Autonomy at Fleet Scale


Raspberry Pi Powers Next-Gen UAV Swarm Intelligence


US Air Force Reaper Drones to Test New Anti-Hacking Software


FAA approves AVSS parachute for DJI Matrice 4 drones


Shell extends multi-million dollar deal with drone firm Cyberhawk


DJI simply revealed its most effective delivery drone yet


Joby Aviation (JOBY) begins piloted eVTOL flights in the United Arab Emirates [Video]


Unitree ends up being a legged robotic unicorn with Series C financing


Tacta Systems raises $75M to give robots a ‘smart nervous system’


Sri Mandir keeps investors hooked as digital devotion grows


Legal software company Clio drops $1B on law data giant vLex


Next-gen procurement platform Levelpath catches $55M


From $5 to financial empowerment: Why Stash co-founder Brandon Krieg is a must-see at A Technology NewsRoom All Stage 2025


Tailor, a 'headless' ERP start-up, raises $22M Series A


Ex-Meta engineers have actually built an AI tool to plan every information of your trip


3 powerhouses cover how to prepare now for your later-stage raise at A Technology NewsRoom Disrupt 2025


Not simply luck-- it's method: Tiffany Luck on winning over VCs at A Technology NewsRoom All Stage


Tiny AI ERP startup Campfire is winning numerous start-ups from NetSuite, Accel led a $35M Series A


Jennifer Neundorfer on how AI is reshaping the way startups are built — live at A Technology NewsRoom All Stage


Kristen Craft brings fresh fundraising strategy to the Foundation Stage at A Technology NewsRoom All Stage