Promising approach for generating 3D scenes from text prompts using: - Gaussian Splatting - initialized by text-to-image generators - projected into 3D - optimization of this representation across multiple views as a 3D inpainting task with image-conditional diffusion models. Open source release is planned for june. Abstract We introduce RealmDreamer, a technique for generation of general forward-facing 3D scenes from text descriptions. Our technique optimizes a 3D Gaussian Splatting representation to match complex text prompts. We initialize these splats by utilizing the state-of-the-art text-to-image generators, lifting their samples into 3D, and computing the occlusion volume. We then optimize this representation across multiple views as a 3D inpainting task with image-conditional diffusion models. To learn correct geometric structure, we incorporate a depth diffusion model by conditioning on the samples from the inpainting model, giving rich geometric structure. Finally, we finetune the model using sharpened samples from image generators. Notably, our technique does not require video or multi-view data and can synthesize a variety of high-quality 3D scenes in different styles, consisting of multiple objects. Its generality additionally allows 3D synthesis from a single image. Jaidev Shriram, Alex Trevithick, Lingjie Liu, Ravi Ramamoorthi: RealmDreamer: Text-Driven 3D Scene Generation with Inpainting and Depth Diffusion 📖 arXiv: https://lnkd.in/eJ2bKTCz License: CC BY 4.0 The movie below is from the project page, with kind permission of the authors. https://lnkd.in/eHsd4R_3
3D Scene Creation from Text Descriptions
Explore top LinkedIn content from expert professionals.
Summary
3D scene creation from text descriptions is an innovative technology that lets you generate interactive three-dimensional environments simply by describing them in words. This approach allows anyone, from engineers to artists, to build and explore 3D worlds without traditional modeling skills, making spatial computing and digital twins more accessible.
- Explore new workflows: Experiment with tools that turn text prompts, single images, or videos into usable 3D scenes, streamlining the process for robotics, simulation, and content creation.
- Iterate quickly: Use text-driven platforms to test, refine, and visualize ideas faster, allowing you to move from concept to walkthrough-ready spaces in real time.
- Unlock creativity: Try stylization features that let you independently adjust the shape and appearance of 3D worlds, opening up endless possibilities for immersive experiences.
-
-
This week's defining shift for me is that creating 3D data is getting much simpler. New tools are turning everyday inputs like smartphone video, single photos, and text prompts into usable 3D environments and assets. This lowers the barrier to building the scenes, objects, and spaces that robotics, simulation, and immersive content rely on. It also shifts 3D creation from a specialized skill to something all teams can generate quickly and at the scale modern spatial systems require. This week’s news surfaced signals like these: 🤖 Parallax Worlds raised $4.9 million to turn standard video into digital twins for robotics testing. The platform turns basic walkthrough videos into interactive 3D spaces that teams can use to run their robot software and see how it performs before sending anything into the field. 🪑 Meta introduced SAM 3D to reconstruct objects and people from single images, producing full-textured meshes even when subjects are partly hidden or shot from difficult angles. The models were trained using real-world data and a staged process to improve accuracy. 🌏 Meta unveiled WorldGen, a research tool that generates full 3D worlds from text prompts. It produces complete, navigable spaces that can be used in Unity or Unreal and shows how AI can create environments without manual modeling. Why this matters: Faster 3D pipelines expand who can build, test, and refine spatial ideas. They turn 3D creation from a bottleneck into a regular part of development, which opens the door to more experimentation and better decisions earlier in the process. #robotics #digitaltwins #simulation #VR #AR #virtualreality #spatialcomputing #physicalAI #AI #3D
-
Engineering is becoming compilable. For months, Vlad and I have been talking about this idea: Engineering intent → compiled into engineering artifacts. Recently, my colleague Pascalis, a robotics and NVIDIA Omniverse expert, approached me with a simple but powerful idea: Connect Anthropic's Claude Code to NVIDIA Omniverse. The question was straightforward: Can agentic coding tools interact with Omniverse and place 3D objects directly into a factory layout? The short answer: Yes, of course. In our setup, a textual description of a production environment and the objects to be placed is enough to generate USDA files, the native scene description format used by NVIDIA Omniverse to describe 3D scenes, assets, transformations, materials, hierarchies and layouts. The workflow looks like this: - An engineer describes what they want to visualize. - Claude (Opus 4.6) compiles this description into machine-readable USDA files. - Omniverse loads the scene and the factory layout appears. What sounds simple is actually a big shift. In the future, engineers will no longer interact directly with engineering tools. The clicks inside CAD, simulation or layout tools will disappear. AI becomes the interface between humans and engineering software, translating engineering intent into machine-readable engineering artifacts. In the example, the system retrieves assets from an Amazon Web Services (AWS) bucket. The LLM does not have access to asset metadata, only the asset names. Yet it can still place and reference the correct objects in the scene. → Engineering becomes compilable. Next steps we are working on: - Providing metadata for assets (dimensions, orientation, connection points) - Using custom Omniverse extensions so the LLM can interact via the Python API - Refactoring large USDA files into multiple smaller files so LLMs can modify them more easily What we are seeing here is not just a cool Omniverse demo. It is a preview of how factory planning, robotics, CAD, simulation, and system engineering will work in the future: Describe what you want. The system builds it. The engineer reviews and iterates. The role of the engineer shifts from tool operator to system architect. And that will change engineering more than any new CAD feature ever could. Curious to hear your thoughts! When engineering becomes compilable, what happens to the tools we use today? Additional observation: the performance of the system heavily depends on a skills.md file that teaches the LLM how USDA and Omniverse work! Essentially giving the model domain-specific engineering knowledge and file structure rules. Nitin Ugale | Timmo Sturm | Sebastian Angerer | Sebastian Linzmair | Florian Böhme
-
🎨✨ Niantic, Inc. 𝗥𝗲𝘀𝗲𝗮𝗿𝗰𝗵 𝗷𝘂𝘀𝘁 𝗱𝗿𝗼𝗽𝗽𝗲𝗱 𝗮 𝗴𝗮𝗺𝗲-𝗰𝗵𝗮𝗻𝗴𝗲𝗿 𝗶𝗻 𝟯𝗗 𝗦𝗽𝗹𝗮𝘁 𝘀𝘁𝘆𝗹𝗶𝘇𝗮𝘁𝗶𝗼𝗻: 𝙈𝙤𝙧𝙥𝙝𝙚𝙪𝙨, 𝗮 𝗻𝗲𝘄 𝗺𝗲𝘁𝗵𝗼𝗱 𝗳𝗼𝗿 𝘁𝗲𝘅𝘁-𝗱𝗿𝗶𝘃𝗲𝗻 𝘀𝘁𝘆𝗹𝗶𝘇𝗮𝘁𝗶𝗼𝗻 𝗼𝗳 𝟯𝗗 𝗚𝗮𝘂𝘀𝘀𝗶𝗮𝗻 𝗦𝗽𝗹𝗮𝘁𝘀! Creating immersive, stylized 3D worlds from real-world scenes has always been exciting—but convincingly changing geometry and appearance simultaneously? That's been the tough part. Until now. Morpheus Highlights: ✅ 𝗜𝗻𝗱𝗲𝗽𝗲𝗻𝗱𝗲𝗻𝘁 𝗦𝗵𝗮𝗽𝗲 & 𝗖𝗼𝗹𝗼𝗿 𝗖𝗼𝗻𝘁𝗿𝗼𝗹: Adjust geometry and appearance separately—unlocking limitless creativity! ✅ 𝗗𝗲𝗽𝘁𝗵-𝗚𝘂𝗶𝗱𝗲𝗱 𝗖𝗿𝗼𝘀𝘀-𝗔𝘁𝘁𝗲𝗻𝘁𝗶𝗼𝗻 & 𝗪𝗮𝗿𝗽 𝗖𝗼𝗻𝘁𝗿𝗼𝗹𝗡𝗲𝘁: Ensures your stylizations stay consistent across views. ✅ 𝗔𝘂𝘁𝗼𝗿𝗲𝗴𝗿𝗲𝘀𝘀𝗶𝘃𝗲 𝗥𝗚𝗕𝗗 𝗗𝗶𝗳𝗳𝘂𝘀𝗶𝗼𝗻 𝗠𝗼𝗱𝗲𝗹: Stylizes each frame based on previously edited views for seamless immersion. ✅ 𝗢𝘂𝘁𝗽𝗲𝗿𝗳𝗼𝗿𝗺𝘀 𝘀𝘁𝗮𝘁𝗲-𝗼𝗳-𝘁𝗵𝗲-𝗮𝗿𝘁 𝗺𝗲𝘁𝗵𝗼𝗱𝘀 in both aesthetics and prompt adherence, validated by extensive user studies. Imagine turning your neighborhood into a neon cyberpunk cityscape 🌃, a cozy winter lodge ❄️, or even a Minecraft village 🧱—all from just a simple text prompt! This isn't just about stunning visuals—it's about reshaping geometry and appearance independently, opening endless possibilities for immersive experiences. 📝Paper: https://lnkd.in/gGtbWQr3 👉Project: https://lnkd.in/gWWSPNAe 🎥Video: https://lnkd.in/g_KMEMe2 #AI #MachineLearning #ComputerVision #3D #Innovation #Metaverse #GaussianSplats #GenerativeAI
-
You’re still prompting like it’s 2019 Let me show you how to do it right to get the image you really want. 1. Be Specific with Your Subject: ↪Clearly define who or what is in the image ↪ Avoid vague terms like “city” or “animal.” Instead, try:- “a futuristic cityscape at night with neon lights and flying cars” or “a fluffy calico cat wearing a tiny wizard hat.” ↪The more precise the description, the better the AI understands your vision. 2) Describe Composition & Action: ↪Detail how the image should be framed and what’s happening ↪For example, “extreme close-up of a flower with morning dew” 3) Specify Location & Context ↪Set the scene’s environment, such as “a futuristic café on Mars” or “a cluttered alchemist’s library.” ↪Adding context helps the AI create richer, story-driven images. 4) Choose the Style & Aesthetic: ↪Mention the artistic style you want whether it’s:- “photorealistic” “watercolor painting” “3D animation” “1990s product photography” ↪This steers the AI’s tone and visual approach. 5) Add Lighting & Camera Details: ↪Incorporate lighting conditions like “soft lighting” “dramatic shadows” “golden hour sunlight” 6) Use Descriptive, Rich Language: ↪Write natural, vivid sentences rather than keywords alone. ↪For instance, “A majestic Bengal tiger with vibrant orange fur stalking through a lush rainforest dappled with sunlight” 7) Include Editing or Transformation Instructions: ↪If refining an existing image, Be direct: “Change the sofa color to navy blue” or “Remove the car in the background.” 8) Use Starter Phrases Wisely: ↪Begin prompts with action words like: “Generate an image of...” “Create a visual of...” to clarify you want an image, not text or other output.
-
Google DeepMind unveils "Genie 3", text-to-3D playable worlds in real time - Genie 3 generates interactive 3D environments from text prompts that you can navigate in real time at 24fps and 720p. - It keeps scenes consistent for several minutes, simulates physics like water and lighting, and spans styles from natural landscapes to animated, fantastical worlds. - Introduces “promptable world events,” so users can change weather, add objects, and reshape scenes via text useful for rapidly varying scenarios. Why it matters - Content creation: near-instant prototyping for games and video with controllable, coherent environments. - AI training: unlimited, diverse simulated worlds for agent and robotics training without costly data collection or bespoke level design. Limitations (for now) - Restricted action spaces and interaction windows limited to a few minutes. - Research preview stage; robustness, safety, and tooling need wider validation. Availability Limited research preview with select academics and creators; broader access to be announced by Google DeepMind. Have a use case in mind? Drop a note and we can think it through together. #GenerativeAI #WorldModels #GameDev #Robotics #Simulation #AIAgents #DeepMind
-
I had one of those "wow" moments earlier: I typed a prompt and then walked around the result (a coherent 3D room) in my browser. Marble (from World Labs) turns text or a single image into persistent, navigable 3D worlds. You can export scenes as Gaussian splats and drop them into Three.js via the open‑source Spark renderer, which runs on desktop, mobile and even VR. It’s early and environment‑first (not people/animals), but the geometry looks cleaner than the usual depth‑map tricks. For agentic systems, this looks like the missing testbed: configurable, browser‑native spaces for planning, UI flows and safety drills, all spun up in minutes, not weeks. Also notable: World Labs is explicitly chasing “Large World Models,” backed by $230m; a signal for where AI is heading. #AgenticAI #AIEngineering #GenAI #EnterpriseAI
-
Good folks at NVIDIA and Tsinghua University have released LLAMA-MESH - A Revolutionary Approach to 3D Content Generation! This innovative framework enables the direct generation of 3D meshes from natural language prompts while maintaining strong language capabilities. Here is the Architecture & Implementation! >> Core Components Model Foundation - If you haven't guessed it yet, it's built on the LLaMA-3.1-8B-Instruct base model - Maintains original language capabilities while adding 3D generation - Context length is set to 8,000 tokens 3D Representation Strategy - Uses the OBJ file format for mesh representation - Quantizes vertex coordinates into 64 discrete bins per axis - Sorts vertices by z-y-x coordinates, from lowest to highest - Sorts faces by the lowest vertex indices for consistency Data Processing Pipeline - Filters meshes to a maximum of 500 faces for computational efficiency - Applies random rotations (0°, 90°, 180°, 270°) for data augmentation - Generates ~125k mesh variations from 31k base meshes - Uses Cap3D-generated captions for text descriptions >> Training Framework Dataset Composition - 40% Mesh Generation tasks - 20% Mesh Understanding tasks - 40% General Conversation (UltraChat dataset) - 8x training turns for generation, 4x for understanding Training Configuration - Deployed on 32 A100 GPUs (for Nvidia, this is literally in-house) - 21,000 training iterations - Global batch size: 128 - AdamW optimizer with a 1e-5 learning rate - 30-step warmup with cosine scheduling - Total training time: approximately 3 days (based on the paper) This research opens exciting possibilities for intuitive 3D content creation through natural language interaction. The future of digital design is conversational!
-
I've been spending a lot of time exploring the exciting world of Generative AI in 3D, and I'm absolutely thrilled about the potential it unlocks. If you are like me, you probably have seen the amazing results these tools are producing these days! Generative AI has been evolving very fast, and it is now starting to change how we approach everything, from 3D modeling and texturing, to animation, simulation, and content creation. It allows you to create all of that, with simple text prompts, or even from a single image. As someone with a passion for 3D data, I've been closely watching, and testing some of these amazing technologies and it's truly a new era for content creation in 3D! If you want to explore these new possibilities, and if you are curious to see what’s available, I curated a list of tools that I found very interesting: 1. Microsoft Trellis: (https://lnkd.in/evQHvi89) - I use this to generate 3D scenes and models from text, or images. It is an amazing tool to explore different concepts and to quickly prototype ideas. 2. DreamFusion: (https://lnkd.in/exd-nBtE) - Very useful for generating unique 3D models with simple text. It is also a great system for exploration when you want to prototype 3D designs. 3. Luma AI: (https://lumalabs.ai/) - This platform is great for creating realistic 3D assets, and for building high quality 3D environments. 4. Meshy AI: (https://www.meshy.ai/) - This tool allows you to create 3D models from 2D images or sketches. 5. Scenario AI: (https://www.scenario.com/) - This is a very creative tool to generate diverse game assets, and to manage and fine-tune the various output. 6. Kaedim: (https://www.kaedim3d.com/) - I often use Kaedim if I need to build and convert different data modalities into 3D models. 7. Poly AI: (https://withpoly.com/) - This is a platform to generate, and manipulate various 3D textures, and materials. 8. Avaturn AI: (https://avaturn.me/) - A solution for generating custom 3D avatars. 9. Midjourney: (https://lnkd.in/eE48Q84z) - AI tool for generating images from text, and exploring visual style before creating 3D assets. 🌱 Growing: Research is constantly pushing the boundaries of generative AI, which is bringing new innovations and new possibilities to reduce the time needed for 3D creation and design. Also, new approaches are being explored to build more and more interactive 3D tools.