Controllable Text-to-3D Generation via Surface-Aligned Gaussian Splatting While text-to-3D and image-to-3D generation tasks have received considerable attention, one important but under-explored field between them is controllable text-to-3D generation, which we mainly focus on in this work. To address this task, 1) we introduce Multi-view ControlNet (MVControl), a novel neural network architecture designed to enhance existing pre-trained multi-view diffusion models by integrating additional input conditions, such as edge, depth, normal, and scribble maps. Our innovation lies in the introduction of a conditioning module that controls the base diffusion model using both local and global embeddings, which are computed from the input condition images and camera poses. Once trained, MVControl is able to offer 3D diffusion guidance for optimization-based 3D generation. And, 2) we propose an efficient multi-stage 3D generation pipeline that leverages the benefits of recent large reconstruction models and score distillation algorithm. Building upon our MVControl architecture, we employ a unique hybrid diffusion guidance method to direct the optimization process. In pursuit of efficiency, we adopt 3D Gaussians as our representation instead of the commonly used implicit representations. We also pioneer the use of SuGaR, a hybrid representation that binds Gaussians to mesh triangle faces. This approach alleviates the issue of poor geometry in 3D Gaussians and enables the direct sculpting of fine-grained geometry on the mesh. Extensive experiments demonstrate that our method achieves robust generalization and enables the controllable generation of high-quality 3D content.
Innovative Techniques for 3d Modeling
Explore top LinkedIn content from expert professionals.
Summary
Innovative techniques for 3D modeling are making it easier and faster to create detailed digital objects and environments, often using AI tools that interpret text, images, or video to generate 3D assets. These new methods not only improve how realistic virtual objects behave, but also let users design complex shapes without the usual limitations of traditional modeling and printing.
- Explore AI-powered workflows: Use tools that convert text prompts, images, or video into 3D models to quickly build scenes or objects without needing advanced modeling skills.
- Try gravity-free printing: Take advantage of gel-based 3D printing to create intricate shapes in any direction, speeding up production and reducing waste.
- Simulate real-world physics: Apply advanced contact algorithms to make virtual cloth and soft bodies behave like they do in reality, avoiding unwanted overlaps and improving stability.
-
-
🪄 3D printing just broke free from gravity — and it happened at Disneyland Paris. Coperni, in collaboration with Disney Research, showcased a revolutionary technique called Rapid Liquid Printing (RLP) — a gel-based 3D printing process that allows objects to form freely in liquid space. The innovation: Instead of building layer by layer, RLP prints directly inside a gel bath. The gel supports the structure as it forms, meaning objects can be “drawn” in mid-air with smooth, continuous motion. What’s new: • No gravity constraints — objects print in all directions. • No supports or post-processing needed — a simple rinse finishes the product. • Compatible with soft materials like silicone and rubber, enabling flexibility and realism. Why it matters: This breakthrough eliminates one of 3D printing’s biggest limitations — the need for support structures. It drastically speeds up production, reduces waste, and enables designs that were previously impossible. → Fashion and luxury design — complex, fluid shapes in textiles and accessories → Architecture and furniture — organic, continuous forms without assembly → Healthcare and robotics — flexible components mimicking natural motion To me, this represents the next era of creation — where 3D printing stops stacking layers and starts shaping ideas in real time. Could this be the moment 3D printing becomes as intuitive as sketching in air? #3DPrinting #Design #Manufacturing #Creativity #FutureOfWork #Engineering #ArtAndTech
-
I got a chance to have a (virtual) sit-down Q&A session with Lukas Höllein about his paper ViewDiff: 3D-Consistent Image Generation with Text-to-Image Models, one of the accepted forpapers CVPR 2024. His paper introduces ViewDiff, a method that leverages pretrained text-to-image models to generate high-quality, multi-view consistent images of 3D objects in realistic surroundings by integrating 3D volume-rendering and cross-frame-attention layers into a U-Net architecture. Lukas discusses the challenges of training 3D models, the innovative integration of 3D components into a U-Net architecture, and the potential for democratizing 3D content creation. Hope you enjoy it! Harpreet: Could you briefly overview your paper's central hypothesis and the problem it addresses? How does this problem impact the broader field of deep learning? Lukas: Pretrained text-to-image models are powerful because they are trained on billions of text-image pairs. In contrast, 3D deep learning is largely bottlenecked by much smaller datasets. Training models on 3D datasets will reach a different quality and diversity than we have nowadays in 2D. This paper shows how to bridge this gap: we take a model trained on 2D data and only finetune it on 3D data. This allows us to keep around the expressiveness of the existing model but translate it into 3D. Harpreet: Your paper introduces a method that leverages pretrained text-to-image models as a prior, integrating 3D volume-rendering and cross-frame-attention layers into each block of the existing U-Net network. What are the key innovations of this technique, and how does it improve upon existing methods? Lukas: The key innovation shows how we can utilize the text-to-image model and still produce multi-view consistent images. Earlier 3D generative methods: create some 3D representation and render images from it. Integrating a text-to-image model into this pipeline is problematic because it operates on different modalities (images vs. 3D). In contrast, we keep around the 2D U-Net architecture and only add 3D components. By design, this allows the creation of consistent 3D images. Our output is *not* a 3D representation but multi-view consistent images (that can be turned into such a representation later). Continued in comments 👇🏼 #cvpr #computervision #artificialintelligence #deeplearning
-
This week's defining shift for me is that creating 3D data is getting much simpler. New tools are turning everyday inputs like smartphone video, single photos, and text prompts into usable 3D environments and assets. This lowers the barrier to building the scenes, objects, and spaces that robotics, simulation, and immersive content rely on. It also shifts 3D creation from a specialized skill to something all teams can generate quickly and at the scale modern spatial systems require. This week’s news surfaced signals like these: 🤖 Parallax Worlds raised $4.9 million to turn standard video into digital twins for robotics testing. The platform turns basic walkthrough videos into interactive 3D spaces that teams can use to run their robot software and see how it performs before sending anything into the field. 🪑 Meta introduced SAM 3D to reconstruct objects and people from single images, producing full-textured meshes even when subjects are partly hidden or shot from difficult angles. The models were trained using real-world data and a staged process to improve accuracy. 🌏 Meta unveiled WorldGen, a research tool that generates full 3D worlds from text prompts. It produces complete, navigable spaces that can be used in Unity or Unreal and shows how AI can create environments without manual modeling. Why this matters: Faster 3D pipelines expand who can build, test, and refine spatial ideas. They turn 3D creation from a bottleneck into a regular part of development, which opens the door to more experimentation and better decisions earlier in the process. #robotics #digitaltwins #simulation #VR #AR #virtualreality #spatialcomputing #physicalAI #AI #3D
-
If you have ever tried simulating cloth or soft bodies, you know they tend to clip through each other. A team from the University of Utah has just shared a new solution called Offset Geometric Contact (OGC), which essentially makes virtual objects behave like real ones – without weird clipping or constant collision checks. Instead of using heavy global calculations, OGC offsets each face of the object along its normal, builds a tiny 'buffer zone,' and figures out how far each vertex can move without intersecting. The best part? It runs in real time on the GPU, so you can simulate massive cloth scenes – like 50 stacked layers with half a million vertices — and still get stable, penetration-free results. You can learn more about the approach and grab the code here: https://lnkd.in/gNScWFAV
-
Check out this video of the tool I created for my 𝗠𝗮𝘀𝘁𝗲𝗿’𝘀 𝗧𝗵𝗲𝘀𝗶𝘀—I tackled one of the oldest challenges in 3D modeling: 𝗿𝗲𝗹𝘆𝗶𝗻𝗴 𝗼𝗻 𝗮 𝟮𝗗 𝘀𝗰𝗿𝗲𝗲𝗻 𝘁𝗼 𝗰𝗿𝗲𝗮𝘁𝗲 𝗮 𝟯𝗗 𝗺𝗼𝗱𝗲𝗹. So, I built a tool that 𝘀𝘁𝗿𝗲𝗮𝗺𝘀 𝟯𝗗 𝗺𝗼𝗱𝗲𝗹𝘀 𝗳𝗿𝗼𝗺 𝗕𝗹𝗲𝗻𝗱𝗲𝗿 (𝗮𝗻𝗱 𝗼𝘁𝗵𝗲𝗿 𝘀𝗼𝗳𝘁𝘄𝗮𝗿𝗲) 𝗶𝗻𝘁𝗼 𝗠𝗶𝘅𝗲𝗱 𝗥𝗲𝗮𝗹𝗶𝘁𝘆 𝘂𝘀𝗶𝗻𝗴 𝗮 𝗤𝘂𝗲𝘀𝘁 𝟯, allowing artists to interact with their work as if it were physically in front of them. 𝗞𝗲𝘆 𝗙𝗲𝗮𝘁𝘂𝗿𝗲𝘀: ✔ 𝗥𝗲𝗮𝗹-𝘁𝗶𝗺𝗲 𝘀𝘁𝗿𝗲𝗮𝗺𝗶𝗻𝗴 of 3D models into Mixed Reality ✔ 𝗜𝗻𝘀𝘁𝗮𝗻𝘁 𝘂𝗽𝗱𝗮𝘁𝗲𝘀—changes made in Blender are reflected immediately ✔ 𝗛𝗮𝗻𝗱𝘀-𝗼𝗻 𝗶𝗻𝘁𝗲𝗿𝗮𝗰𝘁𝗶𝗼𝗻—scale, rotate, and manipulate individual pieces ✔ 𝗘𝘃𝗲𝗻 𝘀𝘂𝗽𝗽𝗼𝗿𝘁𝘀 𝗮𝗻𝗶𝗺𝗮𝘁𝗶𝗼𝗻𝘀 I used 𝗨𝗻𝗶𝘁𝘆’𝘀 𝗼𝗽𝗲𝗻-𝘀𝗼𝘂𝗿𝗰𝗲 𝗠𝗲𝘀𝗵𝗦𝘆𝗻𝗰 for the heavy lifting and adapted it into a Mixed Reality workflow. After doing a user study with 3D modeling experts, the feedback was 𝗼𝘃𝗲𝗿𝘄𝗵𝗲𝗹𝗺𝗶𝗻𝗴𝗹𝘆 𝗽𝗼𝘀𝗶𝘁𝗶𝘃𝗲. 𝗪𝗵𝘆 𝗧𝗵𝗶𝘀 𝗠𝗮𝘁𝘁𝗲𝗿𝘀: This approach is especially useful for: ▪️𝗦𝗰𝘂𝗹𝗽𝘁𝗶𝗻𝗴 ▪️ 𝗤𝘂𝗶𝗰𝗸𝗹𝘆 𝗿𝗲𝘃𝗶𝗲𝘄𝗶𝗻𝗴 𝗰𝗼𝗺𝗽𝗹𝗲𝘅 𝗺𝗼𝗱𝗲𝗹𝘀 ▪️ 𝗥𝗮𝗽𝗶𝗱𝗹𝘆 𝗰𝗵𝗮𝗻𝗴𝗶𝗻𝗴 𝗽𝗲𝗿𝘀𝗽𝗲𝗰𝘁𝗶𝘃𝗲𝘀 ▪️ 𝗦𝗵𝗼𝘄𝗶𝗻𝗴 𝗺𝗼𝗱𝗲𝗹𝘀 𝘁𝗼 𝗼𝘁𝗵𝗲𝗿𝘀 𝘄𝗶𝘁𝗵𝗼𝘂𝘁 𝗲𝘅𝘁𝗿𝗮 𝘀𝗲𝘁𝘂𝗽 The idea isn’t new, but 𝘁𝗵𝗲𝗿𝗲’𝘀 𝘀𝘁𝗶𝗹𝗹 𝗮 𝗹𝗮𝗰𝗸 𝗼𝗳 𝗽𝗹𝘂𝗴-𝗮𝗻𝗱-𝗽𝗹𝗮𝘆 𝘀𝗼𝗹𝘂𝘁𝗶𝗼𝗻𝘀 that seamlessly integrate into existing 3D modeling workflows. That’s what I aimed to change. 𝗡𝗲𝘅𝘁 𝗦𝘁𝗲𝗽𝘀: I’m planning to release this tool in the future—stay tuned!
-
A New Frontier for 3D Modeling: From Painful CAD to Limitless Possibilities For years, the most painful aspect of 3D modeling in manufacturing and design has been the creation and maintenance of the models themselves. Generating a CAD layout from scratch, updating it with every change, and ensuring accuracy across teams - these steps have always been bottlenecks, slowing down innovation and optimization. But what if that pain point is about to disappear? With the advent of generative models like SAM 3D by Meta, we’re entering a new era where 3D models can be created and continuously updated directly from images. No more manual CAD redraws for every tweak. Imagine a world where your digital twin is always in sync with the real world - every optimization, every flow change in manufacturing, instantly reflected in your 3D environment. This unlocks a whole new dimension for manufacturing optimization. Visualizing changes in real time, running simulations, and collaborating in platforms like NVIDIA Omniverse becomes seamless. The applications are truly limitless—from rapid prototyping to predictive maintenance, from immersive training to next-gen AR/VR experiences. The inspiration here is clear: just as SAM 2D segmentation revolutionized how we extract meaning from images, these new 3D models are set to transform how we interact with the physical world. I’ve seen firsthand the power of segmentation in projects close to my heart—like brain segmentation for medical imaging (by Sovesh Mohapatra - disclosure - he is my younger brother :D ). The leap from 2D to 3D, powered by models like SAM 3D, is nothing short of extraordinary. I can only imagine where this is headed next. The future of 3D modeling isn’t just about making things easier—it’s about making the impossible, possible. #3DModeling #DigitalTwin #Manufacturing #AI #NVIDIAOmniverse #SAM3D #Innovation #ContinuousImprovement
-
👀 Most CAD people think the last big geometry revolution was NURBS + parametrics. Wrong. The next one is already here — and it’s coming from an unexpected place: Pixar math. In my new piece, I break down Subdivision Surface (SubD) modeling—why it started in animation, why it’s now creeping into serious engineering workflows, and why it changes the design vocabulary engineers can use. What’s inside: The core math (Catmull–Clark, Doo–Sabin, Loop) and why “recursive refinement” is the whole trick SubD vs NURBS/solids: precision vs topological freedom, and where each wins Why hybrid workflows (SubD → NURBS/solid) are becoming practical in tools like Fusion 360, Rhino, and Siemens NX The real engineering payoff: ergonomics, scan cleanup, generative/topology outputs, and cleaner analysis surfaces If you’ve ever thought: “I want organic form without NURBS gymnastics” — this is your rabbit hole. Full article: Link in first comment. Let me know down there your favorite modeling technique! Are you doing Rhino3D? Plasticity? classic Parasolid like SolidWorks, Solid Edge, Onshape and NX? CATIA V5 or CATIA 3DEXPERIENCE? nTop? Cognitive Design by CDS? Something else? Amazing how many choices there are now, isn't it it? #KernelWars #CAD #PLM #SubdivisionModeling #ConvergentModeling #EngineeringSoftware #BetterCallFino
-
😍 Love me some SuGaR, or rather 'Surface-Aligned Gaussian Splatting'. This method stands out for its ability to align 3D Gaussians with the scene's surface, making it easier to sample points on the real surface and extract highly detailed meshes. The result? You can easily edit, animate, and relight these meshes using popular software like Blender, Unity, or #UnrealEngine, opening up new possibilities for artists and developers alike. Abstract: "We propose a method to allow precise and extremely fast mesh extraction from 3D Gaussian Splatting. Gaussian Splatting has recently become very popular as it yields realistic rendering while being significantly faster to train than NeRFs. It is however challenging to extract a mesh from the millions of tiny 3D gaussians as these gaussians tend to be unorganized after optimization and no method has been proposed so far. Our first key contribution is a regularization term that encourages the gaussians to align well with the surface of the scene. We then introduce a method that exploits this alignment to extract a mesh from the Gaussians using Poisson reconstruction, which is fast, scalable, and preserves details, in contrast to the Marching Cubes algorithm usually applied to extract meshes from Neural SDFs. Finally, we introduce an optional refinement strategy that binds gaussians to the surface of the mesh, and jointly optimizes these Gaussians and the mesh through Gaussian splatting rendering. This enables easy editing, sculpting, rigging, animating, compositing and relighting of the Gaussians using traditional softwares by manipulating the mesh instead of the gaussians themselves. Retrieving such an editable mesh for realistic rendering is done within minutes with our method, compared to hours with the state-of-the-art methods on neural SDFs, while providing a better rendering quality." - Antoine Guédon, Vincent Lepetit Project Page: https://lnkd.in/e3K3rm53 arXiv: https://lnkd.in/eevzwsnm GitHub: https://lnkd.in/eN4qiqrE (coming soon) For more like this ⤵ 👉 Follow Orbis Tabula #gaussiansplatting #mesh #animation