From the course: Stable Diffusion: Tips, Tricks, and Techniques

Sampling and steps

- [Instructor] We've got a couple of important things to talk about in this movie but before we get there, I want to take a look at this image which I've come up with. I'm liking this a lot. I like the closeup angle. I like Stonehenge in the background. It's still there, but it's mostly about her. I want to show you how I got here. I've changed my prompt to closeup of a woman in business casual clothing but instead of saying standing in front of Stonehenge, I've made a new token that says Stonehenge in the background. Then I've got cloudy sky, dramatic lighting, 8k and I've added photorealistic. I've added a lot of weight to photorealistic. I've got 1.5 here and I've added weight to closeup, I've got 1.2 over here. My negative prompt is still green lawn, green grass, purple sky, disfigured face, deformed face and ugly. I don't think it was necessarily that prompt that's working. I think I just got lucky. We got a nice clean face here. What I think is really significant here is that the vagaries of language, there are lots of ways of saying the same thing. Instead of saying a woman in business casual clothing standing in front of Stonehenge, pulling this out to a separate thing saying Stonehenge in the background is giving me different results. For one thing, it's taking the word standing out and if I have standing in there, Stable Diffusion wants to include feet and legs and so it was getting harder to get a closeup. That's not always true. I can still get images that don't have her in it at all or that have her standing way in the background, things like that. But still, I'm, I'm getting more hits with this prompt than I was before. As we discussed earlier, Stable Diffusion renders an image by starting with a field of noise and then making multiple passes to remove that noise from the image in very specific ways that it learned during its training. The sampling steps field lets you specify how many of these noise removal passes Stable Diffusion will make when it renders. Most Stable Diffusion instances give you this parameter, but not all do. I've never seen it called anything besides sampling steps or steps. Here's what happens if I take that last prompt and render it in one step. It doesn't get any farther than this. Two steps gets me here. This is the first step where it actually starts to look like a person. Four steps is getting closer and then five is the first thing that really looks like a finished person but I don't have much of Stonehenge and she's pretty blurry. By 10 steps, we're getting a good, solid image. This is a usable image. In fact, I may like this one better than the other one. I like the rim lighting around her hair. Stonehenge kind of doesn't look believable at all but it at least looks dramatic. This doesn't look terribly photorealistic. It looks very painterly. 20 steps is the one that we saw earlier. Now things are improving as I increase the number of steps so you would think that 30 steps should be much better and it's about the same. If anything, she looks a little more painterly. 40 steps. I'm sure you've noticed, if I go back to 30, 20, the woman's appearance is changing dramatically. Her clothes are staying the same but her face changes a lot as it continues to resample and refine its idea of what a woman standing in front of Stonehenge would look like. 50 steps. 60 steps, big change there but not in overall quality or rendering style. 150 steps. At this point, she's starting to look a little too cartoony although another rendering at 150 steps may not. None of these look terribly photorealistic so I would need to work with my prompt some more but as you can see, I'm not getting a big change as I'm increasing the number of steps. If you had been standing here while I was doing it though, you would've noticed something which is the more steps I have the longer it takes to render. The higher this sampling steps number is, the more you're going to be waiting. So if you're on slow hardware you really don't want to be cranking up the steps. This is why it's nice to figure out what are the least number of steps that you need to get an image that you like. If you can work at that level, you're going to be able to iterate faster. Check out this image. This is the same prompt, very different result. I really like this one. Stonehenge is way in the background and that's not something I ever imagined or envisioned when I was thinking about this image, but if our idea is that we're trying to create an image that links our product to the heritage of Stonehenge then having Stonehenge far away kind of maybe makes sense and it almost looks like it has more power or is more impressive. You can't get too near it or something. I don't know, I really like this. It's not something I had thought of before and so it's a great example of generative imaging helping me brainstorm and come up with something I wouldn't have thought of necessarily on my own. So same prompt, same negative prompt, same number of steps, different sampler. We haven't talked about samplers yet. It turns out there are many different algorithms that Stable Diffusion can use for removing noise from that initial noise field that it starts with. These are called sampling methods and most Stable Diffusion instances provide you with a choice of samplers. In AUTOMATIC1111, we get this big long list. Sampling choice can be overwhelming because there can be a lot of options and the names are meaningless and don't you already have enough parameters and settings to worry about? The good news is that most of the time your choice of samplers is not going to make that big a difference. Check this out. This grid shows you that same prompt that I just showed you rendered in these different samplers at these different steps. So columns are steps 20, 30, 40, and 50 and rows are samplers. So you can see that Euler yields something very different than what I was just showing you and if you increase the number of steps too far you start getting a second Stonehenge in the sky. As we look over these though, we can see that there's really not a huge difference. They're all coming up with the same layout. They all, some of them start to generate weird artifacts at too many steps. DDIM has this woman looking away in at 20 steps but kind of coming back to what we were thinking of. PLMS looks a little too contrasty to me down here. I know these are small on your screen, I am going to make this document available for you to download so you can look at it up close. But even at this thumbnail size, you should be able to tell there's not a huge difference. That's good news. It turns out that most of the time, your choice of samplers is not going to make that big a difference. If you want to do some experimenting what you'll probably find is that some samplers exhibit something called convergence. After a certain number of steps, their images don't change. Other samplers never converge. Their images always alter. Some samplers like DDIM produce a useful image in far fewer steps than others, though it might not be as nice as what another sampler will do in more steps. If you're working with a slower system then you can prototype a prompt in DDIM using just a few steps and get quick renders and then switch to a higher number of steps or switch to a sampler with more steps as you move towards a final image. There can be a general difference in rendering speed from one sampler to another. These samplers tend to all render at about the same speed. These samplers all render slower than those other samplers but these all run at about the same speed as each other. In general, I think it's safe to pick a sampler to work with and use it at a moderate number of steps between 15 and 20. I'm working with 20 here. When you get an image working to your liking then fiddle with changing the steps or the sampler and see if you get something else you like. Again, I had that one image that I like. I changed samplers and I find another image that I like. That's great, I've got both of them. I've saved them both. I've got the parameters for both. I can recreate and iterate from either of them from here. To make this choice of steps and samplers and making comparisons easier, Stable Diffusion offers a cool tool which we'll look at next.

Contents