{"id":2549,"date":"2023-07-03T11:11:31","date_gmt":"2023-07-03T11:11:31","guid":{"rendered":"https:\/\/www.codeastar.com\/?p=2549"},"modified":"2023-07-04T17:51:21","modified_gmt":"2023-07-04T17:51:21","slug":"learn-how-to-use-stable-diffusion-part-2-sampling-methods","status":"publish","type":"post","link":"https:\/\/www.codeastar.com\/learn-how-to-use-stable-diffusion-part-2-sampling-methods\/","title":{"rendered":"Learn how to use Stable Diffusion Part 2 – Sampling Methods"},"content":{"rendered":"\n
In our last post, Learn how to use Stable Diffusion – Part 1<\/a>, we mentioned sampling methods and stated that this is a huge topic. Right now, let’s continue our Stable Diffusion learning journey. Besides discussing the main course, Sampling Methods, we are going to have appetizers as well, like prompt and checkpoint. We hope you all can enjoy the meal this time. Let’s dig in!<\/p>\n\n\n\n Most of the generative art starts with text inputs, the prompts, so let us start from here. We knew from our last post<\/a>, prompts can be inputted as a full sentence or a list of keywords. For beginners, we would suggest using full sentences as inputs first. So you may see what you have inputted in a human-readable way. When you are more familiar with Stable Diffusion, you can use keywords as the prompts for a faster process. Speaking of keywords, here are the major types we use as inputs:<\/p>\n\n\n\n Other than learning the type of prompts, we should learn the weight of prompts as well. We can use ()<\/strong> with a keyword and a value to strengthen or weaken the weight of the keyword. For example, (robot: 1.2)<\/em> strengthens the “robot” keyword, and vice versa, (robot: 0.9) weakens the “robot” keyword. We can also use just ()<\/strong> on a keyword to emphasize the weight.<\/p>\n\n\n\n When we group all the things together, we get the following prompts: <\/p>\n\n\n\n Here comes our result:<\/p>\n\n\n\n Since we have given 1.3 weight to the “taco” keyword. So we can see 2 giant tacos in our output :]] .<\/p>\n\n\n\n We learn from our machine learning exercises, what we got is what we trained. So the model’s drawing styles and capabilities are all depended on what it has been trained. On Stable Diffusion, according to its GitHub page<\/a>, the model is trained with a dataset of 5 billion image-text pairs from the open source project LAION<\/a>. The large image dataset makes Stable Diffusion literally know many things \/ styles to draw. But it also brings out the general issue of “the bard problem<\/a>” — Jack of all trades, master of none. <\/p>\n\n\n\n So people start using Stable Diffusion model as a base model, to train their own checkpoints, with their preferred image datasets. At the end, the images from the additional datasets would favor the output generation algorithm, in other to produce the preferred content and styles. There are plenty of pre-trained checkpoints with various styles and content from CivitAI<\/a>. We can find the differences between those checkpoint models and the Stable Diffusion default model from the images below.<\/p>\n\n\n\n Our outputs and the checkpoint models (with downloadable links):<\/p>\n\n\n\n We have the model downable links, then how do we apply it to Stable Diffusion WebUI? Just simply copy them to <\/p>\n\n\n\n Reload your WebUI, then you can select the installed model to generate your graphics.<\/p>\n\n\n\n We find a list of sampling methods (samplers) available in the WebUI. Then it is always a question, which sampler should we use? Before we find out the answer, let us quickly go through those major samplers here.<\/p>\n\n\n\n For a better comparison, we generate AI graphics using the following conditions with different samplers.<\/p>\n\n\n\n All samplers can generate decent outputs in general, but I would prefer a mid-century art gallery built with arches. And it looks like the steps are not enough to generate art pieces and visitors. If I have to select the best 3 from the above samplers, I should pick Heun<\/em>, DPM2 Karras<\/em> and PLMS<\/em>.<\/p>\n\n\n\n Now, let us test another topic, here is our input:<\/p>\n\n\n\n A portrait of a woman set against a dark background. The subject is positioned in a three-quarter view, facing slightly toward the viewer. The woman is portrayed from the chest up, with her upper body and face prominently displayed. She has a serene expression, characterized by a slight smile that seems to hold a sense of mystery. Her eyes are captivating, with a gaze that follows the viewer from various angles. The woman is portrayed with remarkable realism, with delicate brushwork capturing subtle details in her face and skin tone. Her brown hair is gently layered and frames her face. She is adorned in clothing typical of the era, wearing a dark-colored garment with a veil covering her hair. The painting’s composition is relatively simple, focusing primarily on the subject and her engaging presence.<\/p>\n\n\n\n The origin of the above input is the textual description of the Mona Lisa<\/em>. Let’ see what can Stable Diffusion give us:<\/p>\n\n\n\n When it comes to generating a human subject other than a scene from AI, we will become more picky. Guess we are more familiar with our facial features and we can spot the differences more easily. And I think I see Mr. Bean from the above outputs… This time, my best 3 picks are DPM++ 2M<\/em>, DPM2 Karras<\/em> and DPM2 a Karras<\/em>.<\/p>\n\n\n\n From the 2 inputs, we can see certain samplers are aligned with a similar style. Such as Euler<\/em>, LMS<\/em>, DPM++ 2M<\/em>, DPM2 Karras<\/em> and others are in one group. DPM2<\/em> a<\/em>, DPM++2S a<\/em> and DPM2 a Karras<\/em> are in another group. And DPM fast<\/em> is on its group. Even if samplers are in the same group, they may have a wide range of variance, just look at LMS<\/em> and PLMS<\/em>.<\/p>\n\n\n\n Let us summarize what we have learned so far:<\/p>\n\n\n\n We may use more specific keywords for constructing prompts. And the selection of a model, it is depending on the purpose of your generative AI graphics. Rule of Thumb, go to a model download site, pick your genre then pick the highest-rated model there. Then on the sampler side, separate samplers into different groups according to their styles and generate a testing image from each group. Once you find your favorite style, try to fine tune with each sampler within the group, to get your desired result.<\/p>\n\n\n\n We can take a look at the following chart to find a faster sampler in different groups. <\/p>\n\n\n\nAppetizer I: Prompt<\/h3>\n\n\n\n
\n
<\/figure>\n\n\n\n
<\/figure>\n\n\n\n
Appetizer II: Checkpoint<\/h3>\n\n\n\n
Our prompts: woman, outdoor, half body, side view, busy street, (winter), 4k<\/em><\/code><\/pre>\n\n\n\n
<\/figure>\n\n\n\n
\n
<stable diffusion webui path>\\models\\Stable-diffusion\\<\/code><\/pre>\n\n\n\n
Main Course 1: Sampling Methods in Stable Diffusion<\/h3>\n\n\n\n
\n
\n
<\/figure>\n\n\n\n
<\/figure>\n\n\n\n
<\/figure>\n\n\n\n
<\/figure>\n\n\n\n
Main Course 2: Sampling Methods in Stable Diffusion (cont.)<\/h3>\n\n\n\n
<\/figure>\n\n\n\n
<\/figure>\n\n\n\n
<\/figure>\n\n\n\n
<\/figure>\n\n\n\n
Dessert: Summary of Our Findings<\/h3>\n\n\n\n
\n