The first public version of Stable Diffusion (1.4) was released in mid-August this year.dealing quite a blow to the image-generative AI market: we were just getting used to everything that proprietary solutions like DALL-E 2 and MidJourney could do, and suddenly we had a very powerful open source alternative on our hands.
For By the end of the same month, the Stable Diffusion model had already been updated to version 1.5. Taken together, ‘V1’ of this AI was, as its creators remind us, an example of software with “one of the fastest climbs to 10,000 Github starsshooting through 33,000 stars in less than two months”.
And now, less than three months after the release of 1.5, the Stability AI folks just announced the release of Stable Diffusion V2which “offers a number of great improvements and features compared to the original V1 version.”
“We’ve worked hard to optimize the models to run on a single GPU, making them accessible to as many people as possible right out of the box!”
“Big improvements” such as the inclusion of OpenCLIP, a new text encoder (responsible for interpreting the instructions of the users) that “greatly improves the quality of the generated images” and of a new dataset with its corresponding and improved anti-NSFW filter (ie, intended to prevent generation of ‘sensitive’ images).
In addition, the text-to-image models of this version of Stable Diffusion can generate images with default resolutions of 512×512 pixels and 768×768 pixels.
V2 also includes a rescaling model capable of multiply by four the resolution of the images. Which means that, in combination with text-to-image models, the new version of Stable Diffusion can now generate images with resolutions of 2048×2048 or higher.
depth2img it is a “depth guided” model, a novelty incorporated in the V2 that “infers the depth of an input image (using an existing model) and then generates new images using both the text and depth information.”
“It offers all kinds of creative new applications, delivering transformations that look radically different from the original, but still retain the consistency and depth of that image.”
“Finally, we’ve also included a new text-guided repaint model, which makes it super easy to smartly and quickly change parts of an image.”