Introduction
Stable Diffusion is a latent text-to-image diffusion model capable of generating photo-realistic images given any text input.
V1
The model itself and the weights(~ 4GB) are open-source and free to use for personal and commercial applications.
We can use a prompt interrogator to get possible prompts that’ll match a given image.
V2
Released on November 2022, it uses an open source implementation of CLIP.
We can inspect the datasets used to train stable diffusion v2 with clip retrieval.
We can use a prompt interrogator to get possible prompts that’ll match a given image.
Derived or related work
DreamBooth - “Learn” an image concept for later use in different scenarios and prompts.
- Currently requires a GPU with a minimum of around 10G of VRAM.
InstructPix2Pix - Learning to Follow Image Editing Instructions
LoRA - fine-tuning twice as fast than Dreambooth with end result as low as 1MB
ControlNet - diffusion control with edge maps, segmentation maps, keypoints, etc
Running locally on Mac
Running on the cloud
How does it work?
What is it currently good or bad at?
In-painting region on the picture with custom color or image doesn’t work so well with V1 nor V2;
It does quite a good job doing poses and generate single person generation when paired controlnet;
Some really interesting applications of the model
Animation with Stable diffusion and Unreal Engine 5
Created with #deforum #stablediffusion from a walking animation I made in #UnrealEngine5 with #realtime clothing and hair on a #daz model. Combined the passes in Premiere. Breakdown thread 1/8 @UnrealEngine @daz3d #aiart @deforum_art #MachineLearning #aiartcommunity #aiartprocess pic.twitter.com/oeudGr0Cmq
— CoffeeVectors (@CoffeeVectors) October 6, 2022
Reactive audio based on the generated image
Experimenting with a TouchDesigner to Stable Diffusion pipeline for audio-reactive AI generated artwork#stablediffusion #touchdesigner pic.twitter.com/mEoJHaf295
— John Sabath (@jcsabath) September 11, 2022
Image-to-image + voice and video synth
@StableDiffusion Img2Img x #ebsynth x @koe_recast TEST#stablediffusion #AIart pic.twitter.com/aZgZZBRjWM
— Scott Lighthiser (@LighthiserScott) September 7, 2022
Text-to-video editing
#stablediffusion text-to-image checkpoints are now available for research purposes upon request at https://t.co/7SFUVKoUdl
— Patrick Esser (@pess_r) August 11, 2022
Working on a more permissive release & inpainting checkpoints.
Soon™ coming to @runwayml for text-to-video-editing pic.twitter.com/7XVKydxTeD
Morphing stable diffusion images with Unreal engine 5 models
Took a face made in #stablediffusion driven by a video of a #metahuman in #UnrealEngine5 and animated it using Thin-Plate Spline Motion Model & GFPGAN for face fix/upscale. Breakdown follows:1/9 #aiart #ai #aiArtist #MachineLearning #deeplearning #aiartcommunity #aivideo #aifilm pic.twitter.com/hzUtJvB8IK
— CoffeeVectors (@CoffeeVectors) September 12, 2022
VFX “makeup”
I tried using an A.I. for VFX "Makeup"! So much potential for this tech.
— Albert Bozesan (@AlbertBozesan) September 21, 2022
Inspired by @LighthiserScott using #StableDiffusion with @_akhaliq's @Gradio and #ebsynth pic.twitter.com/VLIB0p3e4n
Stable diffusion animation
"A Year"
— Dmitrii Tochilkin (@cut_pow) October 3, 2022
AI animation artwork made in colab using my custom stable 3D animation algorithm on top of #stablediffusion model. In the thread I share some details about the algo and when i plan to release it, and talk about the joy and future of AI filmmaking
🎶 DakhaBrakha - Vesna pic.twitter.com/OexCG5DCl3