Making the Perfect Profile Image with Stable Diffusion & Dreambooth.

Overview

This will help you create a Stable Diffusion Dreambooth model that will use textual inputs to create imagery of you / whoever you choose to model. The process is unbelievable fun & weird & narcissistic.

Stable Diffusion is a system allowing you to create detailed images based on text descriptions.

Dreambooth allows for text-to-image generation with only a few sample images.

Prior Art

I went down this path using Mathowie’s blog post + AItreprenuer’s video.

Steps

  • Get an API key from Hugging Face
    You’ll need to download the Dreambooth model.

  • Pay for some Google Colab Compute Units
    Get $9.99 / pay-as-you-go

  • Copy this Colab Python Notebook
  • Pick out 20-30 pictures of yourself. Make sure you’re the only one in the photo.
  • Using birme.net,
    • Crop each photo to 512×512 pixels
    • Save your photos using a name that has never been used anywhere. This is how you’ll refer to yourself textually when describing your photos. I used jbdb_1.jpg, jbdb_2.jpg, etc.


  • Go through the notebook’s steps & train your model. This will take about 45 minutes & will save a ~2gig model into your Google Drive.
  • Once that’s done, you’ll be able to start up the notebook whenever you like & create images.

    I’ve kept a list of the textual prompts that have resulted in interesting outputs here.

    An alternative – if you have a healthy GPU at home – is to download to model you generated on Colab from Google Drive and generate images at home. The benefits to this are:
    • You won’t incur startup wait times / bootstrapping when starting Stable Diffusion via Colab.
    • You don’t need to keep a 2 gig model in Google Drive
    • Stable Diffusion UI has a bunch of extra features you can play with.
    • You can quickly swap between generated models.

Tips

  • When you don’t like how an image was rendered, determine what you don’t like about the image (eg “Too anime”), and then use “anime” in the negative prompt.
  • GFPGAN is a feature in the desktop version of Stable Diffusion that “corrects faces”. While it does correct faces, it also feels like it takes something away from the distinguishing features of a face.
  • Generate a series of images to get the gist of types of imagery you’ll create.


%d bloggers like this: