Pix2Pix Image Transfer Activity

How does it work?

pix2pix (from Isola et al. 2017), converts images from one style to another using a machine learning model trained on pairs of images. If you train it on pairs of outline drawings (edges) and their corresponding full-color images, the resulting model is able to convert any outline drawing to what it thinks would be the corresponding full-color picture!

It accomplishes this using a clever machine learning technique known as a Generative Adversarial Network, or GAN. With the GAN technique, we train two machine learning models that compete with one another: the Generator, and the Discriminator.

You can think of the Generator as an artist, and a Discriminator as an art critic. The Generator artist is trying to fool the Discriminator critic into thinking they are creating authentic Picasso paintings, when in reality, they are just trying different inputs until the Discriminator is fooled.

The Generator learns how to fool the Discriminator by learning to make its output more and more realistic. The Discriminator learns the differences between real and Generator-created fakes. In this way, the two machine learning models improve each other through competition.

Training these models to create high-quality images takes a LOT of image pairs, and a LOT of computation (Discriminator / Generator competition!) — meaning a lot of time!

At the end, we can take the highly-skilled Generator algorithm and use it to convert images of the input style to those of the output style. In our case, we can convert outlines to full-color images!

Generating a Sketch-to-Image Dataset

We need to feed the machine learning algorithm many pairs of outlines and images to learn from, which would take a lot of outline drawing!

To build large datasets of image pairs automatically, researchers convert images to a sketch-like tracing using a technique known as Edge Detection.

You can play with one edge detection algorithm, known as Canny edge detection, by uploading an image above.

We trained the Birds model on 381 images from a search engine. This took about an hour and a half to train.
We trained the Flowers model on 400 images from a search engine. This took about an hour and a half to train.
We trained the Lollipop model on 382 images from a search engine. This took about hour and a half to train.
We trained the Snakes model was on only 60 images grabbed from a search engine. This took less than an hour to train.
The Cats model (from Christopher Hesse) was trained with 2k stock photos of cats. Full details on how this and Hesse's other models were trained
The Shoe model (from Hesse) was trained with 50k images from an online shoe store.
The Handbag model (from Hesse) was trained with 137k images from an online handbag store.

Train your Own Model!

You, too, can train your own edges2pix model! We hope to provide a more accessible way to get started with this in the future, but until then—two helpful guides for getting started are Yining Shi's and Christopher Hesse's.