WatchGAN: Advancing generated watch images with styleGANs

In my previous post I showed how to use progressive generative adversarial networks (pGANs) for image synthesis. In this post I show how to use styleGANs on larger images to create customizable images of watches. Additionally, I show how to apply styleGAN on custom data.

The StyleGAN paper has been released just a few months ago (1.Jan 2019) and shows some major improvements to previous generative adversarial networks. Instead of just repeating, what others already explained in a detailed and easy-to-understand way, I refer to this article.

In short, the styleGAN architecture allows to control the style of generated examples inside image synthesis network. That means that it is possible to adjust high level styles (w) of an image, by applying different vectors from W space. Furthermore, it is possible to transfer a style from one generated image to another. These styles are mapped to the generator LOD (level of detail) sub-networks, which means the effect of these styles are varying from coarse to fine.

In (a) we can see the architecture of traditional GANs, where z represents one image in latent space and is directly fed into the generator, in (b) z is mapped to an intermediate space W, which is then fed into the LOD convolutional layers A with additional noise B.
In (a) we can see the architecture of traditional GANs, where z represents one image in latent space and is directly fed into the generator, in (b) z is mapped to an intermediate space W, which is then fed into the LOD convolutional layers A with additional noise B. (Image from original paper A Style-Based Generator Architecture for Generative Adversarial Networks)

The styleGAN paper used the Flickr-Faces-HQ dataset and produces artificial human faces, where the style can be interpreted as pose, shape and colorization of the image. The results of the paper had some media attention through the website: www.thispersondoesnotexist.com.

StyleGAN on watches

I used the styleGAN architecture on 110.810 images of watches (1024×1024 ) from chrono24. The network has seen 15 million images in almost one month of training with a RTX 2080 Ti. Results are much more detailed then in my previous post (besides the increased resolution) and the learned styles are comparable to the paper results. These images are not curated, so its simply what the GAN produces.

Style mixing

Now lets take a look at the style transfer from one generated image to another:

Pose mixing

In the first row we can see the target "style" and in the left we can see the watch to apply this style on. In this case the first 4 dimensions of the style are used to catch some kind of pose, so the GAN applies the pose of the watch to the watch in the second row.
In the first row we can see the target “style” and in the left we can see the watch to apply this style on. In this case the first 4 dimensions of the style are used to catch some kind of pose, so the GAN applies the pose of the watch to the watch in the second row.

Watch mixing

Here the medium-grained styles are applied. This style is more about the watch itself. You can see how the sub dial changes or the color and style of the dial, case and so on.
Here the medium-grained styles are applied. This style is more about the watch itself. You can see how the sub dial changes or the color and style of the dial, case and so on.

Color mixing

Here we can see the fine grained styles, which are responsible for color and look.
Here we can see the fine-grained styles, which are responsible for color and look.

Run styleGAN on your own image data set

  • First of all clone the git repository to your local machine
  • Make sure to install all the requirements mentioned in the readme.md file (at least 8GB GPU memory required)
  • Put all your images into a directory e.g. “E:/image_data” (alternatively you have to change some lines of code in the dataset_tool.py)
  • navigate to the repository and run “python create_from_images datasets/mydata_tfrecord E:/image_data”. Mydata_tfrecord it the target folder, make sure to have enough disk space (in my case 50x the size of the source images)
  • configure your generated dataset in train.py by adding:
desc += '-custom';      
dataset = EasyDict(tfrecord_dir='mydata_tfrecord');
train.mirror_augment = False;
  • comment or un-comment the configs, the most of them are self-explaining: important ones: number of GPUs, minibatch_dict (batch_size for each LOD)
  • run the train.py with python and check the results directory for samples that will be generated from time to time
  • In case the training crashes, you can resume the training by changing the params in training_loop.py, it will restart the training and create a new run
resume_run_id           = 24,
resume_snapshot = None,
resume_kimg = 11160.0,

Related Posts

Leave a reply