Introducing GANs
The concept of GANs is typically introduced using the analogy of a two-player game. In this game, there is typically an art expert and an art forger. The goal of the art forger or counterfeiter is to make a convincing-enough fake to fool the art expert and thus win the game. An example of how this was first portrayed as a neural network is as follows:
GAN by Ian and others
In the preceding diagram, the Generator takes the place of the art forger, the one trying to best the art expert, shown as the Discriminator. The Generator uses random noise as a source to generate an image, with a goal that the image is convincing enough to fool the Discriminator. The Discriminator is trained on both real and fake images, and all it does is classify the image as real or fake. The Generator is then trained to build a convincing-enough fake that will fool the Discriminator. While this concept seems simple enough as a way of self-training a network, in the last few years, the implementation of this adversarial technique has proven exceptional in many areas.
GANs were first developed by Ian Goodfellow and others at the University of Montreal in 2014. In only a few short years, this technique has exploded into many wide and varied applications, from generating images and text to animating static images, all in a very short time. The following is a short summary of some of the more impressive GAN improvements/implementations currently turning heads in the deep learning community:
- Deep convolutional GANs (DCGANs): These were the first major improvement to the standard architecture we just covered. We will explore this as our first form of GAN in the next section of this chapter.
- Adversarial Autoencoder GAN: This variation of an autoencoder uses the adversarial GAN technique to isolate attributes or properties of your data. It has interesting applications for determining latent relationships in data, such as being able to tell the difference in style versus content for a set of handwritten digits, for instance.
- Auxiliary Classifier GAN: This is another enhanced GAN that relates to conditioned or conditional GANs. It has been shown to synthesize higher-resolution images and is certainly worth exploring more in gaming.
- CycleGAN: This is a variation that is impressive in that it allows the translation of style from one image to another. There are plenty of examples of this form of GAN being used to style a picture as if Van Gogh painted it, to swapping celebrity faces. If this chapter piques your interest in GANs and you want to explore this form, check out this post: https://hardikbansal.github.io/CycleGANBlog/.
- Conditional GANS: These use a form of semi-supervised learning. This means that the training data is labeled but with meta data or attributes. So, instead of labeling a handwritten digit from the MNIST data set as a 9, you may instead label the writing style (cursive or print). Then, this new form of conditioned GAN can learn not only the digits, but also whether they are cursive or print. This form of GAN has shown some interesting applications and it is one we will explore further when we speak to specific applications in gaming.
- DiscoGAN: This is yet another form of GAN showing fun results, from swapping celebrity hairstyles to genders. This GAN extracts features or domains and allows you to transfer them to other images or data spaces. This GAN has numerous applications in gaming and is certainly worth exploring further for the interested reader.
- DualGAN: This uses dual GANs to train two generators against two discriminators in order to transfer images or data to other styles. This would be very useful as a way of restyling multiple assets and would work nicely for generating different forms of art content for games.
- Least squares GAN (LSGAN): This uses a different form of calculating loss and has been shown to be more effective than the DCGAN.
- pix2pixGAN: This is an extension to conditional GANs that allows it to transfer or generate multiple features from one image to another. This allows for images of the sketch of an object to return an actual 3D-rendered image of the same object or vice versa. While this is a very powerful GAN, it still is very much research-driven and may not be ready for use in games. Perhaps you will just have to wait six months or a year.
- InfoGANs: These types of GANs are, as of yet, used extensively to explore features or information about the training data. They can be used to identify the rotation of a digit in the MNIST dataset, for instance. Also, they are often used as a way of identifying attributes for conditioned GAN training.
- Stacked or SGAN: This is a form of GAN that breaks itself into layers where each layer is a generator and discriminator battling it out. This makes the overall GAN easier to train but also requires you to understand each stage or layer in some detail. If you are just starting, this is not the GAN for you, but as you build more complex networks, revisit this model again.
- Wasserstein GANs: This is a state-of-the-art GAN, and it will also get attention in its own section in this chapter. The calculation of loss is the improvement in this form of GAN.
- WassGANs: This uses the Wasserstein distance to determine loss, which dramatically helps with model convergence.
We will explore further instances of specific GAN implementations as we work through this chapter. Here, we will look at how to generate game textures and music with a GAN. For now, though, let's move on to the next section and learn how to code a GAN in Keras.