Wasserstein GANs (W-GAN)

4 min readMar 29, 2024

— GANs Series Part 3

Wasserstein GAN (WGAN) is a type of Generative Adversarial Network (GAN) that uses Wasserstein distance (also known as Earth Mover’s Distance) as a loss function instead of traditional loss functions like binary cross-entropy. WGAN addresses some of the challenges faced by conventional GAN models, such as mode collapse and training instability, by providing a more stable and reliable metric for training.

Generator Network:

The generator network in WGAN is responsible for creating synthetic data samples, such as images, from random noise inputs. The goal of the generator is to produce realistic data that closely resembles the true data distribution.

1) The network typically consists of several layers of neural networks, including convolutional or fully connected layers, activation functions like ReLU or Leaky ReLU, and batch normalization.

2) The objective of Generator Network, is not to maximize the discriminator’s classification error (as in traditional GANs) but to minimize the Wasserstein distance between the generated and real data distributions.

3) By minimizing the Wasserstein distance, the generator learns to produce high-quality and diverse samples that span across the data distribution, reducing the risk of mode collapse.

Discriminator Network (Critic) :

The discriminator network in WGAN serves as a critic that evaluates the distance between the generated and real data distributions using the Wasserstein distance.

1) The network goal is to output values that approximate the Wasserstein distance, providing more meaningful feedback to both the generator and the discriminator during training.

2) It typically consists of multiple layers of neural networks, activation functions, and also gradient clipping to stabilize training.

3) Instead of categorizing samples as real or fake, the network focuses on estimating the distance between the distributions, enabling better convergence and learning dynamics.

Description of training losses:

Generator Loss:

The objective of the generator network in a WGAN is to minimize the Wasserstein distance between the generated distribution (generated samples) and the real data distribution. This encourages the generator network to produce realistic data samples that closely match the distribution of real data. The network aims to minimize this loss by generating samples that can fool the critic into giving a low Wasserstein distance estimation, indicating realistic data generation.

The generator loss in WGAN can be formulated as:

Generator Loss = -Mean(Critic(G(z)))

Where:
- (G(z)) is the generated sample produced by the generator from a noise input (z).
- (Critic(G(z))) represents the output of the critic (discriminator) when evaluating the generated sample (G(z)) using the Wasserstein distance metric.

Discriminator (Critic) Loss:

In WGAN, the discriminator network acts as a critic that evaluates the Wasserstein distance between the generated and real data distributions. The discriminator loss is associated with estimating this distance accurately. It aims to maximize this loss by accurately approximating the Wasserstein distance for real and generated samples, guiding the training process to improve the quality of generated samples.

The Wasserstein loss for the discriminator (or critic) is formulated as:

Discriminator (Critic) Loss = Mean(Critic(x)) — Mean(Critic(G(z)))

Where:
- (x) denotes a real data sample from the dataset.
- (Critic(x)) represents the Wasserstein distance estimation for the real sample (x).

WGAN Algorithm

WGAN resolves problems faced by BCE loss:

Stability and Training Dynamics:
- WGANs are designed to provide more stable training dynamics compared to traditional GANs trained with BCE loss. The use of the Wasserstein distance introduces a smoother optimization landscape, making it easier to train the generator and discriminator networks effectively.
- The continuous and differentiable nature of the Wasserstein distance allows for more meaningful gradient updates during training, reducing issues such as vanishing gradients.
Mode Collapse:
- It refers to a situation in which the generator collapses to generate only a limited set of samples, failing to capture the full diversity of the training data distribution.
- WGAN helps mitigate mode collapse by encouraging the generator to produce a diverse set of samples that explore more regions of the data distribution. The Wasserstein distance provides a more stable and meaningful measure of the discrepancy between the generated and real data distributions.
Discriminator Training:
- The discriminator network is trained to output a value that approximates the Wasserstein distance between the generated and real data distributions. This allows the discriminator to provide more informative feedback to the generator, guiding it towards generating data that closely matches the real data distribution.
Gradient Clipping:
- WGANs often utilize gradient clipping, a technique that restricts the magnitude of gradient updates during training. This can help prevent the discriminator’s gradients from becoming too large or too small, contributing to more stable training.

Conclusion:

The Wasserstein GAN loss helps address some of the limitations and challenges associated with training traditional GANs using binary cross-entropy loss. By introducing the Wasserstein distance and promoting more stable optimization, WGANs have shown improved performance in generating high-quality and diverse samples across various domains.

In the next article, we will discuss about Wasserstein distance and issues faced by Wasserstein GAN