General-Purpose Image-to-Image Translation (Pix2Pix)

Published:

Keywords: Computer Vision, Generative Adversarial Networks (GANs), Domain Adaptation, PyTorch

Research Objective

The objective of this project was to verify the universality of the Conditional GAN (Pix2Pix) architecture. While often demonstrated on specific tasks like map generation, a robust translation framework must generalize across different data distributions without hyperparameter re-tuning.

We analyzed the model’s performance on distinct paired datasets to evaluate its ability to learn both low-frequency structures (geometry) and high-frequency textures (photorealism).

Experimental Analysis

The implementation was verified across multiple translation tasks, confirming the architecture’s versatility:

1. Aerial $\leftrightarrow$ Map Translation

  • Challenge: The model must hallucinate abstract symbols (roads) from noisy satellite textures and vice-versa.
  • Finding: The U-Net skip connections were critical here. Without them, the generated maps lacked road connectivity. The verification showed that the model successfully learned to suppress satellite “noise” (trees/cars) to produce clean map tiles.

2. Facades & Structural Verification

  • Challenge: Translating semantic labels to photorealistic building facades.
  • Finding: The PatchGAN discriminator proved essential for generating sharp architectural details (windows, balconies). Using a standard $L2$ loss resulted in blurry, “averaged” facades, whereas the Adversarial Loss forced the generator to commit to sharp, realistic edges.

Conclusion on Architecture

The comprehensive analysis confirms that the cGAN + U-Net + PatchGAN combination is a robust general-purpose solver for image-to-image translation.

  • Structural Consistency: The model does not merely memorize the training set; it learns a translation mapping that respects the underlying geometry of the input (whether it is a road network or a building facade).
  • Loss Balance: The weighting between $\mathcal{L}{L1}$ (100) and $\mathcal{L}{GAN}$ (1) was verified to be stable across these diverse domains, requiring no task-specific adjustment.

View Analysis Notebook on GitHub