RetouchingFFHQ: A Large-scale Dataset for Fine-grained Face Retouching Detection

ACM MM 2023

Qichao Ying1, Jiaxin Liu1, Sheng Li1,★, Haisheng Xu2, Zhenxing Qian1,* Xinpeng Zhang1
1School of Computer Science, Fudan University, Shanghai, China
2NVIDIA, Shanghai, China

Dataset Aquisition Click "Application Form", fill in necessary information and send the PDF to fudanmaslab@gmail.com, with haoyuewang23@m.fudan.edu.cn and lisheng@fudan.edu.cn copied. Thank you!

Examples of RetouchingFFHQ, a fine-grained face retouching dataset containing over half a million images

Abstract

The widespread use of face retouching filters on short-video platforms has raised concerns about the authenticity of digital appearances and the impact of deceptive advertising. To address these issues, there is a pressing need to develop advanced face retouching techniques. However, the lack of large-scale and fine-grained face retouching datasets has been a major obstacle to progress in this field. In this paper, we introduce RetouchingFFHQ, a large-scale and fine-grained face retouching dataset that contains over half a million conditionally-retouched images. RetouchingFFHQ stands out from previous datasets due to its large scale, high quality, fine-grainedness, and customization. By including four typical types of face retouching operations and different retouching levels, we extend the binary face retouching detection into a fine-grained, multi-retouching type, and multi-retouching level estimation problem. Additionally, we propose a Multi-granularity Attention Module (MAM) as a plugin for CNN backbones for enhanced cross-scale representation learning. Extensive experiments using different baselines as well as our proposed method on RetouchingFFHQ show decent performance on face retouching detection.

Method

Network Design of the proposed MAM

We investigate how humans make predictions without reference to the original faces by scrutinizing the retouched images. Besides geometric distortion or noise-level artifacts left by retouching algorithms, we find that the other critical factor relies on the features that can be learnt from multiple granularities. For instance, given an image with large eyes, a closer look on the eyes would easily lead to the conclusion that the image has undergone eye-enlarging. However, further considering the large occupation of the face in the image as well as the reasonable ratio of eyes and face, we would reconsider the image as not being eye-enlarged. Similar phenomenon could also be observed on other retouching types. Besides, there can exist a non-negligible amount of spatial redundancy within the visual representations. For example, the background and skin regions can be reduced into two tokenized representation containing the averaged statistic of lightning condition and sharpness.

For spatial redundancy reduction, we propose the adaptive token clustering method. For enhanced multi-granularity representation learning, we employ a lightweight two-layered Transformer encoder to analyze and compare multi-granularity information for detection.

BibTeX

        
        @article{ying2023retouching,
        title={RetouchingFFHQ: A Large-scale Dataset for Fine-grained Face Retouching Detection},
        author={Qichao, Ying and Jiaxin, Liu and Sheng, Li and Haisheng, Xu and Zhenxing, Qian and Xinpeng, Zhang},
        journal={Proceedings of the 31th ACM International Conference on Multimedia},
        year={2023}
        }