What is Real Anymore? An AI/ML Image Dataset Using Authenticity Validation and Traceable Origins for Every Data Instance
Keywords: AI-generated, AI-generated image detection, image dataset, image classification
TL;DR: An image dataset consisting of authentically validated photographs with hyper realistic AI counterparts aiding the prevention of manipulation and harm of innocent individuals across the globe.
Abstract: Abstract—This project addresses the increasing challenge of
detecting AI-generated images by creating a novel dataset titled
“What Is Real Anymore?” (WIRA). WIRA comprises two
subsets: the first includes over 2000 images, validated as authen
tically real by a set criterion and sourced from photographs on
Flickr. The second subset consists of hyper-realistic AI-generated
counterparts for each validated Flickr image, aggregated through
the Leonardo.AI commercial API. All Flickr-validated images in
WIRA are credited to their respective photographers and retain
their associated rights. Commercial use of this dataset requires
permission from the photographers or adherence to the copyright
laws of each validated Flickr image used. This document details
the rationale for image authentication, image categories, the
motive for category selection, authenticity validation criterion,
methodology for the creation of the dataset, the computational
resources used, a review of included and excluded decision
records, and potential enhancements to expand WIRA.
Submission Number: 14
Loading