Abstracts
Abstract
The increasing development and adaptation of synthetic data raises critical concerns about the perpetuation of datafication logics. In examining some of synthetic data’s core promises, this dialogue paper aims to uncover the potential harm of further de-politicizing synthetic data. With synthetic data, technological opportunities are introduced that promise to resolve a growing demand for data needed to train AI models. Furthermore, models trained on synthetic data are praised as more precise and effective while bring cheaper than collected data (Zewe 2022). With this dialogue paper, I aim to nuance the ways in which synthetic data complicate a critique directed at AI-driven technologies. I build my argument on two elements fundamental to the debate on the promises and perils of synthetic data. The first is the notion of data scarcity—often leveraged to argue for the implementation and further development of synthetic data to train bespoke models. Second, I discuss the concerns of data pollution and contamination with synthetic data. Through these entry points, I argue that synthetic data re-ignites issues previously raised by scholars in the field of critical data and surveillance studies. Therefore, the aim of this dialogue paper is to call for a critical understanding of synthetic data as living information, much like collected data, and to account for synthetic data and the conditions of its generation in the context of simulated environments.
Keywords:
- synthetic data,
- data scarcity,
- data pollution,
- Artificial Intelligence,
- datafication,
- data contamination
Download the article in PDF to read it.
Download