2021-12-06T19:48:46+01:00
Dec 6, 2021
·
1 min read
In the following scientific article on the use of data sets in AI research the authors found that there is an “increasing concentration on fewer and fewer datasets introduced by a few elite institutions”:
We find increasing concentration on fewer and fewer datasets within most task communities. Consistent with this finding, the majority of papers within most tasks use datasets that were originally created for other tasks, instead of ones explicitly created for their own task—even though most tasks have created more datasets than they have imported. Lastly, we find that these dominant datasets have been introduced by researchers at just a handful of elite institutions.
Found via this article shared on HackerNews.

Authors
Wouter Van Rossem is a researcher on the intersection between social science and computer science. He previously worked on the European Research Council (ERC) funded project, Processing Citizenship, where he investigated how data infrastructures for population processing co-produce citizens, Europe, and territory. He completed his PhD at the University of Twente in the Netherlands and is still working on publications stemming from these impactful projects. In addition to his academic pursuits as a PhD at the University of Twente in the Netherlands, he brings a diverse background as a software engineer, having worked in various companies and at the European Commission’s Joint Research Centre in Italy. His diverse background, spanning both theoretical and hands-on knowledge, reflects his keen interest in exploring the intricate interconnections between technology and society.