In the following scientific article on the use of data sets in AI research the authors found that there is an “increasing concentration on fewer and fewer datasets introduced by a few elite institutions”:

We find increasing concentration on fewer and fewer datasets within most task communities. Consistent with this finding, the majority of papers within most tasks use datasets that were originally created for other tasks, instead of ones explicitly created for their own task—even though most tasks have created more datasets than they have imported. Lastly, we find that these dominant datasets have been introduced by researchers at just a handful of elite institutions.

Found via this article shared on HackerNews.