In the following scientific article on the use of data sets in AI research the authors found that there is an “increasing concentration on fewer and fewer datasets introduced by a few elite institutions”:
We find increasing concentration on fewer and fewer datasets within most task communities. Consistent with this finding, the majority of papers within most tasks use datasets that were originally created for other tasks, instead of ones explicitly created for their own task—even though most tasks have created more datasets than they have imported. Lastly, we find that these dominant datasets have been introduced by researchers at just a handful of elite institutions.
Found via this article shared on HackerNews.
Auguste Comte discusses the prerequisite of theory for observations1:
The most important of these reasons arises from the necessity that always exists for some theory to which to refer our facts, combined with the clear impossibility that, at the outset of human knowledge, men could have formed theories out of the observation of facts. All good intellects have repeated, since Bacon’s time, that there can be no real knowledge but that which is based on observed facts. This is incontestable, in our present advanced stage; but, if we look back to the primitive stage of human knowledge, we shall see that it must have been otherwise then. If it is true that every theory must be based upon observed facts, it is equally true that facts cannot be observed without the guidance of some theory. Without such guidance, our facts would be desultory and fruitless; we could not retain them: for the most part we could not even perceive them.
The Conversation article on the first gene-edit baby and how we need more responsible research: “Hard work is now needed by scientists, ethicists, policymakers and the public at large to figure out how to reverse this trend and return reproductive medicine to a path of responsible research and innovation.”.