The Agnostic Structure of Data Science Methods

Fiche du document

Date

2021

Type de document
Périmètre
Langue
Identifiants
Collection

Archives ouvertes

Licence

info:eu-repo/semantics/OpenAccess




Citer ce document

Domenico Napoletani et al., « The Agnostic Structure of Data Science Methods », HAL-SHS : histoire, philosophie et sociologie des sciences et des techniques, ID : 10670/1.7facej


Métriques


Partage / Export

Résumé En

In this paper we argue that data science is a coherent approach to empirical problems that, in its most general form, does not build understanding about phenomena. We start by exploring the broad structure of mathematization methods in data science, organized around the belief that if enough and sufficiently diverse data are collected regarding a certain phenomenon, it is possible to answer all relevant questions about it. We call this belief ‘the microarray paradigm’ and the approach to empirical phenomena based on it `agnostic science’.Not all computational methods dealing with large data sets are properly within the domain of agnostic science, and we give an example of an algorithm, PageRank, that relies on large data processing, but such that the significance of its output are readily intelligible.Within the new type of mathematization at work in agnostic science, mathematical methods are not selected because of any particular relevance for a problem at hand. Rather, mathematical methods are applied to a specific problem only on the basis of their ability to reorganize the data for further analysis and the intrinsic richness of their mathematical structure. We refer to this type of mathematization as `forcing’.We then show that optimization methods are used in data science by forcing them on problems. This is particularly significant since virtually all methods of data science can be reinterpreted as types of optimization methods. In particular, we argue that deep learning neural networks are best understood within the context of forcing optimality.We finally explore the broader question of the appropriateness of data science methods in solving problems. We argue that this question should not be interpreted as a search for a correspondence between phenomena and specific solutions found by data science methods. Rather, it is the internal structure of data science methods that is open to forms of understanding. As an example, we offer an analysis of ensemble methods, where distinct data science methods are combined in the search for the solution of a problem and we speculate on the general structure of the data sets that are most appropriate for such methods.

document thumbnail

Par les mêmes auteurs

Sur les mêmes sujets

Sur les mêmes disciplines

Exporter en