Enabling Reproducibility in Computational and Data-enabled Science

Victoria Stodden
University of Illinois at Urbana-Champaign

As computation becomes central to scientific research and discovery, new questions arise regarding the implementation, dissemination, and evaluation of computational- and data-enabled methods that underlie scientific claims. Reproducibility in research can be interpreted most narrowly as a simple trace of computational steps that generate scientific findings, and most expansively as an independent re-implementation of an experiment testing the same hypothesis. In this talk I present a new framework for conceptualizing the affordances that support scientific inference including computational reproducibility, transparency, and generalizability of findings, demonstrated by recent two sets of research results on reproducibility. The first evaluates Science journal publication standards that require data and code sharing and finds approximately a 25% computational reproducibility rate, and the second evaluates reproducibility in computational physics experiments without data and code sharing requirements, finding that the main barrier to computational reproducibility is lack of access to artifacts (82%) and of those articles with artifacts barriers are from inadequate documentation of code, data, and workflow information (70.9%), missing code function and setting information, and missing licensing information (75%). Finally, these innovations raise important questions regarding ethics and incentives to engage in new research practices supporting computational and data-enabled research.

Presentation (PDF File)

Back to Workshop II: HPC and Data Science for Scientific Discovery