Scientific research is currently done at an unpresedented scale. This is due to a fastly growing scientific community and to technological advances that allow more efficient data gathering. This pace, however, prioritizes speed and impact at the expenses of accessibility and replicability, which is at the center of the current reproducibility crisis in science. Biological sciences have not escaped to this crisis, which affects the credibility of our community and the robustness of the scientific knowledge generated. Fortunately, a variety of tools have been developed to increase the transparency and accessibility of data and the associated analysis that support the conclusions of scientific studies. These tools include online data repositories, version control and continuous integration of data, dynamic reporting, literate programming and the use of research compendiums, to name a few. This online course aims to introduce students into good research and data managing practices throughout the biology research processes that allow guaranteeing transparency, accessibility and reproducibility of scientific production. It makes special emphasis on hands-on experience in the use of free software and open access tools, particularly in R. The course aims to contribute to the scientific community by promoting high quality standards, as well as to the professional development of students and researchers, by preparing them for a job market that is beginning to favor the ability to develop open, collaborative and reproducible science.
Familiarize students with tools and good practices to guarantee transparency, data accessibility and reproducibility in biological scientific research.
Create awareness on the current problems on scientific reproducibility.
Gain experience on free software computational tools that facilitate documentation and accessibility of research (i.e. Rmarkdown, git, github, figshare).
Provide students with hands-on experience on data management practices to improve research reproducibility provided through practices and short individual projects.
Continuous integration is the ability to incorporate new data and code from any collaborator in a collaborative data analysis project (like adding new data/code to a local git repository and upload the changes to github without major issues, closely related to version control). Dynamic reporting is the use of a markup language to generate human readable reports from a data analysis project that are automatically updated when adding new data/code. We will use Rmarkdown to generate these reports. Literate programming is close to dynamic reporting: it’s the practice of thoroughly commenting code so anyone can understand what is going on at each line of code. A research compendium relates to specific folder structures and file organization. So it implies predefined names for folders (e.g. data, scripts, output) which should sound familiar to other users and facilitates reproducibility (it’s also very helpful just to keep things organized). We will also talk about causal analysis (DAGs), git/github, research preregistration and open science online resources. In addition. we will read several papers related to reproducibility (or its crisis) and ways forward.
Previous experience in R.
December 7-18, 2020, Monday to Friday, from 9-12 am CST (Central Standard Time, UTC-GMT -6)
Additional scholarships may be available for students with demonstrated financial need. If you are interested in being considered for a partial scholarship, please make sure to include a request for a partial scholarship in your application. Successful applicants will be individually assessed to determine scholarship eligibility.
Please note seats are limited.
Marcelo Araya Salas, PhD.
Marcelo is an evolutionary behavioral ecologist deeply involved in the development of computational tools for bioacoustic analyses. He is the author of the R packages warbleR, baRulho and Rraven, which provide functions to streamline high-throughput acoustic analysis of animal sounds.