Data Science: Productivity Tools
About this Course
A typical data analysis project may involve several parts, each including several data files and different scripts with code. Keeping all this organized can be challenging. Part of our Professional Certificate Program in Data Science, this course explains how to use Unix/Linux as a tool for managing files and directories on your computer and how to keep the file system organized. You will be introduced to the version control systems git, a powerful tool for keeping track of changes in your scripts and reports. We also introduce you to GitHub and demonstrate how you can use this service to keep your work in a repository that facilitates collaborations. Finally, you will learn to write reports in R markdown which permits you to incorporate text and code into a document. We'll put it all together using the powerful integrated desktop environment RStudio.Created by: Harvard University
Level: Introductory
Related Online Courses
We begin with an introduction to the relevant biology, explaining what we measure and why. Then we focus on the two main measurement technologies: next generation sequencing and microarrays. We... more
If you have specific questions about this course, please contact us at sds-mm@mit.edu. Data science requires multi-disciplinary skills ranging from mathematics, statistics, machine learning,... more
Statistics 2 Part 1 is a self-paced course from LSE which aims to develop your knowledge of elementary statistical theory, particularly relating to the concepts, methods and techniques of... more
In the information age, data is all around us. Within this data are answers to compelling questions across many societal domains (politics, business, science, etc.). But if you had access to a... more
In this course, you will learn how to organize your data within the Microsoft Office Excel software tool. Once organized, we will discuss data cleaning. You will learn how to identify outliers and... more