Data Science : The Data Tidying Process with R

Loading...
Thumbnail Image
Authors
Roodbergen, Anna
Issue Date
2018
Type
Thesis
Language
en_US
Keywords
Research Projects
Organizational Units
Journal Issue
Alternative Title
Abstract
An enormous amount of data is generated every day. All of this data can be analyzed using concepts from data science. Data science is the science of extracting information from a data set to gain knowledge (Nongxa, 2017). In order to analyze a data set it first has to be in a tidy format. To be considered a tidy data set, (1) each observation has to be in its own row, (2) each variable has its own column, and (3) each value has to be in its own cell (Wickham, 2014). This paper discusses the key concepts of the data tidying process and uses a case study to demonstrate how they can be used. The R for Data Science textbook by Garrett Grolemund and Hadley Wickham and the RStudio program were used to learn key data science and data tidying concepts. i
Description
vi, 71 p.
Citation
Publisher
Kalamazoo College
License
U.S. copyright laws protect this material. Commercial use or distribution of this material is not permitted without prior written permission of the copyright holder.
Journal
Volume
Issue
PubMed ID
DOI
ISSN
EISSN