How to Upload Data Set From Desktop Into R
Importing Data into R
A tutorial about information analysis using R
Dr Jon Yearsley (Schoolhouse of Biology and Environmental Scientific discipline, UCD)
- Objectives
- Organise yourself!
- Data Workflow
- Format your information (tidy data)
- Data frames
- Importing spreadsheet data
- Summary of the topics covered
- Further Reading
How to Read this Tutorial
This tutorial is a mixture of R code chunks and explanations of the code. The R code chunks will announced in boxes.
Beneath is an example of a chunk of R code:
# This is a clamper of R code. All text later a # symbol is a annotate # Set working directory using setwd() function setwd('Enter the path to my working directory') # Clear all variables in R's memory rm(list= ls()) # Standard code to clear R'south memory Sometimes the output from running this R lawmaking will exist displayed after the chunk of code. R output will be preceeded by ##.
Hither is a chunk of code followed by the R output
two + four # Use R to add two numbers ## [i] 6 Objectives
The objectives of this tutorial are:
- Demonstrate good practise in data organisation
- Introduce plain text file formats for information
- Explain data import into R
Organise yourself!
Before you lot start importing data into R you should take time to organised your workspace on your reckoner:
- Create a folder on your computer to comprise all your work for this detail project (e.yard. a binder chosen DataModule)
- Inside this project folder create another folder called
data. This volition agree all the raw data files. These raw data files should non be changed. - Within this projection folder create a text file called
MyFirstScript.R. You tin can employ RStudio for this (for this utilize File->New File->R Script menu option) or whatever basic text editor to exercise this (eastward.thousand. Notepad, TextEdit, gedit, emacs). This file volition be your R script that volition incorporate all the commands for R. The.ror.Rsuffixes is the standard suffix for an R script. - If you are starting a big project consider creating separate binder for: R scripts, figures, output from the R script
Your first R script
Now you have created the file MyFirstScript.R you should put some header text at the start of the file to explicate what the R script will do. This was described in tutorial i.
Video Tutorial: Creating a new R script with RStudio (1 min)
The text should have a curt explanation of the R script followed by your name and the date you wrote the R script. Each line should start with a # so that the text is not interpreted by R (this text is for humans so they understand what the file is intended to do). Here is an example,
# ********** Beginning of header ************** # Title: <The title of your R script> # # Add together a short description of the R script hither. # # Writer: <your name> (email address) # Date: <today'southward date> # # *********** End of header **************** # Two common commands at the start of an R script are: rm(listing=ls()) # Clear R's memory setwd('~/DataModule') # Set the working directory # Replace '~/DataModule' with the proper name of your ain directory # ****************************************** # Write your commands below. # Remember to employ comments to explain your commands Writing clear R scripts
An R script isn't simply telling the estimator how to perform calculations on your data. It is as well explaining your working to other human beings.
"Instead of imagining that our primary job is to instruct a computer what to do, let u.s.a. concentrate rather on explaining to human beings what we want a estimator to do." – Donald Due east. Knuth
To make your R scripts usable by humans they must be clearly commented (using the # symbol to start a annotate) and clearly organised.
As yous write an R script consider these questions:
- Does your R script look well organised (e.g. is it well spaced, are lines indented logically)?
- Could someone else read the R script and understand the bones idea?
- Could someone else change your R script relatively easily?
- In a couple of months fourth dimension could you quickly read and edit your ain R script?
Professional person data analysts take clarity very seriously. Here are some links to R coding mode guides:
- Google'due south way guide, https://google.github.io/styleguide/Rguide.xml
- Hadley Wickham'southward style guide, http://adv-r.had.co.nz/Style.html
- http://www.stat.ubc.ca/~jenny/STAT545A/block19_codeFormattingOrganization.html
- http://nicercode.github.io/weblog/2013-04-05-why-dainty-code/
Information Workflow
Beneath is a schematic of the workflow for treatment data.
In this tutorial nosotros will consider formating data, in the adjacent tutorial nosotros'll hash out importing data, then we'll get-go to consider exploring the data using graphics and numerical summaries.
Format your data (tidy information)
The workflow starts long before you lot analyse your data. It starts even before you lot take your data in some computer software.
Organising your data should follow tidy data guidelines (see below) and exist planned earlier you collect your information. The format of the information should be finalised before importing the data into R. It is often easiest to tidy your data using a spreadsheet program before you import the information into R.
Well organised information from the starting time volition brand your life a lot easier and your data import equally painless as possible.
Six guidelines for tidy data
When tidying your data you should ensure that:
- each variable has its own cavalcade
- each row is an ascertainment
- the top of each column contains the name of the variable
- there are no blank columns or blank rows betwixt information
- all data in a column has the same blazon (e.g. it is all numerical information, or it is all text data)
- data are consistent (e.g. if a binary variable tin have values 'Yes' or 'No' so only these two values are allowed, with no alternatives such equally 'Y' and 'Northward')
PDF Summary: This PDF document reiterates the concept of tidy data
The link to the PDF is: http://www.ucd.ie/ecomodel/pdf/TidyData.pdf
Poorly vs well formatted information
The data set shown in the figure below are an example of poorly formatted data. The data set contains data on the pb concentrations (ppm) from three species of fish (whitefish, sucker and trout). Two types of sample were collected: samples from fillets of fish and from whole fish. The data has 3 variables: lead concentration, species of fish and blazon of fish sample.
How would you improve the format of the poorly formatted information shown in the figure? (Hint: use the half-dozen guidelines above)
The second figure shows some well formatted data that follows the tidy data guidelines: each cavalcade represents a unmarried variable and each row an observation.
Data frames
A data frame is R'southward name for spreadsheet data (e.g. data organised in a filigree, similar Excel). R stores the vast majority of data as a data frame and uses data frames when analyzing data.
A data frame forces the information to exist well organised.
- Each column is a variable. The proper name of this variable becomes the name of the cavalcade.
- Each row corresponds to an ascertainment. This meas that values in the same row are data collected about the same object. Rows tin can also have names.
Beneath is an example of a data frame (called airquality) that contains information on the air quality in New York from May - September 1973 (this is a data set that is congenital in to R).
# The airquality data is a built-in dataset # Offset 10 rows of the airquality data frame head(airquality, n= 10) ## Ozone Solar.R Wind Temp Month Mean solar day ## 1 41 190 7.four 67 five 1 ## ii 36 118 eight.0 72 5 2 ## iii 12 149 12.half dozen 74 five 3 ## iv 18 313 11.5 62 5 four ## 5 NA NA 14.3 56 5 five ## 6 28 NA 14.9 66 5 6 ## 7 23 299 viii.6 65 5 7 ## 8 19 99 thirteen.eight 59 five eight ## 9 8 nineteen 20.one 61 five ix ## 10 NA 194 8.6 69 v 10 Y'all tin can type ?airquality to display the help file for this data set. The data frame has 154 rows (observations) and half dozen columns (variables measured). The vi columns contain information on: ozone concentrations (parts per billion), solar radiations, wind speed, air temperature, calendar month and day of ascertainment. You can see that each cavalcade has a proper name corresponding to the information for that column.
The structure of the data frame can be viewed using the str() function
# Display the construction of the airquality data frame str(airquality) ## 'data.frame': 153 obs. of 6 variables: ## $ Ozone : int 41 36 12 eighteen NA 28 23 19 eight NA ... ## $ Solar.R: int 190 118 149 313 NA NA 299 99 xix 194 ... ## $ Wind : num vii.4 viii 12.6 xi.v 14.3 14.ix 8.half-dozen 13.8 xx.1 viii.6 ... ## $ Temp : int 67 72 74 62 56 66 65 59 61 69 ... ## $ Calendar month : int 5 five 5 five 5 5 v 5 5 5 ... ## $ 24-hour interval : int 1 2 3 4 five six 7 8 nine 10 ... The str() function shows that this is a information frame with 153 observations (rows) and six variables (columns). It likewise shows the information tyes of the variables: wind is a numerical variable (i.east. continuous) and the other variables are all integers (i.eastward. whole numbers).
Tidy data in R is described in more detail on this web page: https://cran.r-projection.org/web/packages/tidyr/vignettes/tidy-data.html
Tibbles
A recent evolution (circa 2016) is an improved data frame called a tibble. Nosotros volition not hash out these new data frame objects here, but yous can read about them at https://cran.r-projection.org/web/packages/tibble/vignettes/tibble.html.
Don't Panic! Tibbles are very like to data frames.
The important betoken to know is that if you apply RStudio's GUI interface to import data and so your data will be stored in a tibble, non a data frame.
Importing spreadsheet information
To commencement working with data in R you need to import your information into R. You are aiming to have a information frame that contains your data.
The simplest way to import information into R is from a text file (https://en.wikipedia.org/wiki/Text_file). Text files (sometimes chosen apartment files) tin can be read past any computer operating organization and by many different statistical programs. Saving data as a elementary text file makes your information highly transportable.
Importing data from software specific formats (due east.thousand. Excel's .XLSX format, Minitab'southward .MTW format, SPSS's .SAV format or SAS's .SAS format) is possible (e.g. using RStudio's Import Dataset GUI). If you desire your data to be easily shared with other people then use a text file to store your data.
We advise you to:
- save your data equally a text file (software, such as Excel, often have an pick to save data every bit plain text)
- organize data with columns corresponding to different variables before exporting to the text file
- utilize a visible text character to delimit each cavalcade (usually a comma, semi-colon). Using an invisible graphic symbol (east.k. a infinite or a TAB) is not recommended because these characters all wait the same at first glance.
General advice on importing data into R can be found at https://cran.r-project.org/doc/manuals/r-release/R-data.html
Converting data to a CSV text file
A comma separated values file (CSV file) is the most common format for a text file that contains data.
Here are a few video tutorials on converting data into a CSV text file so that it is suitable for import into R.
Video Tutorial: Converting data from EXCEL to a CSV format (3 mins)
Video Tutorial: Converting data from Googlesheets to a CSV format (1 min)
Viewing text files
Before importing a text file into whatsoever software package it is a huge help if y'all can wait at information technology in a text editor. Text files can contain characters that are normally invisible (e.g. spaces, tabs and stop of line markers). If a text editor is going to be of use it must exist able to brandish all the characters in a file.
Three text editors that tin do this are:
notepad++ is a gratis plan for Windows operating systems
BBedit is a free program for Mac OSX operating systems
emacs is a GNU opensource programme primarily for Linux operating systems.
On Linux systems the
cat -A command from the terminal is besides useful.
Here are two video tutorials on this topic
Video Tutorial: Viewing data in a text file before importing into R (4 mins)
Video Tutorial: An overview of the common data text file formats (3 mins)
Data import examples
The data we'll exist importing are described at http://www.ucd.ie/ecomodel/Resources/datasets_WebVersion.html
The files are:
- WOLF.CSV: This file is a text file of comma separated values.
- Elevation.CSV: This file is a text file of comma separated values.
- INSECT.TXT:This file is a text file of TAB delimited values.
- BEEKEEPER.TXT: This file is a text file with blank space delimiting the values.
- MALIN_HEAD.TXT: This file is a text file with TAB delimited values.
All these data files are simple text files that differ in the character used to distinguish columns of data.
Comma delimited files (CSV files)
CSV stands for comma separated values (annotation sometimes semi-colons are used in place of commas because some countries use the comma in identify of the decimal point).
The read.table() function is a flexible function for importing text information
Video Tutorial: Importing a CSV file into R using read.table() (5 mins)
# Import WOLF.CSV file using read.table part wolf = read.table('WOLF.CSV', header= TRUE, sep= ',') The wolf variable contains the imported data. Information technology is called a data frame.
The ideal arrangement of a information frame is for each row to exist an observation of some object and each columns a variable that measures some property of the object. For example, each row of wolf is an observation of 1 individual wolf and each cavalcade of wolf give data about where the wolf was observed and the information collected from its hair sample.
The Tiptop.CSV file also contains comma separated values. Here is the read.table() command to read in this file
# Import HEIGHT.CSV file using read.tabular array role human = read.table('Summit.CSV', header= TRUE, sep= ',') Note: The function read.csv() is a special case of the read.tabular array() function.
Use the R help pages to learn more about these functions
?read.tabular array # Display help page on read.table function TAB delimited files (TXT files)
The INSECT.TXT data set is a text file where variables are delimited past a TAB. In addition the first 3 lines contain a information description that we do not want to import.
The read.tabular array() function can be used to import this file. The argument skip=3 is used to ignore the first iii lines. The statement sep='\t' specifies a TAB as the variable delimiter
# Import INSECT.TXT file using read.tabular array function (TAB delimited) # skipping the first 3 lines (skip=three) insect = read.tabular array('INSECT.TXT', header=T, skip= iii, sep= ' \t ') The MALIN_HEAD.TXT also contains TAB delimited data. Here is the read.table() control to read in this file
# Import MALIN_HEAD.TXT file using read.table function (TAB delimited) rainfall = read.tabular array('MALIN_HEAD.TXT', header=T, sep= ' \t ') Blank space delimited files
The BEEKEEPER.TXT data set uses white space to delimit the variables. The first vi lines of the file incorporate a description of the data
Using read.table() with the statement sep='' will interpret whatsoever space as a variable delimiter.
# Import BEEKEEPER.TXT file using read.table function (white infinite delimited) # skipping the first 6 lines (skip=6) bees = read.tabular array('Beekeeper.TXT', header=T, skip= 6, sep= '') Summary of import commands
| Type of text file | R Command |
|---|---|
| Comma delimited (.CSV) | read.table(<filename>, header=T, sep=',') |
| TAB delimited (.TXT) | read.table(<filename>, header=T, sep='\t') |
| Blank infinite (.TXT) | read.table(<filename>, header=T, sep='') |
# Comma separated values wolf = read.table('WOLF.CSV', header= True, sep= ',') human = read.tabular array('Elevation.CSV', header= Truthful, sep= ',') # TAB delimited values insect = read.table('INSECT.TXT', header=T, skip= 3, sep= ' \t ') rainfall = read.table('MALIN_HEAD.TXT', header=T, sep= ' \t ') # White space delimited values bees = read.table('BEEKEEPER.TXT', header=T, skip= 6, sep= '') Importing information using RStudio
RStudio has its own data import functionality. To utilise this you will need to install the R parcel readr. For more inofmration well-nigh this see RStudio'due south guide: https://support.rstudio.com/hc/en-us/manufactures/218611977-Importing-Data-with-RStudio
Video Tutorial: Importing a CSV file into R using RStudio's GUI (three mins thirteen secs)
Importing data using RStudio will relieve the data every bit a modified data frame, called a tibble (tibbles are briefly discussed to a higher place).
Importing using fread()
fread() is a powerful data import function that is similar to read.table() but faster. It is part of the data.table packet, which you will need to install.
You should only have to give fread() the name of the file you want to import, and fread() will try to work out the appropriate way to import the data. Endeavor some examples and compare the the examples above
# ****************************************** # Other packages for importing data -------- # The information.tabular array package library(data.table) # Load the data.tabular array package # Import a CSV file wolf2 = fread('WOLF.CSV') human2 = fread('HEIGHT.CSV') # Import TAB delimited file insect2 = fread('INSECT.TXT') rainfall2 = fread('MALIN_HEAD.TXT') # Import white space delimited file bees2 = fread('Beekeeper.TXT') The fread() command is simpler to use because it tries to gauge the format of the data in the file.
Summary of the topics covered
- Organizing your files on your computer
- Best exercise for formatting information
- Reading in spreadsheet information
- Data frames
Further Reading
All these books can be establish in UCD's library
- Andrew P. Beckerman and Owen Fifty. Petchey, 2012 Getting Started with R: An introduction for biologists (Oxford University Press, Oxford) [Chapter 2, 3]
- Mark Gardner, 2012 Statistics for Ecologists Using R and Excel (Pelagic, Exeter)
- Michael J. Crawley, 2015 Statistics : an introduction using R (John Wiley & Sons, Chichester) [Chapter 2]
- Tenko Raykov and George A Marcoulides, 2013 Basic statistics: an introduction with R (Rowman and Littlefield, Plymouth)
Source: https://www.ucd.ie/ecomodel/Resources/Sheet2a_data_import_WebVersion.html
0 Response to "How to Upload Data Set From Desktop Into R"
Post a Comment