Goals of this lab.

At the end of the in-class portion of the lab you will be asked to apply what you have learned to a new data set. Your solutions will be written in an Rmarkdown document that will contain your code and the output of your code.

We begin with the fundamental building blocks of R and RStudio: the interface, reading in data, and some basic commands.

The RStudio interface

The RStudio interface.

Initially there are three panes in the RStudio interface. The pane in the upper right contains your Environment workspace as well as a History of the commands that you’ve previously entered. When you import data or define a variable they will appear in your Environment.

Any Plots that you generate or files that you upload will show up in the pane in the lower right corner. The lower right also contains a list of installed Packages that you can click on to put in your working library. R packages have complex dependency relationships, but often if you need a package installed then R will ask if you want to install it.

The pane on the left is where the action happens. It’s called the Console. Everytime you launch RStudio, it will have the same text at the top of the console telling you the version of R that you’re running. Below that information is the prompt. As its name suggests, this prompt is really a request, a request for a command. Initially, interacting with R is all about typing commands and interpreting the output. These commands and their syntax have evolved over decades (literally) and now provide what many users feel is a fairly natural way to access data and organize, describe, and invoke statistical computations.

To get you started, enter the following command at the R prompt (i.e. right after > on the console). You can either type the command in manually or copy and paste them from this document.

\[ \textrm{ install.packages(''readr") }\] This command makes the ‘readr’ package available for us to use. As the name suggests, the ‘readr’ library is used to read a data file into R.

Next use the same ‘install.packages( )’ command to install the ‘tidyverse’ package that contains tools for creating graphics and data manipulation, the ‘rmarkdown’ package will be used to make the lab reports that you will turn in, and the ‘openintro’ package that contains a template that you can use for your lab reports.

The installation step above will only be done once. After installing these four packages they will be listed in the lower right pane under Packages. You can put a checkmark next to these four packages to use the packages or, alternatively, you can enter the library( ) commands shown below. (From here on you can copy and paste the commands.)

#copy and paste these commands into the console
library(readr)
library(tidyverse)
library(rmarkdown)
library(openintro)

R as a calculator

Before doing anything fancy, notice that R can be used as a calculator.

#enter these commands into R, run the code, then alter it slightly to see what happens.
x=5
y=7
z= x^2+3*y-10
z
## [1] 36

R can also be used as a calculator on data. Before trying the calculations below, draw a dot plot of the data \[2, 3, 4, 5, 6\] and calculate the mean (recall, the mean is the balance point of the data).

data_1 = c(2, 3, 4, 5, 6)
data_2 = 2*data_1
mean_1 = mean(data_1)
SD_1 = sd(data_1)
mean_1
## [1] 4
SD_1
## [1] 1.581139

Reading in data from a website.

Next let’s read in a data set from our textbook. The data is available here <link http://www.isi-stats.com/isi/data/> under Chapter 2, College Midwest. The observational units of the data are students from a college in the midwest. After reading the data you should see data in the Environment pane with name ‘CollegeMidwest’ containing 2919 observational units and 2 variables.

CollegeMidwest <- read_table2("http://www.isi-stats.com/isi/data/chap3/CollegeMidwest.txt")
## Warning: `read_table2()` was deprecated in readr 2.0.0.
## Please use `read_table()` instead.

Basic commands to view the data

The following commands can be used to viewing parts of the data: glimpse, head, tail.

glimpse(CollegeMidwest)
## Rows: 2,919
## Columns: 2
## $ OnCampus <chr> "N", "N", "N", "N", "N", "Y", "Y", "Y", "N", "Y", "Y", "Y", "…
## $ CumGpa   <dbl> 2.92, 3.59, 3.36, 2.47, 3.46, 2.98, 3.07, 3.79, 3.21, 3.67, 3…

To look at the first six rows of the data use the ‘head( )’ command.

head(CollegeMidwest)
## # A tibble: 6 × 2
##   OnCampus CumGpa
##   <chr>     <dbl>
## 1 N          2.92
## 2 N          3.59
## 3 N          3.36
## 4 N          2.47
## 5 N          3.46
## 6 Y          2.98

To look at the last six rows of the data use the ‘tail( )’ command.

tail(CollegeMidwest)
## # A tibble: 6 × 2
##   OnCampus CumGpa
##   <chr>     <dbl>
## 1 Y          3.09
## 2 Y          2.8 
## 3 Y          4   
## 4 N          3.35
## 5 Y          3.33
## 6 Y          2.99

You can also viiew the data by clicking on CollegeMidwest in the Environment pane.

Data Calculations

The command below extracts the gpa data from CollegeMidwest. Try it. Can you calculate the mean gpa? The standard deviation of the gpas? Try it!

GPA <- CollegeMidwest$CumGpa
summary(GPA)
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##   0.000   3.010   3.410   3.288   3.700   4.000

Data tables and graphs

We can create a dotplot of the gpa data

Next we create a table of counts from the OnCampus variable in CollegeMidwest. The output consists of the number of students reporting no, they don’t live on campus and the number reporting yes, they do live on campus.

table_OC <- table(CollegeMidwest$OnCampus)
table_OC
## 
##    N    Y 
##  654 2265

From this table we can create a bar chart of the categorical OnCampus variable using the following command. Notice that this command uses the table we created above.

barplot(table_OC, main="Campus Housing Distribution")

An alternative collection of commands to create a bar chart is shown below. These commands are using the tidyverse package that we installed at the beginning of our lab. The command starts with ‘ggplot’ which is an abbreviation of grammar of graphics. Inside the ggplot function is aes( ) which adds the aesthetics to our graphic. Next comes the geom_bar which tells R that we want to create a bar chart. An last is the ggtitle which adds the title to the graph. The plus command, ‘+’, is used to put the pieces together ggplot(…)+ geom_bar( )+ ggtitle( ) to create the final graphic.

ggplot(data=CollegeMidwest, mapping=aes(OnCampus, fill=OnCampus))+geom_bar()+ggtitle("Bar Chart of Housing on Campus")

Have questions about ggplot? Try the command below.

?ggplot

This next part of the lab contains exercises for you to complete and turn in using an R markdown file. To open a new Rmarkdown file use the New file button in the upper lefthand corner of the screen. Click the + blankpage and from the dropdown menu select R Markdown, select From Template, select LabReport {openintro}, and name your document FirstnameLastname-Lab1.Rmd

NationalAnthemTimes <- read_table2("http://www.isi-stats.com/isi/data/prelim/NationalAnthemTimes.txt")
## Warning: `read_table2()` was deprecated in readr 2.0.0.
## Please use `read_table()` instead.
## 
## ── Column specification ────────────────────────────────────────────────────────
## cols(
##   Year = col_double(),
##   Genre = col_character(),
##   Sex = col_character(),
##   Time = col_double()
## )
table_genre <- table(NationalAnthemTimes$Genre)
barplot(table_genre, main="National Anthem Singers by Genre, 1980-2019")

ggplot(NationalAnthemTimes, aes(Genre, fill=Genre))+geom_bar()+ggtitle("1980-2019 National Anthem Singers by Genre")