Software for Science & Math (part II): Getting started with R

Saturday, June 12, 2010 at 3:26 PM Bookmark and Share
A while back I wrote the first post in a series where I'll cover important concepts from Calculus, Probability and Statistics that (IMO) everyone should be familiar with. I wanted to occasionally involve two free software platforms (R and Maxima) in those posts, and I've finally gotten around to a post dedicated to getting started with R.

R is a handy computing platform and great way to learn basic programming skills. It can do basic statistics, plot data or mathematical functions, and provides access to a menagerie of advanced tools via R packages. And, it's all free. R's broad functionality and statistical capabilities makes familiarity with R a valuable skill in the natural sciences.

Getting Started with R

If you haven't already installed R on your computer you should check out this website on downloading and installing R or you can just pick your nearest CRAN mirror (e.g. at UCLA, NCI in Maryland, etc.) and download and install the appropriate version per their instructions.  If the install isn't working, feel free to post questions in the comments below.

Basic Interactive Examples

Assuming you have R installed, here are a few quick examples to give you a feel for R.  I'm throwing a lot into these examples, so consider the code a nice goal to try and wrap your head around.  Reading help files and then changing or otherwise tinkering with the code is a great way to learn the basics.

Also take a minute or two and note the many resources available on the web to help you learn R.  No need to read them all, just know that they're there.  Lots of documentation on the R Project's R Manuals page, though see the other pages listed under Documentation as well. You can also find other tutorials elsewhere on the web on course websites, help forums, and other blogs.  For example, if you wanted to know more about the histogram function hist(), google "hist [R]" or "histogram hist R" etc. until you find what you're looking for.

Example #1: Importing and Plotting Data

First, download these data (as a CSV file) from this this google docs folder.  Here I've taken measles incidence data from the WHO [Source] and put them into a spreadsheet format for easy import into R.

Here's the code to import and plot the data.  You'll first need to change the working directory in R to wherever you saved the .csv file.  Then, you should be able to just paste the code into the console.
#  <- This means that this line is just a comment.  R will ignore it.
# First, we read in the data
Measles = read.csv("WHOs_US_DRC_measles_reports.csv");
names(Measles);  # View the column names ...
head(Measles);   # and the first few rows of the data

# Next, we plot the data and create a legend.
?plot  # ? gets the help page for the function
plot(Measles$Year, Measles$Measles.DRCongo, type="b", pch=20, 
     main="Reported, xlab="Year", ylab="New Cases", col="darkred");
points(Measles$Year, Measles$Measles.US, type="b", pch=20, col="darkblue")
legend("topleft", legend=c("D.R.Congo","U.S."), pch=20, lty=1,
# How would you access the help page for the legend() function?
There's a lot in there, so how does one make any sense of it?  First, read up on how functions work in R, then read the help pages for each function in the code.  These are read.csv(), names(), head(), etc.  What arguments do they take? What values do they return?  Finally, do a little applied science and experiment with the code. Can you change the x- and y-axis labels?  Change the line type? What does pch=20 mean and what happens when you change it?

Also, open up the spreadsheet in some other software and see what the data look like.  Do the plots make sense?  Do you have some other data (e.g. your credit card payments over the past year) you can enter into a CSV spreadsheet and plot?

Example #2: Plotting curves (x vs. y=f(x))

Next, it's often nice to plot lines or curves when all you have are the equations, and R can do that for you in two ways. First, by plotting a function of the form y=f(x)over a range of x values, or by interpolating a series of points specified as (x,y)coordinates.  In the first case, we only need specify the function, whereas in the second case we must define a list of x values and a corresponding list of y values, then let R connect the dots (as in the example above).  Here we'll use both approaches to plot two functions...

Suppose you forgot the relationship between sin(x) and cos(x).  At x=0, which function equals 1 and which equals 0?  One is a shift of the other, but is it shifted by π or π/2?  All of these answers can be answered by quickly plotting the two functions...
## Plot sin(x) and cos(x) two different ways.
## First, we'll let R do all the work for us using the curve() function
curve(sin(x), from=0, to=2*pi, col="blue", ylab="");
## Here curve() treats sin(x) as a function and does the plotting.
## Next, compute 100 x and y values for y=cos(x) and make the plot ourselves.
x=seq(0,2*pi, length=100); # see ?seq for details
y=cos(x);                  # see ?cos for details
points(x,y,type="l", col="red");
legend("bottomleft", legend=c("sin(x)","cos(x)"), col=c("blue","red"), lty=1)
abline(h=0, lty=2); # lets draw in the horizontal line at y=0. ?abline for details.
R has many such built in functions including exp(), log(), inverse trigonometric functions, and probably any other standard mathematical functions you can think of.  The curve()function is handy for looking at otherwise difficult to imagine functions (e.g. if you wanted to know what sin(x)/(2+cos(x)*exp(-x)) + 1 looks like). As long as the first argument to curve() is an expression written as a function of x, it'll plot it for you.

I hope that's enough to get you going with R for now.  In part III, we'll do a few more examples using some useful R packages, and I'll point out a few more resources you might be interested in.  Other requests are always welcome - just drop me an email or post a comment below.


Posted by: Mike Mike | 7/03/2010 12:29 AM

Thanks for this! I've been trying to learn how to run R for a while (I have very little programming experience), and this sort of tutorial is very helpful. Hope to see more!

Posted by: Paul | 7/03/2010 2:08 PM

Glad you found it useful! Hopefully I can find time to get around to part III sometime soon ;)

Post a Comment