R for statistical analysis in Political and Social Sciences - Basic Notes
For any search related to R it is important to first take a look at the main page of the R-project http://www.r-project.org/. This is the main place to start looking for any documentation and help.
There is an important section in this project website: CRAN (Comprehensive R Archive Network). This is the place from where it is possible to get the source code of the program and the extensions available as packages. From there it is important to select a mirror near you.
R documentation is excellent. Once in the program, in front of the command line, some options are available:
> ?command # opens the help page of the concrete 'command'
help.search(“text-to-find”) # Will look for all commands that have some # strings related with “text-to-find”
RSiteSearch(“text”) # Open a browser and searches inside all R # related sites for that concrete text
Nevertheless, if we are not able to solve our doubt, there are also at least three more places to look for:
- Project documentation: An Introduction to R. The most basic manual. Has a very good section with a simple sample session.
- External documentation related to the project: "Contributed Docs". Generally speaking those are extensive documents that cover different aspects of R, in different languages. I specially recommend "R for beginners", by Emmanuel Paradis.
- "R-help" mailing list archives. It allows to search for keywords and sentences amongst the huge amount of help and information generated in the R-help mailing list.
- Saint-Google. Nothing can stop it.
- Rweb Tutorial. Allows a guided course to learn the basics of R.
In addition to that:
- The wiki is nowadays usable and contains a lot of useful code with examples
- The R graph gallery allows to see the code used to generate several types of plots
- Rseek searches all entries reling R and the keyword used
- R Graphical Manual presents plots from all R packages
- VÃÂdeos on Data Analysis with R
- Cookbook for R, tutorial for doing plots, specially using ggplot2.
- , online tutoRials.
- R course
- Computing for Data Analyss, a full course at Coursera.
- foreign: Allows import/export data from different statistical packages (SPSS, Stata, ...)
- R2HTML: Allows export tables to html files.
- 10 must-have packages for social scientists. Includes packages to deal with missing data, panel data .
Useful complementary tools
R console is not sometimes the most useful tool to work with it. The easiest solution is to work with a text editor, and copy and paste the commands, and so keep also a record of our syntax. However, this is not the "best" solution. There are two different applications that work parallel to R in order to facilitate its use: Emacs/XEmacs and Tinn-R.
Emacs / XEmacs i ESS
Tinn-R Colorizes the syntax and allows to send it to R and lots of other useful stuff. Only available under Microsoft Windows (having Vim, Emacs, ESS, ... who needs it in the *NIX world?)
RStudio is a powerful integrated development environment for R.
Reference card (basic)
Reference card (pdf) of the most used commands.
Example scripts commented
- 01-basics.R: Create and manipulate simple objects. (plain text)
- 02-regression.R: How to specify simple regression models. (plain text)
- 03-logistic.R: How to specify simple logistic regression models (plain text)
- 04-multilevel.R: Replicate the analysis that does HLM in its demo (and non-free) version (plain text)
- 05-matrix_algebra.R: Operations with matrices and resolution of a regression using matrix algebra. (plain text)
- 06-variables.R: Recode different types of variables and specify regressions. Special emphasys on variables use in Social Science surveys. (plain text)
Why you should use R?
Last, but not least, why should I use R if my university pays a licence of SPSS/Stata/other-program?
Without being exhaustive, here you have some arguments for using R for research and teaching in the social sciences:
- R is a programming language in itself. It is based in S-PLUS, a language with an important historical tradition. Being a complete programming language increases the possibilities of things that you can do with it. You can order pizza from within R....
- R is a very well documented project.
- It exists an increasing amount of packages adapted for very concrete tasks. You don't have to reinvent the wheel again.
- You can take a look at its source code. We can know, at each moment, what every function is doing and, hence, adapt it to our needs.
- R is a good friend of everybody. It offers different tools to export and import data, matrices, plots, tables, etc... and the majority of open formats and some of the proprietary ones.
More arguments at Why use R?.
And a list of the signals that are sent when one chooses a software for data analysis: Statistics Softwrare Signal.
Apart from being a useful took, it is also a good tool.
- It is Free software (free as in freedom, not as in free beer!)
- The needs that are being covered are the needs of the community of users.
- We can show it in the lectures and use it for teaching purposes, knowing that students will be able to use it as well at home. If we teach using SPSS / Stata students have to work on the university computers or purchase a licence in order to pass the course (what do we say if we make it compulsory to use a Montblanc pen?), or we induce them not to respect licences.
<li>We don't have to teach students to be consumers of a concrete product, but to do things in different ways and with different programs. In fact, we are "producing" students, not consumers of a privative software.</li> <li>We are a public university. If there are programs that do the same tasks as others and are free (as in freedom), and most of them free (as in free beer), it is a social compromise to spend money in those who are not free (in both ways)?</li>
Command line? But if we are in the mouse era!
There are also some reasons to prefer to work with the spartan R command line instead of doing in a mega-ultra-fancy interface that does everything with clicks.... (and without thinking at all)
- Only for having to work with code students get some habits: they should "think" and design the process, instead of just "do something". Usually, "do something" is synonym for clicking buttons to generate as many output as possible without knowing what he/she is actually doing.
<li>Given that we are in the university to do science, it is absolutely mandatory to be able to reproduce the results that we have obtained. This is <b>only</b> possible if we do use syntax and code. And, if we have to use code, let's use it in a program that is designed to work this way, has an elegant syntax and is a programming language in itself.</li>
However, if you prefer to work with a graphical user interface, R has currently different packages to work with: