rMaps Mexico map

It’s exciting when great people help each other get things done

This is a simple networking story, which might not be surprising to some but I was happily surprised by it. This is how the story goes:

Two weeks ago rMaps (Vaidyanathan, 2014) was released. After making a blog post about it I thought about using it to make a map of the homicide rate in Mexico over the recent years. First, I had the question of how to make custom maps with rMaps. @tyokota had the same question and started asking Ramnath about it in rMaps issue 6. Then I realized I needed a specific file with the map information. Google lead me to @diegovalle who has created the map from official Mexican sources, downloaded the homicide data, cleaned it, and made several maps and analyses: all his work is very impressive! I thought that it’d be very cool if @diegovalle and Ramnath connected, and they did! I saw them interacting via Twitter (here and here) and via GitHub. After sharing @diegovalle’s work with my friends, it turned out that some old friends already knew him (here and high school friends). Another friend was interested in additional features and I suggested her to contact @diegovalle via Twitter: he quickly replied as you can see here.

Beyond how impressive rMaps and @diegovalle’s work on mexican data are, I was amazed by the willingness to help each other and how great people easily connected. I believe this is one of the great features of both GitHub and Twitter where you can share your code, ask questions, try to answer them, meet people working with your tools, etc. You can even offer to PayPal a beer like @tyokota did.

After all their great work, now someone like me (aka, without knowing javascript, Datamaps, etc) can walk you through an example of making an interactive choropleth map showing the homicides rates in Mexico from 1997 to 2013.

Homicides rates in Mexico 1997-2013

The first thing we need to make a custom map using rMaps is a topojson file which in this case specifies the mexican states boundaries. This process is explained in more detail by @tyokota at custom-map which you can view here.

In this particular example, INEGI which is the National Institute of Statistics and Geography of Mexico has a map of the mexican states. @diegovalle explained how to download it here.

But before doing so, you might to install topojson like I did below following the installation instructions. In the terminal:


## Install node.js following instructions at https://github.com/mbostock/topojson/wiki/Installation
brew install node
## Install topojson
npm install -g topojson

## Download map info from INEGI (Mexican official source)
curl -o estados.zip http://mapserver.inegi.org.mx/MGN/mge2010v5_0.zip
## Decompress file
unzip estados.zip
## Create shapefile
ogr2ogr states.shp Entidades_2010_5.shp -t_srs "+proj=longlat +ellps=WGS84 +no_defs +towgs84=0,0,0"
## id-property needed so that DataMaps knows how to color the map
topojson -o mx_states.json -s 1e-7 -q 1e5 states.shp -p state_code=+CVE_ENT,name=NOM_ENT --id-property NOM_ENT

Now that we have the topojson file mx_states.json we need to get the actual homicide data. @diegovalle has gone through the whole process of acquiring the data from official mexican sources and cleaning it. Lets download it.

# Download crime data
## From crimenmexico.diegovalle.net/en/csv
## All local crimes at the state level
download.file("http://crimenmexico.diegovalle.net/en/csv/fuero-comun-estados.csv.gz", 
	"fuero-comun-estados.csv.gz")

The data is not completely ready for us to use it and we need to reshape it a bit. In particular, we want to consider only the intentional homicides and group the data by state and date. We can get this to work by using dplyr (Wickham & Francois, 2014).

## Load required packages
library("dplyr")

## Load the crime data
crime <- read.csv("fuero-comun-estados.csv.gz")

## Only intentional homicides
crime <- subset(crime, crime == "HOMICIDIOS" & type == "DOLOSOS")

## Sum homicides by firearm, etc and group by state and date
hom <- crime %.%
  filter(year %in% 1997:2013) %.%
  group_by(state_code, year, type) %.%
  summarise(total = sum(count, na.rm = TRUE),
            population = mean(population) ) %.%
  mutate(rate = total / population * 10^5) %.%
  arrange(state_code, year)

## How are states coded?
summary(hom$state_code)
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##    1.00    8.75   16.50   16.50   24.20   32.00

We have the slight inconvenience that states are coded as integers from 1 to 32 instead of using their names. Using another of the files supplied by @diegovalle we can merge the codes. This requires using the foreign (R Core Team) package for loading a dbf file and then merging both data sets with plyr (Wickham, 2011).

## Needed for read.dbf
library("foreign")

## The dbf from the state shapefile needed to merge state_code with state
## names
codes <- read.dbf("states.dbf")
codes$NOM_ENT <- iconv(codes$NOM_ENT, "windows-1252", "utf-8")
codes$CVE_ENT <- as.numeric(codes$CVE_ENT)
codes$OID <- NULL
names(codes) <- c("state_code", "name")

## Load plyr for join(). Loading it before creates a problem with the dplyr
## call above
library("plyr")

## Names needed for the map
hom <- join(hom, codes)

## Lets look at the data
head(hom)
##   state_code year    type total population   rate           name
## 1          1 1997 DOLOSOS   355     958126 37.051 Aguascalientes
## 2          1 1998 DOLOSOS    66     975585  6.765 Aguascalientes
## 3          1 1999 DOLOSOS    27     992515  2.720 Aguascalientes
## 4          1 2000 DOLOSOS    14    1009215  1.387 Aguascalientes
## 5          1 2001 DOLOSOS    22    1026437  2.143 Aguascalientes
## 6          1 2002 DOLOSOS    26    1044578  2.489 Aguascalientes

Great! We now have state names under name and the intentional homicide rate under rate (in homicides per 100,000) for each specific year. We can thus proceed to making the interactive choropleth map using the ichoropleth function described by Ramnath here. This requires specifying the topojson file which is specified via dataUrl, the name of the map specified via scope and the most tricky part (for me at least) is that we need to specify the setProjection. These are all properties of the Datamaps library. In particular, the wiki describes how to use custom maps but this requires some javascript knowledge.

## Make the map
library("rMaps")
d1 <- ichoropleth(rate ~ name, data = hom, ncuts = 9, pal = 'YlOrRd', 
    animate = 'year',  map = 'states'
)
## Note that I am hosting the mx_states.json in Dropbox
## but if you are doing it locally, you only need
## dataUrl = "mx_states.json"
d1$set(
  geographyConfig = list(
   dataUrl = "https://dl.dropboxusercontent.com/u/10794332/mx_states.json"
  ),
 scope = 'states',
 setProjection = '#! function( element, options ) {
   var projection, path;
   projection = d3.geo.mercator()
    .center([-89, 21]).scale(element.offsetWidth)
    .translate([element.offsetWidth / 2, element.offsetHeight / 2]);

   path = d3.geo.path().projection( projection );
   return {path: path, projection: projection};
  } !#'
)
d1$save('rMaps.html', cdn = TRUE)

The end result is shown below:

You can also share the map using the publish method as shown below:

d1$publish("Intentional homicides rates for Mexico 1997-2013")
## You'll need a GitHub account

You will get a link to the rCharts viewer such as this one or if you prefer, you can also view the result using Pagist as shown here.

The code presented in this post was written by @diegovalle which can you view here and Ramnath which is shown here. I also figured out the trick of hosting the topojson file at Dropbox from @tyokota’s code as I was running into Access-Control-Allow-Origin errors when hosting it in my academic website. Finally, but not least, Ramnath appropriately insists that all of this would not be possible without libraries such as Datamaps.

References

Citations made with knitcitations (Boettiger, 2014).

Reproducibility

sessionInfo()
## R version 3.0.2 (2013-09-25)
## Platform: x86_64-apple-darwin10.8.0 (64-bit)
## 
## locale:
## [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
## 
## attached base packages:
## [1] stats     graphics  grDevices utils     datasets  methods   base     
## 
## other attached packages:
## [1] rMaps_0.1.1         plyr_1.8            foreign_0.8-59     
## [4] dplyr_0.1.1         knitcitations_0.5-0 bibtex_0.3-6       
## [7] knitr_1.5          
## 
## loaded via a namespace (and not attached):
##  [1] assertthat_0.1     digest_0.6.4       evaluate_0.5.1    
##  [4] formatR_0.10       grid_3.0.2         httr_0.2          
##  [7] lattice_0.20-24    rCharts_0.4.2      RColorBrewer_1.0-5
## [10] Rcpp_0.11.0        RCurl_1.95-4.1     RJSONIO_1.0-3     
## [13] stringr_0.6.2      tools_3.0.2        whisker_0.3-2     
## [16] XML_3.95-0.2       xtable_1.7-1       yaml_2.1.10

Check other topics on #rstats.

Leonardo Collado-Torres
Leonardo Collado-Torres
Investigator @ LIBD, Assistant Professor, Department of Biostatistics @ JHBSPH

#rstats @Bioconductor/🧠 genomics @LieberInstitute/@lcgunam @jhubiostat @jtleek @andrewejaffe alumni/@LIBDrstats @CDSBMexico co-founder

comments powered by Disqus