Finding possible class schedules

Over the weekend my brother wanted to figure out his class schedule for the next semester. He is a veterinary medicine and zootechnology student at UNAM. For this upcoming semester there is a set of classes he has to take and each has 8 or so instructor options. The website where he finds the class times lists about 8 pre-constructed class schedules. So he normally finds one he likes quite a bit, and then manually starts checking if he can change X instructor for Y for a given class. He does this based on the referalls and information he has gathered about the instructors, plus he factors in whether it’d be an overall better schedule. For example, he might prefer to have a packed Tuesday if that means that he can leave early on Friday and avoid classes on Saturday.

The problem is that it’s very easy to make a mistake. You (well he) gets all excited thinking that he’s found the perfect schedule. Only to then realize that there is a conflict between two classes. Or that the practical portion of a class is in a location one hour away from the university, meaning that the schedule he has selected won’t work. This process is very frustrating.

I was watching him and I started to think if I could help him with some code. Turns out that it was straightforward to write some code to find which options are valid. Once I wrote a test case, it took us like half an hour to fill out the data. I know that tomorrow is when he and his classmates start registering for classes, so this information might help his classmates.

First, I define some helper functions. These are rather straightforward but I’ll be using them later on. For example, dias() is just there for typing less.

## Helper functions
dias <- function(d, i) {
    paste0(d, i)
}

extract <- function(m, p) {
    m[[p]]
}

extract_names <- function(m, p) {
    names(m)[p]
}

Next comes the input information. I organized it in a set of nested list objects. The schedule is stored as a character vector. For example, Lucia Eliana’s class meets on Wednesdays (__M__iercoles in Spanish) from 9 to 11 am. I only keep the starting hours (9 and 10 am) because otherwise the code won’t detect valid opitons that include another class that starts at 11 am. For classes that are 1 hour away from the university, we included 1 hour before and 1 hour after the class.

## Input class/prof info and schedule
materias <- list(
    repro = list(
        'lucia eliana' = c(dias('m', 9:10), dias('j', 9:10), dias('v', 8:13)),
        'esquivel lacroix' = c(dias('l', 14:15), dias('ma', 12:18), dias('m', 14:15)),
        'ismael porras' = c(dias('l', 9:10), dias('m', 8:14), dias('j', 9:10)),
        'esquivel lacroix 2' = c(dias('l', 14:15), dias('m', 14:15), dias('v', 12:18)),
        'salvador galina' = c(dias('ma', 8:13), dias('j', 9:10), dias('v', 9:10)),
        'alberto balcazar' = c(dias('l', 15:16), dias('j', 12:18), dias('v', 14:15)),
        'ana myriam boeta' = c(dias('l', 8:13), dias('m', 10:11), dias('v', 9:10)),
        'rafael eduardo paz' = c(dias('l', 11:17), dias('j', 14:15), dias('v', 16:17)),
        'juan heberth' = c(dias('ma', 9:10), dias('m', 11:17), dias('v', 11:12)),
        'vicente octavio mejia' = c(dias('ma', 8:17))
    ),
    economia = list(
        'valentin efren espinoza' = c(dias('l', 8:10), dias('ma', 9:11)),
        'maria del pilar velazquez' = c(dias('l', 16:18), dias('m', 16:18)),
        'arturo alonso pesado' = c(dias('l', 11:13), dias('j', 11:13)),
        'laura mendez' = c(dias('ma', 13:15), dias('m', 18:20)),
        'laura mendez 2' = c(dias('j', 11:13), dias('v', 11:13)),
        'manuela garcia' = c(dias('l', 17:19), dias('v', 16:18)),
        'francisco alejandro' = c(dias('ma', 7:9), dias('j', 7:9)),
        'isaac reyes' = c(dias('m', 13:15), dias('v', 13:15)),
        'jose luis tinoco' = c(dias('ma', 12:14), dias('m', 9:11)),
        'isaac reyes 2' = c(dias('l', 14:16), dias('ma', 14:16))
        
    ),
    bacterianas = list(
        'jose luis gutierrez' = dias('s', 8:11),
        'rodrigo mena' = c(dias('ma', 18:19), dias('j', 18:19)),
        'beatriz arellano' = c(dias('l', 7:8), dias('ma', 10:11)),
        'de la pena, ramirez ortega' = c(dias('j', 18:19), dias('v', 18:19)),
        'ramirez ortega' = c(dias('m', 7:8), dias('j', 7:8)),
        'rodrigo mena 2' = c(dias('ma', 16:17), dias('m', 16:17)),
        'de la pena' = dias('s', 8:11),
        'efren diaz aparicio' = dias('s', 8:11),
        'lucia del carmen favila' = dias('s', 8:11)
    ),
    parasitarias = list(
        'cintli martinez' = c(dias('j', 16:17), dias('v', 18:20)),
        'osvaldo froylan' = c(dias('ma', 18:19), dias('j', 18:20)),
        'maria quintero, agustin perez' = c(dias('ma', 13:14), dias('m', 7:9)),
        'maria quintero' = c(dias('m', 16:18), dias('j', 16:17)),
        'evangelina romero' = c(dias('ma', 7:8), dias('v', 7:9)),
        'guadarrama 01' = c(dias('m', 7:8), dias('j', 11:13)),
        'guadarrama 03' = c(dias('ma', 13:15), dias('v', 7:8)),
        'guadarrama 04' = c(dias('l', 16:17), dias('ma', 18:20)),
        'guadarrama 05' = c(dias('l', 7:9), dias('j', 7:8))
    ),
    diagnosticas = list(
        '1701' = c(dias('l', 11:13), dias('m', 11:16)),
        '1702' = c(dias('j', 13:15), dias('v', 13:18)),
        '1703' = c(dias('ma', 7:9), dias('v', 8:13)),
        '1704' = c(dias('l', 18:20), dias('j', 13:18)),
        '1705' = c(dias('l', 11:13), dias('m', 7:11)),
        '1706' = c(dias('ma', 15:17), dias('m', 15:19)),
        '1707' = c(dias('ma', 10:12), dias('j', 10:15)),
        '1708' = c(dias('l', 18:20), dias('ma', 15:19)),
        '1709' = c(dias('l', 11:13), dias('j', 8:13)),
        '1711' = c(dias('j', 13:15), dias('v', 10:13))
    )
)

Now that the input information is complete, I use expand.grid() to find out all the different possible options.

## Get all the options
options <- expand.grid(lapply(materias, function(x) { seq_len(length(x))}))
dim(options)
## [1] 81000     5

There’s 81,00 of them including the classes that meet on Saturday. You can see why it’s a frustrating process to find which combination of classes work when doing this manually.

Next, I explore all these options to find those that are valid, meaning that none of the classes overlap. I do this by finding which options have no duplicated hours from the character vectors defined earlier. Nothing fancy.

valid <- apply(options, 1, function(input) {
    info <- mapply(extract, materias, input)
    !any(duplicated(unlist(info)))
})

Now that I have the valid options, I can find the names of the instructors for them. There’s 2,847 valid schedules in the end, out of the 81,000. That’s 3.5 percent!

valid_prof <- apply(options[valid, ], 1, function(input) {
    mapply(extract_names, materias, input)
})
ncol(valid_prof)
## [1] 2847

You can search the interactive version here to select only the options with a given instructor. For example, in my brother’s case there are 30 valid options once he decided to prioritize two instructors as shown in the non-interactive table below.

## Ideally, this code would create an interactive table, but it doesn't work for some reason:
#library('DT')
#datatable(t(valid_prof), options = list(pagingType='full_numbers', pageLength=10), rownames = FALSE)
valid_prof <- t(valid_prof)
rownames(valid_prof) <- seq_len(nrow(valid_prof))
top_options <- valid_prof[valid_prof[, 1] == 'lucia eliana' & valid_prof[, 2] %in% c('isaac reyes', 'isaac reyes 2'), ]
kable(top_options, format = 'markdown', row.names = TRUE)
repro economia bacterianas parasitarias diagnosticas
5 lucia eliana isaac reyes 2 jose luis gutierrez cintli martinez 1701
11 lucia eliana isaac reyes 2 rodrigo mena cintli martinez 1701
14 lucia eliana isaac reyes 2 beatriz arellano cintli martinez 1701
19 lucia eliana isaac reyes 2 ramirez ortega cintli martinez 1701
25 lucia eliana isaac reyes 2 de la pena cintli martinez 1701
31 lucia eliana isaac reyes 2 efren diaz aparicio cintli martinez 1701
37 lucia eliana isaac reyes 2 lucia del carmen favila cintli martinez 1701
46 lucia eliana isaac reyes 2 jose luis gutierrez osvaldo froylan 1701
50 lucia eliana isaac reyes 2 beatriz arellano osvaldo froylan 1701
58 lucia eliana isaac reyes 2 ramirez ortega osvaldo froylan 1701
67 lucia eliana isaac reyes 2 de la pena osvaldo froylan 1701
76 lucia eliana isaac reyes 2 efren diaz aparicio osvaldo froylan 1701
85 lucia eliana isaac reyes 2 lucia del carmen favila osvaldo froylan 1701
123 lucia eliana isaac reyes 2 jose luis gutierrez guadarrama 01 1701
130 lucia eliana isaac reyes 2 rodrigo mena guadarrama 01 1701
134 lucia eliana isaac reyes 2 beatriz arellano guadarrama 01 1701
137 lucia eliana isaac reyes 2 de la pena, ramirez ortega guadarrama 01 1701
144 lucia eliana isaac reyes 2 de la pena guadarrama 01 1701
151 lucia eliana isaac reyes 2 efren diaz aparicio guadarrama 01 1701
158 lucia eliana isaac reyes 2 lucia del carmen favila guadarrama 01 1701
209 lucia eliana isaac reyes 2 jose luis gutierrez guadarrama 05 1701
217 lucia eliana isaac reyes 2 rodrigo mena guadarrama 05 1701
222 lucia eliana isaac reyes 2 de la pena, ramirez ortega guadarrama 05 1701
232 lucia eliana isaac reyes 2 de la pena guadarrama 05 1701
242 lucia eliana isaac reyes 2 efren diaz aparicio guadarrama 05 1701
252 lucia eliana isaac reyes 2 lucia del carmen favila guadarrama 05 1701
872 lucia eliana isaac reyes 2 jose luis gutierrez guadarrama 05 1704
885 lucia eliana isaac reyes 2 de la pena guadarrama 05 1704
894 lucia eliana isaac reyes 2 efren diaz aparicio guadarrama 05 1704
903 lucia eliana isaac reyes 2 lucia del carmen favila guadarrama 05 1704

Reproducibility

## Reproducibility info
library('devtools')
session_info()
## Session info --------------------------------------------------------------
##  setting  value                       
##  version  R version 3.3.0 (2016-05-03)
##  system   x86_64, mingw32             
##  ui       RStudio (0.99.902)          
##  language (EN)                        
##  collate  English_United States.1252  
##  tz       America/Mexico_City         
##  date     2016-08-02
## Packages ------------------------------------------------------------------
##  package   * version date       source        
##  devtools  * 1.12.0  2016-06-24 CRAN (R 3.3.1)
##  digest      0.6.9   2016-01-08 CRAN (R 3.3.0)
##  evaluate    0.9     2016-04-29 CRAN (R 3.3.0)
##  formatR     1.4     2016-05-09 CRAN (R 3.3.0)
##  highr       0.6     2016-05-09 CRAN (R 3.3.0)
##  knitr     * 1.13    2016-05-09 CRAN (R 3.3.0)
##  magrittr    1.5     2014-11-22 CRAN (R 3.3.0)
##  memoise     1.0.0   2016-01-29 CRAN (R 3.3.0)
##  rsconnect   0.4.3   2016-05-02 CRAN (R 3.3.0)
##  stringi     1.1.1   2016-05-27 CRAN (R 3.3.0)
##  stringr     1.0.0   2015-04-30 CRAN (R 3.3.0)
##  withr       1.0.2   2016-06-20 CRAN (R 3.3.1)

Want more?

Check other @jhubiostat student blogs at Bmore Biostats as well as topics on #rstats.

Leonardo Collado-Torres
Leonardo Collado-Torres
Investigator @ LIBD, Assistant Professor, Department of Biostatistics @ JHBSPH

#rstats @Bioconductor/🧠 genomics @LieberInstitute/@lcgunam @jhubiostat @jtleek @andrewejaffe alumni/@LIBDrstats @CDSBMexico co-founder

comments powered by Disqus