Blog‎ > ‎

R Result Analysis - Simple Reporting

posted Jun 18, 2019, 12:14 AM by MUHAMMAD MUN`IM AHMAD ZABIDI   [ updated Jul 20, 2019, 11:09 PM ]
This post is the continuation of R Result Analysis - Preparing the Data.

Commands that you enter is in blue. Some recap:

In the previous post, the total marks for each student were computed. Each student was given a grade, and passing students is copied from the full dataframe my to a subset called x.

> table(my$grade)

 A A- A+  B B- B+  C C- C+ D+  E
73 42 48 42 40 42 22  4 25  5  1


> x = my[my$total >=40,]
> table(x$grade)

 A A- A+  B B- B+  C C- C+ D+
73 42 48 42 40 42 22  4 25  5


Columns clo1, clo2 and clo3 reports the students' achievement on the course learning outcomes normalized to 1. The optional options(digits=2) command limits the significant digits to 2.

> x$clo1 <- (x$tests+x$final)/70
> x$clo2 <- x$mile1/15
> x$clo3 <- x$mile4/15
> options(digits=2)
> head(x,2)
      name  matric tests final mile1 mile4  prog sect total grade clo1 clo2 clo3
1 STUDENT1 MATRIC1   9.1    32    14  15.0 2SKEE    1    70    B+ 0.58 0.95 1.00
2 STUDENT2 MATRIC2  12.3    22    13   9.2 2SKEE    1    57    C+ 0.50 0.87 0.62

CLO results for different sections: use the last 3 columns.

> aggregate(x,by=list(x$sect),mean)
   Group.1 name matric tests final mile1 mile4 prog sect total grade clo1 clo2 clo3
1        1   NA     NA    12    30    14    14   NA    1    70    NA 0.61 0.91 0.93
2        2   NA     NA    12    31    14    14   NA    2    71    NA 0.62 0.94 0.90
3        3   NA     NA    13    30    15    13   NA    3    71    NA 0.62 1.00 0.85
4        4   NA     NA    12    29    15    14   NA    4    70    NA 0.58 0.99 0.94
5        5   NA     NA    13    36    13     9   NA    5    71    NA 0.70 0.86 0.60
6        6   NA     NA    15    40    14    13   NA    6    82    NA 0.79 0.90 0.85
7        7   NA     NA    12    33    12    12   NA    7    68    NA 0.63 0.80 0.78
8        8   NA     NA    11    31    12    12   NA    8    66    NA 0.60 0.82 0.80
9        9   NA     NA    14    35    14    13   NA    9    77    NA 0.70 0.95 0.89
10      10   NA     NA    14    35    14    14   NA   10    77    NA 0.70 0.94 0.96
11      11   NA     NA    16    41    14    13   NA   11    83    NA 0.80 0.91 0.85
12      12   NA     NA    12    32    14    13   NA   12    71    NA 0.63 0.91 0.87
13      13   NA     NA    12    29    14    13   NA   13    68    NA 0.58 0.92 0.89

There were 50 or more warnings (use warnings() to see the first 50)

Ignore the warnings. We got all the numbers we need.
CLO results for all sections:

> mean(x$clo1)
[1] 0.66

> mean(x$clo2)
[1] 0.92
> mean(x$clo3)
[1] 0.87


Find how many students achieved KPI of 0.4:

> sum(x$clo1>.4)
[1] 314
> sum(x$clo2>.4)
[1] 343
> sum(x$clo3>.4)
[1] 329

The academic programs differ by the 5th character in the prog column. We can create the e, l, and m data frames (for SKEE, SKEL and SKEM) as subsets of the x data frame. The following commands extract the analysis for SKEE.

> e <- subset(x,'E'==substr(x$prog,5,5))
> mean(e$clo1)
[1] 0.62
> mean(e$clo2)
[1] 0.94
> mean(e$clo3)
[1] 0.87

The SKEL and SKEM has 3 and 4 subgroups, respectively. The SKEL and SKEM subsets can be extracted using the subset or filter function. Here two more ways to perform the same operation s are shown.

> l <- filter(x, x$prog=="1SKEL" | x$prog == "2SKEL" | x$prog=="4SKEL")
> m = subset(x, substr(x$prog,5,5) == 'M)

The filter function does not seem to work all the time. Both filter and subset have subtle differences. Subset is part of the base R, filter is part the dplyr package. After subsetting, find the means for SKEL and SKEM (shown below):

> mean(m$clo2)
[1] 0.92

etc...

We can also perform subgroup analysis without first extracting a subset. We can just use the x dataframe directly. To find the mean for the variable 'clo1' for the 'SKEE' cohort, look for the character 'E' in the 5th position.

> mean(x$clo1) # mean CLO1 for all students
[1] 0.66
> mean(x$clo1['E'==substr(x$prog,5,5)]) # mean CLO1 for SKEE only
[1] 0.62
> mean(x$clo1['L'==substr(x$prog,5,5)])
[1] 0.69
> mean(x$clo1['M'==substr(x$prog,5,5)])
[1] 0.68


etc.. Repeat for CLO2 and CLO3. To get the mean for only 2SKEE students, the command is:

> mean(x$clo1[x$prog=='2SKEE'])
[1] 0.62

All the numbers required for reporting are now ready for presentation using PowerPoint. If you want R to create plots, use the ggplot2 package. That will be my homework for the future.

Comments