Final exam grades: Paris-Centrale Spring 2017

In order to help you interpret your performance on the final exam, you’ll find an analysis of the exam scores here.  I’ll give you the R code that I used to do the analysis, so that you can see how my thinking works here even if my prose is not clear.

Brief overview of the data

First, let’s look at the overall scores on the final exam. The format of this exam was 10 basic questions, plus 10 questions that you could think of as more advanced. (We’ll return later to the question of whether or not the basic questions were really more basic than the advanced questions, and vice versa.) By “basic questions,” I mean that if one knows the answers to these, then you will be able to understand most conversations about natural language processing. By “advanced questions,” I mean that if one knows the answers to those, then you will be able to be an active participant in those conversations.

How I’ll test this

Since I didn’t write any of my own functions, I don’t have a way to break this down into unit tests. So, I’ll calculate some of the scores for individual students manually, and make sure that the program calculates the scores for those students correctly. To make sure that unexpected inputs don’t do anything horrible to the calculations, I’ll also do this check for students who didn’t take the test, and therefore have NA values for both sets of questions.

Statistics on the overall scores for the final exam

# the column page.01 is the scores for the first page of the exam
 # the column page.02 is the scores for the second page of the exam
 # ...so, the score on the final exam is the sum of those two columns.
 scores.total <- data$page.01 + data$page.02
 summary(scores.total)

##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max.    NA's 
 ##    3.50   12.00   13.00   13.25   15.00   20.00       2

shapiro.test(scores.total)

## 
 ##  Shapiro-Wilk normality test
 ## 
 ## data:  scores.total
 ## W = 0.97045, p-value = 0.06652

hist(scores.total,
      main = "Histogram of scores on final exam")

Exam overall

What do we learn from this? First of all, the typical grade was around 13, whether you look at the mean or the median. So, most people passed the final exam. We also know that some people got excellent scores–there were a couple 20s. And, we know that some people got terrible scores. The fact that a couple people got 20s is consistent with the idea that the materials in the course covered the materials on the final exam. The fact that some people failed, including some people that got very low scores, suggests that the exam was sufficiently difficult to be appropriate for the student population in this grande école.

Statistics on the first page (basic questions)

summary(data$page.01)

##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max.    NA's 
 ##   3.500   6.625   7.500   7.519   8.875  10.000       2

shapiro.test(data$page.01)

## 
 ##  Shapiro-Wilk normality test
 ## 
 ## data:  data$page.01
 ## W = 0.96359, p-value = 0.02508

hist(data$page.01,
      main = "Histogram of scores on basic questions")

exam basic questions.png

What do we learn from this? Bearing in mind that the highest possible score on the first page was 10 points, the fact that the mean and median scores were about around 7.5 suggests that students typically got (the equivalent of) passing scores on the basic questions–that is to say, students were typically safely above a score of 5. The fact that several students got scores lower than that suggests that even the basic questions were difficult enough for this context, and again, the fact that a large number of students got scores of 9 or above suggests that the course covered the basic aspects of natural language processing thoroughly.

On the other hand, personally, I was somewhat disappointed with the scores on the basic questions. My hope was that all students would walk out of the course with a solid understanding of the basics of the subject. Although the high proportion of 9s and 10s makes me confident that I covered that material thoroughly, it’s difficult to avoid the conclusion that the large number of absences in every one of the 2nd through 6th class sessions (that is, all of the class sessions but the first) affected students’ overall performance pretty heavily. I don’t have numbers on the attendance rates, so I can’t test this hypothesis quantitatively.

Statistics on the second page (advanced questions)

summary(data$page.02)

##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max.    NA's 
 ##   0.000   5.000   6.000   5.731   7.000  10.000       2

shapiro.test(data$page.02)

## 
 ##  Shapiro-Wilk normality test
 ## 
 ## data:  data$page.02
 ## W = 0.95442, p-value = 0.00714

hist(data$page.02,
      main = "Histogram of scores on advanced questions")

exam advanced questions

Were the advanced questions really more advanced?

If the gradient of difficulty that I was looking for was there, then very few people should have done better on the second page (advanced questions) than they did on the first page (basic questions). When we plot the difference between the first page and the second page, most people should be at zero (equally difficult) or higher (second page more difficult). Very few people should have a negative value (which would indicate that they did better on the questions that I expected to be more advanced; certainly this could happen sometimes, but it shouldn’t happen very often).

difference.between.first.and.second <- data$page.01 - data$page.02
 hist(difference.between.first.and.second)

exam difference hist

Do differences between the basic and advanced scores correlate with the final score? Maybe people who did really poorly or really well will show unusual relationships there.

# let's visualize the data before we do anything else
 plot(difference.between.first.and.second, (data$page.01 + data$page.02),
      main = "Relationship between the gap between scores on the basic and advanced questions, versus total score",
      ylab = "Total score on the exam",
      xlab = "Score on basic questions minus score on advanced questions")

exam correlation

Not much point in looking for a correlation here–we can see from the plot that there won’t be much of one.

However, we’ll do a t-test to see if the difference between the mean scores on the two pages is significant…

t.test(data$page.01, data$page.02)

## 
 ##  Welch Two Sample t-test
 ## 
 ## data:  data$page.01 and data$page.02
 ## t = 6.9339, df = 151.12, p-value = 1.115e-10
 ## alternative hypothesis: true difference in means is not equal to 0
 ## 95 percent confidence interval:
 ##  1.278847 2.298077
 ## sample estimates:
 ## mean of x mean of y 
 ##  7.519231  5.730769

…and, yes: it is, and very much so. The first page was easier than the second page, and the second page was harder than the first page.  I hope this is helpful!

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s