Results

This document itself is the primary result of the project. It will be shared with members of the General Education Committee, Academic Senate, and the Department of Biological Sciences at Ferris State University. Their comments and suggestions will be included in the Discussion.

Summary statistics

A total of 973 student performances on exam 1 were collected over 13 semesters of instruction. Student scores were converted to rubric scores as described above. The overall average rubric score for all students and semesters was 2.54. The mode and median scores were 3 and 3, respectively. The average was not statistically different from the threshold score for competence (2.6) as evaluated with a one-value, two-tailed t-test (t=-1.71, df=972, p=0.087). The effect size for the difference between the average and the threshold was tiny (d=-0.05). We can infer from this that the overall average rubric score is not practically different than the threshold score.

A histogram of the distribution of individual rubric score frequencies over all thirteen semesters.

Figure 2: A histogram of the distribution of individual rubric score frequencies over all thirteen semesters.

The distribution of all rubric scores is shown in Figure 2. This distribution exhibited a moderate negative skew (skew = -0.62). This result may simply indicate that the teaching, materials, and student learning are all functioning well when the scores are viewed in aggregate. A total of 583 students (59.9%) met or exceeded the competence threshold over the semesters investigated.

The distribution of rubric scores by semester is shown in Figure 3. There are rather obvious differences in both the distribution of rubric scores and class sizes between semesters. A one-way ANOVA was used to compare the rubric scores by semester (Table 2). Unsurprisingly, there were statistically significant differences between semester scores. Semester of instruction, however, explained a relatively small amount of the overall variance (\(\eta^2\) = 0.11).

A barplot showing the distribution of rubric scores broken down by semester.

Figure 3: A barplot showing the distribution of rubric scores broken down by semester.

Table 2: One-way ANOVA analysis of scores by semester
Df Sum Sq Mean Sq F value Pr(>F)
Semester 12 142.7398 11.894980 9.850787 0
Residuals 960 1159.2150 1.207516 NA NA

Meta-analysis

Meta-analysis of the student performance was performed using R (Del Re 2015). This analysis resulted in a weighted average of rubric scores. This value was calculated using formula (1). The value \(X_{i}\) average rubric scores for the semesters, while \(P_{i}\) is the weighting factor (student enrollment).

\[\begin{equation} \bar{X}_w = \frac{\sum X_i P_i}{\sum P_i} \tag{1} \end{equation}\]

The confidence interval for the weighted mean was calculated using the weighted variance. However, the weighted variance is actually not simple to calculate. Several different methods have been compared to bootstrapping (Gatz and Smith 1995). The most accurate method was initially described by Cochran (Cochran 1977) and that one was used in this study. The calculation to obtain the weighted variance is shown in formula (2).

\[\begin{equation} \begin{split} (SEM_w)^2 = \frac{n}{(n-1)(\sum P_i)^2}\big[ \sum(P_i X_i - \bar{P}\bar{X}_w)^2 \\ - 2\bar{X}_w \sum(P_i - \bar{P})(P_i X_ i - \bar{P} \bar{X}_w) + \bar{X}_w^2 \sum(P_i - \bar{P})^2 \big] \end{split} \tag{2} \end{equation}\]
A forest plot of the average scores for each semester with a weighted mean estimate for the entire period investigated. Error bars indicate the 95% confidence intervals.

Figure 4: A forest plot of the average scores for each semester with a weighted mean estimate for the entire period investigated. Error bars indicate the 95% confidence intervals.

A forest plot of the meta-analysis is shown in Figure 4. In this representation, each semester is illustrated as a separate line. The mean and 95% confidence intervals for each semester are plotted in the right panel and their associated meta-data are given in the table to the left. The weighted average of all the data is plotted at the bottom of the figure. The width of the diamond indicates the 95% confidence interval.

The rubric scale can be conceptually divided into five areas as shown in Table 3. Of the 13 semesters, 6 fell in the proficient range, 6 fell in the developing range, and 1 fell in the beginning range. The weighted mean score, 2.54, was not significantly different from the threshold of competence as judged by a weighted, one-factor, two-tailed t-test (t=-0.57, df=12, p=0.58). We can conclude that the weighted average score is practically equivalent to the competency threshold score.

Table 3: Interpretation of average rubric scores
Average Score Interpretation
0.00 to 0.99 Unsatisfactory
1.00 to 1.79 Beginning
1.80 to 2.59 Developing
2.60 to 3.39 Proficient
3.40 to 4.00 Advanced