TN Letter Grades 2024-25: A Machine Learning Approach

This blog continues the series on TN School Letter Grade data. You can find the main analysis post here. The same dataset applies.

Having looked at the preliminary results of the 2024-25 statewide school letter grade data, I wanted to use machine learning to explore a specific question: how well can we predict a school's letter grade using only demographic characteristics of the student population?

What is Machine Learning?

Machine learning is an AI approach to data that allows the computer to learn insights about the data to become more accurate at predicting outcomes without being programmed to do so. It spots patterns in data, and the more data it's exposed to, the better it does.

The Question

The letter grade formula uses achievement scores (50%), growth scores (40%), and for high schools, college/career readiness rates (10%). These are the inputs that directly calculate the grade. But what if we ignored those inputs entirely and asked: can we predict a school's letter grade knowing only who attends the school?

To answer this, I built models using only demographic features:

  • Economically disadvantaged percentage

  • Limited English proficient percentage

  • Black, Hispanic, Native American percentage

  • African American percentage

  • Asian percentage

  • Hispanic percentage

  • White percentage

  • Students with disabilities percentage

  • Homeless percentage

  • Foster percentage

  • Migrant percentage

  • Military percentage

These 13 features describe the student population but are not part of the letter grade calculation.

Unsupervised Learning: K-Means Clustering

Before building predictive models, I used K-means clustering to identify natural groupings of schools based on three key demographic variables: economically disadvantaged percentage, BHN percentage, and students with disabilities percentage.

Elbow Method

Using the elbow method, I determined that 5 clusters provided a good balance between simplicity and explanatory power.

Cluster Profiles

Cluster ED % BHN % SWD % Schools Description
0 23.4% 17.4% 17.6% 535 Low-poverty, moderate SWD
1 36.4% 14.4% 25.7% 249 Moderate poverty, high SWD
2 40.0% 74.1% 16.1% 325 Moderate poverty, high diversity
3 64.6% 3.9% 12.8% 224 High poverty, rural
4 13.6% 24.4% 10.5% 364 Low poverty, low SWD

Cluster Profiles

Letter Grades by Cluster

Here's the key finding: letter grades are not randomly distributed across these demographic clusters.

Cluster A B C D F
0 (Low poverty, mod SWD) 25.2% 31.4% 29.3% 13.3% 0.7%
1 (Mod poverty, high SWD) 12.0% 33.3% 32.9% 20.9% 0.8%
2 (Mod poverty, high diversity) 8.0% 22.5% 32.6% 28.9% 8.0%
3 (High poverty, rural) 5.8% 21.0% 32.6% 26.3% 14.3%
4 (Low poverty, low SWD) 41.5% 30.8% 20.1% 7.1% 0.5%

Cluster 4 (low poverty, low SWD) has 41.5% A grades and almost no F grades. Cluster 3 (high poverty, rural) has only 5.8% A grades and 14.3% F grades. The demographics of who attends a school are strongly associated with what grade that school receives.

Supervised Learning: Predicting Letter Grades

Logistic Regression

I used logistic regression to understand which demographic features most strongly predict each letter grade.

For a letter grade of A, the coefficients were:

Feature Coefficient
asian_pct+0.39
hispanic_pct+0.25
white_pct+0.22
military_pct+0.11
......
students_with_disabilities_pct-0.16
homeless_pct-0.45
limited_english_proficient_pct-0.46
economically_disadvantaged_pct-0.85

The strongest predictor of NOT receiving an A is the percentage of economically disadvantaged students (-0.85).

For a letter grade of F, the pattern flips:

Feature Coefficient
economically_disadvantaged_pct+1.30
limited_english_proficient_pct+0.73
black_hispanic_native_american_pct+0.25
homeless_pct+0.20
......
military_pct-0.27
hispanic_pct-0.57

The strongest predictor of receiving an F is the percentage of economically disadvantaged students (+1.30).

Comparing Models

I tested three algorithms to find the best overall predictor:

Model Accuracy Precision Recall F1 Score
Decision Tree 28.0% 28.0% 28.0% 28.0%
Random Forest 32.9% 32.8% 32.9% 32.7%
Gradient Boosting 32.5% 33.4% 32.5% 32.6%

The accuracy is around 33%, which is only slightly better than random guessing (20% for 5 letter grade categories). This tells us something important: demographics alone do not determine a school's letter grade. There is substantial variation in outcomes among schools with similar demographics.

Gradient Boosting Feature Importance

The Gradient Boosting algorithm identified the relative importance of each demographic feature:

Rank Feature Importance
1economically_disadvantaged_pct30.3%
2white_pct13.6%
3students_with_disabilities_pct11.9%
4african_american_pct10.4%
5limited_english_proficient_pct9.1%
6hispanic_pct7.4%
7black_hispanic_native_american_pct7.1%
asian_pct4.0%
9homeless_pct3.3%
10military_pct2.7%

Feature Importance

Economically disadvantaged percentage is by far the most important demographic predictor, accounting for 30% of the model's predictive power.

Confusion Matrix

Confusion Matrix

The per-class accuracy shows the model's limitations:

  • Grade A: 42.6% correct

  • Grade B: 33.3% correct

  • Grade C: 36.0% correct

  • Grade D: 18.6% correct

  • Grade F: 17.6% correct

The model does best at predicting A grades (schools with favorable demographics that earn A's) but struggles with D's and F's.

Neural Network Regression

I also trained a neural network to predict the continuous letter grade score (1-5) rather than the categorical grade. Using demographics only:

  • R-squared: 0.23

MLP Regressor

Demographics alone explain about 23% of the variance in letter grade scores. This leaves 77% unexplained, meaning the majority of what determines a school's grade is not captured by who attends the school.

What Does This Mean?

The 33% accuracy and 0.23 R-squared are actually encouraging findings. They mean:

  1. Demographics are associated with outcomes but do not determine them. Schools serving similar populations can and do achieve very different results.

  2. The letter grade system is not simply a proxy for poverty or race. While there are correlations, the majority of variance comes from other factors.

  3. Schools serving high-need populations can beat the odds. As shown in the main analysis post, numerous high-poverty and high-diversity schools earn A grades.

That said, the patterns are undeniable. High-poverty clusters have dramatically fewer A's and dramatically more F's than low-poverty clusters. The percentage of economically disadvantaged students is the single strongest demographic predictor of letter grades. Schools serving high-need populations face steeper challenges.

Conclusions

Using only demographic characteristics of the student population, machine learning models can predict letter grades with about 33% accuracy. This is better than random but far from deterministic. Demographics explain roughly 23% of the variance in letter grade scores.

The strongest demographic predictor is the percentage of economically disadvantaged students. Schools with higher poverty rates are more likely to receive lower grades, but the relationship is not destiny. Many schools beat the demographic odds.

This analysis reinforces a key finding from the main letter grade post: while demographics correlate with outcomes, they do not dictate them. The schools that achieve excellence while serving high-need populations deserve recognition and study.

Disclaimer

Keep in mind that these scores are derived from a single test for each subject. This is not the most accurate measure of a student's knowledge, even if it is what gets used for accountability purposes.

Finally, this or any other analysis is not a substitute for doing the right things for students. Building relationships with students, teaching the things that matter in every subject, and helping students develop a love and desire for learning are always going to produce the best results for students no matter what scoring apparatus is used.

This analysis used Python with pandas, scikit-learn, matplotlib, and seaborn for data processing, machine learning, and visualization. I used Claude to do the coding and file management and to proofread my write-up of it.