TN Letter Grades: An Unsupervised Learning Clustering Approach

One of the more interesting ways to look at data is to use machine learning to sort your data into clusters. The particular tool I used for this is called K-Means Clustering, a popular algorithm in the field of data science for its simplicity and efficiency. But what exactly is K-Means Clustering? At its core, K-Means is a method that aims to partition a dataset into distinct groups (clusters) such that the data points in each group are as similar to each other as possible, while also being as different as possible from the points in other groups.

Why is this approach useful? We often deal with large amounts of data that can seem impenetrable at first glance. By organizing this data into clusters, we can identify patterns and characteristics that are not obvious to us at first. For instance, when we analyze schools across various districts, K-Means Clustering can reveal groupings of schools with similar challenges or successes, helping us to tailor support and resources more effectively.

For this analysis, I wanted to look closer at Overall Success Rate, Economically Disadvantaged Percentage, and Black/Hispanic/Native American percentage using the same dataset I used for my initial Letter Grades report. I reduced the dataset to those three features, and I had to convert the values in the Success Rate column from <5% to 2.5 and >95% to 97.5 and the convert those values to floats.

Here are the basic descriptive statistics.

Descriptive Statistics

Here is how the data is distributed.

Histograms of each feature

The Success Rate histogram shows a unimodal distribution centered around 30-40%. The distribution is slightly skewed to the right, indicating that while most schools have a success rate in the middle range, there are fewer schools with very high success rates.

The distribution of the percentage of economically disadvantaged students is also unimodal and seems to be slightly skewed to the right. Most schools have between 20% to 40% economically disadvantaged students, with fewer schools having very high or very low percentages.

The BHN histogram is different from the other two, showing a bimodal distribution. One peak is around the 0-10% range, and another, more pronounced peak, is at the 90-100% range. This suggests that schools tend to have either a very low or very high percentage of Black, Hispanic, or Native American students, with fewer schools having a moderate percentage. This histogram supports Kozol’s research that American schools are still segregated.

Next, I wanted to see how each of these values correlated. I did scatterplots and ran a Pearson’s r to see the relationships between the data.

Scatterplots and Pearson’s r correlation coefficients

No surprise, but the data shows the following:

Success Rate vs. Economically Disadvantaged: The correlation coefficient is -0.72, indicating a strong negative correlation. This means that as the percentage of economically disadvantaged students increases, the overall success rate tends to decrease.

Success Rate vs. BHN: The correlation coefficient is -0.56, suggesting a moderate negative correlation. So, higher percentages of BHN students are associated with lower overall success rates.

Economically Disadvantaged vs. BHN: The correlation coefficient is 0.61, showing a strong positive correlation. This implies that higher percentages of economically disadvantaged students are often found in schools with higher percentages of BHN students.

The Clustering Model

Before running the model, I scaled the data using the standard scaler. This is crucial for K-Means Clustering. Here is an article about that if you want to read it. And of course, I ran an elbow plot to find the optimal number of clusters.

The elbow plot

The elbow plot settled on five clusters. After fitting the model and running it for 5 clusters, I generated a 3D Scatterplot of the 5 clusters just to have a visual of the differences. The red star represents the centroid of the cluster.

3D Scatterplot

The clusters that it generated can be described as follows:

  • Cluster 0 (286 schools) has a relatively low overall success rate of about 25%, a moderate percentage of economically disadvantaged students (around 38%), and a very high percentage of Black, Hispanic, or Native American students (approximately 70%).

  • Cluster 1 (187 schools) is characterized by a high overall success rate of around 71%, a low percentage of economically disadvantaged students (about 8%), and a lower percentage of Black, Hispanic, or Native American students (roughly 18%).

  • Cluster 2 (490 schools) features a low-to-moderate overall success rate of about 33%, a moderate percentage of economically disadvantaged students (also around 38%), but a lower percentage of Black, Hispanic, or Native American students (about 12.5%).

  • Cluster 3 (231 schools) has the lowest overall success rate of approximately 15%, the highest percentage of economically disadvantaged students (around 66%), and a very high percentage of Black, Hispanic, or Native American students (nearly 95%).

  • Cluster 4 (476 schools) shows a moderate overall success rate of around 45%, with a lower percentage of economically disadvantaged students (about 21%) and a percentage of Black, Hispanic, or Native American students similar to the previous value (around 21%).

Here is a bar chart showing the Cluster Profiles.

A bar chart of each cluster

And to illustrate how many schools are represented in each cluster, here is a humble pie-chart.

Conclusions

Diversity in School Profiles: The clusters represent a wide range of school profiles, from those with high success rates and low percentages of economically disadvantaged and minority students (Cluster 1) to those facing significant challenges with high percentages of disadvantaged and minority students and low success rates (Cluster 3).

Economic Disadvantage and Success Rates: There appears to be a correlation between economic disadvantage and overall success rates, as seen in the negative correlation coefficients and the cluster characteristics. Schools with a higher percentage of economically disadvantaged students tend to have lower overall success rates (Cluster 0 and Cluster 3).

Racial and Economic Segregation: The bimodal distribution of the percentage of Black, Hispanic, and Native American students indicates potential racial and economic segregation within the school system. Some schools have very high percentages of minority students, while others have very low percentages, with fewer schools in between.

Most Schools Do Not Have a High Success Rates: Cluster 4 and Cluster 1 schools have high success rates. Typically, 45% is the bar schools want to reach because that represents maximum points in the federal accountability model for success rate. These only represent 39.6% of all schools. This means that 60.4% of all schools are falling below that mark.

Cluster 4 stands out

Cluster 4 stands out as a cluster with some diversity and high success rate. The means for Economically Disadvantaged (20.99) and BHN (21.24) are still much lower than the overall means for those categories.

What do you see in this data?

What do you tell teachers about AI?

I have spent the past year and change exploring the possibilities, limitations, and risks of Large-Language-Model AI, especially ChatGPT. In my role at work, I haven’t done a lot on it because we don’t have an adopted Board Policy on it, and TSBA hasn’t written a model policy for boards. This leaves us in a weird space where we know that this is out there and people are using it, but we don’t have any guidance or governance for it. I don’t think ignoring it for now is the answer, and I wanted to share what I have communicated so far so that it might be helpful to other districts.


Handbook Policy

You don’t need a board policy to have a handbook policy, so we put this in our high school handbooks at the beginning of the year:


AI Handbook Entry for Academic Integrity and Honesty in the Use of Large Language Models

·      Purpose

This handbook entry aims to ensure the upholding of academic integrity and honesty in the context of the use of large language models (LLMs) such as ChatGPT in our school environment.

·      Scope

This handbook entry covers all students, staff, and any other individuals who interact with our school's academic programs and services, and who use LLMs for academic purposes.

·     Handbook entry Guidelines

    • Proper Citation: Students must properly acknowledge and cite the use of LLMs in their work. Any idea, phrase, or output generated by AI must be cited just as any other source would be.

    • Original Work: While LLMs can be used for assistance and guidance, the work submitted by students must fundamentally be their own. The use of AI should be to facilitate and enhance the learning process, not to replace individual effort and creativity.

    • Collaboration: While working collaboratively, students must clearly state the contributions made by AI. Collective work should reflect a clear understanding of the contributions made by each student and the AI model used.

    • Access: All students should have equitable access to AI tools to ensure fairness. The school will strive to provide the necessary resources and training for all students.

    • Educator Guidelines: Teachers should educate students about the ethical use of AI and its potential impacts on academic integrity. They should also receive regular training to stay updated on the capabilities and limitations of AI.

·      Implementation and Compliance

This handbook entry should be communicated effectively to all relevant parties. The school will conduct regular checks to ensure compliance. Any violation of this handbook entry will be considered a breach of academic integrity, and the school's standard disciplinary measures will be applied.

Simply enough, we have let students know ahead of time that students can’t use LLMs to produce final products. We’ve also let teachers know that they need to be teaching students how to use LLMs to their advantage.


Where are we now?

We haven’t, to my knowledge, had any issues with students getting caught cheating with LLMs, but that doesn’t mean it hasn’t happened. In fact, the whole inspiration for me writing this is that a student told me that she wouldn’t use AI to help her study French because another student had submitted an essay in her English class using AI and she got the same grade as him and it made her angry.

Because of that conversation, I put together a document for teachers, and I thought I’d share that content here.


So why not just avoid AI for as long as we can?  

·      You can tell when AI has written something, and I’m surprised when anyone can’t. Have you used AI enough to pick up its tone and patterns? It uses too many adverbs. In emails, it always says some affectation like “I hope this email finds you well.”

·      AI isn’t going anywhere. As a matter of fact, it’s the worst quality and the least integrated today than it will ever be in our students’ lives. We have to learn to live with it, and students are going to need to know how to interact with AI now. It can really give them a huge advantage in life if used ethically and responsibly.

·      Withholding the power of any technology from our students only withholds it from certain students. Typically, only the students who are disadvantaged will not learn to use technology when it is withheld from them in school.

·      We can’t have students using this technology to cheat, and avoiding teaching them how to use it responsibly will not prevent them from cheating. In fact, letting students know that we are very knowledgeable about it will make them think twice about using it to cheat.

So how should students be learning to use AI?

·      Helping them get organized.

·   Asking it simple questions and interacting with it. For example, this student is having trouble with conversational French. It can have a conversation with her, and she can practice her French with it. You can’t get that anywhere else without a pen pal or French friend.

·      Asking it to make a study guide.

·      Asking it to quiz you on something.

·      Asking it to help you with the pre-writing phase of writing.

·      Asking it to proofread your paper (that you wrote) and give you feedback on it. You could even ask it to evaluate the paper with a rubric that the teacher gave them.

·      Asking it to explain difficult concepts in simple ways.

·      And many other ways…

Here are some samples:

Example: Helping them get organized.

Sample Prompt: We’re learning about cellular energy in my high school biology class in Tennessee. Can you help me get organized with an outline? I will keep you posted on what we’re studying in class so you can help me make a study guide.

Example: Asking it to quiz them on something.

Sample Prompt: We’re studying slope in Algebra I in Tennessee, can you give me some quiz questions and tell me how I did?

Example: Asking it to help with the pre-writing phase of writing.

Sample Prompt: I am writing a research paper on Romeo and Juliet and comparing it to other famous family feuds in more recent history. We’re going to the library to do research next week, and I need to get organized. Can you give me a checklist of what I should be searching? Do you know of any feuds I can research?

Example: Asking it to help with brainstorming

Sample Prompt: In US History, our teacher has asked us to explore the causes of war leading up to World War I. We are supposed to represent a country and their point of view. Help us brainstorm some ideas for this. We can’t choose Germany, Britain, France, or the US. We don’t know these other countries as well. What information do you need to help us with this?

Example: Ask it to proofread your paper and give you feedback.

Sample Prompt: I’m writing a paper for my World History class on the Ming Dynasty, but I need someone to proofread it for me. Can you proofread this and give me a list of suggestions for improving it. Please do not rewrite the paper for me; I do not want to get accused of cheating.

How do I stop cheating?

·      Consider whether your assignments are easy for students to cheat on using AI.

·      Get experienced enough with AI that you can spot how it writes.

·      Take a writing sample at the beginning of the year for a comparison.  

·      Let students know that you won’t tolerate them using AI for final products, but you’d love for them to use it for brainstorming, outlining, and pre-writing.

I’d love to have a deeper conversation about this, but I want to be clear that we must tackle this issue head-on, and at some point, we’re all going to have to accept that AI is a technology tool that our students need to know how to use. Just like we teach students to use TI-85 calculators, nail guns, MIG and TIG welders, and 3D printers, we have to expose students to all technologies that will help them be successful in life.

There are many AI tools other than ChatGPT that are meant specifically for the classroom. I’m trying to keep a list of them: https://www.jasonhorne.org/ai-tools

TISA Dashboard

I decided to build a TISA Dashboard to keep track of funding coming in the next year. I wanted to also keep track of numbers that inform the dashboard in order to spot potential data-entry errors.

Click here to access the dashboard.

TN Letter Grades: A Machine Learning Approach

This blog continues the series on TN School Letter Grade data. You can find the first post here. The same dataset applies.

Having looked at some of the preliminary results of statewide school letter-grade data, I wanted to do a machine-learning approach to see the importance of the different features in the data. Typically, this type of approach is used for predictive modeling of a given outcome, but it also gives a lot of insights into the data.

What is Machine Learning?

Machine Learning is an AI approach to data that allows the computer to learn insights about the data to become more accurate at predicting outcomes without being programmed to do so. It spots patterns in data, and the more data it’s exposed to, the better it does. This is why I chose this approach: this type of analysis works best on a large set of data, but when applied to an individual school or school system, it will help with goal setting and interventions. If certain challenges are identified as barriers to higher performance, you can work on targeted interventions to address these issues.

The beauty of machine learning in this context is its ability to handle complex, multifaceted data and reveal insights that might not be immediately apparent through traditional analysis. This can lead to more informed decision-making and, ultimately, better educational outcomes for the students in your school system.

Exploring Machine-Learning Methods

In Machine Learning, the process typically involves considering various algorithms and conducting testing to identify the best approach for the data. In this analysis, I began with a logistic regression model, which initially showed promising results with ROC AUC scores for individual letter grades: A = 0.97, B = 0.82, C = 0.84, D = 0.95, F = 0.99 (see the graph below). However, it's important to note that logistic regression is primarily a binary classifier, and it provided different feature scores for each letter grade individually. While these insights were valuable, I sought a more comprehensive model capable of collectively predicting all letter grades, rather than focusing on each one individually.

ROC AUC Scores for Logistic Regression

Even though I ultimately settled on a different algorithm to look comprehensively at the data, given that the letter grades A and F had such high ROC AUC scores, I thought it would be interesting to look at the coefficients scores for each of those letter grades.

Keep in mind that a positive coefficient for a feature means that as the value of the feature increases, the likelihood or the probability of the predicted outcome also increases. The opposite is true of negative coefficients. As the value of the feature decreases, the likelihood or probability of the predicted outcome decreases. Also, the magnitude of the coefficients matter. Larger coefficients, whether positive or negative, imply a stronger influence of the corresponding feature on the outcome.

Class 0 (Letter grade of A):
overall_success_rate_all_students: 4.337198187954319
growth_numeracy_score: 2.2796755844815197
growth_literacy_score: 1.685213811073859
growth_social_studies_score: 1.6173875332744483
growth_science_score: 1.5986916515324763
economically_disadvantaged_pct: -1.3345206477116285
limited_english_proficient_pct: -1.1511514311464113
overall_success_rate_ed: 1.0341733483583349
growth_ela_math_score_bhn: 0.8022573064964893
growth_ela_math_score_ed: 0.7036926608187116
growth_ela_math_score_swd: 0.42283743131419366
black_hispanic_native_american_pct: -0.40066877214648894
homeless_pct: -0.4000903731301108
overall_success_rate_el: 0.34087624502095853
military_pct: -0.33136098524287755
overall_success_rate_swd: -0.2607763057899938
african_american_pct: -0.20352688080879153
asian_pct: -0.15963467577880144
multirace_pct: 0.09164345297812754
native_american_pct: 0.08977420050427559
growth_ela_math_score_el: -0.07558107837346213
white_pct: -0.06735814037053171
students_with_disabilities_pct: -0.022687826552016045
male_pct: 0.00627481315433192
migrant_pct: -0.0007504063512127447
Class 4 (Letter grade of F):
overall_success_rate_all_students: -4.321832443606035
overall_success_rate_ed: -2.1215531229812146
growth_science_score: -1.7027841781095818
growth_literacy_score: -1.6491294242431007
growth_numeracy_score: -1.6129511803548933
growth_ela_math_score_bhn: -1.22590960402722
growth_social_studies_score: -1.178614571621578
economically_disadvantaged_pct: 1.1473369934362392
growth_ela_math_score_ed: -0.7225782040444474
limited_english_proficient_pct: 0.4933486617717649
students_with_disabilities_pct: -0.46909851837241345
asian_pct: 0.34961287513000733
homeless_pct: 0.3360394684863425
growth_ela_math_score_swd: -0.3165323640351417
overall_success_rate_swd: 0.3134225518072883
black_hispanic_native_american_pct: 0.28520884553047776
male_pct: -0.13205589211946425
white_pct: -0.12722495610537785
military_pct: -0.09357110355603143
multirace_pct: 0.06537517896094194
growth_ela_math_score_el: 0.060166568067282857
overall_success_rate_el: 0.04748262863384597
african_american_pct: 0.004185611416037893
native_american_pct: 0.0030283931838529874
migrant_pct: -0.0029266625919092625

As you can see, the best predictors for a letter grade of A are overall_success_rate_all_students (4.3372), growth_numeracy_score, (2.2797), growth_literacy_score (1.6852), growth_social_studies_score (1.6174), and growth_science_score (1.5987). Of course, success rate for all students is 50% of the letter grade score and the overall growth score is 40% of the letter grade score. It isn’t surprising to see these in the top 5.

Looking at negative scores can be telling as well. The negative scores with the greatest magnitude for the letter grade of A were economically_disadvantaged_pct (-1.3345) and limited_english_proficient_pct: (-1.1511). This insinuates that having a lower percentage of economically disadvantaged students and students with limited english proficiency were important to scoring a letter grade of A.

For a letter grade of F, we examined the coefficients to identify the most influential factors. The results shed light on the key determinants of a low letter grade. Just as with the letter grade A analysis, we discovered both positive and negative contributors.

The most prominent positive predictor for a letter grade of F was overall_success_rate_all_students, with a coefficient of -4.3218. This indicates that a low overall success rate for all students strongly correlates with a letter grade of F. Additionally, several growth scores had negative coefficients, including growth_science_score (-1.7028), growth_literacy_score (-1.6491), and growth_numeracy_score (-1.6130). These findings imply that poor performance in these growth areas negatively affects the letter grade.

On the other hand, certain factors with positive coefficients slightly mitigated the impact of negative predictors. For instance, economically_disadvantaged_pct had a positive coefficient of 1.1473, suggesting that a lower percentage of economically disadvantaged students was associated with a slightly better letter grade. Limited_english_proficient_pct had a positive coefficient of 0.4933, indicating that a lower percentage of students with limited English proficiency had a positive influence on the letter grade.

I was also curious about false negatives and false positives with logistic regression. The confusion matrix for the logistic regression showed that the model most accurately predicted schools that scored a D at a very high rate (97.5%) compared to the other letter grades. Here's the percentage of correct predictions for each class:

  • Class A: Approximately 81% correct

  • Class B: Roughly 66% correct

  • Class C: About 80% correct

  • Class D: Approximately 97.5% correct, which indicates a high accuracy for this class

  • Class F: Around 61.5% correct

While these are promising results for predicting an A, C, and a D, it does not predict the others with accuracy, which throws off the entire model as a reliable method for looking at this data. This is why other machine learning approaches needed to be explored.

Logistic Regression Confusion Matrix

Finding a better algorithm

To find an algorithm that would predict the letter grade rather than look at each of them individually, I decided to look at the following algorithms: Decision Tree, Random Forest, Gradient Boosting, and Support Vector Machine. I used the same 70/30 training/testing set that was used for Logistic Regression, and I had it compute the Accuracy, ROC AUC, Precision, Recall, and F1 Score.

Here are the results:

Decision Tree showed an accuracy of 66.5%. The model had a Precision of 66.6%, closely matching its accuracy. Recall and F1 Score were both approximately 66.5%, indicating a balanced performance in terms of precision and recall. The ROC AUC scores were strong across the classes, with the highest being for class 4 (98.9%).

Random Forest performed better in terms of accuracy with 76.6%. Precision was notably higher at 77.3%, with a Recall of 76.6% and an F1 Score close behind at 76.3%. The ROC AUC values mirrored those of the Decision Tree, which suggests consistent performance across different thresholds.

Gradient Boosting edged out with an accuracy of 77.4%, the highest among the tested models. It also had the highest Precision at 77.5% and F1 Score at 77.3%. Recall was in line with accuracy at 77.4%. ROC AUC scores were consistent with the other models.

Support Vector Machine had an accuracy of 74.4%. It achieved a Precision of 75.1% and an F1 Score of 74.1%, with a Recall of 74.4%. ROC AUC scores for this model were also similar to the others.

Overall, Gradient Boosting stood out as the most accurate model for this task. Despite the similarity in ROC AUC scores across the models, I considered the balance between all metrics. Gradient Boosting showed the best balance, with the highest scores in Precision, Recall, and F1 Score, indicating its strength in both classifying correctly and maintaining a balance between false positives and false negatives. This balance is crucial for models where both types of errors carry significant weight, such as predicting school grades.

  • Accuracy: Approximately 77.4%, which signifies the proportion of total correct predictions made out of all predictions.

  • ROC AUC: About 93.9%, reflecting the model's ability to distinguish between the classes across different thresholds.

  • Precision: Roughly 77.5%, indicating the model's accuracy when predicting a positive class.

  • Recall: Also about 77.4%, showing the model's capability to identify all actual positives.

  • F1 Score: Approximately 77.3%, which is a harmonic mean of Precision and Recall, providing a single score to measure the model's accuracy.

Gradient Boosting Results

Using the Gradient Boosting algorithm, I did a chart of the top 10 features.

Gradient Boosting top 10 features

Top 10 Features

  1. Overall Success Rate for All Students (Feature importance: 42.91%): This is the most significant predictor, indicating that the overall success rate is strongly associated with the school's letter grade.

  2. Growth in Numeracy Score (14.49%): The second most important feature, which suggests that improvements in numeracy significantly influence the grade.

  3. Growth in ELA and Math Score for ED (6.98%): The progress in English Language Arts and Mathematics for economically disadvantaged students is also a key indicator.

  4. Growth in Science Score (6.40%): Science score growth is another substantial factor.

  5. Growth in Literacy Score (5.68%): Literacy improvements are crucial, although less so than numeracy.

  6. Growth in ELA and Math Score for BHN (5.43%): The growth in ELA and Math for Black, Hispanic, and Native American students is also a notable predictor.

  7. Growth in Social Studies Score (4.39%): This shows a moderate influence on the grade.

  8. Percentage of Economically Disadvantaged Students (2.76%): While this has a smaller weight, it's still a relevant feature.

  9. Percentage of Students with Disabilities (1.93%): This has a lesser impact but is part of the top 10 features.

  10. Overall Success Rate for ED (1.52%): The overall success rate for economically disadvantaged students rounds out the top 10 features.

Conclusions

It turned out to be a happy accident that I ran the logistic regression first because it allows me to look at the coefficients for features for individual grades before I looked at them as coefficients to predict any grade. Doing this really shows that the beyond common sense that overall success rate is going to determine the letter grade, how ED students score on their ELA and math growth score and how BHN students score on their ELA and math growth scores influences the scores. Additionally, how students scored in science and social studies, two subjects no longer included in federal accountability, were also important to the overall letter-grade score. It will be interesting to look at how these coefficients differ when looking at federal accountability that takes improvement and subgroup performance into account when assignment a school score.

Disclaimer

Keep in mind that these scores are literally from one test for each subject and both achievement and growth scores were derived from the one test. This is not the most accurate measure of a student’s knowledge even if that is what is used for accountability for political expediency and not for actually data reliability.

Finally, this or any other analysis isn’t a substitution for doing the right things for students. Building relationships with students, teaching the things that matter in every subject, and helping students develop a love and desire for learning are always going to produce the best results for students no matter what scoring apparatus is used. As The Fox said in the Little Prince: “On ne voit bien qu’avec le coeur; l’essentiel est invisible pour les yeux.”

I had ChatGPT check this for errors and make some writing suggestions.

A preliminary look at TN School Letter Grade Data

Summary

I analyzed the statewide school letter grade data and found that poverty and race significantly impact the letter grades schools receive. The disregard for improvement and the lack of weight given to subgroup performance are evident in the letter grades.

Purpose of this study

Following the release of school letter grades, which I previously discussed here, I delved into the data. This blog is the first in a series where I will cover basics such as distributions, means, and some demographic data. Due to the urgency of disseminating this information, I will rely on basic Python graphs rather than more sophisticated visualizations.

Data Sources

The data files used are available here. I used demographic data from the 2021-2022 academic year as the 2022-2023 data is not yet accessible. The slight variation should not significantly affect the outcomes, but I intend to reanalyze with the new data upon release.

In the state demographic data, categories with less than 5% or more than 95% of students do not disclose actual numbers. For consistency, I substituted these with 2.5% and 97.5%, respectively, even though it may minimally affect correlation calculations.

Out of 1900 schools listed for letter grades, 210 (11.05%) were ineligible. These schools were excluded from this analysis.

Distribution

The letter grades were distributed as follows:

A: 294 (17%), B: 441 (26%), C: 513 (30%), D: 350 (21%), F: 92 (5%)

This distribution resembles a normal curve, skewed slightly with more A's than expected. While a normal curve would have 68% of data in the middle, here we see 77% of schools as B, C, or D grades.

distribution of grades bar chart

The distribution of letter grades across all schools in TN.

What Influences a Grade

I examined the means for each category (Achievement, Growth, and Growth25) for each letter grade to provide insights similar to our understanding of federal accountability.

Grade A:

  • The overall success rate for all students is at an average of 57.34%.

  • The average growth numeracy score stands at 4.45.

  • The average growth literacy score is 4.11.

  • The average letter grade score  is 4.85.

Grade B:

  • The overall success rate for all students averages 43.87%.

  • The average growth numeracy score is 3.72.

  • The average growth literacy score comes in at 3.36.

  • The average letter grade score is 3.90.

Grade C:

  • The overall success rate for all students is around 32.72%.

  • The growth numeracy score averages at 2.94.

  • The growth literacy score averages at 2.92.

  • The average letter grade score is 2.92.

Grade D:

  • The overall success rate for all students averages 22.76%.

  • The average growth numeracy score is 1.88.

  • The average growth literacy score is 2.31.

  • The average letter grade score is 1.95.

Grade F:

  • The overall success rate for all students is the lowest, at an average of 11.67%.

  • The average growth numeracy score is 1.55.

  • The average growth literacy score is 1.91.

  • The average letter grade score is 1.23.

For schools receiving an 'A' on the state report card, the average success rate is 57.34%, indicating that over half of their students achieved proficiency on TN Ready tests. This surpasses the full points threshold for achievement, which is set at 45% for elementary, 42.5% for middle, and 40% for high schools. In stark contrast, schools with an 'F' have only 11.67% of students reaching proficiency. These discrepancies suggest that factors beyond instruction quality, teacher performance, or student effort are at play.

To understand the breadth of these disparities, I analyzed how schools within each Letter Grade Group performed across different categories, all measured on a 5-point scale.

Average Achievement Scores by Letter Grade:

  • A: 4.85

  • B: 3.95

  • C: 2.96

  • D: 2.00

  • F: 1.02

Average Growth Scores by Letter Grade:

  • A: 4.96

  • B: 3.92

  • C: 2.80

  • D: 1.62

  • F: 1.00

Average Growth25 Scores by Letter Grade:

  • A: 4.45

  • B: 3.58

  • C: 3.10

  • D: 2.64

  • F: 3.21

Average Scores by Letter Grade by Category

It is immediately apparent that schools with an 'F' were disproportionately affected by the inability to demonstrate improvement. The Growth25 metric assesses the advancement of the lowest-performing 25% of test-takers, which often reflects the effectiveness of interventions, tutoring, and other targeted efforts aimed at fostering student progress. Although it's encouraging to see this segment of students making gains, such improvements were not factored into the overall grade assessment by the state.

Subgroups

What impact did subgroups have on this data?

Percent of economically disadvantaged students by letter grade.

Economically disadvantaged students' percentage and letter grades are closely related. For instance, "A schools" average 18.34% of economically disadvantaged students, compared to 54.52% in "F schools."

The relationship between these demographics and school grades is pronounced and merits further investigation.

Scatterplot of Economically Disadvantage Numbers and Letter Grade Score

The correlation between the percentage of economically disadvantaged students and letter grade scores (r = -0.50, p < 0.05) indicates a moderate negative relationship, suggesting that schools with a higher percentage of economically disadvantaged students tend to have lower letter grades. This is statistically significant, with a p-value near zero.

Other subgroup data needs to be explored as well. This is what the same reports look like for the BHN (Black, Hispanic, Native American) subgroup.

BHN by letter grade group.

This is even more extreme than the Economically Disadvantaged data in terms of the difference in means. Let’s look at the correlation data.

Scatterplot of BHN versus Letter Grade Score

For the BHN (Black, Hispanic, Native American) subgroup, the correlation (r = -0.37, p < 0.05) also indicates a moderate negative relationship, with race data showing a significant correlation with letter grades.

Finally, what impact did the percentage of Students with Disabilities have on the data.

Students with Disabilities Percentage by Letter Grade Group

It's intriguing to note that schools with 'A' and 'F' grades have almost identical percentages of students with disabilities. This raises questions that I intend to delve into in an upcoming blog post.

Students with Disabilities percentage scatterplot with letter grade score

The percentage of Students with Disabilities (SWD) shows a different trend, with a non-significant correlation (r = -0.09, p > 0.05). This may be due to the fact that this subgroup's score constitutes only 10% of the letter grade score in the Growth25 category.

Conclusions

More detailed analyses will follow, but this preliminary look suggests significant disparities that need to be addressed. The intent is to bring this to light swiftly.

This blog post was edited by ChatGPT.

TN School Letter Grades Explained

Introduction

Whenever you write something, you’re always supposed to have audience in mind. I do not have one in mind for this blog. It can be for educators or laypeople. I just want people to understand what is going on in education in the state of TN, what the implications are for their school, and how to interpret the results that they are seeing for letter grades for their school.

I will not debate the merits of this letter-grade system here. It’s already done, and I already spoke publicly at the public forum offered. In addition, I have signed more than one letter regarding these letter grades, and everything we warned about them has already borne fruit in less than 24 hours. The marks are already out here marking. Having seen that happen is what has inspired me to explain how these work and what they mean.

Disclaimer

I am not going to dumb this down. It isn’t that complex, but it’s more complex than, say, basic sports stats. Laypeople shouldn’t have any problems understanding this, but I just want to ward off any criticism that I’m writing some sort of arcane nonsense. If that’s what it is to you, then that’s just what it is.

Also, to be clear, I am writing this in my capacity as a private citizen, and any opinions here are my own and are not representative of the Campbell County School System or of East Tennessee State University.

Something that needs to be said

  • We weren’t informed that we would be given these grades until this Fall. This is like the teacher telling students that the way their grade is going to be calculated is totally different than what it was all semester.

  • The state has issued accountability protocols with the accountability metrics. We’re used to having these at the beginning of the school year for that year’s data, not for the previous year’s.

How were we “graded” before?

Accountability, agreed to in our ESSA plan, looked at several different factors, including Achievement, Growth, Chronic Absenteeism, Graduation Rate, Ready Graduate Rate, and English Language Proficiency Assessment. You can see how this accountability is laid out below (source).

Those are the weights and measures. The final score is calculated by weighting all students with 60% and students in subgroups (Black/Hispanic/Native American, Students with Disabilities, Economically Disadvantaged, English Language Learner) 40%. This is what a typical federal heat map looks like.

As you can see, this school scored 3.5. This is high enough to be a reward school in years past (anything school with a score greater than 3.2 was designated as a reward school).

You can see they had an “Absolute” and “Target” score. And the state takes the higher of the two. The TDOE completely did away with this. This is why many schools who are used to having high accountability scores are receiving low grades this year. The goalposts have moved.

How we are “graded” now:

This is a much simpler formula, and that might seem like a positive thing at first glance, but once you dig into how all of these metrics are determined, then you really see how much different of a system this is.

Let’s look first at how achievement scores are determined by federal accountability.

This means that schools with 45% or better of their students scoring proficient on the state test or meeting their double AMO (twice the amount set for their Annual Measurable Objective) will receive a score of 4 out of 4 for the achievement category. Let’s look how the state changed this for their accountability.

They spilt it into three grade bands, and they made the elementary and middle achievement higher at 49.5% and 45.4% respectively. The high school achievement score is lower at 40.1%. This is to earn maximum points. This begs the question why didn’t they include the pathway for improvement?

Growth

Growth is measured the same in both accountability systems, however, growth has been interesting in the state of TN since COVID. Because of Shelby County and Davidson County having multiple virtual years, this has really changed growth for smaller counties. Since growth is a metric that’s based on comparing students to their peers, and specifically, it’s comparing how students perform from one year to the next (have fun reading about it here) for grades 3-8 and comparing how students perform based on a predicted score, it’s hard to show growth when over a hundred thousand students in the pool would have had artificially low scores during and right-after COVID. Can you still show growth despite this? Yes, but you would have to really outscore the means from before. This is why despite showing tremendous gains in Achievement, Campbell County schools are still having trouble showing growth. Look at the improvement Campbell County has had in Achievement over the past three years. It is counterintuitive that this did not also turn into growth like it would have in pre-COVID years.

Subgroups

The following groups of students are recognized as historically underserved student groups:

  • BHN (Black/Hispanic/Native American)

  • ED (Economically Disadvantaged)

  • SWD (Students with Disabilities)

  • ELL (English Language Learners)

For federal accountability, we were held accountable for how these groups of students performed. It is a huge part of our planning. As you can see below, they’re weighted for 40% of federal accountability.

Schools that have a high percentage of these students have depended on their performance to have a high score for federal accountability. Repeating myself here, but it’s 40% of the school’s accountability score.

For state accountability, this completely changed. Now, these students only count for 10% of the accountability score, and it’s the growth score for only the students who score in the bottom 25%. If that sentence is confusing, you’re not alone. Having groups of students go from 40% of your accountability to 10% is shocking enough, but using a metric like a bottom quartile also makes this group of students fluid and not easily identified for intervention.

CCR (not the band)

College and Career Ready is a new metric that the state is using to see the percentage of students who are taking advantage of Early Postsecondary Opportunities (EPSOs). This is like the federal metric, Ready Graduate, but it differs slightly, and it’s an easier metric to attain. Here is a chart of how they differ. Also included is a metric for TISA graduates, which is something that triggers additional outcome funding, but it isn’t part of the accountability model.

Conclusion

Hopefully this will help you understand how Letter Grade accountability differs from what schools are used to. Let’s keep in mind that all of this is based on some pretty flawed logic. How a student performs on one test on one day is interesting information for a teacher, but it shouldn’t be used to evaluate a teacher or school or district.

If you want to read more about accountability, I suggest this blog post.

If you want to read a Director of School’s thoughts on all that’s happening here, please read this.

Creating GPTs and the future of EdTech

ChatGPT plus users now have the ability to make their own GPTs. Think of these like personalized ChatGPT bots that have specific parameters. For example, the first one I created is a writing tutor for students in TN. It doesn’t take any special coding to do one; you just chat your way through it with ChatGPT. You can upload documents, and that’s what I did. I found the latest writing rubrics for TN and the anchor papers that are provided. Because ChatGPT doesn’t care about formatting, I wrote a python script to write all of those dense PDFs into one RTF file, and I uploaded it.

I know a lot of teachers are going to fear that students are going to use these programs to cheat, so what I did was tell ChatGPT that I didn’t want it to create writing for students, only to give feedback on it. So instantly, I built a fence around my GPT that I wanted.

People like me are going to be building a ton of these, and eventually, we will probably have the opportunity to monetize these like YouTube Videos or TikToks. And honestly, that’s my plan. I want to built great resources that are specific to TN, and if I make some extra dough on that, good for me. Honestly, building it is its own reward for me.

You can keep track of my GPTs here: http://jasonhorne.org/gpt

Josh Heupel's Penalty Rates

Despite Josh Heupel’s success as a football coach at both the University of Central Florida (UCF) and the University of Tennessee (UT), and also despite perennial gaudy offensive numbers, Josh Heupel’s teams have been among the most penalized in college football over his six years of coaching. I will look at the extent of this, but I won’t get into why this is true; someone else can do that.

I took data from https://www.teamrankings.com/ to look at how Josh Heupel’s team did with penalties at UCF and UT. By comparison, I wanted to see how Gus Malzahn’s UCF teams have done the past three years, and I also wanted to see how Jeremy Pruitt’s UT teams did the previous three years before Heupel became the coach.

Because penalties aren’t the complete picture, I also wanted to look at winning percentage and the number of plays per game. Just taking a look at 6 years of UCF and UT, it’s obvious that penalties are not having any causal effect on winning percentage for a season.

So what is the reason Josh Heupel’s teams commit so many penalties? His teams’ penalty rankings seem to be inversely proportional to number of play rankings. Heupel shows an average ranking of 116.2 (out of 133 teams most years) in terms of penalties per game (a lower ranking meaning more penalties per game). Conversely, he shows an average ranking of 20.7 for plays per game, even having the number 1 ranking in 2020-2021. Because of this, I wanted to see if there was a relationship between plays per game and penalties per game.

I scraped all the penalty per game and plays per game data from https://www.teamrankings.com/ to get the data for all college football teams for both categories. I combined all the data into one data frame that included the average penalties per game, average plays per game, and the rankings for both. To look the relationship, I wanted to see a scatterplot of average penalties per game vs average plays per game, and I also wanted to see the Pearson r correlation coefficient.

scatterplot of penalties v plas

The scatterplot shows that there is a weak relationship between the average penalties and the average plays per game (r = 0.16, p < 0.05). Even though the relationship is weak, it is a statistically significant relationship. Ultimately, the number of penalties are a trend with Heupel-coached teams, and it doesn’t appear that he can blame it on the number of plays his teams has per game.

A deeper analysis of Tennessee’s penalties might show where they are happening. Are they happening more frequently on offense or defense? Are certain players or position groups committing these penalties? Against which teams are the most penalties committed: teams like Alabama and Georgia, or is it teams that aren’t as stiff competition?

The evidence suggests that while Heupel's offensive strategy correlates with a high volume of plays, it does not inherently lead to increased penalties, hinting at other factors at play. The nature of the penalties, their timing, and their distribution among players and game situations are dimensions still to be explored. In-depth examination could offer insights into whether the penalties are a byproduct of aggressive play-calling, lack of discipline, or strategic trade-offs deemed acceptable by the coaching staff.

Understanding the subtleties behind these penalties can be crucial for refining practice strategies and in-game decision-making. It can help in developing targeted coaching interventions to mitigate unnecessary losses while maintaining the aggressive edge that characterizes Heupel's approach. For the University of Tennessee, such insights are not just academic; they could be the key to fine-tuning a powerful offensive engine into a more efficient, disciplined unit that capitalizes on its strengths without succumbing to self-inflicted setbacks.

For now, the data provides a starting point for a more nuanced discussion on the interplay between plays and penalties under Heupel's tenure. Further research may illuminate the path to optimizing performance where it matters most — on the field where every play, and every penalty, can alter the course of the game.

Tennessee TN Ready/EOC Rankings and Average Proficiency

I wanted to see where my county ranked annually with other school systems in the state in terms of TN Ready/EOC proficiency. I also wanted to compare our county with other benchmark districts. Click here to access the Looker Studio.

I built this Looker Studio (formerly Google Data Studio) to illustrate this data. While I was at it, I also built some charts to look at proficiency over time.

I pulled the raw data from the state data downloads page here.

I used Python to concatenate the files from 2018 through 2023. I also used Python to clean the data and separate it into two different files (EOC and TN Ready). I then uploaded those files to Google Sheets for my Looker Studio.

Here is the raw TN Ready Data.
Here is the raw EOC Data.

Update: I added a spot for school ranks and average proficiency and included data up to 2022. We still do not have 2023 data for the entire state, and my local data is currently embargoed.

I will update that when it is public.

Building an Artificial Intelligence tool to predict test scores

You can find the tool here.

Using different variables, about how reliably can we predict test scores? For this project, the answer was around 40% reliable.

Project Explanation

This project is focused on using three different datasets to predict the ScaleScore for students on their Tennessee Comprehensive Assessment Program (TCAP) exam. The data sources used for this project include NWEA MAP Data, Attendance Data, and TCAP data from the years 2020-2021 and 2021-2022. NWEA MAP is a benchmarking test series that provides a snapshot of student progress through a grade-level, and is also used to predict achievement levels. Attendance data shows how many days a student attended school in a year, and includes demographic data that adds additional variables for multiple regression analysis. Finally, TCAP data includes the ScaleScore for students. The data files are merged using the student's state ID as the Primary Key.

The datasets used for this project include:

  • 2020-2021 Absentee Data: 2021_abs_data.csv

  • 2021-2022 Absentee Data: 2022_abs_data.csv

  • 2020-2021 MAP Test Data: MAP2021W_results.csv

  • 2021-2022 MAP Test Data: MAP2122W_results.csv

  • 2020-2021 TCAP Test Data: TCAP2021_results.csv

  • 2021-2022 TCAP Test Data: TCAP2122_results.csv

This document will cover the following sections:

  1. Project Explanation (this section)

  2. Data Cleaning

  3. Data Visualization

  4. Data Training

  5. Data Predictions

  6. Website Encoding

  7. Reflection

Data Cleaning

When cleaning the data, I opted to do some of it manually in Excel by removing unnecessary columns and creating a new column in the absentee data files for the number of days present. I did this to ensure that any identifying information was removed from the files and to get a more accurate picture of the relationship between attendance and Scale Scores. However, I understand that it's best practice to do all the cleaning in the code, and I did the remaining cleaning in Python.

When I tried to merge the files together, I ran into some challenges because I overcomplicated the process. At first, I thought an outer join would work, then I considered a left join. But both approaches resulted in a lot of NaN values and over 140k rows of data. I eventually realized that I only needed to merge the Attendance, MAP, and TCAP files together after concatenating them, because the year of the test was not important for the final result.

To prepare the data for analysis, I had to make some changes. Firstly, I converted the Course and ContentAreaCode columns from strings into numeric data. Next, I removed rows with Science and Social Studies because my analysis was only interested in Math and English scores. After that, I realized that I didn't need both the Course and ContentAreaCode columns, so I deleted the latter. Additionally, I converted Math scores to 1 and English scores to 2. However, the Course column didn't show a strong correlation with the target variable, so I ultimately decided to exclude it from the analysis.

To clean the data, I deleted more than 100 columns from the original files using Excel. While best practices suggest working with fewer files and keeping them consistent, for this project, I only needed one-off files. Therefore, it was easier for me to clean the data in Excel than to type it all into Python. Although this method may not be ideal for automation, it suited my project's purposes.

Data Visualization

In the visualization stage of the project, I performed a Pearson correlation to determine the correlation between each variable and the TCAP Scale Score, which is the target variable. Based on the results (see below), only a few variables showed any correlation with the Scale Score. These variables were TestDurationMinutes, TestRITScore, EnrolledGrade, ED, SWD, and n_days_present. However, I excluded TestDurationMinutes as it was not logical to use data from the MAP test for predicting the TCAP score. Ultimately, I focused on RIT Score, Grade Level, the number of days present, the economically disadvantaged status, and the student with disabilities status. I chose to work with these variables as they showed the most significant correlation with the Scale Score.

Course                 0.013974
TestDurationMinutes    0.199303
TestRITScore           0.523997
SchoolNumber          -0.048202
EnrolledGrade         -0.125896
TestGrade             -0.126014
ScaleScore             1.000000
school                -0.067635
n_absences            -0.166537
Black                 -0.037727
Hispanic              -0.029100
Native                 0.029105
BHN                   -0.041469
HPI                   -0.012874
Asian                  0.033111
White                  0.026005
ED                    -0.158510
SWD                   -0.316250
EL                    -0.029312
n_days_present         0.172471

After I selected the variables I wanted to focus on, I created scatterplots to visualize the relationships between the data. I specifically focused on the attendance and MAP data, as grade level, SWD, and ED wouldn't be suitable for a scatterplot. The scatterplot for TestRITScore and ScaleScore displayed a clear linear relationship, while the attendance data was more scattered, with more outliers for ScaleScore than RITScore. These visualizations helped me confirm that my data was suitable for analysis and gave me confidence in my approach.

Scatterplot

For the other data, since ED and SWD had binary (0 or 1) choices, and since grade levels are on a 3-8 scale, I chose to use BoxPlots to visualize those. A 0 means that the student isn’t Economically Disadvantaged or a Student with Disabilities.

Economically Disadvantaged BoxPlot

Students with Disabilities BoxPlot

Grade-Level BoxPlot

I also did a heat map that shows how the correlates for each of these compare.

Heat Map

Data Training

To prepare the data for machine learning, I needed to split it into two sets: the data training set and the data testing set. With 63,655 rows of data, the training set will be composed of 20% or 12,731 rows, and the testing set will be composed of 80% or 50,924 rows. I plan to experiment with these numbers to see how different splits affect the results of the machine learning model.

During the process of splitting the data into training and testing sets, I needed to ensure that there were no NaN values present in either set. It took me a while to troubleshoot the issue because a particular variable kept showing NaN and Infinite values, and I couldn't locate them for some reason. Eventually, I realized that I had forgotten to remove the NaN values from the x_train and x_pred variables, even though they were not present in other sets. This was a valuable lesson for me to learn in terms of checking all relevant variables for data inconsistencies.

I ended up running a multiple regression, and here is the data it returned:

Training/Testing Coefficients Intercept R-Squared
20%/80% 0.26169216 -7.37082399 1.17469616 -2.25918833 -4.18241256] 70.8358039289374 0.39895715901860773

The coefficients represent the weights assigned to each variable in the multiple regression model that was trained on the 20% data set. In other words, the coefficients indicate the relative importance of each variable in predicting the target variable (TCAP Scale Score).

In this case, the coefficients are for n_days_present, EnrolledGrade, TestRITScore, ED, and SWD in that order. A positive coefficient indicates that the variable has a positive effect on the target variable, while a negative coefficient indicates that the variable has a negative effect on the target variable.

For example, the coefficient for n_days_present is 0.2617, which means that for each additional day a student is present in school, their predicted TCAP Scale Score will increase by 0.2617 points. Similarly, the coefficient for EnrolledGrade is -7.3708, which means that as a student's enrolled grade increases, their predicted TCAP Scale Score will decrease by 7.3708 points.

The intercept value of 70.8358 represents the predicted TCAP Scale Score when all of the other variables in the model are equal to zero.

The R-squared value of 0.3990 indicates that the model explains 39.9% of the variance in the target variable, which means that the model is moderately accurate in predicting TCAP Scale Scores based on the selected variables.

Website Encoding

Although I had prior experience with HTML dating back to the late 1990s, I struggled to implement my project on my website using Squarespace due to my limited coding skills. Consequently, I sought assistance from ChatGPT to create a JavaScript feature that could take input from a form and use the coefficients to generate a predicted score. The feature turned out to be a great addition to my website, and it worked seamlessly.

Reflection

As I reflect upon the completion of this project, I have gained valuable insights in a relatively short span of time about utilizing Python for data analysis and visualization. This experience has been truly enjoyable and has revealed an intriguing parallel that I had never considered before. Just as English is my native language, Microsoft Excel is my primary data language. I tend to visualize all data in Excel sheets in the same way that I perceive foreign languages through the framework of English. When learning Spanish, I contemplate its relation to French and subsequently to English. Even while studying French, despite being proficient enough to think in the language, I occasionally revert to my native tongue during the thought process. This phenomenon is identical to my experience with Excel, which is why I opted to modify my CSV files in Excel prior to working with them in Python. If I had learned Python first, I might have preferred it for handling data. This unexpected mental revelation has left me wondering when I will begin to perceive the world through Python. Experiencing my first Python dream last night was a noteworthy milestone, as dreaming in a foreign language is often an indicator of the formation of appropriate neural pathways in the brain.

I have thoroughly enjoyed this project and eagerly anticipate creating more with an expanded range of variables. This approach can provide substantial insights into the mathematical dynamics of a student cohort, and incorporating additional years and data types will enable us to further train and test the model, ultimately achieving a greater degree of certainty in our predictions. The moderate correlation (r = 0.52) between RIT and TCAP is already promising, indicating the reliability of RIT as a benchmark exam. I am enthusiastic about broadening the scope of this project over time and discovering new possibilities in the realm of school data analysis.

Three Months of ChatGPT Part III: Learning

Of all the uses that ChatGPT has, how it has helped me learn things is my favorite. Currently, I’m working on a second master’s degree in Data Science, and it has proved invaluable in helping me learn.

School Learning

I’m taking a course, CSCI 5260: Artificial Intelligence. This course uses a book called Artificial Intelligence: A Modern Approach. Because the newest edition of this book was written in 2022, ChatGPT doesn’t know it, but it does know the previous version of this book. Before I do my reading, I ask it to outline the chapter for me with the key points. I use this outline to fill in my notes. I’ll show you what this looks like.

Reading Notes

This really prepared me to read the chapter, and it also gives me a guide for taking notes.

Once my professor posts his PowerPoint, I export it to an RTF file, and I have it make an outline of it. This gives me something to use when I’m taking notes during his lecture. By the time his lecture is finished, I have my notes from the book and notes from him, and I have the ChatGPT chapter summary and PowerPoint summary. I take all of that information and put it back into ChatGPT, and I ask it to make a study guide. I use this study guide to help me take my open-notes quiz. Note that I’m not taking any shortcuts here; I’m only deepening my learning by using ChatGPT to do the part that nobody has time for (making summaries and study guides).

A potential shortcut to this would be to record the audio of the lecture, and use a voice to text program to submit that straight into ChatGPT, but I’m not there yet. Those are some extra steps that could keep me from even having to take notes, but I find that taking notes improves my understanding.

During lectures, if the professor is explaining something that’s out of my depth, I ask ChatGPT to explain it to me. Here is an example from my last class.

Explaining Something

As you can see, I didn’t really understand how these two types of search were related, and ChatGPT explained it to me. So far, I’ve only shown you examples from a graduate-level Computer Science course.

Let’s see how it could help a third-grade student understand fractions.

Explain Equivalent Fractions to a Third Grader

Or if you need help helping them, it can suggest that, too.

Help a Third-Grader

What about high school stuff?

Let’s see how it does with something that nearly everyone has to learn. I’m going to let the whole thing play out.

A freshman reading Romeo and Juliet for the first time could benefit from having ChatGPT within reach rather than just struggling through the text and giving up on it. This is just like Cliff’s notes, except you can interact with it and if throws a word at you, just ask what it means. Instead of getting a dictionary definition, it will tell you what the word means in context.

Also, you can get loopy and ask it to rewrite the scene as if it’s an episode of Seinfeld or Friends just for fun.

Coding

I’m in two different classes that require me to use Python to do my work. One, CSCI 5010: Programming for Data Analytics, is basically an intro to Python class. I don’t use ChatGPT for help in it much because it’s designed for us to go through several steps to learn how to code with Python. If I used ChatGPT for that, I would never learn the basic code, and it would really stunt my growth in Python. The usefulness of ChatGPT is highly dependent on the user's level of knowledge, as it becomes increasingly powerful for those who are knowledgeable and relatively useless for those who lack knowledge. I am not going to be able to use it for advanced coding if I make it do my simple coding, too. Having given that disclaimer, the coding required in CSCI 5260 is much more complex than the coding in 5010, and I have used ChatGPT several times to explain what’s going on in the code so that I can learn more and understand it. Here is an example:

Analyzing Code

And if your code gets an error message, you can ask it to explain the error message to you, and it will even suggest code to fix it.

ChatGPT Fixing a Problem

It isn’t limited to Python, of course. You can have it write in other languages. This could be very helpful with HTML. For example, if I wanted to write some HTML code to insert into my site that has a LOOK AT THIS HTML in big letters using a cursive font, I could just ask ChatGPT how to do that.

LOOK AT THIS HTML

If we only had this back in the MySpace days.

For the uninitiated, how everyone has fixed their code for the past two decades is to go to sites like GitHub, StackOverflow, or just Googled it. This would rarely produce a 1:1 fix for your code. You would have to extrapolate how to fix your own code from someone else’s code fixing something adjacent. And while that was always a pretty decent learning experience, it just doesn’t compare to having ChatGPT look at your code for you. The more you talk to it about what you’re trying to accomplish, the more precise it is with your code.

Laws and Policies

I would not use ChatGPT in place of hiring an attorney, but sometimes you just need to understand what a law means. Let’s say you’re watching TV and someone pleads the fifth. Let’s see what ChatGPT says about that.

I plead the fifth

Maybe you want to figure out how that go into the Bill of Rights in the first place.

Still Pleading the Fifth

Let’s dig even deeper. Let’s say you’re a big Hamilton fan and want to see what Hamilton wrote about The Fifth Amendment in the Federalist Papers.

Hamilton Pleads the Fifth

I wonder if there are any exceptions to the Fifth Amendment.

Exceptions to the Fifth Amendment

This is all in one conversation that takes a minute or two to do. And if you check it against the primary source or other sources, you can see that it stands up. I like how it includes quotes from the sources if you ask about it. For example, if you ask about the sources from the Federalist Papers, it’s going to include quotes. I would be cautious about using ChatGPT as a source, and I definitely wouldn’t ask it to tell you how to cite something. I asked it to cite the Federalist Papers in APA format, and here’s what happened. I had to correct it.

APA Oops

Suggestions for Students

  • Understand that this is still in its infancy, but it is already a powerful tool that you need to know how to use.

  • Don’t trust it as a source any more than you would anything else, but it is reliable especially if you focus it to specific information that you want.

  • Try to get class materials into copyable text to put into to have it organize and analyze for you.

  • Tell it the name of your textbook and have it create guides for each chapter to make it easier to take notes on your reading.

  • Have it help you make study guides.

  • Feed your own typed notes into it and have it organize them for you.

  • Have it create a quiz for you and grade your responses. This works well if you give one response at a time.

  • If your professor provides you with a rubric, you can feed it into ChatGPT, and then paste your paper and have it evaluate your paper using the rubric. Ask it to give you feedback. This will be different than a human, but it will still be insightful.

  • Have it help you write emails to your teachers or professors that sound professional.

  • Have it proofread your papers for you. If you have it write it for you, then it will get caught in an AI detector. Just tell it to tell you where your mistakes are and put them in a table (it’s easier to keep track of them there).

  • Tell it to explain something complex to you like you’re ten years old if you don’t really understand it.

  • If you don’t understand something, tell it what you don’t understand about it.

  • Ask it to explain books to you and why they’re important.

  • Pretend it’s a friend who read the same book as you and talk to it like you’re hanging out discussing the book. If its tone is too impersonal, tell it to pretend it’s your friend.

  • If you are given an open-ended assignment, have it give you ideas on what you could do. (Think six-grade science fair.)

  • Have it help you brainstorm during the writing process.

  • Free write ideas and have it organize them for you.

That’s enough homework for now.

Three Months of ChatGPT Part II: Teaching

Generally, I teach two different classes: MEDA 3570 - Educational Technology for Pre-Service Teachers, and ELPA 5400/6400 - Developing Learners and Instructional Leaders for future PreK-12 Administrators. I want to write about how ChatGPT has impacted my teaching this semester.

Planning

In my MEDA 3570 class, I redid my projects with the help of ChatGPT who not only helped me organize the instructions better than they were, it also helped me develop a rubric to grade my projects. I could have done this by myself, but it was good having ChatGPT to think it through.

What was really great is that I had assignments from another class that I fed into ChatGPT and said “make my assignments more like these.” I have found that ChatGPT is best when it’s improving something, not when it’s creating from nothing. In my experience, ChatGPT is best utilized for improving existing materials rather than creating from scratch. As such, teachers who seek to enhance their existing materials may find it more useful with fewer obstacles than those who prefer to create from scratch.

Collecting Qualitative Data

I really like keeping a log of all my feedback to students on an assignment, then having ChatGPT look through that feedback for commonalities. I think of it like an item-analysis that you would use for an assessment, but instead of looking at who got #1 wrong or #12 wrong, I’m looking at what I’ve said over and over to students. Because of this tool, it frees me to give feedback in a loose, casual way, because ChatGPT is going to see through that and just tell me what I’ve had to say to students. It’s really worth trying especially if you’re teaching something mechanical like writing or art that has an element of subjectivity to it.

Getting to Know Your Students as Learners

I also use this as a way to get to know my students better. In my MEDA 3570 class, I always give a beginning of the year survey where I’m getting to know them. And like in my previous blog post, I take the open-ended answers and feed them into ChatGPT for analysis. It gives me a summary of my class. After feeding all of the information into ChatGPT, I have it make a summary of my class. The insights gained from ChatGPT's analysis of student feedback can be invaluable. Moreover, these analyses can be compared across different classes and semesters, providing a valuable tool for tracking progress and identifying areas of improvement.

In my ELPA 5400/6400 class, part of the program is to use the Strengthsfinder 2.0 for students to identify their strengths. It’s always interesting to look at a matrix of the strengths of a class. It gives them 5 strengths, so there is a lot of information to process with it, and it’s a good tool for putting together groups, but I wanted to see how ChatGPT would process these. I didn’t go too deep with it, but I did have it make a table of the strengths and then summarize them. This is what it returned:

“This information can give you a general understanding of the dominant strengths of the class and can inform your teaching and learning activities, such as creating group projects that utilize the strengths of individual students or creating a positive and supportive learning environment that plays to the strengths of the group.

 Based on the strengths data provided, this class seems to be a group of students with diverse skills and abilities. The strengths of the students include a mix of interpersonal skills (such as Relator, Woo, and Connectedness), analytical skills (such as Analytical, Input, and Intellection), leadership skills (such as Achiever, Activator, and Responsibility), and strategic skills (such as Strategic, Ideation, and Futuristic). Additionally, the class includes individuals who are adaptable (Adaptability), positive (Positivity), and self-assured (Self-Assurance).

Overall, this group of students seems to have a good balance of different strengths, which could make for a collaborative and dynamic learning environment. The presence of different skills could also allow for students to work together on projects, leveraging their diverse strengths to achieve their goals.”

While these applications of ChatGPT may not seem groundbreaking, they provide a starting point for its potential in the K-12 space. By inputting data such as learning styles and readiness levels, ChatGPT could be used to create small groups and perform other analyses to save teachers time. The true power of ChatGPT lies not just in what it can do with one simple query and answer, but in what it can accomplish during a lengthy conversation.

The Administrative Part of Teaching

Whether you’re an adjunct, full professor, or K-12 teachers, you’re going to get emails or phone calls from upset students, parents, or both. It’s just part of the job. ChatGPT can really help you respond to these in a balanced and professional way.

I’m going to feed this email into ChatGPT, and let’s see how it suggests I respond to it. Here is the email:

“Dr. Horne,

I’m very upset that my son made a 55 in your class. He has done all of his work and he’s a straight-A student. You gave him zeros those tests from when we was at Disneyland for a week and you wouldn’t come to school at 6:30am for him to make them up. You need to give my son and A. He’s a straight A student! I want to know what you’re going to do about this!

Upset Parent

Sample Email

This type of tool is very helpful because it doesn’t do what a template would do. A template doesn’t react to a specific situation. And honestly, this is one of those emails that, prior to ChatGPT, I would type 20 times until all of my anger and frustration were gone from it.

Coming at this from a different angle, if you’re writing an email and need it proofed or for someone to look it over, ChatGPT can also do that for you more quickly and more precisely than another human can. Also, it keeps you from having to bother a co-worker.

Letters of Recommendation

I encourage you to be careful when using it to write letters of recommendation because they’re all going to end up sounding the same. I’m joking. They all sound the same already. Here is a good process to follow when using it write letters of recommendation:

  • Input the student’s CV, GPA, ACT/SAT score, etc. Don’t put the student’s name or your name because you don’t want that in the deep recesses of an AI.

  • I would have the student send you their letter of intent or whatever they had to write. Input that as well.

  • Tell ChatGPT to write a letter of recommendation for whatever it is, but leave room for you to add personal comments at the end.

  • This will save you so much time, but it will also give you an opportunity to have something better than your usual letter of recommendation. It will be personalized for the student and include all the information you normally do, but it will include personal comments. How you do that is up to you, but this can help you do it quickly and effectively.

Here is an example. I’m going to input my own CV and tell ChatGPT that I’m writing a letter for a colleague who is apply to be Headmaster of the Van Buren School for Boys.

Sample Letter of Recommendation

I then asked it to recommend the same person for an Associate Professor position at the University of American Samoa. Look at how that letter differs from the first.

Sample Recommendation Letter

Thought Partner

One of my favorite uses of ChatGPT is as a thought partner. This does not replace good colleagues with diverse experiences, but it does provide someone with whom to talk and think when you don’t have a colleague available, and it also provides a different perspective than colleagues, all of whom have their own biases and quirks.

Here is an example:

Many of my students were not following my instructions, and I thought they were very clear. I asked ChatGPT about it, and I pasted in the instructions that I had given, and it gave me suggestions for improving them.

Instructions Suggestions

And in its previous response, it gave some advice on what to think about when students are not following your instructions.

ChatGPT Instructions Advice

I know that isn’t exactly mind-blowing, but again, it’s just a start. It’s a simple thing that every teacher encounters, and it’s a quick and easy way to think through it.

What to try

Here are some things I would definitely try in ChatGPT to see how they work out for you.

  • Feed it your Evaluation rubric scores and see what feedback it gives you.

  • Ask it to help you with an issue you’ve been having in your classroom.

  • Feed your student feedback comments and scores into it and see what it tells you.

  • Get advice on help a student with a specific disability (be careful not to give it confidential information like a student’s name).

  • Ask it to help you write a lesson on something you don’t enjoy teaching. Ask it to help you find a way to really enjoy teaching it.

  • Ask it to help you write a lesson on something your students really dislike learning. Ask it to help you find a way to get them to really enjoy it.

  • Ask it analyze your syllabus.

  • Ask it to make suggestions for your students on how to be successful in your class after you’ve had it analyze your syllabus.

  • Input your writing rubric into ChatGPT then have it analyze student writing samples. See if it scores it any differently than you do. How does your feedback differ?

  • Have it analyze your classroom procedures and rules for clarity and fairness.

  • Have it analyze a dress code for sexism.

  • Have it analyze a student handbook for clarity and bias.

  • Have it write a letter to a parent about a student who is failing your class.

  • Using a student’s CV and other information, have it write a letter of recommendation.

  • Using your CV, have it write a letter of intent for you.

  • Using your CV, have it write a short biography for you.

There is always so much more to say about this tool, but this should be a good starting place for anyone.

Three Months of ChatGPT Part I: Work

Admittedly, I’m not going to use ChatGPT to write this. One reason is that I need as many writing samples of my own writing that I can get in order to train it to write like me, and the other reason is that I think that would be a little too meta and beyond good taste.

I have been using ChatGPT for three months now, and I want to talk about how it has transformed my work, my education, my teaching, and my life. It really serves several different roles for me, and it’s worth it to document them here.

Work

In my role as a central office administrator, I do a lot of reading and a lot of writing. The writing is typically involves communication, interpretation, and extrapolation. I have found ChatGPT can do a lot of the hard work of my job comparing a situation to policy, comparing an updated policy to an old one, and getting key details from a text or an email.

The most powerful aspect of this for me so far has been the work it did on the qualitative section of a survey that I sent out. Instead of me having to pour over hundreds of qualitative responses and making meaning of them. I had ChatGPT categorize them for me. Of course, I didn’t just blindly copy and paste, and everything I do with ChatGPT always starts off as a conversation. I call him “Greg” in my head. For this situation, in particular, I started off with why I was doing the survey, what the questions were, and what I hoped to learn from it. I didn’t need Greg for the quantitative part, but Greg made the quantitative part happen in a matter of minutes. This saved me hours of work, and it did it more precisely than I would have given the low priority of this work and the time constraints.

Here is what this looks like. First, here is what a typical Microsoft (or Google) Forms spreadsheet looks like:

Microsoft Forms Excel Export

Taking the information from this spreadsheet and turning in into data is easy for the quantitative parts. In fact, Microsoft Forms does that for you anyway. The qualitative parts aren’t quite as easy. In real research, a sophisticated coding process is typically used for this, and I haven’t tried to use ChatGPT for that, but this is a very watered-down version of that. I took all the qualitative responses for an individual question and had Greg look at it.

Here is what happened when I gave this input: “In a survey given to teachers about a new grading scale, they were asked. Have you noticed any positive or negative effects on student motivation since implementing the new grading scale? (open-ended) Please categorize their answers and quantify the results,” and then I pasted the text from the column.

Greg replied, “Categorized answers and their corresponding quantities:

Positive Effects: 10 Negative Effects: 4 Mixed/Neutral Effects: 10 Not Applicable/Not Noticed: 5

The most common type of answer is mixed/neutral effects, with 10 responses falling under this category. Meanwhile, positive effects come in second place with 10 responses, and negative effects come in third with 4 responses. There are also 5 responses indicating that the respondent did not notice any effects or that the question was not applicable to their situation.”

Then I asked, “Can you make me a list of the responses in these categories in a table?” Here is what it gave:

Now, imagine doing this for ten questions like this and then having Greg write a summary. It gave this summary:

“Key Themes in the qualitative responses:

  1. Concern about the long-term effects of the new grading scale: Some respondents expressed concern about the potential long-term consequences of the new grading scale, such as lower expectations for students and a false sense of accomplishment.

  2. Mixed feelings about the benefits of the new grading scale: Some respondents saw the new grading scale as a positive development that gave students more opportunities to succeed, while others were worried that it could lower standards and expectations.

  3. Importance of setting high expectations and holding students accountable: Some respondents emphasized the importance of setting high expectations for students and holding them accountable for their work, even if the grading scale is more lenient.

  4. Differences in opinion about the appropriateness of the new grading scale: There were differing opinions about whether the new grading scale was a good idea or not, with some respondents expressing support and others expressing concern.

  5. Need for clear communication about the grading scale: Some respondents mentioned the importance of clearly communicating the new grading scale to students, parents, and other teachers.

  6. Concern about the impact on college readiness: Some respondents were worried that the new grading scale could give students a false sense of their abilities and preparedness for college.”

This is just one example of many, but this type of analysis work is the real time-saver with Greg. I know a lot of folks use it for email, but I do not like for Greg to write my emails for me, generally, unless they are very low-stakes. I would never send out something heartfelt and personal that was generated by AI. That’s how you wind up in a situation like this. However, some emails that are purely professional with no emotional stakes are perfectly fine for Greg. With high-stakes emails, I like for Greg to look over them for me and to look for bias, unprofessional language, and of course, for any typos. It’s hard to proof your own work, and instead of tying up a colleague to do it, now I have Greg for that (that’s a really poorly-worded sentence, and I wish I hadn’t already boxed myself in by saying I wouldn’t have Greg look at it).

I’ve also noticed when someone “GPTs me” at work. Like, seriously? You had GPT write that to me? It makes me feel some kind of way, but I’m not sure which kind of way yet. I’m mindful of that when I’m using it for email.

Greg has saved me countless hours at work, and I started keeping track of everything I used it for, but it started to be way too much for to keep up with. This is what I had before I quit keeping track of it.

In my next installment in this series, I’ll talk about how I use it to enhance my teaching as an adjunct.

ChatGPT Discussion Question

This it the discussion question I’m going to post for my undergrad Educational Technology class.

How do you think AI will change education in the future? Play around with ChatGPT. Here are some things to get it to do for you: 

  1. Have it write a lesson plan for something you think you might teach.

    1. Keep chatting with it to refine the lesson plan.

  2. Paste your biography from the introduction page from your website and get it to rewrite it for you.

    1. Do you like its writing better than yours? 

  3. Ask it to write a study guide for the Google Certified Educator Exam. 

  4. Ask it to explain a concept that you plan on teaching and ask it to explain it on the level you plan on teaching (for example: "explain mitochondria as if I were a high school freshman."). 

  5. Ask it to rewrite one of your tool descriptions in instructional tools. 

  6. Ask it to write all of your tools in the Assessment/Management section. 

Let me know how this goes for you. How is this going to change education? Please don't give solely positive or negative views.