Sung-Woo Cho, Ph.D. (swcho@uoregon.edu) & Nathan Greenstein (ngreenst@uoregon.edu), Academic Data Analytics, University of Oregon
Introduction
Research shows that timely interventions can help large public institutions like the University of Oregon (UO) improve student success and achieve more equitable outcomes. At their best, such interventions are provided directly to the students who need them most, and are proactive rather than reactive (Astin, 1984; Kuh et al., 2011; Tinto, 1987). Taken together, these points present a challenge: it is often difficult to know which students are likely to struggle before they are struggling. This is especially true of student outcomes like retention, which are driven by a host of academic, cultural, and financial factors. To address this challenge, the Academic Data Analytics Unit (ADA) worked with the Division of Undergraduate Education and Student Success (UESS) and started this project in the fall of 2021. Together, we set out to use artificial intelligence (AI) to predict which first-year students are at risk of not returning for their second term, and to present our predictions before these students matriculate, allowing UESS ample time to offer advising support proactively. For this effort, we used AI in the form of machine learning classifiers to best predict persistence as yes/no outcomes with the information available. This form of AI is different than the generative AI that has captured widespread attention this year (e.g., ChatGPT), but shares the same foundation of using historical information, trained algorithms, and large computational power to predict outcomes.
This project has two goals: to serve students by facilitating timely intervention, and to pilot a process of transparent, responsible AI performed by UO’s own subject matter experts. Our model is designed to be used each summer to identify incoming students who we predict are the least likely to return for their second academic term, based on available data and historical cohorts’ information. During the first weeks of fall, UESS and other advisors use this resource to focus their outreach such that students with higher predicted risk receive the support they need to remain enrolled and succeed academically. While developing the model, ADA worked closely with UESS to prioritize transparency and equity. To ensure that the model combats existing inequities instead of reinforcing them, we included a range of voices when making key decisions, and we carefully tested our model for bias.
The model we developed performs several times better than non-AI alternatives, across multiple measures of performance. This essay discusses the motivation and context for this project, our team’s approach to AI, and the performance and fairness of the model.
Motivation & Context
Higher education administrators know that early intervention can improve student success and equity, but logistical realities force institutions to prioritize which students receive a given intervention earliest, or at all. Similarly, the literature suggests that negative outcomes experienced at the beginning of a student’s college career can disrupt their progress in ways that are difficult to recover from. By the same token, interventions generally work best when they prevent a negative outcome, rather than react to it (Astin, 1984; Kuh et al., 2011; Tinto, 1987). However, constraints on resources like staff, facility space, and funding mean that not every student can receive every intervention, and not all students can receive any given intervention at the same moment in time. These facts make it important to have timely insight into which students are at greatest risk for negative outcomes, especially since the burden of such outcomes tends to fall disproportionately on the most vulnerable students, compounding existing inequities (Swail et al., 2003).
One such outcome is non-retention, which is damaging to both individual students and their institutions. At the UO, UESS strives to improve retention by offering outreach to our first-year students early during fall term. However, because there are naturally far fewer advisors than students, not all students can meet with their advisors within the first few weeks of the fall term. Thus, to have the greatest possible impact, UESS sought a way to identify students at highest risk for negative outcomes so that it could offer them prioritized outreach for early advising. Our team set out to meet this need.
Before this project, UESS prioritized students with low predicted first-year GPAs. These predictions came from a linear regression model using eleven variables, with high school GPA the main driver among them. Although UESS hoped to instead prioritize their early advising intervention based on winter retention likelihood, the existing model could not predict retention accurately enough to allow this. This limitation is likely driven by several factors. Most simply, winter non-retention is rare at the University of Oregon – an average of only 4% were not retained over the past ten years. Moreover, retention is a relatively erratic outcome, as it is driven by a blend of academic, social, cultural, familial, and financial factors unique to each student’s experience (Ziskin et al., 2009). These difficulties were compounded by the fact that predictions were needed before students matriculate, when only limited data is available, in order to support the accelerated timeline that is essential to this intervention.
This challenge – to predict a mercurial outcome using only the data available before students’ college careers have begun – struck our team as one that might be met through AI. However, some stakeholders maintained healthy skepticism towards AI, especially regarding transparency and equity. Such concerns are understandable given the past shortcomings of high-profile projects by private companies, but we believe that such issues are consequences of faulty processes, not inherent to AI. Like many tools, AI can do good or harm, depending on how it is wielded. We set out to wield it for good by meeting the need for challenging predictions while piloting a process that is transparent, inclusive, and fair.
A Process for Responsible AI
Past applications of AI, both within the UO and beyond, have aroused reasonable misgivings among those concerned with accountability and student equity. One such application was a tool to help predict student risk that was produced for the UO by a third-party vendor. Due to its proprietary nature, this product offered sparse visibility into how it was developed, which variables it considered, and how well it performed for students belonging to vulnerable groups. This lack of transparency led to a lack of trust in the model’s ability to support equitable advising practices, which undermined confidence in the model. More broadly, it has come to light that certain AI solutions serve some groups better than others (Hardesty, 2018), or worse, exacerbate systemic inequities (Angwin et al., 2016; Dastin, 2018; Feathers, 2021). Those aware of such issues might reasonably worry about the unintended consequences of a new AI initiative. We applaud this regard for the wellbeing of potentially vulnerable students, and we set out to complete this work in a manner that places concerns like these front and center.
Since the earliest stages of the predictive modeling process, our ADA unit has collaborated closely with UESS, demonstrating a major advantage of performing AI work in-house. An AI solution is most valuable when designed around the needs of those who will use it, and we believe that it is only truly successful if it inspires trust in users and other stakeholders – namely advisors and students, in this case. To that end, ADA and UESS convened frequently, and each group openly asked questions and offered feedback to the other. As decision points arose, ADA outlined technical considerations and trade-offs, and UESS responded based on their knowledge of student and advisor needs. This allowed us to answer fundamental questions like the following (plus myriad smaller ones) with full transparency and the trust that it fosters:Â
We applied this inclusive and transparent approach through each stage of model development. To gather data, we worked with numerous offices within the UO. Throughout, UESS reviewed the data we selected to ensure that it was suitable from a student equity perspective and would not undermine the trust and privacy of students and advisors. To reduce administrative burden and the potential for error, we developed infrastructure to automatically import and process approved data elements in a data lake hosted in Microsoft’s Azure cloud platform, depicted below in Figure 1. Instead of relying on manual pulls of static data, we switched to a more automated format in which data are processed and analyzed as flows.
Figure 1: Sample Data Pipeline in UO’s Microsoft Azure Data Lake
Finally, as our model began to produce results, ADA and UESS worked together to ensure that it was fair and equitable. Our unit outlined theoretical approaches to defining and measuring bias, and UESS helped select the approaches best suited to our particular context. This stage also relied on input from faculty in the philosophy department who study data ethics, and, crucially, members of the student body. After we arrived at a shared understanding of what a suitable model should look like, our team evaluated the model against the agreed-upon criteria for fairness and equity. The model was then put into action only after ADA and UESS were satisfied that it met or exceeded all criteria. This process is covered in greater depth below.
Model Performance
Depending on how performance is measured, our model predicts winter term retention up to five times as well as alternatives, and it performs especially well for potentially vulnerable students. The model leverages 80 variables from university data and 33 variables from public data, captured over the 12 years represented by the 2010 through 2021 student cohorts.
To evaluate the model’s performance, we compared its predicted outcomes against actual outcomes within the 2021 student cohort, which was withheld from the model during development.[1] This is designed to simulate the model’s use with incoming student cohorts, whose data it had never seen before. In discussing the model’s performance, we present two metrics: (a) the proportion of true non-returners found among students identified by the model vs. the proportion among those not identified, and (b) the proportion of all non-returners that are identified by the model. We contextualize these results by comparing them to the same metrics as taken from two hypothetical alternative approaches, one based on low high school GPA (the main input to the previous prioritization scheme), and one based on a random lottery.
Among the 2021 cohort – the one withheld from the model during training – non-returners made up 14% of the group of students selected by the model (i.e., 64 among 450), whereas non-returners make up 3% of the group not selected. In other words, the model flags a group of students that is 4.6 times as rich in non-returners than the group it does not flag. For comparison, a hypothetical approach based directly on high school GPA provides no appreciable benefit, yielding two groups with non-returners making up 4% of each. An approach based on a random lottery does the same. This can also be considered from an advisor’s perspective: when meeting with students identified by our team’s AI model, an advisor would expect to encounter a non-returner once for every six students seen. When meeting with students identified by high school GPA or random lottery, an advisor would expect to encounter a non-returner once for every 25 students seen. These results are visualized in Figure 2.
Figure 2: Model Performance, Dot Visualization
Put differently, among the 2021 cohort, the model identifies 35% of all non-returners. Focusing specifically on potentially vulnerable students – simplistically defined here as first-generation students and those belonging to traditionally underserved races and ethnicities – the model identifies 45%. For comparison, a hypothetical approach based directly on high school GPA identifies only 11% of non-returners, and a random lottery identifies only 10%. In other words, by identifying students using our model instead of identifying students with low high school GPAs, results suggest that UESS can proactively reach out to 3.2 times as large a share of non-returners, and 3.4 times as large a share of potentially vulnerable non-returners. These results are visualized in Figure 3.
Figure 3: Model Performance, Area Visualization
* The 2021 cohort was hidden from the model during development. Each cohort’s performance is based on a model trained with all cohorts’ data, except the cohort in question and 2021.
These results establish our model as a valuable tool to identify students at risk of not returning for their second academic term. Although not every non-returner is identified, many intimate aspects of the student experience that contribute to non-retention cannot practically – and should not, ethically – be known to us. For example, a parent’s layoff or a sibling’s declining health could precipitate non-retention, but we do not know a student’s risk for these events, nor should we, especially without violating student privacy. Thus, we see great value in even an imperfect model that allows advisors to work with a non-returner once for every six students seen, as opposed to once for every 25 students seen. This gain has potential to make a deeply meaningful difference in students’ lives.
Equity & Fairness
As described above, our team centered equity and fairness in our AI process. Before beginning this project, we pledged to put the final model into service only if it met our standards for fairness and demonstrably resisted existing biases, and to do this in close consultation with stakeholders like UESS. This section describes how we conceived of equity in the context of this model, the specific standards we set, and the model’s performance relative to them. We conclude with the limitations of this work and potential future enhancements.
When considering a new application of AI, we strongly believe that developers must conceive of equity in a way that is tailored to the project’s specific context. Our team undertook this work with the help of UESS, scholars of data ethics, and student leaders. The relevant deliberations cannot be covered fully here, but two crucial decisions merit mention.
First, we determined that it is acceptable – desirable, even – for the model to predict non-retention at higher rates for potentially vulnerable groups than for their less-vulnerable counterparts. In a contrasting example, evaluators might forbid a model used for bank loan decisions from issuing approvals at a higher rate for white applicants than for applicants of color. However, since our model aims to allocate a supportive resource where it is needed most, and we know that need can relate to traits like race due to existing systemic inequities, we deemed it sensible and unproblematic for our model to allocate early advising at higher rates for some groups than for others, including along sensitive lines such as race and gender. In other words, to be equitable, our model need not predict early advising at equal rates for all groups.[2]
Second, we determined that it is unacceptable for the model to serve potentially vulnerable groups with a substantially lower level of predictive accuracy than it does their less-vulnerable counterparts. Setting predicted rates of non-retention aside, one can envision a scenario where a model is very good at predicting accurate outcomes for white students but generates relatively inaccurate predictions for students of color. This scenario is of particular concern when a majority of the historical data used to train the model describes white students. With that in mind, we deemed it problematic for our model to perform substantially worse for certain groups, especially along sensitive lines such as race and first-generation status. In other words, to be equitable, our model must not substantially under-serve potentially vulnerable groups in terms of accuracy.
Having arrived at an understanding of equity in the context of this model, ADA and UESS developed a set of criteria against which the model would be evaluated to test whether it aligned or did not align with our understanding of equity. Because these criteria lay out concrete ranges within which our model’s behavior must lie before it is put into service, we refer to them as “guardrails.” Without requiring any modification, our model fell comfortably within the guardrails established. Had this not been the case, ADA would have employed one of the various peer-reviewed algorithms available in the literature to address bias (AI Fairness 360 – Resources, n.d.), until such a point as the adjusted model’s performance fell within the guardrails.
Although this process has left us confident that our model behaves in a manner consistent with our equity goals, there is room to explore extended equity review in the future. For example, although students may belong to multiple potentially vulnerable groups, our current analysis does not consider intersectionality. Additionally, we hope to incorporate input from a broader range of stakeholders in the future, especially during the process of establishing concrete guardrails. Our team aims to move in this direction when this model is employed in future years, and when developing additional models.
Conclusion
This project sought to promote student success and advance equity by facilitating timely intervention, and to pilot a process of transparent, responsible AI. Consulting with a range of stakeholders, from UESS to faculty and students, we developed an AI model that performs up to five times as well as alternatives. We then intentionally set requirements for equity and fairness and tested the model against them. This work has served as a foundation by which we can now handle and incorporate more data into our model, particularly from student life, to help improve our predictive accuracy. Our work on the data lake will eventually allow us to incorporate textual data in the form of aggregated student responses (e.g., from a chatbot), which can further improve our model’s performance while simultaneously allowing us to get a better pulse on students’ needs.
We hope that this initiative can serve as a blueprint for responsible use of AI to promote student success and advance equity, both at our institution and beyond. If you are interested in learning more about this project, our preliminary results from our model’s implementation on the 2022 entering cohort, or how responsible AI might serve your institution, please do not hesitate to reach out to us at swcho@uoregon.edu and ngreenst@uoregon.edu.
References
AI Fairness 360-Resources. (n.d.). Retrieved January 13, 2023, from https://aif360.mybluemix.net/aif360.mybluemix.net/resources
Angwin, J., Larson, J., Mattu, S., & Kirchner, L. (2016, May 23). Machine Bias. ProPublica. https://www.propublica.org/article/machine-bias-risk-assessments-in-criminal-sentencing?token=p-v0T1xjfOJ8jrJHzc08UxDKSQrKgWJk
Astin, A. W. (1984). Student Involvement: A Developmental Theory for Higher Education. Journal of College Student Personnel, 25(4), 297-308.
Dastin, J. (2018, October 10). Amazon scraps secret AI recruiting tool that showed bias against women. Reuters. https://www.reuters.com/article/us-amazon-com-jobs-automation-insight-idUSKCN1MK08G
Feathers, T. (2021, March 2). Major Universities Are Using Race as a “High Impact Predictor” of Student Success. The Markup. https://themarkup.org/machine-learning/2021/03/02/major-universities-are-using-race-as-a-high-impact-predictor-of-student-success
Hardesty, L. (2018, February 11). Study finds gender and skin-type bias in commercial artificial-intelligence systems. MIT News | Massachusetts Institute of Technology. https://news.mit.edu/2018/study-finds-gender-skin-type-bias-artificial-intelligence-systems-0212
Kuh, G. D., Kinzie, J., Schuh, J. H., & Whitt, E. J. (2011). Student Success in College: Creating Conditions That Matter. John Wiley & Sons.
Swail, W. S., Redd, K. E., & Perna, L. W. (2003). Retaining Minority Students in Higher Education: A Framework for Success: ASHE-ERIC Higher Education Report. Wiley.
Tinto, V. (1987). Leaving College: Rethinking the Causes and Cures of Student Attrition. University of Chicago Press, 5801 S.
Ziskin, M., Hossler, D., & Kim, S. (2009). The Study of Institutional Practices Related to Student Persistence. Journal of College Student Retention: Research, Theory & Practice, 11(1), 101-121. https://doi.org/10.2190/CS.11.1.f
[1] As described, 2021 data was withheld during model development and testing. All assessments of performance and equity found in this document are based on the model’s handling of 2021 data, which it had never encountered before. However, at the very end of the modeling process, the model was re-trained with 2021 data included so that it could make the best possible predictions for the incoming 2022 cohort. This is standard practice in AI modeling.
[2] Under the same rationale, we chose to supply the model with race and other sensitive demographics, rather than withhold them. This choice is especially sensible given the well-documented ability of complex models to infer sensitive attributes from other data points, e.g. zip code.
Navigate the articles below, or go to the current Beacon directory.