How “Small Data” Can Address the Limitations of Predictive Analytics

October 2017

Ioakim Boutakidis is Associate Professor of Child & Adolescent Studies and Faculty Fellow of Student Success at CSU Fullerton and a member of the Interim Advisory Board of the CSU Student Success Network. In May 2017, the Network held a convening on data use in the CSU, where faculty, staff, and administrators from ten CSU campuses shared their challenges and successes in using data to improve student learning and created plans for next steps once they returned to campus. In this blog, Ioakim summarizes some promises and limitations of using “big data”—and describes how universities can use “small data” to broaden and apply their big data findings.

The California State University (CSU) has entered the age of big data. At a data convening by the CSU Student Success Network last year, an informal survey found that 30 of 31 staff and faculty from CSU campuses “agreed” or “strongly agreed” that they are considering the use of predictive analytics to improve student success. Over half (54%) said that using predictive analytics was making a difference for students on their campus. Nationally, between 40% and 50% of colleges and universities report using student data for predictive analytics. As these numbers grow, it is crucial that faculty and staff understand the kinds of questions that big data can—and cannot—answer about students.

At CSU Fullerton, predictive analytics are primarily used to help answer “what” questions. For example: What are our barriers to progression for a specific population of students? Or: What are the most promising interventions for a student, given his or her specific set of circumstances? These questions can lead to powerful insights about the appropriateness of specific student interventions. For us, they represent a first level of inquiry.

However, we also ask the next level of research questions, which typically begin with “why” or “how.” For example: Why are these particular students facing these challenges at these junctures? Or: How have particular students responded to a particular kind of intervention? These questions are more qualitative and require time and resources to interview students, hold focus groups, and analyze the results. The responses we receive, however, tell us about student motivations, challenges, and successes. Because this brings us closer to students, I call this a “small data” approach, although the term is metaphorical rather than literal.

Big Data and Predictive Analytics in Higher Education

The term big data commonly refers to datasets that are so large and constantly expanding that their management continually requires new strategies for collection and sorting. The emergence of affordable software has brought these data management capabilities to universities large and small. Predictive analytics describes a particular use of these datasets to make estimates based on past patterns. Weather forecasts are a version of predictive analytics, as are the browser ads that pop up based on your search history.

One of the most promising uses of predictive analytics in universities involves projecting outcomes for individuals rather than aggregated groups. For example, asking whether peer mentoring is effective in general is an interesting question, but it typically leads to results of limited utility. However, determining the efficacy of a particular peer mentoring intervention for a particular student, given his or her set of circumstances, can be very useful because this can help decision-makers determine what aspects of the peer mentoring program were effective, which were not, and why. As another example, software applications can analyze a student’s prior coursework and current status to predict relative risk for academic challenges. As more databases tracking student status are linked to course management systems, institutions will be able to provide earlier notification of student difficulties. These are effective uses of big data.

There are limitations, however, and one of the most important is that quantitative data, by its nature, is de-contextualized and often atheoretical. In fact, decision-making that is too data-driven rather than data-informed can cement existing inequities. A chilling account of this can be found in Cathy O’Neil’s Weapons of Math Destruction, which describes how the use of data analytics can lead criminal courts to predict probation failure—and with that more jail time—for defendants in low-income, high-crime neighborhoods. In contrast, it tends to predict a greater likelihood of probation success—and therefore less jail time—for those in neighborhoods with less crime. The effect is to create a self-fulfilling dynamic that perpetuates inequities in the criminal justice system. An example closer to home would be advisor-facing tools that analyze a student’s past performance to determine whether a change of major would lead to greater success. Given how some majors already suffer from a lack of diversity, these tools, when over-relied upon, risk cementing and perpetuating these conditions. The alternative is to use these tools to determine the types of support students would need to succeed in the majors they legitimately want to pursue, rather than trying to carve an “easier” path for them.

The Importance of Analyzing Small Data

Public universities serve the public good and are thereby accountable for equity as well as quality. A national retailer does not need to know why the demand for strawberry Pop-Tarts climbs in regions that recently experienced natural disasters. Managers simply need to stock inventories to maximize profits based on those conditions. Institutions of higher learning, however, need to know many whys. For example, why are Latino students—despite recent improvements—still significantly less likely to graduate from four-year institutions compared with their white non-Latino and Asian American peers? Answering this requires having a broad professional perspective—including research, management, and education theory—to guide data analysis and decision-making.

At CSU Fullerton, we supplement quantitative metrics with students’ own stories and reflections, which are rich, personal, and narrative. For example, people might assume that one reason Latino students may struggle is that they are more likely to come from low-income backgrounds. In interviewing students, however, we found that family obligations were also a primary reason given for acute challenges to completing coursework. Our structured interviews and focus groups with students helped inform our quantitative data analysis and pointed toward opportunities in faculty development.

There is no denying the potential of big data and predictive modeling in contributing to student success. The apparent definitiveness of big data, however, may be its biggest weakness. If its use cements rather than addresses inequities, it can undermine the academic and public missions of public universities. To guard against this danger, decision-makers can use small data to listen to students and better understand their needs.