Analytics helps you form hypotheses. It improves the quality of your questions.
Cassie Kozyrkov, 2019
Statistics help you test hypotheses. It improves the quality of your answers.
Over the past few months, my team and I have been focused on machine learning projects primarily centered around understanding student populations more so than merely trying to predict outcomes. Such is the world of higher education where understanding student behaviors, and being able to react to them, to improve outcomes is paramount to both the students and institutions long term success. During our work, what has stood out the most is the delicate dance the analyst (or data scientist for that matter) has to play between presenting the model findings along side providing rich exploratory results which give stakeholders a complete picture from top to bottom. In a world where budgets are tight, resources are scarce, and the demand for value is high, that dance has to be navigated with the precision of a gymnastic floor exercise. While I will concede that there are data science projects which are only concerned with the overall prediction and high levels of accuracy, many are missing the painful but necessary types of data investigation the data analyst can provide. The real question is are we missing opportunities to supplement data science and machine learning projects with critical skills and work the traditional analyst can provide?
Going Beyond the Initial Question
Let’s take the classic example put forth by VP of Operations, Joe to his data science team: “Guys, I want to see if we can start predicting which products will be hot sellers each month so we can put them front and center on our website and I want something in place in two weeks!” Joe’s team will go forth, providing the data is available, and create that algorithm for the web team to implement.
While the work to produce the model will hopefully yield the outcomes desired, it may be missing the wider and deeper understanding of the data that helps shape the modeling process AND will provide Joe with greater insight into his customer base. Insight that the data scientist can use to adjust the model and Joe could use to improve a process. These are the types of results the traditional data analyst can provide because, as Ronald Coase puts it, they enjoy torturing data until it confesses. It’s within that torture where information to the data scientist can help refine a hypothesis, trigger variables to be made, tweaked, or adjusted, and allow for deeper insights into the population under investigation. It is not to say that the data scientists do not or cannot perform this level of data investigation; quite the contrary. The typical data science process will undoubtedly include an exploratory step, however, the analyst possesses additional skills that will attack the data from different directions and enhance the processes without burdening it. Part of the reason for this is the typical data analyst are producing reporting and answering questions from a broader perspective as a opposed to the narrow question and outcome a data science project seeks to answer. There exists amazing diversity in both the types of people and the questions asked the analyst typically responds to. This knowledge and understanding can greatly inform the data science project.
The Case for Both – Democratization of Analysis
The data scientist and analyst attack the data differently which can provide a richness up and down the data science life cycle. The analyst also is more adept at visualizing the story, including the modeling outcomes, to stakeholders that gets at the heart of what they are looking for. These stories within stories are not necessarily front and center during exploration due to time and other project constraints. So, I would simply submit that if you don’t have an analyst to supplement your data science team, find one. Worse, if you have an analyst and don’t involve them in your data science projects, change that dynamic ASAP! We consistently read and hear about he wonders of democratization of analytics, but that does not exist in a world where data science activities are not informed from the work of the analyst or include a different perspective on the data at the outset. The analyst who is adept at finding bits of diamond insights among a sea of coal. The kind of diamond bits that the data scientist can use to hone their work and provide the type of result that increases revenue or just plain successfully reaches the target audience and drives business value.