Good vs. GREAT data scientists

Sudeep Gowrishankar
4 min readNov 16, 2020
Photo by Campaign Creators on Unsplash

It’s one of the hottest jobs right now. And it comes with a cool-sounding title — Data Scientist! Sure, there is data involved, but is it really science? That’s a topic for another time.

However, we know that generally, the role of a data scientist is to use data to draw insights and build algorithms that can be applied to optimize a system. So what makes a good data scientist? And what about a great data scientist?

Good data scientists can do many things well. At a high level, they have knowledge in the fields of statistics, machine learning, optimization, and programming. Furthermore, they are also able to translate a business problem into one or more sub-problems that can be partially or fully solved by analyzing data. Of course, many other disciplines are useful to a data scientist, but that list is long. It includes many sub-fields of artificial intelligence, data engineering, mathematics, and software engineering.

However, excellence in these skills and fields still doesn’t make a GREAT data scientist. It only improves the efficacy and efficiency of a good data scientist. Great data scientists have other qualities too. The most important of them, in my opinion, are domain expertise and empathic communication.

Domain Expertise

Domain expertise is a critical requirement for a great data scientist since their role always involves solving a problem in some domain that is not data science. Improving a system within a particular domain that has a vast body of knowledge is generally not as easy as applying a canned machine learning algorithm on some data. At a minimum, it requires consideration and understanding of the following questions:

  1. What does the business do? Where does its competitive advantage lie?
  2. What are the aspects of the system that are difficult to model with conventional techniques such as first-principles models, rules, or heuristics?
  3. What are the current models that are in use? And how can a data-based model add value?
  4. Is there enough data to actually solve this problem? If yes, why hasn’t it been solved before? Is the reason anything to do with the lack of advanced statistical or artificial intelligence algorithm capability? Or is it due to a cultural, political, or logistical challenge?
  5. What other types of data should be collected to solve this problem and how much would that data collection cost?
  6. Is there enough value that can be realized by setting up an advanced data system?

This list is not exhaustive, but hopefully, you can see that answering these questions involves a deep understanding of the domain as well as a deep understanding of data-related technologies.

And often, building this level of understanding takes time if someone is new to a particular domain. And traditional domain experts often aren’t experts at data-related technologies. Therefore, finding this quality in your next data scientist candidate can be difficult. Of course, the way to work around this is to have data scientists work very closely with domain experts and build their own expertise in the domain. But this is only successful if the data scientist has the second quality that makes them great — empathic communication.

Empathic communication

Like most fields that involve some level of mathematics, the intricacies and details of data science are often grizzly to the uninitiated. Besides, when there is a large amount of data, our brains simply cannot comprehend all the different interactions that are going on in the system. To add to the confusion, the high non-linearity of some machine learning algorithms adds to the black-box like treatment being doled out to data. And sometimes, this does not sit well with customers who want to feel confident about what they are buying. Even within an organization, there can be pockets of doubting stakeholders who do not want to take on the burden of maintaining something that they don’t fully understand.

Therefore, a great data scientist has to excel in being able to listen to their internal and external customers’ anxieties, anticipate their needs, and be able to speak their language. This requires a great deal of empathy throughout the data science project. Some of the things that a great data scientist would be able to do are:

  1. Communicate how existing systems can be improved using data-based modelling — both from a business perspective and domain perspective.
  2. Explain the working of their tools and algorithms at a high-level. But also be able to dig deep when required.
  3. Explain the repercussions of bringing in a new system into the business and how it affects existing domain processes, IT systems, and cultural practices.
  4. Set up tools and processes to help non-data-scientists maintain new data systems.

Summary

The wide array of fields and tools that data scientists have to master are non-negotiable. However, the really great data scientists are domain experts and can communicate with empathy throughout the data science process.

--

--