Michelle Caler

By Michelle Caler

Freya’s Michelle Caler asks: do you need a Ph.D. to be a Data Scientist?

Hello world! My name is Michelle Caler, and I am Freya’s newest Data Scientist. This is something of a career change for me, as I used to be an Instructor in the Department of Physics and Engineering at West Chester University of Pennsylvania. Making the jump from academia to the business world could be the subject of a whole other blog post entirely, but today I would like to focus on a different question:

Do you really need a Ph.D. in order to be a data scientist?

At first glance, I may seem like the worst person in the world to offer an opinion on this question. After all, I have a Ph.D., so you may be expecting me to lecture you about how absolutely necessary having one is in order to be a good, qualified data scientist. But my opinion may surprise you.

Whether you really need a Ph.D. to do good data science depends on a number of different factors, including your education, background, professional training, and area of interest in the field. Generally speaking, I think having a Ph.D. doesn’t hurt, but for many data science jobs is not absolutely necessary. What I think is more important is your approach to problems, ability to construct a logical argument using data, algorithmic thinking ability, and willingness to continually check your work for reasonableness.

When having a Ph.D. makes Sense

There are certain areas of data science where I think requiring a Ph.D. makes sense, particularly when it comes to neural network development and deep learning networks. I think it also makes sense to require a Ph.D. or similar training for applications requiring a good deal of domain-specific knowledge, like developing medical diagnostic machine learning models. But even for these cases, a reasonable argument could be made that years of direct experience in those areas will “make up” for not having a Ph.D. (On the other hand, a reasonable counterargument could be made that the deep foundational training in the appropriate area that a Ph.D. brings to the table makes them the stronger choice.)

However, for a wide variety of data science jobs, I do not think a Ph.D. is required per se. In fact, I will be so bold as to put forth the controversial opinion that a degree in the natural sciences, mathematics, computer science, or data science isn’t necessarily required.

That having been said, of course you need to know how to code, of course you need to know how to appropriately clean, manipulate, and sample data, of course you need to know how to read and construct clear, informative graphs, of course you need to understand how to perform statistical calculations, and of course you need a fundamental understanding of the algorithms and ideas involved. A decent understanding of linear algebra and basic calculus isn’t the worst idea in the world either. You need to understand the foundations of what you’re doing when it applies to a situation at hand, and when it doesn’t apply. This understanding does not necessarily need to be built in a classroom.

Pros and Cons of a Formal Education in Data Science

There are very good reasons to get this understanding from a classroom, of course. And I’m not just saying that because I am a former university instructor. One of the primary reasons is trust. Having a degree in data science (or mathematics or statistics or computer science or one of the natural sciences) or having attended a well-regarded bootcamp tells a potential employer that you trained under someone that knew what you need to know in detail, what the fundamental knowledge base is and what key skills are vital to cultivate, what the reliable sources of information are to go to, and above all knows how to correct you if you aren’t doing things properly.

A degree gives a potential employer a good idea of what you should know and know how to do, as well as an expert’s verification that you in fact know these things and can do these things. That level of trust in background and training is hard to get from any other source.

It is entirely possible, of course, to get foundational knowledge and cultivate marketable skills outside of a university classroom or bootcamp environment. However, as a non-expert seeking to gain expertise in a field, there are many pitfalls in charting your own training that you might not be aware of.

Are you finding trustworthy, pedagogically sound sources to learn from?

How can you be sure that you are building the skills you need to have, and that you are building those skills correctly?

How will you know if you have a misunderstanding of something fundamental to a concept, statistical method, or commonly implemented algorithm?

There are some structured online learning environments that offer some guidance in these areas, but they can have issues with pedagogical cohesion and naturally have limitations in their treatment of topics in the field (as well as background areas) that you may need more training in to fully understand. But the main question to ask when going about things on your own is, how are you going to convince an employer that you, on your own, gave yourself training comparable to what a college or university degree would offer? They trust a university, they trust a college, they trust many of the major coding and data science bootcamps. How do they know they can trust you?

My Personal Take

For me personally, I know my training in physics and astronomy has been vital to my success as a budding data scientist. From it, I learned good model building and evaluation skills, how to check my work for reasonableness and self-consistency, how to construct a plot that tells a clear story from the data, when to drop in a well-placed calculation to add to an argument, and how to tell if something is negligible enough to be reasonably ignored. I’ve also found a surprising number of ways to directly apply the coding skills I developed in my scientific research to my current work as well. Could I have developed those skills without a Ph.D.? Perhaps. But there’s no doubt that my training made the transition from academia to the business world easier.

So, do you really need a Ph.D. to be a good data scientist?

Not necessarily. Having the core skills, foundational knowledge, and ability to deploy that skill and knowledge makes a good data scientist. A Ph.D. will help with these things, no doubt, but you don’t necessarily need to have a Ph.D. to have them. All this having been said, there are a lot of PhDs out there, and many Ph.D. programs which offer training in the skills and knowledge required for data science, so if a company wants that level of education there is certainly a pool of applicants out there to support it. As to why such a pool of Ph.D. applicants exists … well, that is the subject of another blog post entirely.