About the author: Leon is a senior data scientist executive from a FAANG company in Silicon Valley. Before his current role, he managed a team of data scientists at Chegg, and worked as a research scientist at amazon.com building large scale machine learning systems.


As a hiring manager or being part of a hiring committee, I have interviewed thousands of data scientist candidates. Generally speaking, I've found there are 2 different tracks of data scientists in Silicon Valley:

1: Analytics/Inference Track

For people working on this track, their daily jobs usually include executing statistical analysis, reporting, managing dashboard, product analytics, score cards, experimental design.

They quite often use data to ‘tell a story’ and share their findings with leaders to convince them to adopt new strategies or change a product feature, their ultimate goal is to drive product decisions and create a better customer experiences from data.

I have seen people on this track mostly come from statistics, mathematics, economic or other quantitative but non-computer science or engineering background.

2: Machine Learning/Algorithm Track

This field is usually for people that are hardcore computer and software engineers, who create production quality code for real production systems. Often they are also asked to help productionizing POC models from research scientist teams, and take care model deployment and maintain model refresh cycles, e.g., deploy the model into production after it’s retrained with new data.

Successful candidates often come from a computer science or engineering background, and are really good with programming.

It is very important to be a good programmer in both tracks to succeed, but for analytics track (#1) you don’t have to be able to master object oriented programming such as using Java, Scala, C/C++, but this track will require you to master at least one general programming language and have very good understanding of object oriented programming, data structures and system design.

Successful candidates often come from a computer science or engineering background, and are really good with programming.

Table 1: important skills for 2 data scientists tracks: Analytics/Inference vs. Machine Learning/Algo

In this article, I will be focusing on the analytics/inference track and walk you through how to prepare your interview in 7 steps.

Comparing to the algorithm track, the analytics track candidates don’t have to master an object oriented programming language such as Java, Scala or C/C++,or advanced computer science concepts in data structures, algorithms and system design.

Whether you are a first time job seeker or a professional who wants to make a change, you can hire me to help you prepare your job interviews. Feel free to visit my 1:1 mentorship program and reach out to me if you have any questions.


1. SQL

A must know (programming) language for any analytics track data scientist, SQL is the lingra fraca for processing and managing data in the industry. I’ve been using SQL for many years, and it is still my go to language to prepare and manage data.

However, it is not a strong-typed programming language, and there are many different database systems with different syntaxes or built-in functions, therefore it is very important for you to communicate with your interviewer which SQL engine (e.g., Postgres, MS SQL Server) your code is based on during a coding interview.

As a hiring manager or part of the hiring committee, I often ask a lot of SQL questions during a data scientist job interview to make sure the candidate will be hands-on at work.

However, in my 15 years career, I have met so many fresh college graduates or young professionals starting their job search without a solid skills in SQL, and of course, they didn’t get their job offer.

Sample questions

Number of days to become a happy customer

  • Given a rental transaction table, write a query to return the average number of days it take for a customer to make his/her 10th rental.
  • Any customer who made 10 movie rentals are happy customers

Customer who spent the most

  • Write a query to return the first and last name of the customer who spent the most on movie rentals in Feb 2020.

How to prepare

Online SQL practice: SQLPad is an online SQL playground to practice and improve your SQL skills, you can practice 80 SQL coding interview questions for free. The problems are categorized into 3 groups based on their difficulties.

Once you can easily solve most of the easy and medium problems, and fluent in window functions such as row_number, rank, you are ready to schedule a SQL interview.

Online courses: if you want some extra help, you can try my Cracking the Data Scientist SQL Interview course, we launched recently and if you sign up today, there is a 50% off special discount.

2. Product Sense