How to Ace a Data Science Interview

As I mentioned in my first post, I have just finished an extensive tech job search, which featured eight on-sites, along with countless phone screens and informal chats. I was interviewing for a combination of data science and software engineering (machine learning) positions, and I got a pretty good sense of what those interviews are like. In this post, I give an overview of what you should expect in a data science interview, and some suggestions for how to prepare.

An interview is not a pop quiz. You should know what to expect going in, and you can take the time to prepare for it. During the interview phase of the process, your recruiter is on your side and can usually tell you what types of interviews you’ll have. Even if the recruiter is reluctant to share that, common practices in the industry are a good guide to what you’re likely to see.

In this post, I’ll go over the types of data science interviews I’ve encountered, and offer my advice on how to prepare for them. Data science roles generally fall into two broad ares of focus: statistics and machine learning. I only applied to the latter category, so that’s the type of position discussed in this post. My experience is also limited to tech companies, so I can’t offer guidance for data science in finance, biotech, etc..

Here are the types of interviews (or parts of interviews) I’ve come across.

Always:

  • Coding (usually whiteboard)
  • Applied machine learning
  • Your background

Often:

  • Culture fit
  • Machine learning theory
  • Dataset analysis
  • Stats

You will encounter a similar set of interviews for a machine learning software engineering position, though more of the questions will fall in the coding category.

Coding (usually whiteboard)

This is the same type of interview you’d have for any software engineering position, though the expectations may be less stringent. There are lots of websites and books that will tell you how to prepare. Practice your coding skills if they’re rusty. Don’t forget to practice coding away from the computer (e.g. on paper), which is surely a skill that’s rusty. Review the data structures you may never have used outside of school — binary search trees, linked lists, heaps. Be comfortable with recursion. Know how to reason about algorithm running times. You can generally use any “real” language you want in an interview (Matlab doesn’t count, unfortunately); Python’s succinct syntax makes it a great language for coding interviews.

Prep tips:

  • If you get nervous in interviews, try doing some practice problems under time pressure.
  • If you don’t have much software engineering experience, see if you can get a friend to look over your practice code and provide feedback.

During the interview:

  • Make sure you understand exactly what problem you’re trying to solve. Ask the interviewer questions if anything is unclear or underspecified.
  • Make sure you explain your plan to the interviewer before you start writing any code, so that they can help you avoid spending time going down less-than-ideal paths.
  • If you can’t think of a good way to do something, it often helps to start by talking through a dumb way to do it.
  • Mention what invalid inputs you’d want to check for (e.g. input variable type check). Don’t bother writing the code to do so unless the interviewer asks. In all my interviews, nobody has ever asked.
  • Before declaring that your code is finished, think about variable initialization, end conditions, and boundary cases (e.g. empty inputs). If it seems helpful, run through an example. You’ll score points by catching your bugs yourself, rather than having the interviewer point them out.

Applied machine learning

All the applied machine learning interviews I’ve had focused on supervised learning. The interviewer will present you with a prediction problem, and ask you to explain how you would set up an algorithm to make that prediction. The problem selected is often relevant to the company you’re interviewing at (e.g. figuring out which product to recommend to a user, which users are going to stop using the site, which ad to display, etc.), but can also be a toy example (e.g. recommending board games to a friend). This type of interview doesn’t depend on much background knowledge, other than having a general understanding of machine learning concepts (see below). However, it definitely helps to prepare by brainstorming the types of problems a particular company might ask you to solve. Even if you miss the mark, the brainstorming session will help with the culture fit interview (also see below).

When answering this type of question, I’ve found it helpful to start by laying out the setup of the problem. What are the inputs? What are the labels you’re trying to predict? What machine learning algorithms could you run on the data? Sometimes the setup will be obvious from the question, but sometimes you’ll need to figure out how to define the problem. In the latter case, you’ll generally have a discussion with the interviewer about some plausible definitions (e.g., what does it mean for a user to “stop using the site”?).

The main component of your answer will be feature engineering. There is nothing magical about brainstorming features. Think about what might be predictive of the variable you are trying to predict, and what information you would actually have available. I’ve found it helpful to give context around what I’m trying to capture, and to what extent the features I’m proposing reflect that information.

For the sake of concreteness, here’s an example. Suppose Amazon is trying to figure out what books to recommend to you. (Note: I did not interview at Amazon, and have no idea what they actually ask in their interviews.) To predict what books you’re likely to buy, Amazon can look for books that are similar to your past Amazon purchases. But maybe some purchases were mistakes, and you vowed to never buy a book like that again. Well, Amazon knows how you’ve interacted with your Kindle books. If there’s a book you started but never finished, it might be a positive signal for general areas you’re interested in, but a negative signal for the particular author. Or maybe some categories of books deserve different treatment. For example, if a year ago you were buying books targeted at one-year-olds, Amazon could deduce that nowadays you’re looking for books for two-year-olds.  It’s easy to see how you can spend a while exploring the space between what you’d like to know and what you can actually find out.

Your background

You should be prepared to give a high-level summary of your career, as well as to do a deep-dive into a project you’ve worked on. The project doesn’t have to be directly related to the position you’re interviewing for (though it can’t hurt), but it needs to be the kind of work you can have an in-depth technical discussion about.

To prepare:

  • Review any papers/presentations that came out of your projects to refresh your mind on the technical details.
  • Practice explaining your project to a friend in order to make sure you are telling a coherent story. Keep in mind that you’ll probably be talking to someone who’s smart but doesn’t have expertise in your particular field.
  • Be prepared to answer questions as to why you chose the approach that you did, and about your individual contribution to the project.

Culture fit

Here are some culture fit questions your interviewers are likely to be interested in. These questions might come up as part of other interviews, and will likely be asked indirectly.  It helps to keep what the interviewer is looking for in the back of your mind.

  • Are you specifically interested in the product/company/space you’d be working in? It helps to prepare by thinking about the problems the company is trying to solve, and how you and the team you’d be part of could make a difference.
  • Do you care about impact? Even in a research-oriented corporate environment, I wouldn’t recommend saying that you don’t care about company metrics, and that you’d love to just play with data and write papers.
  • Will you work well with other people? I know it’s a cliché, but most work is collaborative, and companies are trying to assess this as best they can. Avoid bad-mouthing former colleagues, and show appreciation for their contributions to your projects.
  • Are you willing to get your hands dirty? If there’s annoying work that needs to be done (e.g. cleaning up messy data), will you take care of it?
  • Are you someone the team will be happy to have around on a personal level? Even though you might be stressed, try to be friendly, positive, enthusiastic and genuine throughout the interview process.

You may also get broad questions about what kinds of work you enjoy and what motivates you. It’s useful to have an answer ready, but there may not be a “right” answer the interviewer is looking for.

Machine learning theory

This type of interview will test your understanding of basic machine learning concepts, generally with a focus on supervised learning. You should understand:

  • The general setup for a supervised learning system
  • Why you want to split data into training and test sets
  • The idea that models that aren’t powerful enough can’t capture the right generalizations about the data, and ways to address this (e.g. different model or projection into a higher-dimensional space)
  • The idea that models that are too powerful suffer from overfitting, and ways to address this (e.g. regularization)

You don’t need to know a lot of machine learning algorithms, but you definitely need to understand logistic regression, which seems to be what most companies are using. I also had some in-depth discussions of SVMs, but that may just be because I brought them up.

Dataset analysis

In this type of interview, you will be given a data set, and asked to write a script to pull out features for some prediction task. You may be asked to then plug the features into a machine learning algorithm. This interview essentially adds an implementation component to the applied machine learning interview (see above). Of course, your features may now be inspired by what you see in the data. Do the distributions for each feature you’re considering differ between the labels you’re trying to predict?

I found these interviews hardest to prepare for, because the recruiter often wouldn’t tell me what format the data would be in, and what exactly I’d need to do with it. (For example, do I need to review Python’s csv import module? Should I look over the syntax for training a model in scikit-learn?) I also had one recruiter tell me I’d be analyzing “big data”, which was a bit intimidating (am I going to be working with distributed databases or something?) until I discovered at the interview that the “big data” set had all of 11,000 examples. I encourage you to push for as much info as possible about what you’ll actually be doing.

If you plan to use Python, working through the scikit-learn tutorial is a good way to prepare.

Stats

I have a decent intuitive understanding of statistics, but very little formal knowledge. Most of the time, this sufficed, though I’m sure knowing more wouldn’t have hurt. You should understand how to set up an A/B test, including random sampling, confounding variables, summary statistics (e.g. mean), and measuring statistical significance.

Preparation Checklist & Resources

Here is a summary list of tips for preparing for data science interviews, along with a few helpful resources.

  1. Coding (usually whiteboard)
  2. Applied machine learning
    • Think about the machine learning problems that are relevant for each company you’re interviewing at. Use these problems as practice questions.
  3. Your background
    • Think through how to summarize your experience.
    • Prepare to give an in-depth technical explanation of a project you’ve worked on. Try it out on a friend.
  4. Culture fit
    • Think about the problems each company is trying to solve, and how you and the team you’d be part of could make a difference.
    • Be prepared to answer broad questions about what kind of work you enjoy and what motivates you.
  5. Machine learning theory
  6. Dataset analysis
    • Get comfortable with a set of technical tools for working with data.
    • Resources:
  7. Stats

Are there things I missed? Other resources you’d recommend? Please comment!

The Curious Candidate Gets the Job

Remember the last time you were looking for a job… A job search can be a stressful experience, whether or not things are going well. It’s hard not to agonize about things you could have said in your last interview, problems you could have solved, how you could have prepared better, etc. I’ve really enjoyed my multi-week job search extravaganza (see note at the beginning of my first post), and a large aspect of that has been to keep the following in mind:

Be curious!

A job search is an opportunity to learn about people, companies, and yourself. I firmly believe that treating it as such is not just a luxury for those of us lucky enough to be confident that they’ll get a job. It actually makes you a better candidate by helping you prepare better, be more engaged throughout the interview, and improve for the next interview.

Preparation

Think about the company. Poke around their website. What’s cool about what they’re doing? What’s cool about what you might get to do if you worked there? What are their unique advantages? What challenges might they have? Are there things about the technology or the business that you don’t understand?

At the interview: Ask questions you’re genuinely interested in.

You probably thought of questions you’re curious about while you were preparing for the interview. More questions will come up if you engage with what you’re hearing when you’re there. Pull on the threads that seem important, and get multiple perspectives. Here are some things you might be curious to learn.

  • Maybe you’re trying to imagine yourself working on the team. What would affect your experience? Projects, organizational structure, tech talks, culture, …?
  • Maybe you’re talking to a manager. Are there things you want to know about the overall direction for the company, and how the team you’re interviewing with fits in that picture?
  • Do you want to know how your interviewer decided to join this company, and whether they encountered any surprises?
  • Maybe you’ve already decided you’re unlikely to join the company. You still have a unique opportunity to learn how things work around here.

By the way, if you get an offer, you’ll almost certainly get a chance to talk to more people. It would be reasonable to leave the more down-in-the-weeds questions until then.

After the interview: Agonize productively

If you’re like me, after an interview, you’re likely to be agonizing about all the things you could’ve done better. You can make that productive rather than miserable.

  • Were there things you didn’t know that you should have known, or would like to know? Go out and learn them! My interviews have caused me to go through the Django tutorial, review the math behind SVMs, read up on Spark, etc.
  • Were there interactions you can handle better next time? I had an interview where I rushed to code up a solution without first making sure the interviewer and I were on the same page about the right approach. I’ll be sure to watch out for that temptation in the future.

Positive attitude

Being curious will make your job search a less stressful experience, which will make you more effective. If you go into an interview thinking of it as a learning opportunity, and the interview doesn’t go well, at least you still learned something!

Why Companies Write Terrible Job Posts

Are you ready to “demonstrate an understanding of the interplay between physics-based signal and image analysis and text summarization and interpretation analysis”? I know I am!

The goal of a job posting is to solicit applications from qualified candidates. In that light, compare the following two job postings.

The classic:

big_data_analyst

The modern:

yelp_posting_exerpt

Are you psyched to work in a group where, “Technical disciplines required include machine-assisted information fusion methods based upon physical, statistical and graphical models, as well as the software engineering skills to deploy them in open, distributed and service-oriented, cloud-based systems”? I worked in that group, and it was a great experience, but you’d never know it from the job post. A place where, “They banter about Bandits, know their way down a Gradient, and aren’t too Naïve to kick back in our Bay(es) Area offices,” sounds quite a bit more fun.

The author of the classic job posting above never stopped to consider the objective: getting qualified candidates to apply. They got the qualified part — the job posting does specify what qualifications are required (cropped out above). What they forgot about is the get candidates to apply part. A job posting is not just a description of requirements — it’s an advertisement. The young tech companies seem to have figured that out, but this bit of wisdom hasn’t permeated parts of the established corporate world. A few minutes of browsing revealed the following gems:

“You will perform professional software development tasks associated with the developing, designing and debugging of software applications or operating systems.” — Software Developer 5 at Oracle

“The main responsibilities of the developer include quickly diagnosing and resolving defects associated with moderately complex to complex intranet applications.” — Web Developer 5 at Wells Fargo

How do such awful job postings come to be? Here are some thoughts:

  • Internal vs. external: A company needs to have a concept of why it’s hiring and what the specific requirements for the role are. What we’re seeing here is just an internal spec sheet that hasn’t been translated into a sales pitch. At Oracle, job postings actually double as internal promotion/transfer opportunities.
  • Image: A Silicon Valley tech company like Yelp (modern job posting) isn’t expected to sound professional, but a government laboratory in Massachusetts (classic job posting) is. The same way a candidate can’t show up to a consulting interview in jeans, a “serious” organization will not create a punnish job ad.
  • Generic requirements: It’s hard to write a good job posting when the requirements are too general. In the case of the classic job posting above, the group is essentially looking for smart technical people, and does not want to discourage a broad set of potential applicants. At large companies like Oracle, while the job requirements might be specific, much of the text for a job posting is standard and was not written with a particular role in mind. The unfortunate consequence is that the job ads suffer from at least one of two flaws:
    • Phrases that are so general that they are meaningless, e.g., “professional software development tasks associated with the developing… of software applications.”
    • Long, convoluted lists of possibilities, e.g., “Projects involve feature extraction from multi-source data; pattern recognition and behavior analysis; information-theoretic analysis of machine-aided decision effectiveness; semi-automated sensor cueing and resource optimization; modeling and simulation of surveillance and reconnaissance networks; design, implementation and analysis of laboratory and field experiments.”
  • Lack of perspective: Having stewed in company-internal lingo for a while, it’s hard to step back and remember just how obtuse it sounds to an outsider.

Have you come across job postings that are about as dynamic as a software license agreement? Please comment!

The Interview Process: What a Company Wants

I have just finished a more extensive tech job search than anyone should really do. It featured eight on-sites, along with countless phone screens and informal chats. There were a few reasons why I ended up doing things this way: (a) I quit my job when my husband and I moved from Boston to San Francisco a few months ago, so I had the time; (b) I wasn’t sure what I was looking for — big company vs. small, data scientist vs. software engineer on a machine learning system, etc.; (c) I wasn’t sure how well it would all go. 

This way of doing a job search turned out to be an awesome learning experience. In this series of posts, I’ve tried to jot down some thoughts on what makes for a good interview process, both for the company and for the candidate. I was interviewing for a combination of data science and software engineering positions, but many observations should be more broadly applicable.

What are we trying to do here, anyway?

Before we can talk about what is a good or bad interview process, we need to understand the company’s objectives. Here are some things your company might be trying to do, or perhaps should be trying to do. Note that I’m focusing on the interview stage here; there are many separate questions about finding/filtering candidates.

Hire or no hire: Decide whether to give the candidate an offer. 

  1. Qualification check: Figure out whether the candidate is qualified for the position they applied for. This is the most basic objective of the interview process. To check someone’s qualifications, you first need to define what it means to be qualified for the position. In addition to technical skills, many companies look for a “culture fit”, which can help maintain the work and social environment at the company — or change it, if that’s what’s needed.
  2. Potential check: If the candidate isn’t qualified right now, can they become excellent at this job anyway? Companies have very different philosophies on whether this is a question they care to ask. In many cases, there are good reasons to ask it. I was told a story about someone who was hired as a machine learning expert, but soon got excited about infrastructure challenges, and before long became the head of an infrastructure team. At that point, what does it matter precisely what set of skills he originally came in with, as long as he’s smart and capable of learning new things?
  3. Opportunity check: If the candidate isn’t ideally suited to the position they applied for, are there other roles in the company where we’d love to have them? More than one place I interviewed at came back with an offer for a different role from the one I applied for (in my case, “data scientist” instead of “engineer”). They weren’t advertising for that job, but they were thinking opportunistically.

Leave a good impression.

There are two major components to this.

  1. Be cool: Make sure the candidate comes away with a positive view of the company. Part of doing this effectively is figuring out what counts as “cool” to this particular candidate.
  2. Be nice: Make sure the candidate has a positive overall experience.

Doing this well has an obvious benefit when the candidate is qualified: they’ll be more likely to take the offer. But it also has some less obvious benefits that apply to all candidates:

  • The candidate will be more likely to refer friends to your company. I heard about a candidate who was rejected but went on to recommend two friends who ended up joining the company.
  • The candidate will be more positive when discussing your company with their friends. It’s a small world.
  • Even if you don’t want to hire the candidate right now, you might want to hire them in a year.
  • There is intrinsic merit in being nice to people as they’re going through what is often a stressful experience.

Feel good doing it: Make sure the interviewers have a positive interview experience.

As someone on the other side of the fence, this one is harder for me to reason about. But here are some thoughts on why this is important:

  • Your employees might be spending a lot of time interviewing (as much as 10 hours a week during the fall recruiting season), and you don’t want them to be miserable doing it.
  • If the interviewer is grumpy, the candidate will be less likely to think well of the company (see above). One of the companies I interviewed at requires interviewers to submit detailed written feedback, which resulted in them dedicating much of their attention to typing up my whiteboard code during the interview. More than one interviewer expressed their frustration with the process. Even if they were pretty happy with their job most of the time, it certainly didn’t come across that way.
In the next post, I’ll take a look at some job postings. Do you have thoughts on other goals companies should strive for? Please comment!k