Byte-Sized Case Study: Newhire.ai

An AI chatbot that helps you screen candidates using an AI Interviewer - what could go wrong?

May 26, 2024

two men talking — Photo by LinkedIn Sales Solutions on Unsplash

The other day on LinkedIn, I stumbled across a post announcing that tools like NewHire.AI were the future of recruiting and hiring. Granted, some random person on LinkedIn announcing that so-and-so AI usage is the future isn't anything unique. Especially for me - my feed is full of hot AI takes, mostly exceptionally bad AI takes. But the way this particular poster framed the conversation, as if this new AI use case was a boon to job seekers, rubbed me the wrong way. I thought this would be a great case study to look at what seems like a good use of AI, but probably isn't.

NewHire.AI Use Case

NewHire.AI is a single-page website a single paragraph and a demo video. The page title boldly claims to "Revolutionize Your Hiring with AI." The only content on the page is a single paragraph describing NewHire:

"Conduct 10-30 minute phone screens at scale using your own AI recruiter. It will call a candidate, ask any job-related questions, answer the candidate's questions about the role and your company, and then send you AI summarized notes with a recording after each call. Start saving hundreds of hours and make better hiring decisions."

NewHire isn't the only company doing this--there are quite a few companies using AI to solve this particular use case. The idea is this: As a candidate, I apply for a job, and I get my initial phone screen with an AI recruiter who sounds just like a human. We do the phone screen and AI generates a summary that it sends back to someone at the company to decide whether to move forward with me. As a candidate, I get to ask questions and get answers about the company to determine if it's a good fit for me.

Technical Challenges

There are a lot of issues with this approach. The first is the most obvious - LLMs hallucinate as a function of the model and are only correct by happenstance, not design. So as a company, you are entrusting a machine that has been shown to hallucinate as much as 50% to represent your company. If a human recruiter lied 50% of the time to candidates, how long would that person have a recruiting job?

The flip side of the coin is that candidates cannot review and correct the summaries before the AI sends them to the company. The AI could easily lie about a candidate, either favorably or unfavorably, and the candidate would never know about it. They are at the mercy of an AI that frequently makes mistakes to properly represent them, their experiences and qualifications. Anyone who has ever used an LLM to summarize content knows that it never creates perfect content -- it still needs to be reviewed and corrected. Yet NewHire assumes that it can accurately reflect a summarized conversation. But if it makes a mistake, it can hurt candidates - and it makes mistakes frequently.

Then, because this is an LLM, it's susceptible to adversarial prompting. For the LLM to be able to give a background and history of your company, you are going to have to train it on your company's data. A savvy hacker will know this and can mine the LLM for sensitive information to exploit the company.

Or I could be a candidate who knows that the company I'm applying to uses NewHire, and I can look up the best way to answer questions or ask questions to get the most favorable summary and recommendation based on the strengths and weaknesses of the model. That erodes the trust in the tool to fulfill its use case.

Socio-technical Problems

What originally inspired me to comment, and then write this post was the statement that tools like NewHire are good for job seekers because it means they get to have more, valuable conversations.

Let's be honest - that's horse shit.

NewHire's website claims to be able to do "phone screens at scale." Because the cost of doing a phone screen drops with an AI screener, plus multiple phone screens can happen simultaneously, a company could screen many times more applicants than they could before.

But regardless of how many phone screens there are, there's still only 1 job. So instead of screening 10 or 20 folks for a job, a company screen 100-200 folks for less cost than those 10 to 20. This is great for the company because they can screen candidates for pennies on the dollar. But it sucks for job seekers.

The logical fallacy in the "getting a phone screen is good for candidates" is that the phone screen is the desired outcome. It's not. As a job seeker, I don't care about having great conversations. I care about getting a job. That's the only metric a job seeker cares about.

As a job seeker, I don't care about having great conversations. I care about getting a job.

As a candidate , I get overwhelmed with phone screens with AI because the cost is slow. Whereas before only a subset of thge 'best' applicants would get a phone screen, with an AI tool many more applicants 'qualify' for a phone screen, even if I'm not as qualified. So instead of having a small number of 20-30 minute conversations, I now have a huge number of 20-30-minute conversations to complete, but the same number of actual jobs available. So I end up working even harder as more companies want to phone screen me for the same number of positions. NewHire shifts the cost of recruiting from the company to the candidate and declares a good thing for the candidate.

If a company can't be bothered to dedicate 20 minutes of their time for me to talk with a real person, why should I be willing to invest in them? They've already shown me that they don't value me or the recruiting process, and they expect me to invest my the of time and resources to get a job with them, while they invest nothing. Is that the kind of image a company wants? They value their employees so little they would rather replace them with AI.

NewHire doesn't directly say it, but they expect you to make hiring decisions based on the summaries their AI recruiters provide. In most cases, the decision is move on the process. I will be amazed if NewHire doesn't plan to provide a recommendation as part of their summaries. If they do, they will risk violating New York's AI Bias law, Colorado's AI Law, California's pending AI legislation, as well as the EU AI Act.

There's a good chance that even just providing summaries of content without a recommendation can violate the regulation, as the summaries are expected to be used to determine whether to move candidates forward in the process.

Speaking of regulation, many laws require an opt-out of AI function function that doesn't penalize the user for opting out. A candidate could opt-out of the AI for a actual person. But in that case, nothing stops the company from finding a reason that you aren't in consideration for the role anymore. Even though there will be required opt-outs from speaking with AI, users will still be punished for exercising the right because it's too easy to disqualify a candidate over any perceived 'legal' reason.

Can They Fix It?

black handle on brown wooden table — Photo by iMattSmart on Unsplash

The use-case of using an LLM to conduct a phone screen is riddled with issues, both technical and socio-technical. But is there a way that this could work while minimizing harm to candidates? Perhaps, but mitigating the harms to individuals lows the value to the company.

The biggest thing is candidates need to be able to opt out without penalty. I don't know how to do this, but it needs to be safe to request to speak to a real person.
Candidates need to be able to update and rectify their summaries before they are sent to the company. There's too much risk of the LLM getting something wildly wrong to not give candidates the option to review and correct the summaries.
NewHire needs to strike the "at scale" part of this. There has to be a limited number of candidates that can screened at one time. Anything else results in hundreds of phone screens for just one position. When done across hundreds of positions in many companies, candidates will be drowning in hundreds of low-value phone screens.
Complying with regulations is going to be tough. Companies will need to ensure human-in-the-loop and defensible decision-making. In the strictest sense, providing a summary is not providing an explicit recommendation. But at the same time, the model is going to editorialize as it summarizes the conversation. That editorialization will skew the output of the summary one way or another and trip over regulation. Having a policy and a human-in-the-loop is the only way to mitigate this, but it's still high risk even with that.
Company reputation isn't something you can fix when you using a tool like NewHire. No amount of wordsmithing will do anything to mitigate the reputational impact to the company. Companies will either need to accept the reputational hit and know that implementing this tool will drive away quality candidates or not use the tool at all.
Today, there isn't a good defense against adversarial prompting other than red-teaming and fixing individual issues. The only way to mitigate the risk is to not train it on sensitive company information. But there's still high residual risk and companies need to be prepared to accept the additional cybersecurity risk a tool like this presents.

There we have it folks. A Byte-sized Case Study on NewHire.AI and the use-case of using LLMs to do the phone screens. Let me know what you think in the comments below! Do you have a tool/use case you'd like me to review? Send them over!