Skip to main content

The Interview Loop

Before you study a single concept, map the terrain. AI engineer loops vary more than traditional SWE loops — a frontier lab, an AI-native scaleup, and an enterprise team will test you in genuinely different ways — but they assemble from a small set of recurring rounds. Learn the eight archetypes and you can predict almost any loop from the job description.

The single most important thing to know going in: most loops are bimodal. They pair at least one AI-specific round (build a RAG pipeline, design an agent, defend an eval strategy) with at least one classic round (a data-structures problem, a behavioral panel). Preparing for only one half is the most common way strong builders fail.

8
round archetypes that compose nearly every loop
take-home
increasingly the highest-weight, decision-making round
AI-assisted
the 2025–26 trend — coding *with* a model, graded on judgment
project
the past-work deep-dive is often the hardest round of all

The eight round archetypes

1

Recruiter / mission screen

universal

Thirty minutes on fit and motivation. At the labs, genuine interest in the mission and a coherent view on AI safety is a real signal, not a formality.

2

Practical coding screen

not pure LeetCode

Build a small API, an iterator, a spreadsheet-with-formulas, a text editor — or extend a codebase. Tests whether you write clean, working code on a realistic task, not whether you memorized algorithms.

3

Classic DSA round

still here

Trees, graphs, binary search, an in-memory database, an LRU cache. Alive and well even at AI-first labs — sometimes the same problem dressed in an AI-systems costume.

4

Take-home / build

often decisive

A multi-hour or multi-day build (sometimes paid), defended in a follow-up walkthrough. Build a RAG bot, an agent, an eval harness. The most production-realistic signal a loop can collect.

5

AI / LLM system design

the ADEPT round

Design a RAG system, an agent, an eval platform. RAG, orchestration, evals, cost and latency tradeoffs — run the ADEPT framework here.

6

Project deep-dive

often the hardest

Present and defend real work. Why this choice, what broke, what you would change. At OpenAI this is reported as the round people most underestimate.

7

Customer / discovery sim

forward-deployed only

A roleplay where you do real sales discovery, not engineering. Distinctive to FDE roles (Anthropic Applied AI, Harvey). High-fail because candidates treat it as a tech interview.

8

Behavioral / values

culture gate

Values and collaboration — Netflix's culture rounds, Scale's "Credo," lab safety-and-ethics discussions. Often gates the offer regardless of technical scores.

A typical AI-native scaleup loop: recruiter → practical screen → take-home build → AI system design → project deep-dive → behavioral. A frontier-lab applied loop swaps in a paid work trial and a safety discussion. An enterprise loop leans on DSA + integration coding + classic system design.

The 2025–26 shift: AI-assisted coding

The biggest recent change is that several companies now have you code with an AI model in the room — and grade you on how well you drive it. This inverts the old signal: it is no longer "can you write this from memory" but "can you specify, review, and verify faster than the model can mislead you."

MetaAI-Assisted Codingsource ↗
One coding round is now done with an in-editor AI assistant

As of late 2025, Meta replaced one traditional coding round with an AI-enabled one: a CoderPad with a built-in assistant (defaulting to Llama, switchable to other frontier models mid-interview). The format is usually one thematic problem in escalating parts — review and fix a bug, add a feature, then handle edge cases and scale — over a codebase larger than you could write by hand in the time. Because the model can produce volume, the bar rises on code review and verification: you are expected to find what the AI got wrong and justify the design. Sierra, Cursor, and Notion run variations of the same idea.

The signal moved from writing code to reviewing it. Candidates who paste model output without catching its mistakes fail; candidates who drive the model on a larger, multi-part problem and verify its work pass.

Real loops, by archetype

The cleanest persona-matched loops — the ones hiring AI builders rather than researchers — cluster at the labs' applied teams and the AI-native scaleups. A few documented shapes:

Documented loop shapesverify specifics with your recruiter — loops change
SierraRemoved traditional algorithm interviews. The onsite is Plan → Build → Review: ideate a product, get ~2 hours to build it with any AI tooling (interviewers leave the room), then demo and defend it — plus a system-design round. Grades agency, judgment, and product sense over syntax. [documented — official blog]
OpenAI (Applied)Recruiter → 1–2 practical phone screens → a paid take-home / work trial (reported as the highest-weight round) → onsite with coding, system design on a whiteboard tool, a project deep-dive, behavioral, and an AI-safety discussion. [anecdotal + aggregator]
Anthropic (Applied AI / FDE)Screen → live coding or take-home → a customer-discovery simulation reported as the highest-signal, highest-fail round → system design that leans on evaluation harnesses over RAG architecture. [anecdotal + aggregator]
Cursor / AnysphereRecruiter → two technical screens (no AI tools — autocomplete only) → a paid multi-hour onsite project or 2-day in-office build with the team. Later rounds expect AI use; pasting raw output without judgment is a fast reject. Hires to "ship from week one." [aggregator + press]
Mistral~5–6 rounds including LeetCode-style coding, system design on RAG / agentic workflows, and a distinctive LLM-internals quiz — KV caching, embedding retrieval, transformer internals — deeper than most applied roles. [aggregator + Glassdoor]
Netflix (MLE)Recruiter → hiring-manager project review → technical screens (coding + ML fundamentals) → an end-to-end ML system-design round → culture rounds on Netflix values, including a "partner" manager from another team. [aggregator + Glassdoor]
Sierra and Meta are the most fully documented (official sources). The rest blend candidate reports and prep-site write-ups — treat the shape as durable and the specifics as subject to change.

What separates scaleups from incumbents

A useful heuristic when you don't know the loop: scaleups grill on agents and evals; incumbents grill on RAG and integration. A young AI-native company wants to know you can build an autonomous system and prove it works. A large enterprise wants to know you can wire an LLM into existing infrastructure without breaking it.

Frontier lab (applied)AI-native scaleupEnterprise / big tech
Practical codingyesyesyes
Classic DSAsometimesrareyes
Take-home / buildyesyessometimes
AI system designyesyesyes
Customer / discovery simFDE rolesFDE rolesno
Eval-heavy designyesyessometimes
Safety / values gateyesyesyes
Read the company before you prep. A Sierra or Anthropic-Applied loop rewards agent design and eval fluency; an enterprise loop still rewards data-structures fundamentals and clean integration. The middle column is where this guide's center of gravity sits.

A field note on the round people underestimate

Field note

The coding was fine. System design was fine. Where candidates lose us is the project deep-dive — we ask them to walk through something they built and then we push: why this database, why this chunk size, what happened when it failed in production, what would you do differently. People who actually shipped the thing light up and go deeper. People who supervised it or read about it run out of road in about four minutes. You cannot fake operational scar tissue, and that is exactly what we are probing for.

a hiring manager, frontier-lab applied team
Prepare for the whole loop

The loop is not one skill tested five ways. It is five different skills, and the rejection usually comes from the one you did not prepare — not the one you did.

  • Drill a practical coding task AND a classic DSA problem — the bimodal trap catches people who prepped only one.
  • Have two or three real projects you can defend to four levels of "why" — the deep-dive rewards depth you cannot improvise.
  • If the role says forward-deployed, rehearse the customer conversation as seriously as the system design. It fails more candidates than the code.

Now learn the script for the design round: the ADEPT Framework.