# AI in CATI — Capability Map & Vendor Questions (2026)

A one-page reference for deciding which AI applications to adopt in a telephone
field operation, in what order, and what to ask before believing a vendor claim.

Reviewed quarterly — the maturity ratings move fast. Last review: June 2026.

---

## The capability map

| AI application | Maturity (2026) | Where the edge still is |
| --- | --- | --- |
| Transcription / STT | Production-ready (commodity) | Accents, crosstalk, code-switching on poor lines; numbers and names need verification |
| QC / call monitoring | Production-ready | Calibration to your protocols; the verdict stays human |
| Open-end coding | Partly production-ready | Clustering and first-pass codes yes; nuanced interpretation and new frames no |
| Dialer / contact-center AI | Production-ready | Operational gains (pacing, routing), not survey-data gains |
| AI interviewers (respondent-facing) | Demo-ready | Solid on structured questions; weak on probing, pace, nonresponse conversion |

Rule of thumb: **maturity is inversely correlated with how disruptive the
application sounds.** The back-office uses are done; replacing the interviewer is
the least proven.

---

## Adoption order (most to least reversible)

1. **Transcription** — the foundation everything else depends on. Lowest risk; a wrong transcript is caught by the human reading it.
2. **AI quality control / call monitoring** — highest return on an existing pain; forgiving error model (over-flagging wastes minutes, under-flagging is caught on review). Where 100% coverage becomes affordable.
3. **Open-end coding assist** — saves real hours on the repetitive 70%; humans own the codeframe and the edges. Mind the verbatim-accuracy caveat.
4. **Dialer / contact AI** — operational efficiency, low methodological risk. Sequence by contact-rate pain, not data-quality pain.
5. **AI interviewers** — last and selectively. Pilot only on short, fully structured studies; measure break-off and data quality against your human benchmark; keep away from eligibility judgment and probing until the frontier moves.

Each step earlier in the list is more reversible and catches its own errors before
they reach a client or a respondent.

---

## What AI can't replace

- **Respondent eligibility judgment** — the messy reality behind clean screener logic.
- **Back-check design and the falsification net** — deciding where verification risk concentrates.
- **Complex probing decisions** — when to follow up, and how to probe without leading.
- **Client relationship management** — defending a wave with documented procedure, not a model output.

---

## Questions to ask before you believe a claim

**For any AI vendor:**
- Show me this on a batch of *my own* recorded calls, not a demo reel.
- What's the per-call (or per-minute) processing cost at my actual volume?
- What's the false-positive rate, and how do you measure it?
- How long does calibration to my protocols take — demonstrated, not asserted?
- What happens when the questionnaire changes mid-wave?

**For transcription specifically:**
- Word error rate on hard audio (accents, crosstalk, mobile lines)?
- How are numbers, names, and verbatims flagged for human verification?

**For QC / monitoring:**
- Does the system check the recorded answer against the spoken one, or just the audio?
- Is every flag reviewable by a human before it reaches an interviewer's file?

**For AI interviewers:**
- What's the break-off rate vs. your human benchmark?
- How does data quality compare on *open-ended* items specifically?
- How does it handle a respondent who half-qualifies on the screener?

---

*From VeriCATI — AI quality control for CATI surveys. https://vericati.com*
