Even the Expensive Tools Get It Wrong: Sorting AI Tasks by Layer

The short version

Leading legal AI platforms still hallucinate. A Stanford–Yale study published in the Journal of Empirical Legal Studies in June 2025 found that Lexis+ AI, Westlaw AI-Assisted Research, and Ask Practical Law AI each hallucinate between 17% and 33% of the time, with Lexis+ AI achieving the highest overall accuracy at roughly 65%.
Research and drafting sort cleanly into the two-layer stack once you ask one question: did confidential client information enter the prompt?
Intake is the highest-leverage workflow to fix first. Industry sources suggest AI-driven intake and automated follow-up can meaningfully lift conversion, though the specific lift depends on firm, practice area, and lead source.

Last week I laid out the two-layer stack. General-purpose AI for work that does not touch confidential client information. Domain-specific AI for work that does.

This week I am answering the question I promised: which research, drafting, and intake tasks belong in which layer? First, a problem that applies to both layers equally.

Why Do Even the Best Legal AI Tools Still Hallucinate?

Because large language models generate plausible-sounding text, not verified facts, and no commercial legal platform has solved that problem yet.

In AI, the word for this is "hallucination." It means the tool generates something that is not real — a case citation, a statute, a holding, a quote — and presents it as fact. It does not flag the answer as uncertain. It does not tell you it is guessing. It hands you a citation in the right format, with plausible language, from what looks like a real court. The only problem is it does not exist.

The National Center for State Courts and the Thomson Reuters Institute published A Legal Practitioner's Guide to AI and Hallucinations in February 2026 through their joint AI Policy Consortium for Law & Courts. Their description of the problem is consistent with the pattern attorneys keep running into: fabricated case citations, distorted holdings, misrepresented facts, invented statutes, and false procedural information — all delivered in language that sounds completely authoritative.

This is not a glitch that happens once in a while. In the peer-reviewed Stanford–Yale study Hallucination-Free? Assessing the Reliability of Leading AI Legal Research Tools, published in the Journal of Empirical Legal Studies vol. 22, issue 2 (June 2025), pp. 216–242, the authors found that Lexis+ AI, Westlaw AI-Assisted Research, and Ask Practical Law AI each hallucinate between 17% and 33% of the time on legal queries, even though these are the three leading retrieval-augmented research tools built specifically for lawyers. Lexis+ AI achieved the highest overall accuracy at about 65%; Westlaw's tools ranked lower.

I spent most of my career writing press releases for renowned powersports and marine brands. Technical specifications, product launches, performance data — the kind of material where every number has to be right because the client's engineers will check. If we got it wrong a third of the time, I would not have kept the account for three decades.

The consequences in law are worse than losing an account.

What Happens When Lawyers File AI-Generated Fake Citations?

Courts are sanctioning them, and penalties are escalating on repeat offenders.

On September 12, 2025, the California Second District Court of Appeal, Division Three, sanctioned Los Angeles attorney Amir Mostafavi $10,000 in Noland v. Land of the Free, L.P. after finding that 21 of the 23 case quotations in his opening brief were fabricated, with additional fabricated citations in his reply brief, as reported by CalMatters and Datamation. Mostafavi told the court he ran the appeal through ChatGPT to improve the writing and did not review the AI-generated output before filing. The three-judge panel called the filing frivolous and a waste of the court's time and taxpayer money. He paid within days, and in February 2026 a State Bar Court judge recommended further discipline, with the California Supreme Court to have the final say.

U.S. District Judge Kai N. Scott sanctioned New Jersey attorney Raja Rajan $5,000 for filing AI-generated fake citations for the second time in Bunce v. Visual Technology Innovations. His prior sanction for the same conduct in the same case was reported by Bloomberg Law in April 2026.

The Stanford–Yale team identified two kinds of errors. Factual hallucinations, where the AI just gets the law wrong. And citation errors, where the AI describes the law correctly but points you to a source that does not actually support the claim. That second kind is harder to catch. It reads right. It looks right. And if you file it without checking, you end up in front of a judge explaining why you trusted a machine to do your job.

If last week's issue was about verifying confidentiality before you use a tool, this week's is about verifying accuracy after.

Which Legal Research Tasks Belong in Layer 1 vs. Layer 2?

Public legal research belongs in Layer 1; any research that requires client names or case facts in the prompt belongs in Layer 2.

Research is where the two-layer stack works most cleanly, because the line between public and confidential is usually obvious.

Public legal research belongs in Layer 1. General legal standards. Regulatory tracking. Publicly filed opinions. Jurisdiction comparisons. Background for a CLE talk or a client-facing FAQ. None of that requires client names or case facts in the prompt. I use Perplexity for this in my own firm. A solo attorney could research the current state of non-compete enforceability in Georgia without entering a single client detail. That is Layer 1 work.

Client-specific research belongs in Layer 2. Analyzing how a ruling applies to your client's facts. Uploading a deposition transcript. Running a specific set of case details against a body of precedent. That work goes into a tool with proprietary data, audit trails, and confidentiality terms you have read.

CoCounsel, Thomson Reuters' legal AI, runs a feature called Deep Research that builds multi-step research plans and delivers cited reports drawn from Westlaw's proprietary database. Lexis+ AI does comparable work on the LexisNexis side. Even inside Layer 2, the Stanford–Yale hallucination rates still apply — these tools give you a better starting point and tighter confidentiality controls, not a finished product. You still check every citation before it goes out the door.

Where Do Small Firms Get Drafting Wrong?

At the line between template work and client work, especially under deadline pressure.

Most of the mistakes I hear about do not happen in research. They happen in drafting, because the line between "template work" and "client work" blurs fast at 6 p.m. on a Thursday.

Documents without client facts belong in Layer 1. Internal checklists. Engagement letter templates with blank fields. Blog posts. Client alerts about a new regulation. An outline for a brief using hypothetical facts.

Everything with real client information belongs in Layer 2. Motions. Briefs. Discovery requests and responses. Demand letters. Anything with names, dates, medical records, financial figures, or case facts that belong to an actual person.

For litigation-heavy small firms, Briefpoint automates discovery requests and responses. MyCase's own integration page claims it saves up to three hours per discovery document — a vendor figure worth testing against your own workflow. For transactional practices, Spellbook handles contract drafting and review inside Microsoft Word. CoCounsel includes drafting support alongside its research features.

Which tool you pick matters less than which layer you put the work in. A paralegal who pastes a client's medical history into ChatGPT to speed up a demand letter has created the same problem Bradley Heppner created. Different context, same exposure.

And regardless of layer, every AI-drafted document needs a human review — not a skim, a real review. Mostafavi told the court he did not read the output before he filed it. That cost him $10,000, a published opinion with his name on it, and a pending State Bar discipline proceeding.

Research, Drafting, and Intake by Layer

Workflow	Layer 1: General-Purpose AI	Layer 2: Domain-Specific or Controlled AI
Research	General legal standards, regulatory tracking, public case law, jurisdiction comparisons, CLE background	Applying rulings to client facts, deposition analysis, client-specific precedent research
Drafting	Internal checklists, engagement letter templates, blog posts, client alerts, outlines with hypothetical facts	Motions, briefs, discovery, demand letters, anything with real names, dates, or records
Intake	Designing the questionnaire, follow-up email sequences, qualification criteria, scripts	Live prospect data, case details, financial or medical information, CRM records
Example tools	Perplexity, ChatGPT, Claude	CoCounsel, Lexis+ AI, Spellbook, Briefpoint, Lawmatics, Clio

Why Is Intake the AI Workflow Small Firms Should Fix First?

Because response speed is the single biggest lever on lead-to-client conversion, and AI is unusually well-suited to it.

I have run a small firm for more than 30 years. In PR, the pitch window is measured in minutes. A journalist calls three brands for comment. The first one back with a usable quote gets the placement. The other two get nothing.

Law firm intake works the same way.

The often-cited five-minute rule — that prospects who get a real response within five minutes convert at dramatically higher rates than those who wait 30 minutes or more — traces back to lead-response research originally conducted on B2B sales leads, not legal intake specifically. Legal Brand Marketing and other legal-marketing firms have applied the framework to law firm intake. The underlying logic holds: the longer a prospect waits, the colder the lead gets. The specific multiplier should be treated as directional, not precise.

Traditional legal intake conversion rates are commonly reported in the 15% to 25% range in vendor case studies and legal-marketing research, with vendors like Lawmatics and Smith.ai citing meaningful lifts when AI-driven intake and automated follow-up are added. Those figures are vendor and marketing-firm sourced and should be treated as directional. For a baseline on the broader legal-tech landscape, the Clio Legal Trends Report remains the most frequently cited industry benchmark.

The Layer 1 work here is designing the system. Drafting your intake questionnaire. Writing follow-up email sequences. Building qualification criteria. Creating scripts. No client data involved.

The Layer 2 work starts when real prospects fill in real information. Names, case details, financial situations, medical histories. That data needs to live in a system built for it.

Lawmatics, a legal CRM, launched QualifyAI in October 2025, which LawNext described as a sophisticated lead-scoring platform that analyzes a firm's own historical case data and builds a scoring model from your firm's actual intake patterns — not a one-size-fits-all algorithm. Clio handles intake inside its broader practice management suite. Smith.ai, reviewed by Lawyerist in 2025, pairs AI receptionists with live human agents so firms have 24/7 first-response coverage without losing the human touch on complex calls.

Legal intake platforms run roughly $200 to $800 a month depending on firm size. Against even one or two additional retained clients per month, the math tends to work — but run your own numbers before signing.

How Much Can a Three-Attorney Firm Save With the Two-Layer Stack?

Illustrative example only — your firm's numbers will differ based on rates, practice area, case mix, and existing tooling.

Assume two associates billing $250 an hour and one partner at $400.

Layer 1 savings (illustrative). Each associate recovers 3 hours a week on non-confidential research, drafting, and admin. That is $1,500 a week across the two associates. The partner recovers 2 hours, worth $800. Monthly recovered capacity: roughly $9,200.

Layer 2 savings (illustrative). If the intake system converts one additional case per month worth $5,000, the platform pays for itself several times over.

Tool costs. Perplexity Max at $167 a month billed annually for the general-purpose layer. A legal AI platform at roughly $100 to $225 per user per month depending on vendor and tier. An intake platform at $200 to $800. For a three-attorney firm, total AI spend typically lands between $800 and $1,875 a month.

These numbers are a planning framework, not a forecast. Plug in your own billing rates, your own recovered hours, and your own historical conversion rate before you commit to tooling.

What Should a Small Firm Do This Week?

Audit your last two weeks of research tasks, move one drafting workflow to the right layer, and time your actual intake response.

Go through your last two weeks of research tasks. For each one, write down whether confidential client information entered the prompt. If you cannot answer that question for every task, the work defaulted to whatever tool was convenient. That is the gap you need to close.

Move one drafting workflow to the correct layer. If associates draft templates in CoCounsel, that is wasted spend. If they draft client motions in ChatGPT, that is unmanaged risk. Pick one. Fix it this week.

Measure your actual intake response time. Not what you think it is. Time the gap between a new inquiry arriving and a real answer going out. If that number is more than five minutes during business hours, the data says you are losing retained clients to firms that answer faster.

Read the NCSC and Thomson Reuters Institute guide A Legal Practitioner's Guide to AI and Hallucinations. It is written for practitioners, not computer scientists. Then look at whatever AI tool your firm uses for research and ask: what is our review process for catching the roughly one-in-four to one-in-three queries that come back wrong? If you do not have an answer, build one before you file anything else.

Frequently Asked Questions

How accurate are the leading legal AI research tools?

Not as accurate as the marketing suggests. The 2025 Stanford–Yale study in the Journal of Empirical Legal Studies found that Lexis+ AI, Westlaw AI-Assisted Research, and Ask Practical Law AI each hallucinate between 17% and 33% of the time; Lexis+ AI achieved the highest overall accuracy, at roughly 65%.

Can I use ChatGPT or Perplexity for legal research?

Yes, for public legal research with no client facts in the prompt. General legal standards, regulatory tracking, and public case law are appropriate Layer 1 tasks. Client-specific analysis is not.

Which AI tools are built for client-confidential legal work?

CoCounsel, Lexis+ AI, Spellbook, Briefpoint, and Clio Duo are commonly used Layer 2 tools. They offer proprietary data, audit trails, and contractual confidentiality protections that consumer AI tools do not.

What is the five-minute rule in legal intake?

A directional benchmark from B2B lead-response research stating that prospects contacted within five minutes convert at dramatically higher rates than those contacted 30 minutes or later. Legal marketing firms have applied it to law firm intake, but the specific multipliers are not drawn from legal-specific studies.

How much should a small firm budget for AI tools?

As an illustrative planning range, a three-attorney firm should expect $800 to $1,875.

Catch up on recent issues

Is Perplexity Computer for Professional Finance Right for Your Small Firm? (May 6)

Where Small Law and CPA Firms Are Actually Using AI in 2026 (May 7)

The Small Legal Firm AI Policy Template (That Won't Get You Sanctioned) (May 11)