I built an AI assistant for my own work. I call her Gal Friday.
Most of the AI industry is a bubble.
The financial structure runs on hype and capital expenditure cycles propped up by a single chip company's growth story.
The technology hallucinates, fabricates, and validates anything the user puts in front of it.
The press — with a few exceptions — never seriously questions any of it, and most of the coverage reads like marketing copy with bylines.
The labor-disruption narrative is mostly a lie used by managers to justify cuts they wanted to make anyway.
I think most of this is hype.
I still use AI.
The case study is about the narrow reason why.
Gal Friday is the AI assistant I built for my own work.
The name is a nod to Iron Man's F.R.I.D.A.Y. — Tony Stark's second AI, the one he built after JARVIS's consciousness got folded into Vision. Yes, that's a deep Marvel cut. I'm aware.
F.R.I.D.A.Y.'s own name was a callback to "Girl Friday" — 1940s slang for the assistant who runs the office. The phrase was popularized by gossip columnist Walter Winchell and cemented in the lexicon by the 1940 Cary Grant screwball comedy His Girl Friday.
Tony Stark named a tool after the human role it played. I did the same thing.
The tool is useful because of how it was built and how it gets used. Not because of what the model can do.
Default AI use produces slop.
The well-known failure modes — hallucination, sycophancy, voice drift, padding — are the surface. Everyone with a Twitter account (I refuse to call it X) has heard those words.
The harder problem is what to do about them in a real workflow that runs every day.
The fix is not better prompts. The fix is operational architecture.
A tool that's wrong 5% of the time is a tool that produces a wrong output every twenty messages. If your job depends on the output, you need a system that catches the wrong one before it ships.
Five operational layers. Each one exists because of a specific way the underlying tool fails.
Gal Friday operates in two registers. Real Talk Mode for internal strategy work — direct, tactical, no padding. Safe Corporate Mode for anything that could get forwarded or screenshotted — professional, neutral, process-compliant.
I switch modes with labels in the conversation. "Hard mode." "Draft for [Name]." "Tactical analysis."
The reason this layer exists: without it, the tool slowly reverts to default corporate register over a long session and starts softening directives. An internal strategy note ends up reading like a press release. Mode switching makes the register a deliberate input, not an emergent property.
Two layers, not one. Layer 1 is institutional — process documentation, brand standards, compliance rules, the source-of-truth files my work runs against. Layer 2 is operational — decision protocols, blocker logs, decision rationale files, contradictions surfaced over time.
Most AI assistants only have a single context store. Mine separates what's known from what's been decided.
Why this matters: when I ask her something, she should be able to tell me whether the answer comes from a documented standard or from a decision I made last Tuesday. Those are different kinds of authority. Treating them as the same is how AI tools end up presenting invention as observation.
A running log of five things, captured as I work: operational excellence moments, brand governance wins, rework prevented, blockers caused by others, and wins worth keeping.
I dictate. She structures. Each entry gets a date, a source citation, and a category.
Why this matters: in any role with annual review cycles, what gets remembered is what got documented. The ledger exists so that what shows up at the end of the year is captured in real time, not reconstructed from memory.
A rule that runs against every output before delivery. Is this genuine signal, or is it AI slop dressed up as analysis?
Specifically: a 500-word document that says nothing gets summarized into the one sentence it actually contains, with the inefficiency flagged. Generic strategies that lack specific execution mechanics get rejected.
The filter forces the tool to differentiate between sounding smart and being useful. It catches the failure mode that's hardest to catch by feel — AI output that reads as substantive because it follows the rhythm of substantive writing.
Every proposed idea, system, or hire comes with an invisible operational tax — maintenance, training, fixing errors. The audit names the tax before the decision gets made.
Specifically: what will break three weeks from now? What's the unspoken cost of saying yes?
Gal Friday doesn't deliver a "solution" without listing its predicted failure points. That single rule has killed more bad recommendations than any other layer in the system.
The known AI failure modes are not what makes a tool unusable. They're what makes the tool dangerous if you don't build around them.
Here's the specific operational corrective for each, as it's actually wired into the protocol.
The tool's failure: when asked a specific attribution question, defaults to a confident-sounding answer.
The architectural fix: the knowledge layer separates documented facts from inferences. Any claim that cites a source must reference Layer 1. Inferences from Layer 2 must be flagged as inferences. Unverifiable claims must return "I'm not certain. Let me check."
The tool's failure: draws conclusions before it has enough information.
The architectural fix: dump-and-hold mode. When I'm transferring context from a voice memo or a long document, no synthesis happens until I explicitly direct it with "go." The default is silent intake.
The tool's failure: long sessions degrade output quality as the assistant reverts to default corporate register.
The architectural fix: a forbidden-phrases list runs as a check on every output. Specific words and patterns get flagged before delivery. The list is version-controlled — when I catch a new drift pattern, the phrase goes on the list, and it doesn't come back.
The tool's failure: generates plausible-sounding tradecraft that reads like my voice but has no project evidence behind it.
The architectural fix: source isolation. Every claim about my work must cite a filename and a line. Outputs that can't cite get labeled as hypothesis, not extraction.
The tool's failure: performs closure with phrases like "that should do it."
The architectural fix: no completion language unless I've explicitly said the work is done. The tool delivers the output and stops. I decide when something is finished.
In a normal week, Gal Friday helps me:
Hold context across projects I haven't touched in three weeks. Draft stakeholder updates that match my voice. Catch compliance issues in copy before they go to legal review.
Process voice memos into structured notes I can act on. Build evidence files for major work moments as they happen.
Surface contradictions between what a stakeholder said in one meeting and what they said in the next.
She makes me better at my job because she forces me to define what good looks like before I can ask for it.
That last part is the unspoken benefit.
Most AI users skip the work of defining the brief. They type a prompt, accept the output, ship the result.
Gal Friday requires me to think clearly before I ask, which is the same discipline I've been applying to every brief I write.
I started Gal Friday on one major LLM provider.
I migrated her to a different one when the second platform proved better at the work.
The rule book moved. The knowledge layers moved. The evidence ledger moved. She kept being Gal Friday.
If the platform she runs on now stops earning its keep, I'll migrate her again. I have no loyalty to any AI vendor.
The architecture is the work. The platform is a rental car.
Most AI deployment will produce slop. Most AI companies will fail. Most of the labor-disruption narrative will not survive contact with reality.
Some of the tools, applied with discipline against specific operational problems, will earn their keep.
Gal Friday is mine.