A practical guide to HIPAA LLM inference
Healthcare has some of the most valuable LLM use cases — clinical documentation, prior authorization, patient messaging — and some of the strictest data rules. Here's how to run models on protected health information without tripping over HIPAA.
Compliance Lead
This is practical guidance, not legal advice. Work with your compliance counsel on your specific obligations.
When a model vendor becomes a business associate
Under HIPAA, any vendor that creates, receives, maintains or transmits protected health information (PHI) on your behalf is a business associate. The moment a prompt containing PHI hits an inference provider, that provider is handling PHI — and you need a Business Associate Agreement (BAA) in place before a single request goes out. No BAA, no PHI. Full stop.
The retention trap
Here's where most AI deployments quietly create risk. If your inference vendor logs prompts, caches completions, or uses your data to improve their models, then PHI is now sitting in their systems — expanding your breach surface, your audit scope, and your liability under the Breach Notification Rule. Every system that stores PHI is a system you have to secure, monitor and account for.
Zero data retention collapses this problem. If request content is never written to disk, never logged, and never used for training, then there is no stored PHI on the inference side to breach or to audit. You've removed an entire category of risk instead of managing it.
The minimum-necessary principle
HIPAA's minimum-necessary standard says you should use and disclose only the PHI required for the task. For LLM workloads that translates to a few concrete practices:
- De-identify where you can. Strip identifiers that the model doesn't need before the prompt is built.
- Pin your region. Keep inference in a known, compliant location rather than a global pool.
- Isolate compute. Single-tenant GPUs mean PHI is never processed on hardware shared with another customer.
- Scope access tightly. RBAC and scoped API keys so only the right services can submit PHI.
A compliant architecture, in short
Putting it together, a defensible HIPAA inference setup has four properties: a signed BAA with the provider; zero retention so no PHI is stored downstream; single-tenant, in-region processing; and an audit trail of access that records that a request happened without logging what was in it. With those in place, the inference layer stops being the scary part of your healthcare AI stack.
Why this is easier than it sounds
Teams often assume HIPAA rules out modern open models. It doesn't. It rules out careless data handling. Run DeepSeek-V4 or GLM-5.2 on dedicated GPUs inside your boundary, under a BAA, with zero retention, and you get frontier capability with a smaller compliance footprint than many legacy systems already in your environment.
Run this privately, in your own environment
A solutions engineer will scope a zero-retention deployment for your models and volume.