Agentic AI Clinical Risk Pipeline

⚡ TL;DR — What This Project Does

An AWS-deployed, agentic LLM pipeline that scans free-text clinical notes, flags patients at elevated risk, and sends structured, review-ready email summaries to clinicians — all while keeping clinicians fully in the loop.

Processes patient notes in ≈ 3 seconds each
Sequential, retrieval-augmented, agentic LLM workflow (retrieve → detect → score → summarize → recommend)
Clinician alerts via SES email, with structured outputs and full audit logs persisted in S3

Using OpenAI GPT models deployed on AWS infrastructure, this project analyzes unstructured clinical notes to identify high-risk patients and generate actionable insights for care teams.

This tool is designed to assist clinicians, not replace them — all flagged patients are intended to be reviewed manually by a qualified clinician before any action is taken.

🌐 Workflow Overview

The pipeline automatically processes incoming patient notes, surfaces those at greatest risk, and delivers summarized insights to clinicians, helping save time and prioritize follow-up.

On a scheduled basis (e.g., weekly), the application automatically scans patient notes from a configurable lookback window.

The application can also be run on demand from the command line. Customized timeframes, clinician ID(s), and custom risk-threshold levels may be specified.

This application is designed to augment, not replace, the clinician. It automates a highly time-consuming yet crucial aspect of a clinician's role — note review — for a fraction of the time and cost. This frees clinicians from time spent in review, enabling them to reallocate this time to direct contact with their patients.

Agentic AI Clinical Risk Pipeline - Architecture Overview

📚 Retrieval-Augmented Clinical Reasoning (RAG)

To improve factual grounding, consistency, and explainability, the pipeline incorporates retrieval-augmented generation (RAG) as a first-class component of the clinical risk assessment workflow.

Prior to risk scoring and summarization, each patient note is compared against a curated clinical knowledge base focused on common high-risk medical scenarios and red-flag patterns. The most semantically relevant reference snippets are retrieved and injected into the LLM context, where they inform downstream reasoning and summarization.

Retrieved reference snippets are logged together with model inputs and outputs, enabling clinician review, auditability, and retrospective analysis of how specific risk assessments were generated.

🛠️ Design Principles

🧰 Technologies & Stack

⚙️ Key Capabilities

🛡 Safeguards & Failure Handling

⚠️ Limitations & Responsible Use

This project is intended as a demonstration of how large language models can help prioritize clinical review workloads. It is not a diagnostic tool and does not independently determine care plans or patient outcomes. All outputs must be interpreted by licensed clinicians familiar with the patient’s history and context.

Accordingly, this system is designed with a human-in-the-loop by default: it prioritizes cases for clinician review but does not replace clinical judgment, and no model outputs are ever sent directly to patients.

🧪 Evaluation & Validation Approach

Because this system is designed as decision-support rather than a diagnostic tool, evaluation focuses on workflow impact and usability rather than clinical outcome prediction. Validation to date has emphasized:

A lightweight internal test set of patient-note examples was used to iterate on prompts and thresholds. Future extensions may include blinded clinician comparison studies, inter-rater agreement metrics, and validation against structured outcomes when appropriate approvals and de-identification procedures are in place.

🔍 LLM Prompt Design & Review Workflow

A general RISEN framework (Role, Input, Steps, Expectation, and Narrowing) was used to guide the formulation of the different sequential prompts used across the workflow. Different prompts performed different roles (e.g., an initial prompt identifying high-risk patients, and a secondary prompt summarizing/making recommendations for each high-risk patient).

During development, these different prompts were updated in an iterative process until they delivered consistent results for their desired purpose. Please find an example of the initial risk-assessment prompt listed below:

This Agentic Workflow deploys a sequential combination of multiple LLM prompts, NLP/regex applications, and numeric calculations to provide the desired analysis.

The workflow is described as agentic because multiple prompts operate in sequence, with the output of one step guiding the next. Rather than a single call to an LLM, the system performs staged reasoning: it first detects whether a note includes potential high-risk findings, then assigns a numeric risk score, then generates a human-readable clinical summary, and finally recommends potential follow-up actions for clinician review.

🔔 Why This Matters

Clinicians face an overwhelming volume of unstructured documentation, and published estimates suggest a substantial share of clinician time is spent on paperwork and documentation review. Much of this time is consumed by manual triage of free-text notes to determine which patients require follow-up attention.

This pipeline is designed to reduce that burden while keeping clinicians fully in the loop:

The result is not automation of medical decision-making, but meaningful reallocation of clinician time — less time searching through notes, more time engaging directly with patients.

🚀 Future Directions

Together, these extensions would support more rigorous validation, enable continuous model improvement through clinician feedback, and move the system closer to real-world deployment while maintaining strong human oversight.

📊 Example of Application Input/Output

A 59-year-old male patient with chronic alcoholism and hepatitis B virus carrier was diagnosed with alcoholic liver cirrhosis and hepatocellular carcinoma (HCC) two years ago. Then, he received transcatheter arterial chemoembolization therapy three times and has been living without recurrence. The patient visited our emergency department with the symptoms of headache beginning 10 days prior and progressive left hemiparesis, altered mentality occurring two days prior. He was afebrile and his vital signs were stable. There were no leukocytosis and C-reactive protein (CRP) was 4.04 mg/L of blood. Upon a neurological examination, he was drowsy with disorientation and revealed decreased upper and lower extremities motor power to grade IV. DWI of the brain was performed because of suspicion of cerebral infarction. It showed a multi-lobulated cystic mass lesion and associated mild edema located in the right parieto-occipital lobe. We considered the possibility of a metastatic brain tumor at the first impression owing to negative diffusion restriction sign and a history of HCC. Contrast enhanced MRI combined with DWI revealed a multi-lobulated cystic rim-enhancing mass with surrounding edema and hypointensity in the cystic cavity on the DWI. Stereotactic biopsy with aspiration was performed on the assumption of HCC multiple metastasis in the brain and the result revealed BA involving multiple bacterial colonies. However, because the bacteria was not cultured, an initial antimicrobial therapy was started on the basis of the standard empirical treatment that consists of vancomycin plus a third-generation cephalosporin and metronidazole. Despite the use of the above antimicrobial therapy, clinical deterioration with an increasing abscess size on cranial imaging made further stereotactic aspiration and cultures including fungus, parasite and tuberculosis mycobacterium. The amount of vancomycin dosage was increased in order to increase the CSF concentration of vancomycin but intermittent spiking fever continued and patient's clinical symptoms did not improve. Even though there were no bacterial growth in the cultures, considering the situation that antimicrobial-resistant gram-positive strains is increased, we had to change the previous antibiotics to line.

Note: Retrieved clinical reference snippets used during risk assessment are persisted alongside model inputs and outputs in S3 for audit and review, but are not included verbatim in clinician-facing emails to avoid unnecessary cognitive load.

Agentic AI Clinical Risk Pipeline

⚡ TL;DR — What This Project Does

🌐 Workflow Overview