Presentation Details
| Using Large Language Models for Time Studies on Preoperative Screening Phone Calls Danielle LaPointe, MD, Alex Clark, Stephen M Breneman, MD, PhD. University of Rochester, Rochester, NY, USA |
Abstract
BACKGROUND: Time studies are a very powerful way to review the work of a call center and preoperative phone call screening remains an integral part of the preoperative process. We have 18 regular screening nurses completing approximately 4000 screens per month (5000 calls). We attempted a self-reported time study last year but found 20% of the data to be missing or uninterpretable and burdensome to the screeners. The purpose of this innovation abstract is to describe how advances in artificial intelligence can help with the quality assurance process of a large screening program.
PURPOSE: We used NEC Softphones and requiring all calls made by the screeners to be made with this phone. Transcription with OpenAI’s Whisper speech-to-text. Two hundred transcripts were classified into nine types of calls by A.C. and reviewed by S.B. Classification prompts were created and tested using Large Language Models (LLM) (GPT 4.0, GPT 4.1, Mistral and Qwen 3-4B) (HIPPA compliant). Two months of calls were classified and integrated with screener’s reported work hours to create time study.
RESULTS: Transcription time was 1:12 with Whisper with no incremental cost. Open AIs GPT 4.0 achieved 97% accuracy in classifying the 200 screening calls. GPT 4.1 (89%), Mistral (88.5%), Qwen (78%). Time to classify: 500 transcripts per hour at a cost of $0.01/transcript. Combining the label of the type of call with the length of call and the hours worked by the screener, we were able to reproduce a manual time study with over 10,000 calls.
CONCLUSIONS: Perioperative screening phone calls will continue to be an integral part of patient preparation for surgery. We have demonstrated an inexpensive, unburdened method to perform ongoing time studies and quality assurance on large screening programs.
No part of this publication may be reproduced, distributed, or transmitted in any form or by any means, including photocopying, recording, or other electronic or mechanical methods, without the prior written permission of the author.
PURPOSE: We used NEC Softphones and requiring all calls made by the screeners to be made with this phone. Transcription with OpenAI’s Whisper speech-to-text. Two hundred transcripts were classified into nine types of calls by A.C. and reviewed by S.B. Classification prompts were created and tested using Large Language Models (LLM) (GPT 4.0, GPT 4.1, Mistral and Qwen 3-4B) (HIPPA compliant). Two months of calls were classified and integrated with screener’s reported work hours to create time study.
RESULTS: Transcription time was 1:12 with Whisper with no incremental cost. Open AIs GPT 4.0 achieved 97% accuracy in classifying the 200 screening calls. GPT 4.1 (89%), Mistral (88.5%), Qwen (78%). Time to classify: 500 transcripts per hour at a cost of $0.01/transcript. Combining the label of the type of call with the length of call and the hours worked by the screener, we were able to reproduce a manual time study with over 10,000 calls.
CONCLUSIONS: Perioperative screening phone calls will continue to be an integral part of patient preparation for surgery. We have demonstrated an inexpensive, unburdened method to perform ongoing time studies and quality assurance on large screening programs.
No part of this publication may be reproduced, distributed, or transmitted in any form or by any means, including photocopying, recording, or other electronic or mechanical methods, without the prior written permission of the author.