Papers
arxiv:2501.13687

Question Answering on Patient Medical Records with Private Fine-Tuned LLMs

Published on Jan 23
· Submitted by ayushgs on Jan 27

Abstract

Healthcare systems continuously generate vast amounts of electronic health records (EHRs), commonly stored in the Fast Healthcare Interoperability Resources (FHIR) standard. Despite the wealth of information in these records, their complexity and volume make it difficult for users to retrieve and interpret crucial health insights. Recent advances in Large Language Models (LLMs) offer a solution, enabling semantic question answering (QA) over medical data, allowing users to interact with their health records more effectively. However, ensuring privacy and compliance requires edge and private deployments of LLMs. This paper proposes a novel approach to semantic QA over EHRs by first identifying the most relevant FHIR resources for a user query (Task1) and subsequently answering the query based on these resources (Task2). We explore the performance of privately hosted, fine-tuned LLMs, evaluating them against benchmark models such as GPT-4 and GPT-4o. Our results demonstrate that fine-tuned LLMs, while 250x smaller in size, outperform GPT-4 family models by 0.55% in F1 score on Task1 and 42% on Meteor Task in Task2. Additionally, we examine advanced aspects of LLM usage, including sequential fine-tuning, model self-evaluation (narcissistic evaluation), and the impact of training data size on performance. The models and datasets are available here: https://huggingface.co/genloop

Community

Paper author Paper submitter

Healthcare systems continuously generate vast amounts of electronic health records
(EHRs), commonly stored in the Fast Healthcare Interoperability Resources (FHIR)
standard. Despite the wealth of information in these records, their complexity and
volume make it difficult for users to retrieve and interpret crucial health insights.
Recent advances in Large Language Models (LLMs) offer a solution, enabling
semantic question answering (QA) over medical data, allowing users to interact with
their health records more effectively. However, ensuring privacy and compliance
requires edge and private deployments of LLMs.
This paper proposes a novel approach to semantic QA over EHRs by first identifying
the most relevant FHIR resources for a user query (Task1) and subsequently answering the query based on these resources (Task2). We explore the performance of
privately hosted, fine-tuned LLMs, evaluating them against benchmark models such
as GPT-4 and GPT-4o. Our results demonstrate that fine-tuned LLMs, while 250x
smaller in size, outperform GPT-4 family models by 0.55% in F1 score on Task1
and 42% on Meteor Task in Task2. Additionally, we examine advanced aspects of
LLM usage, including sequential fine-tuning, model self-evaluation (narcissistic
evaluation), and the impact of training data size on performance. The models and
datasets are available here: https://huggingface.co/genloop

This is an automated message from the Librarian Bot. I found the following papers similar to this paper.

The following papers were recommended by the Semantic Scholar API

Please give a thumbs up to this comment if you found it helpful!

If you want recommendations for any Paper on Hugging Face checkout this Space

You can directly ask Librarian Bot for paper recommendations by tagging it in a comment: @librarian-bot recommend

Sign up or log in to comment

Models citing this paper 14

Browse 14 models citing this paper

Datasets citing this paper 2

Spaces citing this paper 0

No Space linking this paper

Cite arxiv.org/abs/2501.13687 in a Space README.md to link it from this page.

Collections including this paper 1