TempoQL: A Readable, Precise, and Portable Query System for Electronic Health Record Data
Ziyong Ma, Richard D. Boyce, Adam Perer, Venkatesh Sivaraman
电子健康记录(EHR)数据是机器学习对健康的重要数据源,但研究人员和临床医生在提取和验证EHR数据进行建模方面面临巨大障碍。 现有工具在表达性和可用性之间产生权衡,并且通常专门针对单个数据标准,因此很难编写为现代模型构建管道做好准备并适应新数据集的时间查询。 本文介绍了TempoQL,这是一个基于Python的工具包,旨在降低这些障碍。 TempoQL为时间查询提供了一种简单、人类可读的语言;支持多种EHR数据标准,包括OMOP、MEDS等;以及具有可选大语言模型(LLM)创作辅助的交互式基于笔记本的查询界面。 通过性能评估和不同数据集上的两个用例,我们证明TempoQL简化了机器学习队列的创建,同时保持了精度,速度和可重复性。
Electronic health record (EHR) data is an essential data source for machine learning for health, but researchers and clinicians face steep barriers in extracting and validating EHR data for modeling. Existing tools incur trade-offs between expressivity and usability and are typically specialized to a single data standard, making it difficult to write temporal queries that are ready for modern model-building pipelines and adaptable to new datasets. This paper introduces TempoQL, a Python-based to...