تشخيص بيماري با استفاده از تركيب داده هاي ساختاريافته و بدون ساختار در پرونده سلامت الكترونيك

مقطع تحصيلي

كارشناسي ارشد

رشته تحصيلي

مهندسي كامپيوتر - نرم افزار

دانشكده

مهندسي كامپيوتر

تاريخ دفاع

1404/11/06

صفحه شمار

84 ص .

استاد راهنما

زهرا زجاجي , مريم لطفي

كليدواژه فارسي

پرونده سلامت الكترونيك , تشخيص بيماري , پردازش زبان طبيعي , داده¬هاي بدون ساختار , شبكه عصبي بازگشتي , مدل زباني

چكيده فارسي

امروزه با ديجيتالي شدن سوابق پزشكي در سراسر جهان و پيشرفت تكنولوژي، پرونده سلامت الكترونيك به منبع مهمي براي داده‌هاي دنياي واقعي تبديل شده¬است. در پرونده سلامت الكترونيك، اطلاعات سلامتي بيماران نگه داري مي¬شود. اين اطلاعات شامل داده¬هاي ساختاريافته ( مانند علائم حياتي، سن، فشارخون،جنسيت) و اطلاعات بدون ساختار(مانند يادداشت‌ها، گزارش‌هاي باليني و نسخه¬هاي پزشكي) است. اطلاعات بسيار مهمي در دسته‌ي داده‌هاي بدون ساختار وجود دارد. تركيب اين داده¬ها مي¬تواند باعث تشخيص دقيق¬تر بيماري گردد. از آنجايي كه حجم بيشتري از اطلاعات بيماران را داده¬هاي بدون ساختار تشكيل مي¬دهند، استفاده و تحليل داده¬هاي بدون ساختار اهميت بسيار زيادي دارد. در اين پژوهش داده¬هاي بدون ساختار و داده¬هاي ساختاريافته پرونده سلامت الكترونيك براي تشخيص بيماري در نظر گرفته شده اند. در اين پژوهش از مجموعه داده MIMIC-III كه دربرگيرنده انواع داده¬هاي زماني ساختاريافته و بدون ساختار است، استفاده شده¬است. براي انجام اين كار، در مرحله اول، ستون¬هاي مناسب داده¬هاي ساختاريافته از اين مجموعه داده استخراج شده¬است، در مرحله دوم، داده¬هاي بدون ساختار و يادداشت¬هاي پزشكي اين مجموعه داده تحليل شده و با استفاده از مدل كلينيكال برت داده-هاي بدون ساختار تبديل به بردارهاي تعبيه شده¬است. در مرحله سوم، يك جدول از مجموعه داده¬هاي جديد متشكل از داده¬هاي ساختاريافته و نتايج داده¬هاي بدون ساختار ايجاد شده¬است. در مرحله چهارم، تشخيص بيماري با استفاده از اين مجموعه داده منتخب و تحليل شده¬است و با اين تركيب داده¬ها مدل سازي با مدل LSTM و مدلBERT انجام شده¬است. ورودي مدل، تركيب داده¬ها و خروجي آن، كد تشخيص بيماري است. نتايج نشان مي¬دهد كه تشخيص كد بيماري با درنظرگرفتن تمامي انواع داده¬ها داراي معيار دقت 0.94 ، فراخواني 0.93 و صحت 0.94 است. كد بيماري معرف دقيق نام بيماري طبق مجموعه داده MIMIC-III است. داشتن مدلي با اين نتايج در حوزه سلامت، كمك بسيار زيادي به فرآيند درمان، شناسايي زودهنگام بيماري¬ها و پيشگيري از مرگ و مير خواهد كرد.

كليدواژه لاتين

Electronic health record , disease diagnosis , natural language processing , unstructured data , recurrent neural network , language model

عنوان لاتين

Disease diagnosis using a combination of structured an‎d unstructured data in electronic health records.

گروه آموزشي

مهندسي نرم افزار

چكيده لاتين

Today, with the digitization of medical records around the world an‎d the advancement of technology, electronic health records have become an important source of real-world data. In electronic health records, patient health information is stored. This information includes structured data (such as vital signs, age, blood pressure, gender) an‎d unstructured information (such as notes, clinical reports, an‎d medical prescriptions). There is very important information in the unstructured data category. Combining this data can lead to more accurate diagnosis of the disease. Since a larger volume of patient information is made up of unstructured data, the use an‎d analysis of unstructured data is of great importance. In this study, unstructured data an‎d structured data from electronic health records are considered for disease diagnosis. In this study, the MIMIC-III dataset, which includes a variety of structured an‎d unstructured temporal data, was used. To do this, in the first step, appropriate columns of structured data were extracted from this dataset, in the second step, unstructured data an‎d medical notes of this dataset were analyzed an‎d unstructured data were transformed into embedded vectors using the Clinical BERT model. In the third step, a table of the new dataset consisting of structured data an‎d unstructured data results was created. In the fourth step, disease diagnosis was selec‎ted an‎d analyzed using this dataset an‎d modeling was performed with the LSTM model an‎d BERT model with this data combination. The input of the model is the data combination an‎d its output is the disease diagnosis code. The results show that the disease code diagnosis, considering all data types, has a precision of 0.94, recall of 0.93 an‎d precision of 0.94. The disease code is the exact identifier of the disease name according to the MIMIC-III dataset. Having a model with these results in the health field will greatly contribute to the treatment process, early detection of diseases, an‎d prevention of mortality.

تعداد فصل ها

فهرست مطالب pdf

156959

نويسنده

عموشاهي، مائده

لينک به اين مدرک

https://lib.ui.ac.ir/dl/search/default.aspx?Term=25669&Field=0&DTC=3