Institutional Repository [SANDBOX]
Technical University of Crete
EN  |  EL

Search

Browse

My Space

Mining distributed and heterogeneous data sources in the medical domain

Moustakis Vasilis, Hristofis K., Potamias G., Orphanoudakis S.

Simple record


URIhttp://purl.tuc.gr/dl/dias/6A3F9FF1-F21B-4D18-BC9B-5759BBADCAB5-
Identifierhttp://www.logistics.tuc.gr/Contents/Publications/43.pdf-
Languageen-
TitleMining distributed and heterogeneous data sources in the medical domainen
CreatorMoustakis Vasilisen
CreatorΜουστακης Βασιληςel
Creator Hristofis K.en
CreatorPotamias G.en
CreatorOrphanoudakis S.en
Content SummaryWith the current explosion of data, the problem of how to combine distributed and heterogeneous- D&H information sources becomes more and more critical. Besides collecting enormous amount of data it is very important to consider the general need of semantic integration and knowledge discovery from these sources, an important and necessary challenge for machine learning- ML, and data mining/knowledge discovery- DM/KDD researchers. The main differences here, and consequently the grand challenges with respect to single, static and homogeneous information sources, are: (a) the scale of the problem is much larger than anything attempted before in ML and DM/KDD, and (b) the raising need for integrating multiple knowledge representations (e.g., domain ontologies and data-models) are more important and vital (Wah et.al., 1993). If the distributed nature of data has a more-or-less clear definition (even hard, and most of the times tedious to achieve), heterogeneity is a more complex concept. Consider for example the situation where, the same or different database applications are installed and run at different remote locations. In such a set-up users may enter and record data in a non pre-specified and non-homogeneous format. This is a common situation in an Integrated Electronic Health Care Record (I-EHCR) environment (Forslund, and Kilman, 1996; InterCare, 1999, pp. 7-13; Grimson et.al., 1997). A physician that accesses a patient’s healthcare record needs an overview of the patient’s EHCR segments, since in most cases only a small fraction of the complete record will be selected and presented in detail. That also means that when accessing a particular clinical information system there is a need for extracting only a subset of the information stored in it. The real issue here is not only how to access specific information systems that maintain EHCR segments, but also how to identify and index the essential information in them. A promising approach to this integration problem is to gain control of the organization's information resources at a meta-data level, while allowing autonomy of individual systems at the data instance level. The objective of the meta-database model is to achieve enterprise information integration over distributed and potentially heterogeneous systems, while allowing these systems to operate independently and concurrently (Hsu, 1992). However, achieving integration at the semantic level is a challenging problem mainly because the logic, knowledge, and data structures used in various systems are complex and often incompatible (Sciore, 1994). In addition, the further someone wishes to hide heterogeneity, the more he/ she has to deal with semantic integration issues. Thus, a realistic solution should hide heterogeneity at the top level, while making the individual sources of information appear to end users as a large collection of objects that behave uniformly (Baldonado, 1996). This paper presents the problem of discovering and acquiring knowledge form D&H clinical data sources. In particular, we tackle the problem of inducing interesting associations between data items stored in remote clinical information systems. The test-bed environment of our approach is the HYGEIAnet: The Integrated Health Care Network of Crete (Tsiknakis, 1997; HYGEIAnet Web site). One of the basic healthcare services offered within the HYGEIAnet network is the access to patients' clinical information stored in autonomous (legacy) clinical information systems. Even if the focus is on the medical domain, the proposed methodology and solutions could be smoothly extended to cover the general case of other application domains. In the next section, we present the architecture of an integrated environment for mining D&H data sources. Section 3, presents the basic technology for accessing distributed and structured data sources, as well as the processes for the semantic homogenization and integration of heterogeneous data sources and items. In section 4, we present the information and data representation framework; based on the XML framework and technology. Section 5, presents the machine learning and data mining processes, which are being adapted on flexible data representation structures. In section 6, some preliminary experimental results are presented. In the last section we conclude, and discuss on the future research and development agenda. en
Type of ItemΣύντομη Δημοσίευση σε Συνέδριοel
Type of ItemConference Short Paperen
Licensehttp://creativecommons.org/licenses/by/4.0/en
Date of Item2015-11-05-
Date of Publication2000-
Bibliographic CitationK. Hristofis, G. Potamias, M. Tsiknakis, V. Moustakis, S. Orphanoudakis, "Mining Distributed and Heterogeneous Data Sources in the Medical Domain," presened at Eturopean Conference of Machine Learning. Barcelona, Spain, 2000.en

Available Files

Services

Statistics