Abstract
Purpose – This study aims to develop a methodology for predicting the resilience of individual firms to economic crisis, using historical government data to optimize one of the most important and costly interventions that governments undertake, the huge economic stimulus programs that governments implement for mitigating
the consequences of economic crises, by making them more focused on the less resilient and more vulnerable firms to the crisis, which have the highest need for government assistance and support.
Design/methodology/approach – The authors are leveraging existing firm-level data for economic crisis periods from government agencies having competencies/ responsibilities in the domain of economy, such as Ministries of Finance and Statistical Authorities, to construct prediction models of the resilience of individual
firms to the economic crisis based on firms’ characteristics (such as human resources, technology, strategies, processes and structure), using artificial intelligence (AI) techniques from the area of machine learning (ML).
Findings – The methodology has been applied using data from the Greek Ministry of Finance and Statistical Authority about 363 firms for the Greek economic crisis period 2009–2014 and has provided a satisfactory prediction of a measure of the resilience of individual firms to an economic crisis.
Research limitations/implications – The authors’ study opens up new research directions concerning the exploitation of AI/ML in government for a critical government activity/intervention of high importance that mobilizes/
spends huge financial resources. The main limitation is that the abovementioned first application of the proposed methodology has been based on a rather small data set from a single national context (Greece), so it is necessary to
proceed to further application of this methodology using larger data sets and different national contexts.
Practical implications – The proposed methodology enables government agencies responsible for the implementation of such economic stimulus programs to proceed to radical transformations of them by predicting the resilience to economic crisis of the firms applying for government assistance and then directing/focusing the scarce available financial resources to/on the ones predicted to be more vulnerable, ιncreasing substantially the effectiveness of these programs and the economic/social value they generate.
Originality/value – To the best of the authors’ knowledge, this study is the first application of AI/ML in government that leverages existing data for economic crisis periods to optimize and increase the effectiveness of the largest and most important and costly economic intervention that governments repeatedly have to
make: the economic stimulus programs for mitigating the consequences of economic crises.
Abstract
Purpose: Public sector has started exploiting artificial intelligence (AI) techniques, however mainly for operational and to a lower extent for tactical level tasks. The purpose of this study is to exploit AI for the highest strategic level task of government: to develop an AI-based public sector data analytics methodology for supporting policy-making for one of the most serious and large-scale challenges that governments repeatedly face: the economic crises giving rise to economic recessions (though the proposed methodology has much wider applicability)
Design/Methodology/Approach: A public sector data analytics methodology has been developed, which enables the exploitation of existing public and private sector data, through advanced processing of them using a big data oriented AI technique, ‘all-relevant’ feature selection, in order to identify characteristics of firms and their external environment that affect (positively or negatively) their resilience to economic crisis.
Findings: A first application of the proposed public sector data analytics methodology has been conducted, using Greek firms’ data concerning the economic crisis period 2009-2014, which has led to interesting conclusions and insights, revealing some factors that affect the extent of sales revenue decrease in Greek firms during the above crisis period, and providing a first validation of our methodology.
Research Implications: Our research contributes to the advancement of two emerging highly important for the society, but minimally researched, digital government research domains: public sector data analytics (and especially policy analytics) and government exploitation of AI. It exploits an AI feature selection algorithm, the Boruta ‘all-relevant’ variables identification one, which has been minimally exploited in the past for public sector data analytics, in order to support the design of public policies for addressing one of the most serious and large-scale economic challenges that governments repeatedly face: the economic crises.
Practical Implications: The proposed methodology allows the identification of characteristics of firms and their external environment that affect positively or negatively their resilience to economic crisis. This enables a better understanding of the kinds of firms that are more strongly hit by the crisis, which is quite useful for the design of public policies for supporting them; and at the same time reveals firms’ resources, capabilities and practices that enhance their ability to cope with economic crisis, in order to design policies for promoting them through educational and support activities.
Social Implications: This methodology can be very useful for the design of more effective public policies for reducing the negative impacts of economic crises on firms, and therefore mitigating their negative consequences for the society, such as unemployment, poverty and social exclusion.
Originality/Value: Our research develops a novel approach to the exploitation of public and private sector data, based on a minimally exploited for such purposes AI technique (‘all-relevant’ feature selection), in order to support the design of public policies for addressing one of the most threatening disruptions that modern economies and societies repeatedly face the economic crises.
Abstract
Storytelling is an important asset in today\\\'s society. Digital platforms for storytelling can facilitate collaborative development of stories. The storytelling process, if properly facilitated, can lead to the creation of stories that improve the relations between the players. Moreover, stories convey important information about the players and their interaction. Extended knowledge and better tools are needed about how to facilitate storytelling for good and analysis to exploit the power of the generated data. Research Question: How to facilitate Digital Storytelling for good? Method: The investigation is based on a case study approach in which participants have been engaged in the creation of stories. The study is based on empirical data collection and analysis: from the stories recorded, we extract the storytelling features and performance. We have provided qualitative (Domain Expert) and quantitative (Machine Learning) analysis of the stories. In total, 58 users played the game in 15 sessions. Results and Conclusions: The main result is a framework for analysing digital stories. The analysis gives an indication of which game building blocks lead to stories for good. Future work will include a redesign of the game and its building blocks which lead to stories for good and further analyses.
Abstract
This paper presents a privacy preserving protocol for the computation of a Radial Basis Function (RBF) neural network model between N participants which share horizontally partitioned datasets. The RBF model is used for regression analysis tasks. The novel aspect of the proposed protocol lies to the fact that it assumes a malicious user model and does not use homomorphic cryptographic methods, which are inherently only suited for a semi-trusted user environment. The performance analysis shows that the communication overhead is low enough to warranty its use while the computational complexity is identical in most cases with the centralized computation scenario (e.g. a trusted third party). The accuracy of the output model is only marginally subpar to a centralized computation on the union of all datasets.
Abstract
Emerging pervasive assistive environment applications for remote home healthcare monitoring of the elderly, disabled and also patients with various chronic diseases generate massive amounts of sensor signal data, which are transmitted from numerous homes to local health centers or hospitals. While it is critical to process this data efficiently (in a fast and accurate manner) and cost-effectively, in a large scale application of the above technologies it is not possible to do so manually by specialized human resources. This paper proposes a methodology for automatic real-time screening of heart sound signals (one of the most widely acquired signals from the human body for diagnostic purposes)and identification of those that are abnormal and require some action to be taken, which can be applied to many other similar types of bio-signals generated in assistive environments.It is based on a novel Markov Chain Monte Carlo (MCMC) Bayesian Inference approach, which estimates conditional probability distributions in structures obtained from a Tree-Augmented Naïve Bayes (TAN) algorithm. It has been applied and validated in a highly ‘difficult’ heterogeneous dataset of 198 heart sound signals, which comes from both healthy medical cases and unhealthy ones having Aortic Stenosis, Mitral Regurgitation, Aortic Regurgitation or Mitral Stenosis. The proposed methodology achieved high classification performance in this difficult screening problem. It performs higher than other widely used classifiers, showing great potential for contributing to a cost-effective large scale application of ICT-based assistive environment technologies.
Abstract
The extraction and exploitation of existing knowledge assets for supporting decision making and increasing the effectiveness of various internal and external interventions is of critical importance for the success of modern organizations. The use of advanced Operational Research based quantitative methods in combination with high capabilities information systems can be very useful for this purpose. In this paper we are investigating the use of Ensemble Random Forests for extracting, codifying and exploiting existing organizational knowledge on gas turbine blading faults identification, in the form of a large number of decision trees (called a ‘forest’); each of them has internal nodes corresponding to various tests on features of signals acquired from the gas turbine and leaf nodes corresponding to classifications to the healthy condition or particular faults. Two heterogeneous kinds of inserting randomness to the development of these forest trees, based on different theoretical assumptions, have been examined (Random Input Forests and Random Combination Forests). Using data from a large power gas turbine the performance of Ensemble Random Forests has been evaluated, and also compared against other machine learning classification methods, such as Neural Networks, Classification and Regression Trees and K-Nearest Neighbor. The Ensemble Random Forests reached a level of 97% in terms of precision and recall in engine condition diagnosis from new signals acquired from the gas turbine, which was higher than the performance of all the other examined classification methods. These results provide some first evidence that Ensemble Random Forest can be an effective tool for the extraction, codification and exploitation of the technological knowledge assets of modern organizations, and contribute significantly to the improvement of organizations’ decision making and interventions in this area.
The extraction and exploitation of existing knowledge assets for supporting decision making and increasing the effectiveness of various internal and external interventions is of critical importance for the success of modern organizations. The use of advanced Operational Research based quantitative methods in combination with high capabilities information systems can be very useful for this purpose. In this paper we are investigating the use of Ensemble Random Forests for extracting, codifying and exploiting existing organizational knowledge on gas turbine blading faults identification, in the form of a large number of decision trees (called a ‘forest’); each of them has internal nodes corresponding to various tests on features of signals acquired from the gas turbine and leaf nodes corresponding to classifications to the healthy condition or particular faults. Two heterogeneous kinds of inserting randomness to the development of these forest trees, based on different theoretical assumptions, have been examined (Random Input Forests and Random Combination Forests). Using data from a large power gas turbine the performance of Ensemble Random Forests has been evaluated, and also compared against other machine learning classification methods, such as Neural Networks, Classification and Regression Trees and K-Nearest Neighbor. The Ensemble Random Forests reached a level of 97% in terms of precision and recall in engine condition diagnosis from new signals acquired from the gas turbine, which was higher than the performance of all the other examined classification methods. These results provide some first evidence that Ensemble Random Forest can be an effective tool for the extraction, codification and exploitation of the technological knowledge assets of modern organizations, and contribute significantly to the improvement of organizations’ decision making and interventions in this area.
Abstract
Intrusion Detection Systems (IDS) have nowadays become a necessary component of almost every security infrastructure. So far, many different approaches have been followed
in order to increase the efficiency of IDS. Swarm Intelligence (SI), a relatively new bioinspired
family of methods, seeks inspiration in the behavior of swarms of insects or other animals. After applied in other fields with success SI started to gather the interest of
researchers working in the field of intrusion detection. In this paper we explore the reasons
that led to the application of SI in intrusion detection, and present SI methods that have
been used for constructing IDS. A major contribution of this work is also a detailed
comparison of several SI-based IDS in terms of efficiency. This gives a clear idea of which
solution is more appropriate for each particular case.
Abstract
With the proliferation of the Web and ICT technologies there have been concerns about the handling and use of sensitive information by data mining systems. Recent research has focused on distributed environments where the participants in the system may also be mutually mistrustful. In this paper we discuss the design and security requirements for large-scale privacy-preserving data mining (PPDM) systems in a fully distributed setting, where each client possesses its own records of private data. To this end we argue in favor of using some well-known cryptographic primitives, borrowed from the literature on Internet elections. More specifically, our framework is based on the classical homomorphic election model, and particularly on an extension for supporting multi-candidate elections. We also review a recent scheme [Z. Yang, S. Zhong, R.N. Wright, Privacy-preserving classification of customer data without loss of accuracy, in: SDM’ 2005 SIAM International Conference on Data Mining, 2005] which was the first scheme that used the homomorphic encryption primitive for PPDM in the fully distributed setting. Finally, we show how our approach can be used as a building block to obtain Random Forests classification with enhanced prediction performance.
Copyright Notice: This material is presented to ensure timely dissemination of scholarly and technical work. Copyright and all rights therein are retained by authors or by other copyright holders. All persons copying this information are expected to adhere to the terms and constraints invoked by each author's copyright. In most cases, these works may not be reposted or mass reproduced without the explicit permission of the copyright holder.
Abstract
The rapid growth of Information and Communication
Technologies emerges deep concerns on how data mining techniques and intelligent systems parse, analyze and manage enormous amount of data. Due to sensitive information contained within, data can be exploited by potential aggressors. Previous research has shown the most accurate approach to acquire knowledge from data while simultaneously preserving privacy is the exploitation of cryptography. In this paper we introduce an
extension of a privacy preserving data mining algorithm designed and developed for both horizontally and vertically partitioned databases. The proposed algorithm exploits the multi-candidate election schema and its capabilities to build a privacy preserving Tree Augmented Naive Bayesian classifier. Security analysis and
experimental results ensure the preservation of private data throughout mining processes.
Abstract
The COVID-19 pandemic is expected to lead to a severe recessionary economic crisis with quite negative consequences for large numbers of firms and citizens; however, this is an ‘old story’: recessionary economic crises appear re-peatedly in the last 100 years in the market-based economies, and they are rec-ognized as one of the most severe and threatening weaknesses of them. They can result in closure of numerous firms, and decrease of activities of many more, as well as poverty and social exclusion for large parts of the population, and finally lead to political upheaval and instability; so they constitute one of the most threatening and difficult problems that governments often face. For the above reasons it is imperative that governments develop effective public policies and make drastic interventions for addressing these economic crises. Quite useful for these interventions can be the prediction of the vulnerability of individual firms to recessionary economic crisis, so that government can focus its attention as well as its scarce economic resources on the most vulnerable ones. In this direction our pa-per presents a methodology for using existing government data in order to predict the vulnerability of individual firms to economic crisis, based on Artificial Intelligence (AI) Machine Learning (ML) algorithms. Furthermore, a first application of the proposed methodology is presented, based on existing data from the Greek Ministry of Finance and Statistical Authority concerning 363 firms for the economic crisis period 2009-2014, which gives encouraging results.
Abstract
An important trend in the area of digital government is its expansion beyond the support of internal processes and operations, as well as transactions and consultations with citizens and firms, which were the main objectives of its first generations, towards the support of higher-level functions of government agencies, with main emphasis on public policy making. This gives rise to the gradual development of policy analytics. Another important trend in the area of digital government is the increasing exploitation of artificial intelligence techniques by government agencies, mainly for the automation, support and enhancement of operational tasks and lower-level decision making, but only to a
very limited extent for the support of higher-level functions, and especially policy making. Our paper contributes towards the advancement and the combination of these two important trends: it proposes a policy analytics methodology for the exploitation of existing public and private sector data, using a big
data oriented artificial intelligence technique, feature selection, in order to
support policy making concerning one of the most serious problems that governments face, the economic crises. In particular, we present a methodology for exploiting existing data of taxation authorities, statistical agencies, and also of private sector business information and consulting firms, in order to identify
characteristics of a firm (e.g. with respect to strategic directions, resources, capabilities, practices, etc.) as well as its external environment (e.g. with respect to competition, dynamism, etc.) that affect (positively or negatively) its resilience to the crisis with respect to sales revenue; for this purpose an advanced artificial intelligence feature selection algorithm, the Boruta ‘all-relevant’ variables identification one, is used. Furthermore, an application of the proposed
economic crisis policy analytics methodology is presented, which provides a first validation of the usefulness of our methodology.
Abstract
Since 2009, the European Union (EU) is phasing a multi–year financial crisis affecting the stability of its involved countries. Our goal is to gain useful insights on the societal impact of such a strong political issue through the exploitation of topic modeling and stance classification techniques. \ \ To perform this, we unravel public’s stance towards this event and empower citizens’ participation in the decision making process, taking policy’s life cycle as a baseline. The paper introduces and evaluates a bilingual stance classification architecture, enabling a deeper understanding of how citizens’ sentiment polarity changes based on the critical political decisions taken among European countries. \ \ Through three novel empirical studies, we aim to explore and answer whether stance classification can be used to: i) determine citizens’ sentiment polarity for a series of political events by observing the diversity of opinion among European citizens, ii) predict political decisions outcome made by citizens such as a referendum call, ii) examine whether citizens’ sentiments agree with governmental decisions during each stage of a policy life cycle.
Abstract
Cloud computing (CC) can offer significant benefits to enterprises. However, it can pose some risks as well, and this has led to a lower adoption than the initial expectations. For this reason, it would be very useful to predict which enterprises will exhibit a propensity for CC adoption. In this direction, we investigate the use of six well-established classifiers (fast large margin Support Vector Machine, Naive Bayes, Decision Tree, Random Forest, k-Nearest Neighbor, and Linear Regression) for the prediction of enterprise level propensity for CC adoption. Having as our theoretical foundation the Technology – Organization – Environment (TOE) framework, we are using for this purpose of set of technological (concerning enterprise information systems), organizational and environmental characteristics. Our first results, using a dataset of 676 manufacturing firms of the glass, ceramic and cement sectors from six European countries (Germany, France, Italy, Poland, Spain, and UK), collected through the e-Business W@tch Survey of the European Commission, are encouraging. It is concluded that among the examined characteristics the technological ones, concerning enterprise systems, seem to be the most important predictors.
Abstract
In the last decade there is extensive and continuously growing creation of political content in the Internet, and especially in the Web 2.0 social media, which can be quite useful for government agencies in order to understand the needs and problems of societies and formulate effective public policies for addressing them. So a variety of ICT-based methods have been developed for the exploitation of this political content by governments (‘citizensourcing’), initially simpler and later more sophisticated ones. These ICT-based methods are increasingly based on the use of opinion mining (OM) and sentiment analysis (SA) techniques, in order to process the extensive political content collected from numerous sources. This paper describes a novel approach to OM and SA use, created as part of an advanced ICT-based method of exploiting political content created in the Internet, and especially in social media, by experts (‘expertsourcing’), aiming to leverage the extensive policy community of the European Union, which is developed in the European EU-Community project. Furthermore, some first experimental results of it are presented.
Abstract
The EU Community project seeks to promote, facilitate, and ultimately exploit the synergy of a cutting-edge intelligent collaboration platform with a community of institutional actors, stakeholders, scientists, consultants, media analysts and other individuals that can make valuable contributions to EU policy debates. Its ultimate goal is to effectuate a transformation in the modus operandi
of EU politics and move closer to achieving the illusive goals of improved transparency,
efficiency, awareness and engagement, ultimately leading to better policies
for a better European Union.
Abstract
In the last decade there is extensive and continuously growing creation of political content in the Internet, and especially in the Web 2.0 social media, which can be quite useful for government agencies in order to understand the needs and problems of societies and formulate effective public policies for addressing them. So a variety of ICT-based methods have been developed for the exploitation of this political content by governments (‘citizensourcing’), initially simpler and later more sophisticated ones. These ICT-based methods are increasingly based on the use of opinion mining (OM) and sentiment analysis (SA) techniques, in order to process the extensive political content collected from numerous sources. This paper describes a novel approach to OM and SA use, created as part of an advanced ICT-based method of exploiting political content created in the Internet, and especially in social media, by experts (‘expertsourcing’), aiming to leverage the extensive policy community of the European Union, which is developed in the European EU-Community project. Furthermore, some first experimental results of it are presented.
Abstract
One of the major issues that Greek Higher Education Institutes face is the delayed completion of studies of their students. For example, in the case of the Technological Educational Institute of Athens, in the academic year 2012-2013, the percentage of graduates with a length of studies of more than 6 years was 53%. This "problem" becomes harder if we consider that according to the new legislation, the Greek Higher Education Institutes (HEI) must cut off access to the students who "linger" too long. This means that many of these graduateswouldn't be able to complete their studies. While many institutes have systems to quantify and report the length of studies of all graduates, far less attention is typically paid to each student's reason(s) for delayed graduation. In this paper, we focus on examining the question of why students delay in the completion of their studies using several data mining techniques. Through the application of data mining techniques new knowledge will be provided to the administration of a HEI that could be used for solving this problem. The data used in our case study come from a questionnaire distributed to graduates of the institute but also from educational data stored in the Institute's student database.
Abstract
Cyberbullying is a new phenomenon resulting from the advance of new communication technologies including the Internet, cell phones and Personal Digital Assistants. It is a challenging bullying problem occurring in a new territory. Online bullying can be particularly damaging and upsetting because it's usually anonymous or hard to trace. In this paper, the proposed method is utilizing a dataset of real world conversations (i.e. pairs of questions and answers between cyber predator and the victim), in which each predator question is manually annotated in terms of severity using a numeric label. We approach the issue as a sequential data modelling approach, in which the predator’s questions are formulated using a Singular Value Decomposition representation. The motivation of this procedure is to study the accuracy of predicting the level of cyberbullying attack using classification methods and also to examine potential patterns between the lingustic style of each predator. More specifically, unlike previous approaches that consider a fixed window of a cyber-predator’s questions within a dialogue, we exploit the whole question set and model it as a signal, whose magnitude depends on the degree of bullying content. Using feature weighting and dimensionality reduction techniques, each signal is straightforwardly parsed by a neural network that forecasts the level of insult within a question given a window between two and three previous questions. Throughout the time series modeling experiments, an interesting discovery was made. By applying SVD on the time series data and taking into account the second dimension (since the first is usually modeling trivial dependencies between instances and attributes) we observed that its plot was very similar to the plot of the class attribute. By applying a Dynamic Time Warping algorithm, the similarity of the aforementioned signals was proved to exist, providing an immediate indicator for the severity of cyberbullying within a given dialogue.
Abstract
One of the biggest challenges that Higher Education Institutions (HEI) faces is to improve the quality of their educational processes. Thus, it is crucial for the administration of the institutions to set new strategies and plans for a better management of the current processes. Furthermore, the managerial decision is becoming more difficult as the complexity of educational entities increase. The purpose of this study is to suggest a way to support the administration of a HEI by providing new knowledge related to the educational processes using data mining techniques. This knowledge can be extracted among other from educational data that derive from the evaluation processes that each department of a HEI conducts. These data can be found in educational databases, in students’ questionnaires or in faculty members’ records. This paper presents the capabilities of data mining in the context of a Higher Education Institute and tries to discover new explicit knowledge by applying data mining techniques to educational data of Technological Educational Institute of Athens. The data used for this study come from students’ questionnaires distributed in the classes within the evaluation process of each department of the Institute.
Abstract
Privacy preserving analysis of a social network aims at a better understanding of the network and its behavior, while at the same time protecting the privacy of its individuals. We propose an anonymization method for weighted graphs, i.e., for social networks where the strengths of links are important. This is in contrast with many previous studies which only consider unweighted graphs. Weights can be essential for social network analysis, but they pose new challenges to privacy preserving network analysis. In this paper, we mainly consider prevention of identity disclosure, but we also touch on edge and edge weight disclosure in weighted graphs. We propose
a method that provides k-anonymity of nodes against attacks where the adversary has information about the structure of the network, including its edge weights. The method is efficient, and it has been evaluated in terms of privacy and utility on real word datasets.
Abstract
Electronic Participation (eParticipation), both in its traditional form and in its emerging Web 2.0 based form, results in the production of large quantities of textual contributions of citizens concerning government policies and decisions under formation, which contain valuable relevant opinions and knowledge of the society, however are exploited to a limited only extent. It is of critical importance to analyze these contributions in order to extract the opinions and knowledge they contain in a cost-efficient way. This paper reviews a wide range of opinion mining methods, which have been developed for analyzing commercial product opinions and reviews posted on the Web, as to the capabilities they can offer for meeting the above challenges. The review has revealed the great potential of these methods for the analysis of textual citizens’ contributions in public policy debates, both for assessing contributors’ general attitudes-sentiments (positive, negative or neutral) towards the policy/decision under discussion, and also for extracting the main issues they raise (e.g. negative and positive aspects and effects, implementation barriers, improvement suggestions) and the corresponding attitudes-sentiments. Based on the conclusions of this review a basic framework for the use of opinion mining methods in eParticipation has been formulated.
Abstract
The evolution of new technologies and the spread of the Internet have led to the exchange and elaboration of massive amounts of data. Simultaneously, intelligent systems that parse and analyze patterns within data are gaining popularity. Many of these data contain sensitive information, a fact that leads to serious concerns on how such data should be managed and used from data mining techniques. Extracting knowledge from statistical databases is an essential step towards deploying intelligent systems that assist in making decisions, but also must preserve the privacy of parties involved. In this paper, we present a novel privacy preserving data mining algorithm from statistical databases that are horizontally partitioned. The novelty lies to the multi-candidate election schema and its capabilities of being a basic foundation for a privacy preserving Tree Augmented Naïve Bayesian (TAN) classifier, in order to obviate disclosure of personal information.
Abstract
The large scale application of ICT-based assistive environment technologies for the home care of elderly and disabled people is going to generate huge numbers of signals transmitted from homes to local health centers or hospitals in order to be monitored by medical personnel. This task is going to be of critical importance and at the same time - if manually performed - quite demanding for specialized human resources and costly. In order to perform it in a cost-efficient manner it is necessary to develop mechanisms and methods for automated screening of these signals in order to identify abnormal ones that require some action to be taken. This paper proposes a method for automatic screening of heart sound signals, which are the most widely acquired signals from the human body for diagnostic purposes in both the „traditional‟ medicine and the emerging ICT-based assistive environments. It is based on a novel Markov Chain Monte Carlo (MCMC) Bayesian Inference approach, which estimates conditional probability distributions in structures obtained from a Tree-Augmented Naïve Bayes (TAN) algorithm. The proposed approach has been applied and validated in a difficult heterogeneous dataset of 198 heart sound signals, which comes from both healthy medical cases and unhealthy ones having Aortic Stenosis, Mitral Regurgitation, Aortic Regurgitation or Mitral Stenosis. The proposed approach achieved a good performance in this difficult screening problem, which is higher than other widely used alternative classifiers, showing great potential for contributing to a cost-effective large scale application of ICT-based assistive environment technologies.
Abstract
The development of ‘intelligent’ medical
equipment, which can not only acquire various signals from the
human body, but also process them and provide
recommendations as to probable pathological conditions, will
be highly beneficial for both the medical personnel and the
patients. However, this necessitates the development and
exploitation of advanced highly efficient classification
techniques. In this direction this paper presents a novel
ensemble classification technique, combining Random Forests
with the ‘Markov Blanket’ notion, which is used for the
automated diagnosis of aortic and mitral heart valves diseases
from low-cost and easily acquired heart sound signals. It has
been tested in a highly ‘difficult’ global and heterogeneous
dataset of 198 heart sound signals, which been acquired from
both healthy and pathological medical cases. The proposed
ensemble classification technique exhibited a higher
classification performance in comparison with the classical
Random Forest algorithms, and also other widely used
classification algorithms.
Abstract
The aging population in many countries, in combination with high
government deficits and financial resources limitations,
necessitates new methods for the home care of the elderly at
reasonable costs based on the exploitation of modern information
and communication technologies (ICT). This requires the
installation of assistive environments at the homes of elderly
people, which include various types of sensors, generating biosignals
of other types of signals, which are transferred through
networks to local health centers or hospitals in order to be
monitored. However, scaling up the application of such ICTbased
methods of elderly home care is going to increase
tremendously the workload of the medical staff of local health
centers or hospitals. Therefore it is of critical importance to
develop capabilities for an automated first screening of these
signals and identification of abnormal elements and diseases. In
this direction the present paper proposes a system for the
automatic identification of murmurs in heart sound signals, and
also for the classification of them as systolic or diastolic, using a
new generation of advanced Random Forests classification
algorithms, which are aggregating the prediction of multiple
classifiers (ensemble classification). The proposed system has
been applied and validated in a representative global dataset of
198 heart sound signals, which come both from healthy medical
cases and from cases having systolic and diastolic murmurs. Also,
some alternative classifiers have been applied to the same data for
comparison purposes. It has been concluded that the proposed
systems shows a good performance, which is higher than the
examined alternative classifiers.
Abstract
Data mining technology raises concerns about the handling and use of sensitive information, especially in highly distributed environments where the participants in the system may by mutually mistrustful. In this paper we argue in favor of using some well-known cryptographic primitives, borrowed from the literature on large-scale Internet elections, in order to preserve accuracy in privacy-preserving data mining (PPDM) systems. Our approach is based on the classical homomorphic model for online elections, and more particularly on some extensions of the model for supporting multi-candidate elections. We also describe some weaknesses and present an attack on a recent scheme [1] which was the first to use a variation of the homomorphic model in the PPDM setting. In addition, we show how PPDM can be used as a building block to obtain a Random Forests classification algorithm over a set of homogeneous databases with horizontally partitioned data.
Abstract
In the present paper, Random Forests are used in a
critical and at the same time non trivial problem
concerning the diagnosis of Gas Turbine blading
faults, portraying promising results. Random forestsbased
fault diagnosis is treated as a Pattern
Recognition problem, based on measurements and
feature selection. Two different types of inserting
randomness to the trees are studied, based on different
theoretical assumptions. The classifier is compared
against other Machine Learning algorithms such as
)eural )etworks, Classification and Regression Trees,
)aive Bayes and K-)earest )eighbor. The
performance of the prediction model reaches a level of
97% in terms of precision and recall, improving the
existing state-of-the-art levels achieved by )eural
)etworks by a factor of 1.5%-2%. Furthermore,
emphasis is given on the pre-processing phase, where
feature selection and outliers identification is carried
out, in order to provide the basis of a high
performance automated diagnostic system. The
conclusions derived are of more general interest and
applicability.
Abstract
In the present paper, Random Forests are used in a critical
and at the same time non trivial problem concerning the diagnosis
of Gas Turbine blading faults, portraying promising results. Random
forests-based fault diagnosis is treated as a Pattern Recognition
problem, based on measurements and feature selection. Two different
types of inserting randomness to the trees are studied, based on
different theoretical assumptions. The classifier is compared against
other Machine Learning algorithms such as Neural Networks, Classification
and Regression Trees, Naive Bayes and K-Nearest Neighbor.
The performance of the prediction model reaches a level of 97% in
terms of precision and recall, improving the existing state-of-the-art
levels achieved by Neural Networks by a factor of 1.5%-2%.
Copyright Notice: This material is presented to ensure timely dissemination of scholarly and technical work. Copyright and all rights therein are retained by authors or by other copyright holders. All persons copying this information are expected to adhere to the terms and constraints invoked by each author's copyright. In most cases, these works may not be reposted or mass reproduced without the explicit permission of the copyright holder.
Copyright Notice: This material is presented to ensure timely dissemination of scholarly and technical work. Copyright and all rights therein are retained by authors or by other copyright holders. All persons copying this information are expected to adhere to the terms and constraints invoked by each author's copyright. In most cases, these works may not be reposted or mass reproduced without the explicit permission of the copyright holder.
Copyright Notice: This material is presented to ensure timely dissemination of scholarly and technical work. Copyright and all rights therein are retained by authors or by other copyright holders. All persons copying this information are expected to adhere to the terms and constraints invoked by each author's copyright. In most cases, these works may not be reposted or mass reproduced without the explicit permission of the copyright holder.