Your Profile

Education

Research Interests

Teaching Activities

Journals


Copyright Notice: This material is presented to ensure timely dissemination of scholarly and technical work. Copyright and all rights therein are retained by authors or by other copyright holders. All persons copying this information are expected to adhere to the terms and constraints invoked by each author's copyright. In most cases, these works may not be reposted or mass reproduced without the explicit permission of the copyright holder.


[1]
D. Pritsos, E. Stamatatos, Open Set Evaluation in Web Genre Identification, Language Resources and Evaluation, Vol. 52, No. 4, pp. 949–968, 2018, Springer, http://dx.doi.org/10.1007/s10579-018-941...

Conferences


Copyright Notice: This material is presented to ensure timely dissemination of scholarly and technical work. Copyright and all rights therein are retained by authors or by other copyright holders. All persons copying this information are expected to adhere to the terms and constraints invoked by each author's copyright. In most cases, these works may not be reposted or mass reproduced without the explicit permission of the copyright holder.


[1]
D. Pritsos, A. Rocha, E. Stamatatos, Open-set Web Genre Identification Using Distributional Features and Nearest Neighbors Distance Ratio, 41st European Conference on Information Retrieval (ECIR), pp. 3-11, Dec, 2019, Springer, http://dx.doi.org/10.1007/978-3-030-1571...
[2]
D. Pritsos, E. Stamatatos, The Impact of Noise in Web Genre Identification, 6th International Conference of the CLEF Association (CLEF-2015), pp. 268-273, Dec, 2015, Springer LNCS 9283, http://link.springer.com/chapter/10.1007...
[3]
D. Pritsos, E. Stamatatos, The Impact of Noise in Web Genre Identification, 6th International Conference of the CLEF Association, Josiane Mothe, Jacques Savoy, Jaap Kamps, Karen Pinel-Sauvagnat, Gareth J. F. Jones, Eric SanJuan, Linda Cappellato, Nicola Ferro, (eds), pp. 268-273, Dec, 2014, Springer, http://dx.doi.org/10.1007/978-3-319-2402...
D. Pritsos, E. Stamatatos, Open-Set Classification for Automated Genre Identification, Advances in Information Retrieval - 35th European Conference on IR Research (ECIR 2013), pp. 207-217, Dec, 2013, Springer LNCS,
 

Abstract
Automated Genre Identification (AGI) of web pages is a problem of increasing importance since web genre (e.g. blog, news, eshops, etc.) information can enhance modern Information Retrieval (IR) systems. The state-of-the-art in this field considers AGI as a closed-set classification problem where a variety of web page representation and machine learning models have intensively studied. In this paper, we study AGI as an open-set classification problem which better formulates the real world conditions of exploiting AGI in practice. Focusing on the use of content information, different text representation methods (words and character n-grams) are tested. Moreover, two classification methods are examined, one-class SVM learners, used as a baseline, and an ensemble of classifiers based on random feature subspacing, originally proposed for author identification. It is demonstrated that very high precision can be achieved in open-set AGI while recall remains relatively high.

Books


Copyright Notice: This material is presented to ensure timely dissemination of scholarly and technical work. Copyright and all rights therein are retained by authors or by other copyright holders. All persons copying this information are expected to adhere to the terms and constraints invoked by each author's copyright. In most cases, these works may not be reposted or mass reproduced without the explicit permission of the copyright holder.


Chapters in Books


Copyright Notice: This material is presented to ensure timely dissemination of scholarly and technical work. Copyright and all rights therein are retained by authors or by other copyright holders. All persons copying this information are expected to adhere to the terms and constraints invoked by each author's copyright. In most cases, these works may not be reposted or mass reproduced without the explicit permission of the copyright holder.


Conferences Proceedings Editor


Copyright Notice: This material is presented to ensure timely dissemination of scholarly and technical work. Copyright and all rights therein are retained by authors or by other copyright holders. All persons copying this information are expected to adhere to the terms and constraints invoked by each author's copyright. In most cases, these works may not be reposted or mass reproduced without the explicit permission of the copyright holder.