Web page Classification by Using PCA and Neural Network

Author:Laith R. Flaih
Computer Science Department, College of Science, Cihan University-Erbil

DOI: http://dx.doi.org/10.24086/cuesj.v1n1a12


With the exclusive growth in the WWW makes the internet growing very fast. Therefore classifiers of the web pages become more challenging. The proposed system is about using Principal Components Algorithm PCA to classify web documents.  In this research, new web page classification method is proposed, and the proposed system uses a neural network with inputs obtained by the Principal Components Algorithm.  The feature vectors that obtained from PCA are then used as the input to the neural networks for classification. The experimental evaluation demonstrates that the proposed system provides high quality classification accuracy with the sports news datasets.

Keywords: PCA ,Neural Networks, Web pages Classification, classifiers



Blog, (2013). Available from: http://blog.peltarion.com/2006/07/10/classifier-showdown [Accessed 10th February 2013]

BMVA (2013). Available from:  http://www.bmva.ac.uk/bmvc/1997/papers/duin/node5.html [Accessed 9th February 2013]

Chakrabarti, S. (2003) Mining the Web Discovering Knowledge From Hypertext Data. USA: Morgan Kaufmann Publishers, Elsevier Science

Fahmi, I, (2004a) Examining Learning Algorithms For Text Classification In Digital Libraries. Submitted In Partial Fulfillment of The Requirements for The Degree of Master of Arts at University of Groningen Groningen, The Netherland.

Fahmi, I, (2004b) Examining Learning Algorithm for Text Classification in Digital Libraries. Submitted in Partial Fulfillment of the Requirments for the degree of Master of Arts at University of Groningn, The Netherland.

Ipeirotis, P. G. (2004) Classifying and Searching Hidden-Web Text Databases. Submitted in partial fulfillment of the requirements for the degree of Doctor of Philosophy in the Graduate School of Arts and Sciences, Columbia University.

Jo, T. (2008) Neural Text Categorizer for Exclusive Text Categorization. Journal of Information Processing Systems, 4(2), p. 7.

Miltsakaki, E. and Troutt, A. (2009) Real-Time Web Text Classification and Analysis of Reading Difficulty. In:ISBI’09 IEEE International Symposium on Biomedical Imaging: From Nano to Macro, 28 June – 1 July, 2009, USA, Pages 89-97.

Minaei-Bidgoli, B. and Punch III W.F. (2003) Using Genetic Algorithms for Data Mining Optimization in an Educational Web-based System. Part of the Lecture Notes in Computer Science book Sereis (LNCS, Volume 2724).doi: 10.1007/3-540-45110-2_119 [Accessed 1st February 2013].

Mora-Jime´nez, I. and Figueiras-Vidal, A.R. (2009) Improving performance of neural classifiers via selective reduction of target levels.Neurocomputing Journal, 72(13-14), pp.2729-3410.

Resample (2013a). Available from:  http://www.resample.com/xlminer/help/NNC/NNClass_intro.htm [Accessed 12th February 2013].

Resample (2013b). Available from: http://www.resample.com/xlminer/help/NNC/NNClass_intro.htm [Accessed 2nd February 2013].

Scime, A. (2005) Web Mining: Applications and Techniques. USA: Idea Group Publishing

Weingessel, A. and Hornik, K. (2000) Local PCA Algorithms, IEEE Transactions on Neural Networks, 11(6), p. 4.

Wikipedia (2013a). Principal component analysis. Available from:  http://en.wikipedia.org/wiki/Principal_component_analysis [Accessed 1st February 2013].

Zhang, G. P. (2000) Neural Networks for Classifications: A Survey, IEEE Transactions on Systems, Man, And Cybernetics—Part C: Applications And Reviews, 30(4), p. 7.

Full Text

About admin

Check Also

Lexical Repetition as a Stylistic Device in Oscar Wilde’s ‘The Picture of Dorian Gray’

Author:Irfan Said Department of English, Cihan University-Erbil DOI:http://dx.doi.org/10.24086/cuesj.v1n1a11 Abstract Oscar Wilde’s novel ‘The Picture of …

Leave a Reply

Your email address will not be published.