A study of Document object Model and removing noise from web pages - Softcover

Sultan, Bisma

 
9786139843664: A study of Document object Model and removing noise from web pages

Inhaltsangabe

The Violation of official rules of HTML results in some spot or error. Furthermore, when we convert a word document into web page, the code contains some unnecessary html tags as well as proprietary tags. Such undesirable, redundant, inessential, irrelevant tags are considered as noise. These noisy elements disturb the web page contents and make it difficult to read the contents of web page. Noise adversely affects web data mining and by eliminating noise we can reduce storage and indexing requirements Noise removal helps us to improve the performance of web page clustering, classification, content mining, and summarization. In the proposed work, web page noise has been identified by using four popular web browsers namely Google chrome, Internet Explorer7, Mozilla Firefox and opera and three web authoring tools which are Ms Word, Dreamweaver8 and Microsoft expression web4. Once the noise has been identified, we then classified this noise into different categories based on the source of word document. The experiment was conducted by running 40 web pages on the four popular web browsers and the results obtained shows that web page noise to a large extend depends on the source

Die Inhaltsangabe kann sich auf eine andere Ausgabe dieses Titels beziehen.

Über die Autorin bzw. den Autor

The author is pursuing PhD in Computer Science at University of Kashmir, Srinagar. She has been awarded M.Tech in CSE from University of jammu, and B.Tech in CSE from University of Kashmir. Her areas of expertise are Web Technologies and Deep Learning

„Über diesen Titel“ kann sich auf eine andere Ausgabe dieses Titels beziehen.