A Hidden Topic-Based Framework toward Building Applications with Short Web Documents


A Hidden Topic-Based Framework toward
Building Applications with Short Web Documents

Abstract:
This paper introduces a hidden topic-based framework for processing short and sparse documents (e.g., search result snippets, product descriptions, book/movie summaries, and advertising messages) on the Web. The framework focuses on solving two main challenges posed by these kinds of documents: 1) data sparseness and 2) synonyms/homonyms. The former leads to the lack of shared words and contexts among documents while the latter are big linguistic obstacles in natural language processing (NLP) and information retrieval (IR). The underlying idea of the framework is that common hidden topics discovered from large external data sets (universal data sets), when included, can make short documents less sparse and more topic-oriented. Furthermore, hidden topics from universal data sets help handle unseen data better. The proposed framework can also be applied for different natural languages and data domains. We carefully evaluated the framework by carrying out two experiments for two important online applications (Web search result classification and matching/ranking for contextual advertising) with large-scale universal data sets and we achieved significant results.

Existing System:
Ø In The Existing system Framework is not solving two main challenges.
Ø 1. Data sparseness and 2 .synonyms/homonyms.
Ø The former leads to the lack of shared words and contexts among      Documents while the latter are big linguistic obstacles in Natural    Language processing (NLP) and information retrieval (IR).
Ø Fail to achieve the desired accuracy due to the data sparseness.

Proposed System:
Ø In the  proposed system we are solving  the two main challenges  i.e.,
 Short and sparse data problem & synonyms and homonyms.

Ø  Proposes the general framework of classification and contextual matching with hidden topics.
Ø Describes the analysis of large-scale text/Web data collections that serve as universal data sets in the framework.
Ø Describes how to build a matching and ranking model with hidden topics for online contextual advertising.


KEYWORDS:
Generic Technology Keywords: Database, User Interface, Programming
Specific Technology Keywords: C#.Net, ASP.Net, MS SqlServer-08
Project Keywords: Presentation, Business Object, Data Access Layer
SDLC Keywords: Analysis, Design, Code, Testing, Implementation, Maintenance







SYSTEM CONFIGURATION
HARDWARE CONFIGURATION
S.NO
HARDWARE
CONFIGURATIONS
1
Operating System
Windows 2000 & XP
2
RAM
1GB
3
Processor (with Speed)
Intel  Pentium IV (3.0 GHz) and Upwards
4
Hard Disk Size
40 GB and above
5
Monitor
15’ CRT
SOFTWARE CONFIGURATION
S.NO
SOFTWARE
CONFIGURATIONS
1
Platform
Microsoft Visual Studio
2
Framework
.Net Framework 4.0
3
Language
C#.Net
4
Front End
Asp.net, html
5
Back End
SQL Server 2008

0 comments:

Post a Comment