A Hidden Topic-Based Framework toward Building Applications with Short Web Documents
A
Hidden Topic-Based Framework toward
Building
Applications with Short Web Documents
Abstract:
This paper
introduces a hidden topic-based framework for processing short and sparse
documents (e.g., search result snippets, product descriptions, book/movie summaries,
and advertising messages) on the Web. The framework focuses on solving two main
challenges posed by these kinds of documents: 1) data sparseness and 2)
synonyms/homonyms. The former leads to the lack of shared words and contexts
among documents while the latter are big linguistic obstacles in natural
language processing (NLP) and information retrieval (IR). The underlying idea
of the framework is that common hidden topics discovered from large external
data sets (universal data sets), when included, can make short documents less
sparse and more topic-oriented. Furthermore, hidden topics from universal data
sets help handle unseen data better. The proposed framework can also be applied
for different natural languages and data domains. We carefully evaluated the
framework by carrying out two experiments for two important online applications
(Web search result classification and matching/ranking for contextual
advertising) with large-scale universal data sets and we achieved significant
results.
Existing
System:
Ø In
The Existing system Framework is not solving two main
challenges.
Ø 1.
Data sparseness and 2 .synonyms/homonyms.
Ø The
former leads to the lack of shared words and contexts among Documents while the latter are big
linguistic obstacles in Natural Language processing (NLP) and
information retrieval (IR).
Ø Fail
to achieve the desired accuracy due to the data sparseness.
Proposed
System:
Ø In the proposed system we are solving the two main challenges i.e.,
Short and sparse data problem & synonyms
and homonyms.
Ø Proposes the general framework of
classification and contextual matching with hidden topics.
Ø Describes the analysis
of large-scale text/Web data collections that serve as universal data sets in
the framework.
Ø Describes how to
build a matching and ranking model with hidden topics for online contextual
advertising.
KEYWORDS:
Generic Technology Keywords: Database,
User Interface, Programming
Specific Technology Keywords: C#.Net,
ASP.Net, MS SqlServer-08
Project Keywords: Presentation, Business Object, Data Access Layer
SDLC Keywords: Analysis, Design, Code, Testing, Implementation, Maintenance
SYSTEM
CONFIGURATION
HARDWARE
CONFIGURATION
S.NO
|
HARDWARE
|
CONFIGURATIONS
|
1
|
Operating System
|
Windows 2000 & XP
|
2
|
RAM
|
1GB
|
3
|
Processor (with Speed)
|
Intel
Pentium IV (3.0 GHz) and Upwards
|
4
|
Hard Disk Size
|
40 GB and above
|
5
|
Monitor
|
15’ CRT
|
SOFTWARE
CONFIGURATION
S.NO
|
SOFTWARE
|
CONFIGURATIONS
|
1
|
Platform
|
Microsoft Visual Studio
|
2
|
Framework
|
.Net Framework 4.0
|
3
|
Language
|
C#.Net
|
4
|
Front End
|
Asp.net, html
|
5
|
Back End
|
SQL Server 2008
|
0 comments:
Post a Comment