IRM: Integrated File Replication and Consistency Maintenance in P2P Systems



IRM: Integrated File Replication and
Consistency Maintenance in P2P Systems

Abstract:


In peer-to-peer file sharing systems, file replication and consistency maintenance are widely used techniques for high system performance. Despite significant interdependencies between them, these two issues are typically addressed separately. Most file replication methods rigidly specify replica nodes, leading to low replica utilization, unnecessary replicas and hence extra consistency maintenance overhead. Most consistency maintenance methods propagate update messages based on message spreading or a structure without considering file replication dynamism, leading to inefficient file update and hence high possibility of outdated file response. This paper presents an Integrated file Replication and consistency Maintenance mechanism (IRM) that integrates the two techniques in a systematic and harmonized manner. It achieves high efficiency in file replication and consistency maintenance at a significantly low cost. Instead of passively accepting replicas and updates, each node determines file replication and update polling by dynamically adapting to time-varying file query and update rates, which avoids unnecessary file replications and updates. Simulation results demonstrate the effectiveness of IRM in comparison with other approaches. It dramatically reduces overhead and yields significant improvements on the efficiency of both file replication and consistency maintenance approaches.



Existing System:
OVER the past years, the immerse popularity of Internet has produced a significant stimulus to peer-to-peer (P2P) file sharing systems. A recent large-scale characterization of HTTP traffic has shown that more than 75 percent of Internet traffic is generated by P2P applications. The percentage of P2P traffic has increased significantly as files such as videos and audios have become almost pervasive. The study also shows that the access to these files is highly repetitive and skewed towards the most popular ones. Such objects can exhaust the capacity of a node, leading to delayed response. File replication is an effective method to deal with the problem of overload condition due to flash crowds or hot files. It distributes load over replica nodes and improves file query efficiency. File consistency maintenance to maintain the consistency between a file and its replicas is indispensable to file replication. Requiring that the replica nodes be reliably informed of all updates could be prohibitively costly in a large system. Thus, file replication should proactively reduce unnecessary replicas to minimize the overhead of consistency maintenance, which in turn provides guarantee for the fidelity of consistency among file replicas considering file replication dynamism. File replication dynamism represents the condition with frequent replica node generation, deletion, and failures. Fig. 1 demonstrates the interrelationship between file replication and consistency maintenance.
Disadvantages:-
1) traditional file replication and consistency maintenance methods either are not sufficiently

    effective or incur prohibitively high overhead.

2) We find that IRM relying on polling file owners still cannot guarantee that all file requesters

   receive up-to-date files, although its performance is better than other consistency maintenance

   algorithms
Proposed System:
This paper presents an Integrated file Replication and consistency Maintenance mechanism (IRM) that achieves high efficiency in file replication and consistency maintenance at a significantly lower cost. IRM integrates file replication and consistency maintenance in a harmonized and coordinated manner. Basically, each node actively decides to create or delete a replica and to poll for update based on file query and update rates in a totally decentralized and autonomous manner. It replicates highly queried files and polls at a high frequency for frequently updated and queried files. IRM avoids unnecessary file replications and updates by dynamically adapting to time varying file query and update rates. It improves replica utilization, file query efficiency, and consistency fidelity. A significant feature of IRM is that it achieves an optimized trade-off between overhead and query efficiency as well as consistency guarantees. IRM is ideal for P2P systems due to a number of reasons. First, IRM does not require a file owner to keep track of replica nodes. Therefore, it is resilient to node joins and leaves, and thus suitable for highly dynamic P2P systems. Second, since each node determines its need for a file replication or replica update autonomously, the decisions can be made based on its actual query rate, eliminating unnecessary replications and validations. This coincides in spirit with the nature of node autonomy of P2P systems. Third, IRM enhances the guarantee of file consistency. It offers the flexibility to use different replica update rate to cater to different consistency requirements determined by the nature of files and user needs. Faster update rate leads to higher consistency guarantee, and vice versa. Fourth, IRM ensures high possibility of up-to-date file responses.



Advantages:-
1) an Integrated file Replication and consistency Maintenance mechanism (IRM) that
    integrates the two techniques in a systematic and harmonized manner.

2) It dramatically reduces overhead and yields significant improvements on the 
     efficiency of both file replication and consistency maintenance approaches.


Software Requirements: 
The major software requirements of the project are as follows.
Language             :         Java (JDK1.6)
Operating System :         Microsoft Windows Xp Service Pack 2
IDE                       :         Eclipse IDE 8.0&
Front End             :        JSP & Servlets
Services                 :        SOA arechitecture
Hardware Requirements:
Processor              :         Intel Pentium 4
RAM                     :         256 MB
Hard Disk             :         40 GB        
Module Description:
1.     Adaptive File replication
i.                   Replicate nodes determination
ii.                 Replica creation
2.     File Consistency Maintenance
i.                   Polling frequency reduction
ii.                 Poll reduction
3.     Impact file replication on consistency maintenance

Adaptive File replication:
The replication algorithm achieves an optimized trade-off between query efficiency and overhead in file replication. In addition, it dynamically tunes to time-varying file popularity and node interest, and adaptively determines replica nodes based on query traffic. In the following, we introduce IRM’s file replication component by addressing two main problems in file replication:
1) Where to replicate files so that the file query can be significantly expedited and the replicas can be fully utilized?
2) How to remove underutilized file replicas so that the overhead for consistency maintenance is minimized?
Replicate nodes Determination:
Frequent requesters of a file and traffic junction nodes (i.e., hot routing spots) in query paths should be the ideal file replica nodes for high utilization of file replicas. Based on this, IRM replicates a file in nodes that have been very interested in the file or routing nodes that have been carrying more query traffic of the file. The former arrangement enables frequent requesters of a file to get the file without query routing, and the latter increases the possibility that queries from different directions encounter the replica nodes, thus making full use of file replicas. In addition, replicating file in the middle rather than in the ends of a query path speeds up file query.
Replica Creation:
The product of a constant factor and the normal query passing rate in the system. In IRM, when a routing node a receives query for file f, it checks lf . In the case that lf > Tl and the node has available capacity for a file replica, it adds a file replication request into the original file request with its IP address. After the file destination receives the query, if it is overloaded, it checks if the file query has additional file replication requests. If so, it sends the file to the replication requesters in addition to the query initiator. Otherwise, it replicates file f to its neighbors that forward the queries of file f most frequently.
File Consistency Maintenance:
The dynamism has posed a challenge for timely update in structured-based consistency maintenance methods. On the other hand, consistency maintenance relying on message spreading generate high overhead due to dramatically redundant messages. Rather than relying on a structure or message spreading, IRM employs adaptive polling for file consistency maintenance to cater to file replication dynamism. A poll approach puts the burden of consistency maintenance on individual nodes. Unlike push, poll approach can achieve good consistency for distant nodes and is less sensitive to P2P dynamism, network size, and the connectivity of a node.
Poll Reduction:
The file change rate, file query rate is also a main factor to consider in consistency maintenance. Even when a file changes frequently, if a replica node does not receive queries for the file or hardly queries for the file during a time period, it is an overhead waste to poll the file’s owner for validation during the time period. However, most current consistency maintenance methods neglect the important role that file query rate plays in reducing overhead.
Polling frequency reduction:
In this case, a file replica node can ensure that a replica is never outdated by more than 4t seconds by polling the owner every 4t seconds. Since the rate of file change varies over time as hot files become cold and vice versa, a replica node should be able to adapt its polling frequency in response to the variations. In IRM, a replica node intelligently tailors its polling frequency so that it polls at approximately the same frequency of file change.
Impacts file replication on consistency maintenance:
IRM minimizes the number of replicas while maintaining high efficiency and effectiveness of file replication. First, without arranging the file server to keep track of the query rate of nodes in a centralized manner, IRM enables each node to autonomously keep track of its own load status. Thus, the file server won’t be overloaded easily, leading to less replicas. Second, IRM replicates files in nodes with high query passing rate or query initiating rate. This guarantees that a request has high probability to encounter a replica node and every replica is highly utilized.



0 comments:

Post a Comment