Load balancing in MapReduce on homogeneous and heterogeneous clusters: an in-depth review
Numbers of various programming models have been proposed to process big data in recent years. However, MapReduce is the most famous programming model amongst cloud computing environments and includes many advantages, yet there are several challenges to deal with. Load balancing is considered as one of the most significant downsides of MapReduce which causes the increase in applications' runtime and accordingly results in less-efficiency, where there is no appropriate proposed mechanism. Although, data locality and data skew are known as two main key factors for determination of load balancing, yet it is remarkable that load balance highly depends on whether the computational clusters are homogeneous or heterogeneous. This paper examines the effectiveness of two main key factors. These are data locality and data skew on homogeneous and heterogeneous clusters. Besides, a review is conducted on a number of recent literature in the same context of load balancing improvements in Hadoop MapReduce. Finally, all investigated researches are compared with the purpose of highlighting the differences of various load balancing methods, the optimisation phase, type of clusters and the main challenges
mohammad javad kargar
mesyam vakili
Inderscience