For all its advantages in Big Data research, Hadoop has one major drawback, it struggles to achieve 20% server utilization, writes Jeff Kelly in his latest Wikibon Peer Incite, “Improving Hardware Efficiency Important to Overall Hadoop ROI”. This makes large, production Hadoop installations running on hundreds of nodes into a drain on IT CapEx. Earlier this summer VMware announced Project Serengeti, including the contribution of code to Apache Hadoop to make HDFS and MapReduce “virtualization aware”, to support Hadoop virtualization to attack this issue. The only problem – each node of a Hadoop cluster needs its own copy of vSphere Enterprise Edition, which adds a major “virtualization tax” to the implementation.


Por Editorial