Apache Hadoop 3.0 is coming! As the next major release, it attracts everyone's attention as show case several bleeding-edge technologies and significant features across all components of Apache Hadoop, include: Erasure Coding in HDFS, Multiple Standby NameNodes, YARN Timeline Service v2, JNI-based shuffle in MapReduce, Apache Slider integration and Service Support as First Class Citizen, Hadoop library updates and client-side class path isolation, etc.
In this talk, we will update the status of Hadoop 3 especially the releasing work in community and then go deep diving on new features included in Hadoop 3.0. As a new major release, Hadoop 3 would also include some incompatible changes - we will go through most of these changes and explore its impact to existing Hadoop users and operators. In the last part of this session, we will continue to discuss ongoing efforts in Hadoop 3 age and show the big picture that how big data landscape could be largely influenced by Hadoop 3.
it enables online EC which bypasses the conversion phase and immediately saves storage space; this is especially desirable in clusters with high end networking. Second, it naturally distributes a small file to multiple DataNodesand eliminates the need to bundle multiple files into a single coding group.