Skip to content
Youngwoo Kim edited this page Aug 3, 2018 · 17 revisions

Hadoop 3.x

Hadoop 3.0.3+
- Hive 3.0+
- Tez 0.9.1+
- Spark 2.4.0+ ???(SPARK-23534)
- HBase 2.0.0+ (Hadoop 3.0 + HBase 2.0.0 + Spark 2.3.0 + Phoenix 5.0)

HBase 2.0.1 + Phoenix 5.x -> https://github.com/apache/phoenix/commit/a4f93eb458c516206cc3ed25978fb025d752a2a7

WebHDFS

hdfs-over-ftp

https://github.com/chia7712/hof

# cat conf/hdfs-over-ftp.properties 

#uncomment this to run ftp server
port = 2222
data-ports = 2223-2225

#uncomment this to run ssl ftp server
#ssl-port = 2226
#ssl-data-ports = 2227-2229

# hdfs uri
#hdfs-uri = hdfs://localhost:8020
hdfs-uri = hdfs://mnode1:9000

# have to be a user which runs HDFS
# this allows you to start ftp server as a root to use 21 port
# and use hdfs as a superuser
superuser = hdfs

# cat conf/users.properties 

ftpserver.user.test.userpassword=5f4dcc3b5aa765d61d8327deb882cf99
ftpserver.user.test.homedirectory=/user/test
ftpserver.user.test.enableflag=true
ftpserver.user.test.writepermission=true
ftpserver.user.test.maxloginnumber=0
ftpserver.user.test.maxloginperip=0
ftpserver.user.test.idletime=0
ftpserver.user.test.uploadrate=0
ftpserver.user.test.downloadrate=0
ftpserver.user.test.groups=test,users
su -s /bin/bash hdfs -c "cd /opt/hof-0.1.1/; bin/hof hof conf/hdfs-over-ftp.properties conf/users.properties"

ps aux | grep hof

-- HDFS

sudo -u hdfs hdfs dfs -mkdir -p /user/test
sudo -u hdfs hdfs dfs -chown test /user/test

-- Test

$ ftp ftp://test:password@HOSTNAME:2222

MiniHDFS

TBD

hdfs, data locality, HP moonshot, BDRA

webhdfs, curl example:

curl -L -i -X PUT -T hs_err_pid103121.log "http://icdasdat07:50075/webhdfs/v1/tmp/2.txt?op=CREATE&namenoderpcaddress=icdas&overwrite=true&permission=777"

HDP 2.6 Docs:

HDFS & Fuse:

Fair scheduler: http://blog.cloudera.com/blog/2015/09/untangling-apache-hadoop-yarn-part-1/

http://blog.cloudera.com/blog/2015/10/untangling-apache-hadoop-yarn-part-2/

http://blog.cloudera.com/blog/2016/01/untangling-apache-hadoop-yarn-part-3/

http://blog.cloudera.com/blog/2016/06/untangling-apache-hadoop-yarn-part-4-fair-scheduler-queue-basics/

HDFS iNotify

https://developer.ibm.com/hadoop/2017/03/10/yarn-node-labels/

https://issues.apache.org/jira/browse/YARN-3214 https://issues.apache.org/jira/browse/HDFS-7285

HADOOP_ROOT_LOGGER=DEBUG,console hadoop fs -ls /

Fix 'under replicated blocks', https://community.hortonworks.com/articles/4427/fix-under-replicated-blocks-in-hdfs-manually.html

https://www.datadoghq.com/blog/collecting-hadoop-metrics/

https://github.com/miguel10/YARN-Memory-Calculator

MRUnit

https://github.com/tomwhite/hadoop-book/blob/master/ch08-mr-types/src/main/java/WholeFileRecordReader.java

http://appsintheopen.com/posts/38-maven-config-for-cloudera-map-reduce-programs

YARN Fair shceduler

https://www.ibm.com/developerworks/community/wikis/home?lang=en#!/wiki/W265aa64a4f21_43ee_b236_c42a1c875961

http://dewoods.com/blog/hadoop-kerberos-guide

http://hortonworks.com/blog/simplifying-user-logs-management-and-access-in-yarn/

Pig

Hive 2.0, https://issues.apache.org/jira/browse/PIG-4764

HACT, https://cwiki.apache.org/confluence/display/Hive/HCatalog+LoadStore

Clone this wiki locally