Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unable to read MR job output using R #18

Open
varsharaveendran opened this issue Oct 7, 2015 · 1 comment
Open

Unable to read MR job output using R #18

varsharaveendran opened this issue Oct 7, 2015 · 1 comment

Comments

@varsharaveendran
Copy link

Environment:
HortonWorks 2.1 cluster integrated with Kerberos and Active Directory
R version: 3.1.3

Issue:
I am trying to run a simple MR job using R on a Kerberos enabled Hadoop cluster. The R code is given below:
Sys.setenv(HADOOP_STREAMING="/usr/lib/hadoop-mapreduce/hadoop-streaming-2.4.0.2.1.5.0-695.jar")
Sys.setenv(HADOOP_CMD="/usr/bin/hadoop")
Sys.setenv(HADOOP_CONF_DIR="/etc/hadoop/conf")
library(rhdfs)
library(rmr2)
hdfs.init()
ints = to.dfs(1:100)
calc = mapreduce(input = ints, map = function(k, v) cbind(v, 2*v))

Till this point the mapreduce job runs successfully but when I try to access the results using the following command, an error is thrown:
from.dfs(calc)

The error is "Error in scan(file, what, nmax, sep, dec, quote, skip, nlines, na.strings, :
line 1 did not have 8 elements".

The same error is thrown while accessing output of any MR job [wordcount, pi value].

The traceback() function displays the following:
7: scan(file = file, what = what, sep = sep, quote = quote, dec = dec,
nmax = nrows, skip = 0, na.strings = na.strings, quiet = TRUE,
fill = fill, strip.white = strip.white, blank.lines.skip = blank.lines.skip,
multi.line = FALSE, comment.char = comment.char, allowEscapes = allowEscapes,
flush = flush, encoding = encoding, skipNul = skipNul)
6: read.table(textConnection(hdfs("ls", fname, intern = TRUE)),
skip = 1, col.names = c("permissions", "links", "owner",
"group", "size", "date", "time", "path"), stringsAsFactors = FALSE)
5: hdfs.ls(fname)
4: part.list(fname)
3: lapply(src, function(x) system(paste(hadoop.streaming(), "dumptb",
rmr.normalize.path(x), ">>", rmr.normalize.path(dest))))
2: dumptb(part.list(fname), tmp)
1: from.dfs(calc)

Please let me know how to resolve this issue.

@anjalishejul
Copy link

I am also facing same issue . Please Help to resolve.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants