[SparkR-237, 238] Fix cleanClosure by including private function checks in package namespaces. #229

hlin09 · 2015-03-20T19:53:08Z

Fixes 237 by including private function checks in package namespaces.
Add a test for this.

shivaram · 2015-03-20T19:56:44Z

@piccolbo Would be great if you could help test this patch.

piccolbo · 2015-03-20T19:58:17Z

Will do

piccolbo · 2015-03-20T20:57:29Z

Does not pass. Fails a little later, same way. Details to follow shortly.

piccolbo · 2015-03-20T21:13:19Z

This is the first failure I got

> ### ** Examples
> 
> as.data.frame(
+   where(
+     input(mtcars),
+     cyl > 4))
Error in kv2rdd.list(if (ncol(k) == 0) f1(kv) else do.call(rbind, lapply(unname(split(kv,  : 
  could not find function "keys.spark"
Calls: source ... computeFunc -> <Anonymous> -> FUN -> FUN -> kv2rdd.list
Execution halted
15/03/20 14:03:32 ERROR Executor: Exception in task 0.0 in stage 0.0 (TID 0)
org.apache.spark.SparkException: R computation failed with
 Error in kv2rdd.list(if (ncol(k) == 0) f1(kv) else do.call(rbind, lapply(unname(split(kv,  : 
  could not find function "keys.spark"
Calls: source ... computeFunc -> <Anonymous> -> FUN -> FUN -> kv2rdd.list

So this is progress, because kv2rdd.list is private and is found but keys.spark is also private and is not found. I patched the code to fully qualify that name and it moves to a later point of failure:

Error in lazy_eval(x, c(data, list(.data = data))) : 
  could not find function "as.lazy"
Calls: source ... f1 -> do.call -> <Anonymous> -> lazy.eval -> lazy_eval
Execution halted
15/03/20 13:56:23 ERROR Executor: Exception in task 0.0 in stage 0.0 (TID 0)
org.apache.spark.SparkException: R computation failed with
 Error in lazy_eval(x, c(data, list(.data = data))) : 
  could not find function "as.lazy"
Calls: source ... f1 -> do.call -> <Anonymous> -> lazy.eval -> lazy_eval
Execution halted
    at edu.berkeley.cs.amplab.sparkr.BaseRRDD.compute(RRDD.scala:80)
    at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:262)
    at org.apache.spark.rdd.RDD.iterator(RDD.scala:229)
    at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:62)
    at org.apache.spark.scheduler.Task.run(Task.scala:54)
    at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:177)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
    at java.lang.Thread.run(Thread.java:745)

Now this is getting complicated. lazy_eval is in package lazyeval, imported by plyrmr (just like SparkR). as.lazy is an exported function in that package. So my initial idea about this being an issue with private functions only is not the correct one.

hlin09 · 2015-03-20T23:20:26Z

Thanks @piccolbo for reporting. Let me do more debugging on this later today.

shivaram · 2015-03-20T23:28:30Z

BTW @hlin09 one thing you could try is to debug using plyrmr directly. @piccolbo can probably tell us if there are any setup instructions we should use.

piccolbo · 2015-03-21T00:15:34Z

I think it's a good idea because also of SPARKR-238. The instructions talk
about installing hadoop first, and I think you can just ignore that.

I think if you cut it to the chase you just need to do

library(devtools)
install_github("RevolutionAnalytics/rmr2", subdir = "pkg")
install_github("RevolutionAnalytics/plyrmr", subdir = "pkg")

We don't normally give these instructions because we need regular users to
install the official latest version. You guys are not regular users. I see
some warnings but it seems to work. Otherwise let me know and I will point
you to the long way.

Then R CMD check path-to-plyrmr will repro 237

On Fri, Mar 20, 2015 at 4:28 PM, Shivaram Venkataraman <
[email protected]> wrote:

BTW @hlin09 https://github.com/hlin09 one thing you could try is to
debug using plyrmr directly. @piccolbo https://github.com/piccolbo can
probably tell us if there are any setup instructions we should use.

—
Reply to this email directly or view it on GitHub
#229 (comment)
.

hlin09 · 2015-03-22T00:02:50Z

@piccolbo Thanks for the helpful instructions. I have just done some tests. Please try this patch and let me know if it works.

piccolbo · 2015-03-23T18:23:21Z

The tests pass the original error point, but fail elsewhere. From the error message, it seems an instance of SparkR-238, not 237. My suggestion is that we can assume 237 fixed and focus on 238, but maybe wait to close it until all plyrmr tests pass cleanly (I am assuming that all problems will prove to be related to changes in SparkR, which is only a working hypothesis).

Fix 237 by including private function checks in package namespaces.

4157600

More fixes on 237 and 238.

be2a935

hlin09 changed the title ~~Fix 237 by including private function checks in package namespaces.~~ [SparkR-237, 238] Fix cleanClosure by including private function checks in package namespaces. Mar 22, 2015

hlin09 added 2 commits March 24, 2015 20:12

More fix on cleanClosure.

93a3262

Merge remote-tracking branch 'upstream/master' into cleanClosureFix

e16d1c3

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[SparkR-237, 238] Fix cleanClosure by including private function checks in package namespaces. #229

[SparkR-237, 238] Fix cleanClosure by including private function checks in package namespaces. #229

hlin09 commented Mar 20, 2015

shivaram commented Mar 20, 2015

piccolbo commented Mar 20, 2015

piccolbo commented Mar 20, 2015

piccolbo commented Mar 20, 2015

hlin09 commented Mar 20, 2015

shivaram commented Mar 20, 2015

piccolbo commented Mar 21, 2015

hlin09 commented Mar 22, 2015

piccolbo commented Mar 23, 2015

[SparkR-237, 238] Fix cleanClosure by including private function checks in package namespaces. #229

Are you sure you want to change the base?

[SparkR-237, 238] Fix cleanClosure by including private function checks in package namespaces. #229

Conversation

hlin09 commented Mar 20, 2015

shivaram commented Mar 20, 2015

piccolbo commented Mar 20, 2015

piccolbo commented Mar 20, 2015

piccolbo commented Mar 20, 2015

hlin09 commented Mar 20, 2015

shivaram commented Mar 20, 2015

piccolbo commented Mar 21, 2015

hlin09 commented Mar 22, 2015

piccolbo commented Mar 23, 2015