Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Chapter 13 - Advanced RDD example of Custom partitioner may need correction #43

Open
izayarniy opened this issue Jun 23, 2019 · 1 comment

Comments

@izayarniy
Copy link

izayarniy commented Jun 23, 2019

I'm studying spark advanced RDD API and got a little bit confused by one example.
`// in Scala
import org.apache.spark.Partitioner

class DomainPartitioner extends Partitioner {
def numPartitions = 3
def getPartition(key: Any): Int = {
val customerId = key.asInstanceOf[Double].toInt
if (customerId == 17850.0 || customerId == 12583.0) {
return 0
} else {
return new java.util.Random().nextInt(2) + 1
}
}
}`
As far as I can see in code documentation, partitioner must return the same partition id given the same partition key. That is not true for the example in the code above. Isn't "random" id for key break the Partitioner interface ?

@subhmita
Copy link

Hi there ,

Java.util.random.nextInt(2) return a number between 0 and 1 not inclusive of 2. So assume the idea is the code is interested in the given customer ID of the first if block and the rest of the customerid data will be mapped to partition 1 and 2

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants