Chapter 13 - Advanced RDD example of Custom partitioner may need correction #43

izayarniy · 2019-06-23T23:54:31Z

I'm studying spark advanced RDD API and got a little bit confused by one example.
`// in Scala
import org.apache.spark.Partitioner

class DomainPartitioner extends Partitioner {
def numPartitions = 3
def getPartition(key: Any): Int = {
val customerId = key.asInstanceOf[Double].toInt
if (customerId == 17850.0 || customerId == 12583.0) {
return 0
} else {
return new java.util.Random().nextInt(2) + 1
}
}
}`
As far as I can see in code documentation, partitioner must return the same partition id given the same partition key. That is not true for the example in the code above. Isn't "random" id for key break the Partitioner interface ?

subhmita · 2019-09-26T14:47:18Z

Hi there ,

Java.util.random.nextInt(2) return a number between 0 and 1 not inclusive of 2. So assume the idea is the code is interested in the given customer ID of the first if block and the rest of the customerid data will be mapped to partition 1 and 2

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Chapter 13 - Advanced RDD example of Custom partitioner may need correction #43

Chapter 13 - Advanced RDD example of Custom partitioner may need correction #43

izayarniy commented Jun 23, 2019 •

edited

Loading

subhmita commented Sep 26, 2019

Chapter 13 - Advanced RDD example of Custom partitioner may need correction #43

Chapter 13 - Advanced RDD example of Custom partitioner may need correction #43

Comments

izayarniy commented Jun 23, 2019 • edited Loading

subhmita commented Sep 26, 2019

izayarniy commented Jun 23, 2019 •

edited

Loading