标签: sparkIMF
只要你会写任何一门编程语言的代码,大数据Spark和Hadoop你已经会60%左右了!
##介绍Spark Streaming中的Transformations和Actions
###Transformations on DStreams
- map(func)
- fletMap(func)
- filter(func)
- repartition(numPartitions)
- union(other Stream)
- count()
- reduce(func)
- countByValue()
- reduceByKey(func,[number Tasks])
- join(otherStream,[num Tasks])
- cogroup(otherStream,[num Tasks])
- transform(func) 直接对RDD进行操作
- updateStateByKey(func) updateStateByKey的时候它有前面的状态,也有当前的状态,这个所谓的当前的状态,其实就是指过去进行的时间间隔,它会更新前面的值
###Window Operations
- window(windowLength, slideInterval)
- countByWindow(windowLength, slideInterval)
- reduceByWindow(func, windowLength, slideInterval)
- reduceByKeyAndWindow(func, windowLength, slideInterval, [numTasks])
- reduceByKeyAndWindow(func, invFunc, windowLength, slideInterval, [numTasks])
- countByValueAndWindow(windowLength, slideInterval, [numTasks])
###Output Operations on DStreams
- print()
- saveAsTextFiles(prefix, [suffix])
- saveAsObjectFiles(prefix, [suffix])
- saveAsHadoopFiles(prefix, [suffix])
- foreachRDD(func)
##状态管理 1.6.x新推出mapWithState
def mapWithState[StateType: ClassTag, MappedType: ClassTag](
spec: StateSpec[K, V, StateType, MappedType]
): MapWithStateDStream[K, V, StateType, MappedType] = {
new MapWithStateDStreamImpl[K, V, StateType, MappedType](
self,
spec.asInstanceOf[StateSpecImpl[K, V, StateType, MappedType]]
)
}
mapWithState和updateStateByKey是同样一种类型的操作。
##阅读源码
- PairDStreamFunctions
- StateDStream