Skip to content

Latest commit

 

History

History
executable file
·
27 lines (18 loc) · 376 Bytes

File metadata and controls

executable file
·
27 lines (18 loc) · 376 Bytes

Sampling-based estimation of the number of distinct values in distributed environment

Environment

Simulated Experiment

  • Ubuntu
  • C++ 11
  • GCC 4.8

Experiments on Spark

  • Scala
  • Python 3.7
  • JDK
  • Hadoop 3.3
  • Spark 3.1

Preparation

Generate sampling data from Poisson Distribution and Zipfian Distribution.

python genpoi.py
python genzipf.py