Sampling-based estimation of the number of distinct values in distributed environment Environment Simulated Experiment Ubuntu C++ 11 GCC 4.8 Experiments on Spark Scala Python 3.7 JDK Hadoop 3.3 Spark 3.1 Preparation Generate sampling data from Poisson Distribution and Zipfian Distribution. python genpoi.py python genzipf.py