Skip to content

Code for paper "Sampling-based Estimation of the Number of Distinct Values in Distributed Environment"

Notifications You must be signed in to change notification settings

RUC-ALGO/NDV_Estimation_in_distributed_environment

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

2 Commits
 
 
 
 
 
 

Repository files navigation

Sampling-based estimation of the number of distinct values in distributed environment

Environment

Simulated Experiment

  • Ubuntu
  • C++ 11
  • GCC 4.8

Experiments on Spark

  • Scala
  • Python 3.7
  • JDK
  • Hadoop 3.3
  • Spark 3.1

Preparation

Generate sampling data from Poisson Distribution and Zipfian Distribution.

python genpoi.py
python genzipf.py

About

Code for paper "Sampling-based Estimation of the Number of Distinct Values in Distributed Environment"

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Jupyter Notebook 47.4%
  • C++ 44.4%
  • Python 6.2%
  • Cython 2.0%