Sampling is the process of selecting a subset of individuals, items, or observations from a larger population. The goal of sampling is to obtain a representative subset of the population that can be used to estimate population characteristics with a reasonable level of accuracy.
There are several different sampling techniques that can be used, including:
In this technique, each member of the population has an equal chance of being selected for the sample.
This involves dividing the population into subgroups (strata) based on some relevant characteristic, and then selecting a random sample from each stratum in proportion to the size of the stratum.
In this technique, the population is divided into clusters (e.g., geographic regions or schools) and a random sample of clusters is selected. Data is then collected from all individuals within the selected clusters.
This involves selecting every nth member of the population after randomly selecting a starting point.
Weighted sampling is a sampling technique in which each element in the population is given a weight that reflects its importance or representativeness in the population. The goal of weighted sampling is to increase the representation of certain elements in the sample in order to better reflect the population as a whole.
In this Project, I have taken a Credit Card Fraud Detection Dataset and create 5 different Samples using above mentioned Sampling Techniques. After that, I applied 5 different ML Algorithms/Models and find out the accuracy of model over each Sample.
Among the 5 ML Models, XGBoost and Random Forest Classifier both gives Best Accuracy over each Sample.