Welcome to the official BiSPER documentation. This guide provides a comprehensive walkthrough to replicate our experiments, covering everything from dataset preparation and code compilation to running experiments and plotting results.
- Tools: gcc (version 7.5 or newer), Python 3.x, Eigen 3.4
-
Clone the repository to your local machine.
-
Install the necessary Python packages using the command below:
pip install numpy scipy pandas matplotlib seaborn networkx
Note: To save time, it is recommended to proceed directly to step 3.
-
Original Real-World Datasets: Obtain these from the SNAP dataset page. The initial run of our code will preprocess these datasets, generating files named
<dataset_name>-compressed_sorted.bin
. -
Generated Synthetic Datasets: Create Erdos-Renyi random graphs by executing
generate_synthetic.py
. This process generates an edge list file namedsynthetic.txt
. -
Preprocessed Datasets: For convenience, you can directly download our preprocessed datasets (in adjacency list and binary formats) from this link.
Ensure all datasets are placed in the BiSPER/datasets
directory.
Credit: Original datasets are courtesy of SNAP.
Pre-calculated lambda values are found in BiSPER/datasets/<dataset_name>.lambda
. To recalculate:
- Ensure the existence of
<dataset_name>-compressed_sorted.bin
(refer to Dataset Preparation). - Execute
python get_lambda.py <dataset_name>
from the BiSPER root directory. The lambda value will be recalculated and saved toBiSPER/datasets/<dataset_name>.lambda
.
Navigate to the BiSPER/
directory and compile the code with the following commands:
cd src
g++ -std=c++14 -O3 *.cpp -o SPER -lstdc++fs -pthread
mv SPER ..
cd ..
Credit: Code for AMC / GEER algorithms are from their official GitHub page and code for Bipush / Push algorithms are from their official GitHub page.
Follow these steps to replicate our experiments:
-
Grant execution permissions to the shell scripts:
chmod +x *.sh
-
Execute a specific experiment script:
Note: Experiment 0 stands for the
$L_{\max}$ truncated value v.s.$L_{\max}$ experiment in Table 2../Experiment-[0/I/II/III].sh
You can utilize the compiled SPER executable for custom experiments. The usage syntax is:
SPER
[--dataset <dataset_name>]
[--algorithm <algorithm_name>]
[--num_query <num_query>]
[--L_max <L_max>]
[--eps <eps>]
[--p_f <p_f>]
[--num_landmarks <num_landmarks>]
[--num_samples <num_samples>]
[--r_max <r_max>]
Below are detailed explanations of the arguments that can be used with the program:
-
--dataset
: Specifies the dataset to be used. The program initially attempts to locate the compressed and sorted binary file atBiSPER/datasets/<dataset_name>-compressed_sorted.bin
. If this file cannot be found, it then searches for a text file atBiSPER/datasets/<dataset_name>.txt
. Should both files be unavailable, the program will raise an exception. The default dataset isFacebook
. -
--algorithm
: Determines the algorithm to be employed. Currently supported algorithms include:BiSPER
,AMC
,GEER
,Bipush
, andPush
. The default algorithm isBiSPER
. -
--num_query
: Sets the number of query pairs to generate. Both the pre-sampled queries and their corresponding ground-truth values are stored in theBiSPER/samples/
directory. The default number is100
. -
--L_max
: Defines the maximum number of steps, denoted as$L_{\max}$ . Users can input an integer or specifyauto
, which calculates$L_{\max}$ based on lambda and epsilon values. This argument is applicable only to theBiSPER
,AMC
, andGEER
algorithms. The default value is100
. -
--eps
: Sets the desired absolute error guarantee,$\epsilon$ . This parameter is relevant only for theBiSPER
,AMC
, andGEER
algorithms. The default value is1e-2
. -
--num_landmarks
: Indicates the number of landmark nodes in theBipush-vl
,Push-vl
, andRW-vl
algorithms. The default value is100
. -
--num_samples
: Indicates the number of random walks to sample in theBipush
algorithm. The default value is10,000
. -
--r_max
: Specifies the push threshold for both theBipush
andPush
algorithms. The default value is1e-4
.
Results are stored in BiSPER/results/
as .out
files for any text editor access.
To recreate the line charts in experiments from our papers, execute the following command within the BiSPER/results/
directory:
python plot.py
This generates and saves PGF files for visualizations.