Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

When I reproduced the AutoEncoder model useing main.py and structured Amazon-Google Dataset, the recall rate I got is 94.4% .I don't know how to get 97.1% in Amazon-Google row DL column in table 6 in this paper. #6

Open
SovereignLin opened this issue Nov 8, 2023 · 0 comments

Comments

@SovereignLin
Copy link

    When I ran the main.py at https://github.com/saravanan-thirumuruganathan/DeepBlocker and used the structured Amazon-Google Dataset downloaded from https://github.com/anhaidgroup/deepmatcher/blob/master/Datasets.md#fodors-zagats,the recall rate I had got was 94.4% using  the model called AutoEncoder.But the recall rate in the paper is 97.1% which can be found in the Amazon-Google row,DL column in table 6.
    I used the 'wiki.en.bin' in https://fasttext.cc/docs/en/pretrained-vectors.html, and I changed the activation function from ReLU to Tanh mentioned in the Autoencoder in Section 3.4.
    The configuration is :

FASTTEXT_EMBEDDIG_PATH = "embedding/wiki.en.bin"
#Dimension of the word embeddings.
EMB_DIMENSION_SIZE = 300
#Embedding size of AutoEncoder embedding
AE_EMB_DIMENSION_SIZE = 150
NUM_EPOCHS = 100
BATCH_SIZE = 256
RANDOM_SEED = 1234
LEARNING_RATE = 1e-3
K=50
And the aggregator used SIF.
So I don't know how to reproduce the 97.1% recall rate in sturctured Amazon-Google.
In Conclusion there are I few problems:
1、The structured Amazon-Google Dataset is used the raw unprocedded dataset(4 attributes called title、description、manufacturer、price) or the processed dataset(3 attributes called title、manufacturer、price) at https://github.com/saravanan-thirumuruganathan/DeepBlocker ?I think the structured dataset can not have the attribute description,but if that, the structured Amazon-Google Dataset only have 3 attribute but there are 4 in the table 4 in Section 5 in this paper.
2、The two-layer feed-forward NNs of Encoder is 300 - 300 -150 and two-layer feed-forward NNs of Decoder is 150-300-300 in AutoEncoder Model?
3、Is there anything else that needs to be changed in the code to achieve the 97.1 recall rate of the structured Amazon-Google Dataset?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant