You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
When I reproduced the AutoEncoder model useing main.py and structured Amazon-Google Dataset, the recall rate I got is 94.4% .I don't know how to get 97.1% in Amazon-Google row DL column in table 6 in this paper.
#6
Open
SovereignLin opened this issue
Nov 8, 2023
· 0 comments
When I ran the main.py at https://github.com/saravanan-thirumuruganathan/DeepBlocker and used the structured Amazon-Google Dataset downloaded from https://github.com/anhaidgroup/deepmatcher/blob/master/Datasets.md#fodors-zagats,the recall rate I had got was 94.4% using the model called AutoEncoder.But the recall rate in the paper is 97.1% which can be found in the Amazon-Google row,DL column in table 6.
I used the 'wiki.en.bin' in https://fasttext.cc/docs/en/pretrained-vectors.html, and I changed the activation function from ReLU to Tanh mentioned in the Autoencoder in Section 3.4.
The configuration is :
FASTTEXT_EMBEDDIG_PATH = "embedding/wiki.en.bin"
#Dimension of the word embeddings.
EMB_DIMENSION_SIZE = 300
#Embedding size of AutoEncoder embedding
AE_EMB_DIMENSION_SIZE = 150
NUM_EPOCHS = 100
BATCH_SIZE = 256
RANDOM_SEED = 1234
LEARNING_RATE = 1e-3
K=50
And the aggregator used SIF.
So I don't know how to reproduce the 97.1% recall rate in sturctured Amazon-Google.
In Conclusion there are I few problems:
1、The structured Amazon-Google Dataset is used the raw unprocedded dataset(4 attributes called title、description、manufacturer、price) or the processed dataset(3 attributes called title、manufacturer、price) at https://github.com/saravanan-thirumuruganathan/DeepBlocker ?I think the structured dataset can not have the attribute description,but if that, the structured Amazon-Google Dataset only have 3 attribute but there are 4 in the table 4 in Section 5 in this paper.
2、The two-layer feed-forward NNs of Encoder is 300 - 300 -150 and two-layer feed-forward NNs of Decoder is 150-300-300 in AutoEncoder Model?
3、Is there anything else that needs to be changed in the code to achieve the 97.1 recall rate of the structured Amazon-Google Dataset?
The text was updated successfully, but these errors were encountered:
FASTTEXT_EMBEDDIG_PATH = "embedding/wiki.en.bin"
#Dimension of the word embeddings.
EMB_DIMENSION_SIZE = 300
#Embedding size of AutoEncoder embedding
AE_EMB_DIMENSION_SIZE = 150
NUM_EPOCHS = 100
BATCH_SIZE = 256
RANDOM_SEED = 1234
LEARNING_RATE = 1e-3
K=50
And the aggregator used SIF.
So I don't know how to reproduce the 97.1% recall rate in sturctured Amazon-Google.
In Conclusion there are I few problems:
1、The structured Amazon-Google Dataset is used the raw unprocedded dataset(4 attributes called title、description、manufacturer、price) or the processed dataset(3 attributes called title、manufacturer、price) at https://github.com/saravanan-thirumuruganathan/DeepBlocker ?I think the structured dataset can not have the attribute description,but if that, the structured Amazon-Google Dataset only have 3 attribute but there are 4 in the table 4 in Section 5 in this paper.
2、The two-layer feed-forward NNs of Encoder is 300 - 300 -150 and two-layer feed-forward NNs of Decoder is 150-300-300 in AutoEncoder Model?
3、Is there anything else that needs to be changed in the code to achieve the 97.1 recall rate of the structured Amazon-Google Dataset?
The text was updated successfully, but these errors were encountered: