Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How are scores calculated? Also, last names with higher scores are ignored in the presence of first names. #26

Open
Otpid opened this issue Jul 5, 2022 · 1 comment

Comments

@Otpid
Copy link

Otpid commented Jul 5, 2022

Hi, thanks for the awesome dataset. I actually have 3 questions:

  1. How are the scores calculated? I read the older issue regarding this but it wasn't really answered there. If possible, could you disclose the formula that you used to calculate the score of a name? Is the score normalized? If a name has 0.35 for country A and 0.35 for country B, what does that indicate? If a name has 0.35 score as a first name for country A and 0.35 score as a surname for the same country A, what does that mean?

  2. is search_first_name method no longer available?

  3. It seems last names are ignored in deciding the country association. For the name "anderson", the result indicates it's a male name from Brazil. Although, "anderson" has a higher max last name score (0.59 for USA) than the max first name score (0.42 for Brazil)

@philipperemy
Copy link
Owner

philipperemy commented Jul 20, 2022

@Otpid thanks!

  1. It's a bit involved. I'd like to give more explanation but I need to dig into it again, which I will do later. The code is here:
    https://github.com/philipperemy/name-dataset/blob/master/generation/filter_records.py

  2. search_first_name was removed. Now there's only search and it retrieves both first and last names.

    def search(self, name: str):
    .

  3. Yes, only the first_name has been used in deciding the country association (when using .describe). The last name is discarded. We can update the behavior because it might be confusing.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants