Mixmosa was a dating service that organized 2,961 in-person dates. For each date, it observed:
- the outcomes (e.g. did both people like each other?)
- participant attributes (e.g. physical attractiveness, Big 5 Personality, intelligence, income, drug use, height, values, etc.)
I am sharing the (anonymized) dataset here because it may be of value to people interested in evolutionary psychology, machine learning, mate choice, dating & relationships, etc.
- Here is an example of some analysis done using this dataset.
- Is it possible to predict the outcomes of first dates in advance?
The dataset is available in two formats:
- CSV files
- Postgres SQL
There are two tables I'd recommend you start with:
user_features
describes the participants: age, gender, sexual orientation, IQ, personality, physical attractiveness, etc.interactions
describes the outcomes of each date: did the male like the female? did the female like the male?
Data was collected by running in-person speed dating events. Each date lasted 8 minutes.
-
users
: people that created an account in the Mixmosa app -
events
: speed dating event metadata (note that age ranges were not strictly enforced) -
attendance
: which users went to which events -
event_feedback
: attendees were asked to rate their experience after each event on a 5-star scale -
questions
: the set of questions users were required to complete before attending an event -
responses
: users' answers toquestions
-
swiping
: speed dating attendees were asked who they liked and disliked after 8-minute-long dates -
picture_ratings
: users photos were scored on physical attractiveness by the opposite sex on a 1 - 10 scale, with 1 being least attractive, 5 being average, and 10 being most attractive. Independent raters were used (Mixmosa users did not rate other Mixmosa users). -
user_features
: derived table that aggregates all user attributes (e.g. the user's average physical attractiveness score frompicture_ratings
, their IQ score given theirresponses
to intelligencequestions
, their neuroticism score given theirresponses
to Big 5questions
). See below for definitions for these features. You can also generate your own features from the raw data. -
interactions
: each row represents a date. derived fromswiping
. contains date outcomes (e.g. did the man like the woman? did the woman like the man?).
Self Explanatory
age
gender
attracted_to
height_inches
Intelligence
iq_share_correct
: share of intelligence questions answered correctly by user
Big 5 Personality Traits:
- sum of individual question scores, with each question ranging from 1 to 3. Note that I decided to score "strongly agree" and "agree", and "strongly disagree" and "disagree" the same, which is why this is on a 3 point scale. If you'd like to take these differences into account and move to a 5 point scale, you can regenerate these features yourself. See code below.
- Specific features:
agreeableness
conscientiousness
extraversion
neuroticism
openness
def get_personality_by_user(self):
query = """
select
user_id,
q.id as question_id,
category,
lower(misc->>'big_5') as big_5_category,
subcategory,
misc->>'key' as key,
prompt,
responses.id as responses_id,
responses.text_response
from responses
inner join questions q on responses.question_id = q.id and q.category = 'Personality'
inner join users u on responses.user_id = u.id;
"""
personality_responses = pd.read_sql(query, self.engine)
def score_personality_response(row):
# TODO currently not differentiating between "strongly {agree, disagree}", and just {agree, disagree}
response = row["text_response"]
if row["key"] == "1":
if response in ["Strongly Disagree", "Disagree"]:
return 1
elif response == "Neutral":
return 2
elif response in ["Strongly Agree", "Agree"]:
return 3
else:
return None
elif row["key"] == "-1":
if response in ["Strongly Disagree", "Disagree"]:
return 3
elif response == "Neutral":
return 2
elif response in ["Strongly Agree", "Agree"]:
return 1
else:
return None
else:
return None
personality_responses["score"] = personality_responses.apply(
score_personality_response, axis=1
)
personality_users = personality_responses.groupby(
"user_id"
).responses_id.count()
# currently 65 personality questions. filter for where users have answered all personality questions.
users_completing_all_personality_questions = list(
personality_users[personality_users >= 65].index
)
filtered_personality_responses = personality_responses[
personality_responses.user_id.isin(
users_completing_all_personality_questions
)
]
return pd.pivot_table(
filtered_personality_responses,
values="score",
index=["user_id"],
columns=["big_5_category"],
aggfunc=np.sum,
)
Physical Attractiveness
avg_physical_attractiveness_rating
: user's average physical attractiveness score frompicture_ratings
(1 - 10 scale)
Income:
income_bucket
def score_income(income_bucket: str):
if income_bucket == "Under $15,000":
return 1
elif income_bucket == "$15,000 - $24,999":
return 2
elif income_bucket == "$25,000 - $34,999":
return 3
elif income_bucket == "$35,000 - $49,999":
return 4
elif income_bucket == "$50,000 - $74,999":
return 5
elif income_bucket == "$75,000 - $99,999":
return 6
elif income_bucket == "$100,000 - $149,999":
return 7
elif income_bucket == "$150,000 - $199,999":
return 8
elif income_bucket == "$200,000 and over":
return 9
else:
return None
Drug Use
nicotine_score
:
def score_nicotine(text_response):
if text_response == "Never":
return 0
elif text_response in ["Daily", "Weekly", "Monthly"]:
return 1
else:
return None
alcohol_score
def score_alcohol(text_response):
if text_response == "Never":
return 0
elif text_response == "Monthly":
return 1
elif text_response == "Weekly":
return 2
elif text_response == "Daily":
return 3
else:
return None
marijuana_score
def score_marijuana(text_response):
if text_response == "Never":
return 0
elif text_response == "Monthly":
return 1
elif text_response == "Weekly":
return 2
elif text_response == "Daily":
return 3
else:
return None
has_used_psychedelics
psychedelics["has_used_psychedelics"] = psychedelics.text_response.apply(
lambda r: 1 if r == "Yes" else 0
)
Health
behavioral_health_score
: how strongly does the person agree with the following: i eat healthy, exercise, and am not overweight?
def score_health(text_response):
if text_response == "Strongly Disagree":
return 0
elif text_response == "Disagree":
return 1
elif text_response == "Neutral":
return 2
elif text_response == "Agree":
return 3
elif text_response == "Strongly Agree":
return 4
else:
return None
Sex
sex_partners_score
def score_sex(text_response):
if text_response == "0":
return 0
elif text_response == "1":
return 1
elif text_response == "2-4":
return 2
elif text_response == "5-9":
return 3
elif text_response == "10+":
return 4
else:
return None
casual_sex_score
: this is question 116, i.e.: "How strongly do you agree with the following: I do NOT want to have sex with a person until I am sure that we will have a long-term, serious relationship"
def score_casual_sex(text_response):
if text_response == "Strongly Disagree":
return 4
elif text_response == "Disagree":
return 3
elif text_response == "Neutral":
return 2
elif text_response == "Agree":
return 1
elif text_response == "Strongly Agree":
return 0
else:
return None
Kids
num_children_wanted_score
def score_num_children_wanted(text_response):
if text_response == "0":
return 0
elif text_response == "1":
return 1
elif text_response == "2-3":
return 2
elif text_response == "4+":
return 3
else:
return None
already_has_kids
has_kids["already_has_kids"] = has_kids.text_response.apply(
lambda r: 1 if r == "Yes" else 0
)
Importance of {ethnicity, religion, politics} in a Romantic Partner: Users were asked: How important is it that your romantic partner shares your {ethnicity, religion, political beliefs}?
ethnicity_importance_score
religion_importance_score
politics_importance_score
def score_importance(text_response):
if text_response == "Not Important":
return 0
elif text_response == "Somewhat Important":
return 1
elif text_response == "Very Important":
return 2
else:
return None
Misc
political_tolerance_score
: this is question 179, i.e.: To what degree does the following phrase describe you? Comfortable being friends with someone that disagrees with me on important political topics.
def score_political_tolerance(text_response):
if text_response == "Strongly Disagree":
return 0
elif text_response == "Disagree":
return 1
elif text_response == "Neutral":
return 2
elif text_response == "Agree":
return 3
elif text_response == "Strongly Agree":
return 4
else:
return None
get_along_well_with_family_score
def score_family(text_response):
if text_response in ["Strongly Disagree", "Disagree", "Neutral"]:
return 0
elif text_response in ["Agree", "Strongly Agree"]:
return 1
else:
return None
hard_work_and_success_belief_score
def score_does_hard_work_lead_to_success(text_response):
if text_response == "Strongly Disagree":
return 0
elif text_response == "Disagree":
return 1
elif text_response == "Neutral":
return 2
elif text_response == "Agree":
return 3
elif text_response == "Strongly Agree":
return 4
else:
return None
I'd be happy to answer any questions: first dot last on gmail.