Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Column.get(map=) now returns dtype of labels #402

Merged
merged 10 commits into from
Nov 2, 2023
Merged

Conversation

hagenw
Copy link
Member

@hagenw hagenw commented Oct 31, 2023

Closes #400

As discussed in #400 (comment) when using audformat.Database.get() with the map argument, it should return the original dtype of the labels scheme (when using a misc table and a scheme is assigned to its corresponding column) or dtype should be inferred from the actual labels.

The example from #400

import audformat
import numpy as np
import pandas as pd

db = audformat.Database('db')
db.schemes['age'] = audformat.Scheme('int')

db['speaker'] = audformat.MiscTable(pd.Index(['s1', 's2'], dtype='string', name='speaker'))
db['speaker']['age'] = audformat.Column(scheme_id='age')
db['speaker']['age'].set([2, np.NaN])

db.schemes['speaker'] = audformat.Scheme('str', labels='speaker')

db['files'] = audformat.Table(audformat.filewise_index(['f1', 'f2']))
db['files']['speaker'] = audformat.Column(scheme_id='speaker')
db['files']['speaker'].set(['s1', 's2'])

db['files']['speaker'].get(map='age')

now returns

file
f1       2
f2    <NA>
Name: age, dtype: Int64

And the example from #317

db = audformat.Database('db')
db.schemes['text'] = audformat.Scheme(dtype='str')
db['misc'] = audformat.MiscTable(
    pd.Index(
      ['a', 'b', 'c'],
      name='speaker',
      dtype='string',
    )
)
db['misc']['text'] = audformat.Column(scheme_id='text')
db['misc']['text'].set(['A', 'B', 'C'])

db.schemes['scheme'] = audformat.Scheme('str', labels='misc')
db['table'] = audformat.MiscTable(audformat.filewise_index(['f1', 'f2', 'f3']))
db['table']['column'] = audformat.Column(scheme_id='scheme')
db['table']['column'].set(['a', 'b', 'c'])

db['table']['column'].get(map='text')

now returns

file
f1    A
f2    B
f3    C
Name: text, dtype: string

In order to use audformat.core.common.to_audformat_dtype(), I updated it to also handle the input coming from pd.api.types.infer_dtype() which is used to infer the dtype from a list of values.

@hagenw hagenw marked this pull request as draft October 31, 2023 08:55
@codecov
Copy link

codecov bot commented Oct 31, 2023

Codecov Report

Merging #402 (89c59cc) into main (77f69bf) will not change coverage.
The diff coverage is 100.0%.

Files Coverage Δ
audformat/core/column.py 100.0% <100.0%> (ø)
audformat/core/common.py 100.0% <100.0%> (ø)

@hagenw hagenw marked this pull request as ready for review October 31, 2023 09:27
@hagenw hagenw requested a review from frankenjoe October 31, 2023 09:27
@hagenw hagenw marked this pull request as draft October 31, 2023 10:11
@hagenw
Copy link
Member Author

hagenw commented Oct 31, 2023

Handling of datetime and timedelta are still missing.

@hagenw hagenw marked this pull request as ready for review October 31, 2023 10:52
@hagenw
Copy link
Member Author

hagenw commented Oct 31, 2023

Now, all dtypes we support in audformat are covered.

tests/test_column.py Outdated Show resolved Hide resolved
tests/test_column.py Outdated Show resolved Hide resolved
@frankenjoe frankenjoe merged commit 5152318 into main Nov 2, 2023
10 checks passed
@frankenjoe frankenjoe deleted the fix-map-dtype branch November 2, 2023 11:03
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

get() with mapping can return wrong categorical data type with newer pandas versions
2 participants