You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Currently, if a user creates a Pandas dataframe and passes it into Woodwork, certain dtypes are already inferred in Pandas which makes inference significantly easier. However there might be cases where all incoming data is in the form of text and has a dtype of string.
For a dataframe initialized like this:
df = pd.DataFrame()
df["ints"] = [i for i in range(100)]
df["floats"] = [i*1.1 for i in range(100)]
df["bools"] = [True, False, False, True, False] * 20
df["bools_nan"] = [True, False, False, True, pd.NA] * 20
df["strings"] = [f"{i}" for i in range(100)]
df["categoricals"] = np.random.choice(["Yellow", "Blue", "Red"], 100)
Subsequent Woodwork initialization yields as expected:
But conversion of all dtypes to string prior to Woodwork initialization
for col in df.columns:
df[col] = df[col].astype("string")
Yields this:
This spike covers investigation into what solution(s) exist for this and how/in what order it should be tackled (by logical type, or is there an approach that can tackle all at once).
The text was updated successfully, but these errors were encountered:
Currently, if a user creates a Pandas dataframe and passes it into Woodwork, certain dtypes are already inferred in Pandas which makes inference significantly easier. However there might be cases where all incoming data is in the form of text and has a dtype of
string
.For a dataframe initialized like this:
Subsequent Woodwork initialization yields as expected:
But conversion of all dtypes to
string
prior to Woodwork initializationYields this:
This spike covers investigation into what solution(s) exist for this and how/in what order it should be tackled (by logical type, or is there an approach that can tackle all at once).
The text was updated successfully, but these errors were encountered: