Releases: mfarragher/appelpy
Releases Β· mfarragher/appelpy
Discovery
model_selection_stats
attribute for models now has keys in snake case format.- Functions for test statistics, e.g. heteroskedaticity test, now return dictionary instead of tuple so that the output is more explicit. More info is returned now in the output (e.g. degrees of freedom and distribution). Keys will be consistent across test functions.
rvp_plot
andrvf_plot
now function as intended.
π New features:
- Wald test function for joint hypothesis testing with native Python data structures
- Residual vs predictor plot (the rvp_plot) implemented properly
- Support for covariance keyword arguments (e.g. specify group column for cluster standard errors).
X_list
attribute for model objects
Campanino
- Fix for one-regressor models: more robust predict method for linear models and Logit; X attribute for Logit is a dataframe
- Note: the predict method for models requires Numpy array of shape (# examples, # regressors)
- Improve test coverage
Campanino
- Fluent interface for classes: OLS, WLS, Logit & BadApples objects now require a
fit
call in order to do calculations. DummyEncoder and InteractionEncoder objects now have atransform
method for returning dataframes with encoded columns, instead of the encode method (the dictionary parameters in the old encode method now sit in the object initialization).
For example:
- OLS models are now set up via the
model = OLS(df, y_list, X_list).fit()
pattern. - DummyEncoder output dataframes can be set up via the
df_transformed = DummyEncoder(df, categorical_col_base_levels).transform()
pattern.
π New features:
get_dataframe_columns_diff
utils function for returning diff between two dataframes' columns. columns_added and columns_removed attributes have been removed from encoder objects as this is a more general way of comparing dataframes during the pre-processing.- Partial regression plot function handles case where regressor is already in dataframe.
Eve
- Fix studentized residuals (now
resid_studentized
) and make available only for OLS - Make weight attributes consistent for OLS and WLS
- Fix types for plot functions so that they return Figure instances
- Fix calculation of variance in
statistical_moments
function
π Bonus feature:
breusch_pagan_studentized
option for heteroskedasticity test
Eve
- Attributes of model objects that end in β_modelβ no longer have that suffix, e.g. βX_modelβ becomes βXβ, βresid_modelβ becomes βresidβ. If model objects are given the βmodelβ name this makes text more parsimonious: model.resid is more pleasing than model.resid_model.
- Observations with NaN values are no longer dropped before modelling. Errors are now raised where the model dataset has any of these cases: NaN values; +inf or -inf values; string data; Pandas Category dtype.
- Fix: Jinja2 now a clear requirement (has been used for pd.Styler in standardized estimates)
- API now supports Python 3.6 or higher. Updates to dependencies.
π New features:
- Partial regression plot
- Print statements for model fitting are now an optional parameter
Adam
Minor updates
- Add tests for features (replicate results from easily accessible datasets)
- significant_regressors fix
Adam
π Now due for proper versioning to reflect evolution of features, this release has enhanced regression diagnostics.
Main additions:
BadApples
class indiagnostics
module takes a model object and calculates measures of influence, leverage and outliers. It includes a method for leverage vs residuals squared plot.- Heteroskedasticity test in the
diagnostics
module: supports Breusch-Pagan and White tests. - Models support having an index for X that is beyond the typical RangeIndex.
Eden
π New module (discrete_model
) with class for logistic regression Logit
.
Features for Logit include: standardized estimation (via Long's method); odds ratios; model selection stats; prediction.
Eden
π New module (utils) with classes to encode columns of data:
DummyEncoder
: create dummy columns from categorical columns. Deals with NaN data in three different ways.InteractionEncoder
: create interaction effects between two columns. Deals with many scenarios for interactions between Boolean, categorical and continuous variables.
Eden
π Main features:
- OLS and WLS models (results, predict, model selection, etc.)
- Initial regression diagnostics and EDA functionality