This repository contains code and pdf output for project 2 of Practical Data Analysis, Fall 2023, Biostatistics Department, Brown University. Instructor: Alice Paul; Author: Yu Yan
-
Objective: Develop regression models to predict tracheostomy placement or patient mortality in severe bronchopulmonary dysplasia (sBPD) cases. Data Source: BPD Collaborative Registry, including demographic, diagnostic, and respiratory parameters.
-
Methodology:
- Data assessment for quality and completeness.
- Exploratory data analysis with transformations, variable selection, and outlier detection.
- Construction of regression models.
-
Models Presented: Lasso and Best Subset regression models.
-
Performance: Both models show strong performance in various metrics.
-
Limitations & Future Directions:
- Explore additional machine learning methods (e.g., RandomForest).
- Consider multilevel mixed-effect models for center-related effects.
- Address the challenge of missing 44-week data.
- Evaluate model sparsity vs. predictive power trade-offs for practical clinical use.