This project uses Apache Spark to process data and Linear Regression, Random Forest models to estimate the fare and duration of taxi trips. You can examine the data dictionary of the dataset here.
The table below illustrates the schema of the DataFrame used in this project: