NYC Taxi Fare Prediction

Technologies Used
Python | Pandas | Seaborn | Matplotlib | sklearn | ExtraTreeRegressor | XgBoost Regressor
Description

Objective: The objective of this project is to predict taxi fare amounts in New York City based on various trip-related factors using machine learning techniques. Input Features: • Pickup and drop-off locations • Distance traveled • Time of day (hourly granularity) • Additional relevant factors (engineered features) Data Preprocessing: • Cleaning Data: Addressed missing values and applied feature engineering to enhance predictive power. Machine Learning Models: • Model 1: ExtraTreesRegressor • Trained a sklearn.ensemble.ExtraTreesRegressor to predict taxi fares based on the preprocessed dataset. • Model 2: XGBoost Regressor • Trained an XGBoost regressor and evaluated its performance with the following metrics: • R² (Coefficient of Determination): 0.9399 • Mean Absolute Error (MAE): 1.4400 • Mean Squared Error (MSE): 6.6736 • Root Mean Squared Error (RMSE): 2.5833 Hyperparameter Tuning: • Utilized RandomizedSearchCV to optimize hyperparameters for the XGBoost regressor. • Evaluated tuned model performance with the following metrics: • R²: 0.9270 • MAE: 1.5565 • MSE: 8.0516 • RMSE: 2.8375 Graphical Analysis: • Visualized model predictions and actual fares using matplotlib to provide insights into model accuracy and trends. Conclusion: • Achieved high prediction accuracy with both ExtraTreesRegressor and XGBoost regressor models. • Highlighted the effectiveness of hyperparameter tuning in improving model performance. • Discussed potential applications of the models in real-world scenarios, such as fare estimation tools for taxi passengers and drivers.