Early Detection of Heart Disease Using Machine Learning

  • Tools & Technologies: Python, Pandas, Scikit-Learn, XGBoost, Random Forest, Logistic Regression, Seaborn, Matplotlib
  • Role: Data Scientist
  • Report URL : Project Link

Description

Developed a full end-to-end predictive modeling pipeline using the Cleveland Heart Disease dataset. Performed extensive data profiling, missing value treatment, outlier detection, correlation analysis, and feature engineering to improve model stability. Conducted rigorous model benchmarking using Logistic Regression, Random Forest, XGBoost, and Lasso-regularized classifiers. Applied k-fold cross-validation, hyperparameter tuning, threshold optimization, and model interpretability techniques such as SHAP and feature importance analysis.

The project focused heavily on understanding risk-driving features such as chest pain type, cholesterol, and resting ECG values, translating statistical patterns into clinically meaningful insights.

Outcome

Lasso Logistic Regression delivered the highest interpretability with strong accuracy and stable performance across folds. The model effectively highlighted key clinical predictors that support early identification of individuals at high cardiovascular risk..