About
Pursuing an M.S. in Data Science at the University of Michigan-Dearborn with some experience in machine learning, NLP, and data analysis.
Work Experience
Skills
Check out my latest work
I've worked on a variety of projects, from language translation pipelines to complex machine learning applications. Here are a few of my favorites.
Brain Tumor Classification with CNNs and Transfer Learning
● Built and evaluated four deep learning models (Custom CNN, VGG16, EfficientNetB0, MobileNetV2) for classifying brain MRI images into four categories: glioma, meningioma, pituitary tumor, and no tumor.
● Used a labeled dataset of 3,268 training images and 1,052 testing images, preprocessed with grayscale conversion, resizing to 128×128, and augmented to improve generalization.
● MobileNetV2 outperformed all other models, achieving ~92% accuracy, macro-averaged F1-score above 0.90, and low overfitting, making it ideal for deployment on resource-constrained environments.
● Implemented a full ML pipeline: loading and preprocessing data, visualizing class distributions, training models, evaluating with classification reports and confusion matrices, and comparing performance across models.
Infant Mortality Prediction
● Conducted exploratory data analysis on a dataset spanning 2,923 country-year health records to assess socioeconomic and healthcare determinants of infant mortality.
● Engineered a predictive log-log linear regression model achieving R² = 0.988 and RMSE = 0.16, outperforming standard linear and stepwise models.
● Diagnosed model fit through residual analysis, Shapiro-Wilk, and Durbin-Watson tests, and recommended hierarchical and panel-data extensions for future modeling.
Fraud Detection Patterns in Insurance Claims
This project focuses on identifying fraudulent insurance claims using advanced machine learning techniques. By leveraging detailed claim data and employing extensive preprocessing and feature engineering, we developed and compared models such as Random Forest, Logistic Regression, XGBoost, Naive Bayes, and KNN. The Random Forest model emerged as the most effective, achieving high accuracy and AUC scores. This work highlights the potential of data-driven approaches in addressing real-world challenges like fraud detection.
Diabetes Prediction Using AutoML and BigQuery
This project focuses on developing a diabetes prediction system using Google Cloud services, leveraging BigQuery for data preprocessing, AutoML for machine learning model training, Vertex AI for deployment, and Looker Studio for data visualization. It emphasizes early detection of diabetes, aiming to improve patient outcomes and resource efficiency. The project utilizes a dataset from Kaggle, cleans and transforms it into a structured format, trains models for predictions, and creates dashboards to present actionable insights, all while considering data privacy and ethical implications.
Translating Tunisian Dialect to French using Machine Learning
Developed a machine learning pipeline to automate the translation of Tunisian dialect sentences into French, addressing the operational inefficiencies of manual translation at Emrhod Consulting. The project utilized advanced NLP techniques, including fine-tuned MarianMT and mBART models, combined with semi-supervised learning and a transliteration module for converting alphanumeric inputs to Arabic script. This approach significantly reduced translation time, improved accuracy, and contributed to advancing NLP solutions for low-resource languages.
Get in Touch
Have a question or want to connect? Feel free to reach out to me on LinkedIn with a direct message and I'll do my best to respond promptly.