Hi, I'm Rayan 👋
I'm a data science grad student who's into machine learning and software engineering. I just really enjoy figuring out how to use AI to solve real-world stuff.
RB

About

Pursuing an M.S. in Data Science at the University of Michigan-Dearborn with some experience in machine learning, NLP, and data analysis.

Work Experience

M

Math Learning Center (University of Michigan)

September 2023 - Present
Math and Statistics Tutor
Providing drop-in tutoring for a variety of mathematics courses (calculus, statistics and bio-statisticsusing R). Facilitating 1-1 and group appointments and project management to support students with source-relatedquestions.
E

Emrhod Consulting

January 2022 - Present
Part Time Junior Research Analyst
Implemented quantitative (CATI, internet surveys, etc.) and qualitative (focus groups, in-depth interviews) research methodologies to gather insights, analyzed and cleaned survey data to ensure accuracy and reliability, generated comprehensive reports with actionable insights to support decision-making, and collaborated with cross-functional teams to deliver high-quality data analysis solutions on tight deadlines.
V

Vermeg

June 2024 - August 2024
AI/ML Intern
Conducted a comparative analysis of key tools for designing RAG pipelines, including LLamaIndex,LangChain and Vector DB. Defined performance and quality criteria (ease of use, scalability, integration) and created a consistent testing framework to evaluate each tool. Drafted a detailed report with benchmark results and formulated recommendations, presenting findings to Development and Management teams.
E

Emrhod Consulting

January 2024 - June 2024
Machine Learning Intern (Capstone Project)
Fine-tuned transformer models (mBART, MarianMT) to transliterate alphanumeric letters to Arabicletters and translate Tunisian dialect to French. Employed web-scraping, data preprocessing and augmentation, implemented NLP techniques to enhancethe dataset. Evaluated model performance using BLEU, ROUGE, METEOR, and TER scores.

Skills

Python
R
SQL
Java
C
C#
JavaScript
Scikit-learn
TensorFlow
PyTorch
GCP
Spark
Tableau
Power BI
Matplotlib
NumPy
Pandas
My Projects

Check out my latest work

I've worked on a variety of projects, from language translation pipelines to complex machine learning applications. Here are a few of my favorites.

Fraud Detection Patterns in Insurance Claims

This project focuses on identifying fraudulent insurance claims using advanced machine learning techniques. By leveraging detailed claim data and employing extensive preprocessing and feature engineering, we developed and compared models such as Random Forest, Logistic Regression, XGBoost, Naive Bayes, and KNN. The Random Forest model emerged as the most effective, achieving high accuracy and AUC scores. This work highlights the potential of data-driven approaches in addressing real-world challenges like fraud detection.

Python
PySpark
Scikit-learn
XGBoost
Pandas
NumPy
Apache Spark
Matplotlib
Seaborn
Gaussian Noise Data Augmentation

Diabetes Prediction Using AutoML and BigQuery

This project focuses on developing a diabetes prediction system using Google Cloud services, leveraging BigQuery for data preprocessing, AutoML for machine learning model training, Vertex AI for deployment, and Looker Studio for data visualization. It emphasizes early detection of diabetes, aiming to improve patient outcomes and resource efficiency. The project utilizes a dataset from Kaggle, cleans and transforms it into a structured format, trains models for predictions, and creates dashboards to present actionable insights, all while considering data privacy and ethical implications.

GCP
AutoML
VertexAI
BigQuery
LookerStudio

Translating Tunisian Dialect to French using Machine Learning

Developed a machine learning pipeline to automate the translation of Tunisian dialect sentences into French, addressing the operational inefficiencies of manual translation at Emrhod Consulting. The project utilized advanced NLP techniques, including fine-tuned MarianMT and mBART models, combined with semi-supervised learning and a transliteration module for converting alphanumeric inputs to Arabic script. This approach significantly reduced translation time, improved accuracy, and contributed to advancing NLP solutions for low-resource languages.

Python
Google Colab Pro
Selenium
Pandas
Numpy
Matplotlib
Scikit-learn
Pytorch
TensorFlow

ProExam: Intelligent Assessment Generation System

This project aims to develop an AI-powered exam generation system tailored for professors, using a Retrieval-Augmented Generation (RAG) pipeline. Professors can upload their teaching materials and past exams, enabling the system to generate new, personalized exams aligned with their teaching style and focused on specific chapters or concepts where students previously struggled.

Python
Google Colab Pro
FAISS
LangChain
Pandas
Numpy
Matplotlib
Scikit-learn
PyTorch
TensorFlow
Contact

Get in Touch

Have a question or want to connect? Feel free to reach out to me on LinkedIn with a direct message and I'll do my best to respond promptly.