Personal Statement

Data analyst and research health data scientist focused on clinical analytics, healthcare data workflows, biostatistics, machine learning, and research workflow design. I work with structured EHR data, public clinical datasets, and short clinical narratives, with emphasis on cohort construction, SQL-based extraction, data cleaning, ETL pipelines, applied modeling, and interpretable results. My work centers on translating clinical and research questions into analysis-ready datasets, reproducible code, tables, figures, manuscripts, and reviewer-facing evidence, with particular attention to whether the data, timing, assumptions, and interpretation can survive clinical and methodological review.

Education

PhD in Environmental Engineering, Data Science ConcentrationCarnegie Mellon University, Pittsburgh, PA. Aug 2021 - Aug 2025.
MS in Data AnalyticsGeorgia Institute of Technology, Atlanta, GA. Aug 2024 - July 2026.
MS in Computer ScienceGeorgia Institute of Technology, Atlanta, GA. Expected start Aug 2026.

Work Experience

Research Health Data Scientist, University of PittsburghJan 2026 - present. Build the main clinical analytics thread of my current work: clinical NLP, multimodal prediction, EHR and public clinical data workflows, model evaluation, and reproducible evidence for emergency care and real-world healthcare research.
Data Analyst, University of PittsburghJan 2026 - present. Support Alzheimer’s disease and cognitive-health research through AI literature synthesis, clinical NLP, network analysis of symptoms and comorbidities, dataset preparation, statistical analysis, and manuscript-oriented evidence summaries.
Data Scientist, Peachy DayOct 2025 - Dec 2026. Built SQL-based pipelines for user health, weather, and app activity data; developed migraine-risk and recurrent-event modeling workflows; supported product analytics, feature attribution, and user-level reporting for personalized insights.
Graduate Research Assistant, Carnegie Mellon UniversityAug 2021 - Oct 2025. Developed computer vision, Gaussian process regression, active learning, geospatial modeling, and uncertainty-aware workflows for environmental sensing and system-level analytics.

Selected Research and Project Experience

Clinical NLP and multimodal ED predictionBuilt workflows using structured emergency department variables and text-derived features for IV fluid utilization, EKG use, and hospital admission prediction. Compared simple text methods, embeddings, and gradient boosting models with attention to clinical interpretation. IV fluid project · IV fluid DOI
Migraine prediction and digital health analyticsDefined product metrics, built SQL analytics workflows, designed physician-guided migraine forecasting logic, and developed personalized data-story outputs for patient engagement. Project
Environmental computer vision and spatial modelingApplied Mask R-CNN, Gaussian process regression, active learning, geospatial modeling, and sensor-data integration for environmental monitoring and uncertainty-aware sampling.
Applied ML and BI systemsBuilt a real-time flight delay prediction workflow combining historical records, live weather data, TabPy, Tableau, and model explanation outputs. Project

Skills

  • Healthcare Data Analytics: EHR/clinical data analysis, clinical narratives, cohort construction, analytic dataset development, NHAMCS-ED, real-world data boundaries, and clinical research workflows.
  • Programming & Data Analysis: Python, Pandas, NumPy, R, SAS, SQL, data cleaning, ETL pipelines, reproducible analysis files, and dashboard-oriented reporting.
  • Statistical Modeling: Descriptive and inferential statistics, hypothesis testing, linear regression, logistic regression, generalized linear models, mixed models, recurrent-event survival analysis, and Bayesian/statistical study design concepts.
  • Predictive Modeling & Clinical NLP: Scikit-learn, XGBoost, gradient boosting, clinical text analysis/NLP, multimodal modeling, model evaluation, calibration thinking, SHAP/permutation-style interpretation, and interpretable machine learning.
  • Research & Communication: Matplotlib, Tableau, manuscript support, interdisciplinary collaboration, reviewer-response evidence preparation, and research workflow documentation.

Relevant Coursework

Big Data for Healthcare; Computing for Data Systems; Natural Language Processing; Computational Data Analytics; High-Dimensional Data Analytics; Data and Visual Analytics; Simulation; Practicum; regression analysis; generalized linear models; machine learning; Bayesian statistics; experimental design; survival-oriented modeling.

Selected Publications and Manuscripts

Healthcare Data Science

  1. Machine learning-driven prediction of hospital admissions using gradient boosting and GPT-2
    Zhang, X., Wang, H., Yu, G., & Zhang, W. (2025). DIGITAL HEALTH, 11. DOI

  2. Integrating multimodal clinical data to predict intravenous (IV) fluid utilization: a comparative analysis of natural language processing techniques
    Wang, H., Ling, H., & Zhang, X. (2025). PeerJ Computer Science, 11, e3441. DOI · Code

  3. Machine Learning for Personalized Prediction of Electrocardiogram (EKG) Use in Emergency Care
    Wang, H., & Zhang, X. (2025). Journal of Personalized Medicine, 15(8), 358. DOI · Code

  4. Mapping Acute Encounters in End-Stage Renal Disease: A Multi-scale Network Analysis of Presenting Reasons and Diagnoses in NHAMCS-ED (2020-2022)
    Wang, H., & Zhang, X. (2026). Under review.

  5. From Algorithms to Empathy: A Review of AI-Enabled Ecosystems for Alzheimer’s Diagnosis, Rehabilitation, and Care
    Wang, H., & Zhang, X. (2026). Under review.

  6. Network Analysis of Alzheimer’s Comorbidities and Symptoms in the Emergency Department
    Wang, H., Fetia J., Jiang Y., Zhang W., & Zhang, X. (2026). Under review.

  7. CLEAR Team for amyloid-targeting therapy screening in Veterans with early symptomatic Alzheimer’s disease: protocol for a randomized controlled trial
    Wang, H., Zhang, X., Fetia J., & O’Donnell, A. (2026). In preparation.

Environmental Data Science

  1. A stock-based framework for monitoring fossil persistence and renewable expansion in global power systems
    Wang, H., & Hong, C. (2026). Energy, Ecology and Environment. DOI · Code

  2. Comparison of robot-deployable sensing methods for autonomous in-field screening of total petroleum hydrocarbons
    Wang, H., Rajesh, L., Ganesh, K., Lopes, A. R., Hoelen, T. P., & Lowry, G. V. (2026). Journal of Hazardous Materials, 503, 141208. DOI

  3. AI-assisted screening for asbestos fibers in soil using Mask R-CNN and computer vision on polarized light micrography
    Wang, H., Piao, W., & Gregory, L. (2025). Under review.

  4. Integrating machine learning into life cycle assessment: Review and future outlook
    Wang, H. (2025). PLOS Climate. DOI · Code

  5. Applications of microbial induced calcium carbonate precipitation in historical architecture restoration - a mini review
    Wang, H., & Wang, S. (2025). Journal of Infrastructure Preservation and Resilience, 6. DOI

  6. High Aspect Ratio Polymer Nanocarriers for Gene Delivery and Expression in Plants
    Zhang, Y., Shin, J., Sun, H., Chang, H.-F., Martinez, M. R., Perkins, L. A., Yan, J., Cao, Y., Wang, H., Giraldo, J. P., Matyjaszewski, K., Sheen, J., Tilton, R. D., Marelli, B., & Lowry, G. V. (2025). Nano Letters, 25(2), 681-690. DOI

  7. Path to autonomous soil sampling and analysis by ground-based robots
    Norby, J., Wang, S., Wang, H., Deng, S., Jones, N., Mishra, A., Pavlov, C., He, H., Subramanian, S., Thangavelu, V., Sihota, N., Hoelen, T., Johnson, A. M., & Lowry, G. V. (2024). Journal of Environmental Management, 360, 121130. DOI

  8. Impact of polymer molecular weight on the efficiency of temperature swing solvent extraction for desalination of concentrated brines
    Lopes, A. R., Wang, H., Dong, J., Han, J., Hatakeyama, E. S., Hoelen, T. P., & Lowry, G. V. (2022). Desalination, 543, 116104. DOI