Hongyi Liu

PhD Student in Biostatistics | University of Michigan
liutom@umich.edu | LinkedIn | GitHub


Professional Summary

I am a PhD student in Biostatistics at the University of Michigan. My current research interests lie in causal inference, randomized clinical trials (RCT), optimization, machine learning, and real-world evidence (RWE). My work so far has involved using large-scale Electronic Health Record (EHR) data to develop statistical models for healthcare applications. I am also interested in exploring the potential of Large Language Models (LLMs) alongside traditional machine learning methods for treatment effect estimation.


Education

  • University of Michigan, Ann Arbor, MI
    PhD in Biostatistics | Expected 2029
    Advisor: Bingkai Wang, Ph.D. (Quals Passed June 2025)
  • Macalester College, St. Paul, MN
    BA with Honors in Statistics | May 2024
    Minors: Computer Science, French | GPA: 3.9/4.0

Technical Skills

  • Statistical Methods: Causal Inference (HAIPW, Doubly Robust Estimation), Survival Analysis, Machine Learning, High-Dimensional Data Analysis, Clinical Trial Design, Real-World Evidence.
  • Programming: R (advanced, Rcpp), Python, SAS, SQL, C++, STAN, Julia, Git/GitHub.
  • Machine Learning & AI: Random Forests, XGBoost, Lasso Regression, LLM Fine-tuning (GPT-4, DeepSeek, Llama-3), Deep Learning.
  • Data Engineering: Large-scale EHR data processing, SQL database queries, ETL pipelines, data visualization (ggplot2, matplotlib).
  • Languages: English (fluent), Mandarin (native), French (intermediate).

Research Experience

  • Graduate Student Researcher | University of Michigan (May 2025 – Present)
    Working on high-performance convex optimization algorithms for spatial transcriptomics cell-type deconvolution. This involves implementing FISTA with efficient sparse matrix operations in C++ (Rcpp/Armadillo) to improve computational performance.
  • Graduate Student Researcher | University of Michigan (Jan 2025 – Present)
    Studying how LLMs (GPT-4, Llama-3) can be used for causal inference in clinical trials, with a focus on reducing the time required for feature engineering compared to traditional machine learning methods.
  • Graduate Student Researcher | University of Michigan (Aug 2024 – Dec 2024)
    Contributed to the analysis of COVID-19 vaccine effectiveness using a large dataset from the Michigan Medicine EHR system, exploring patterns across different variants.
  • Data Science Research Intern | Minnesota Department of Health (May 2023 – Aug 2023)
    Supported the Injury & Violence Prevention Unit by analyzing records from the Minnesota Violent Death Reporting System and developing predictive models to assist in identifying high-risk populations.

Publications & Presentations

  • Wang, B., Yu, M., Liu, H. et al. (2024). Test-negative designs with various reasons for testing: statistical bias and solution. Epidemiology (In Press).
  • Liu, H. (2024). A discussion on estimation of the best constant for spherical restriction inequalities. Macalester College Digital Commons.
  • Oral Presentation: “Test-negative designs with various reasons for testing.” Michigan Student Symposium (MSSISS), March 2025.

Honors & Awards

  • Finalist Award (Top 2%), Mathematical Contest in Modeling (2023).
  • Society of Actuaries (SOA): Passed Exams P, FM, IFM, SRM.
  • Full-Tuition Scholarships: Kofi Annan Scholarship & Charles J. Turck Presidential Honor Scholarship (2020-2024).