I am an Associate Professor in the Department of Biostatistics at Columbia University Mailman School of Public Health.  My primary research interests lie in survival analysis, longitudinal data, and statistical learning. I have been developing statistical methods to address the challenges posed by complex datasets from electronic health records (EHRs), observational studies, clinical trials, and the integration of multiple data sources. My CV can be found here.

Contact Info:

722 W 168 St, R653

New York, NY, 10069

Email: ys3072@cumc.columbia.edu


Papers

† Student/Postdoctoral fellow under my supervision

Selected works in progress (submitted/in revision)

  • Sun Y and Sheng Y. Statistical inference for counting processes under shape heterogeneity.

  • Sun Y, Moghekar A, Soldan A, Pettigrew C, Greenberg B, Albert MS, Wang M-C. Cerebrospinal Alzheimer’s disease biomarker patterns of change prior to the onset of mild cognitive impairment.

  • Sun Y, Zhao X, Chan KCG, Xu W, Allore H, Zhao Y. Semiparametric Joint Modeling for biomarker trajectory before disease onset.

  • Zhu H, Sun Y, Wei Y. Hybrid censored quantile regression forest to assess the heterogeneous effects.

Publications

  • Sheng Y†, Sun Y, McCulloch CE, Huang C-Y (2023). Scalable estimation for high velocity survival data able to accommodate adding covariates. Statistica Sinica. In press.

  • See SB, Yang X†, Burger C, Lamarthée B, Snanoudj R, Shihab R, Tsapepas DS, Roy P, Larivière-Beaudoin S, Hamelin K, Mendoza Rojas A, van Besouw NM, Bartosic A, Daniel N, Vasilescu ER, Mohan S, Cohen D, Ratner L, Baan CC, Bromberg JS , Cardinal H, Anglicheau D, Sun Y, Zorn E (2023). Natural antibodies are associated with rejection and long-term renal allograft loss in a multi-center international cohort. Transplantation. In press.

  • Sun Y, Chiou SH, McGarry M, Huang C-Y (2023). Dynamic risk prediction triggered by intermediate clinical events using survival tree ensembles. Annals of Applied Statistics. 17(2), 1375-1397.

    - The work was selected to be presented in the session “The Best of AOAS” in JSM 2023

  • Sun Y, He X, Hu J. An omnibus test for treatment effects when many subgroups are generated via data partitioning (2022). Annals of Applied Statistics. 16(4): 2266-2278.

  • Sun Y, Chiou SH, Marr KA, Huang C-Y (2022). Statistical inference for the shape and size indexes of counting processes. Biometrika. 109(1):195-208.

  • Eaton A†, Sun Y, Neaton J, Luo X (2022). Nonparametric estimation in an illness-death model with component-wise censoring. Biometrics. 78(3):1168-1180

  • Sheng Y†, Sun Y, Huang C-Y, and Kim M-O (2022). Synthesizing external aggregated information in the presence of population heterogeneity: a penalized empirical likelihood approach. Biometrics. 78(2):679-690.

  • Clague M, Kim C, Zucker J, Green DA, Sun Y, Whittier S, Thakur KT (2022). Impact of implementing the cerebrospinal fluid film array meningitis/encephalitis panel on duration of intravenous acyclovir treatment. Open Forum Infectious Diseases. 9(8), ofac356.

  • Bhatt SP, Balte PP, Schwartz JE, Jaeger BC, Cassano PA, Chaves PH, Couper D, Jacobs Jr DR, Kalhan R, Kaplan R, Lloyd-Jones D, Newman AB, O'Connor G, Sanders JL, Smith BM, Sun Y, Umans JG, White WB, Yende S, Oelsner ES (2022). Pooled Cohort Probability Score for Subclinical Airflow Obstruction. Annals of the American Thoracic Society. 19(8):1294-1304

  • Taskiran NP, Hiura GT, Zhang X, Barr RG, Dashnaw SM, Hoffman EA, Malinsky D, Oelsner EC, Prince MR, Smith BM, Sun Y, Sun Y, Wild JM, Shen W, Hughes EW (2022). Mapping Alveolar Oxygen Partial Pressure in COPD Using Hyperpolarized Helium-3: The Multi-Ethnic Study of Atherosclerosis (MESA) COPD Study. Tomography. 8(5), 2268-2284.

  • Hermann EA, Motahari A, Hoffman EA, Allen N, Bertoni AG, Bluemke DA, Bluemke DA, Eskandari A, Gerard SE, Guo J, Hiura GT, Kaczka DW, Michos ED, Nagpal P, Pankow J, Shah S, Smith BM, Stukovsky KH, Sun Y, Watson K , Barr RG (2022). Pulmonary Blood Volume Among Older Adults in the Community: The MESA Lung Study. Circulation: Cardiovascular Imaging. 15(8), e014380.

  • Lyu T†, Luo X, Huang C-Y, Sun Y (2021). Additive rates model for recurrent event data with intermittently observed time-dependent covariates. Statistical Methods in Medical Research. 30(10):2239–2255.

  • Lyu T†, Luo X, Sun Y (2021). Additive-Multiplicative rates model for recurrent event data with intermittently observed time-dependent covariates. Journal of Data Science. 19(4):615-633.

  • Sheng Y†, Sun Y, Huang C-Y, Kim M-O (2021). Synthesizing external aggregated information in the penalized Cox regression under population heterogeneity. Statistics in Medicine. 40(23):4915-4930.

  • Goldsmith J, Sun Y, Fried L, Wing J, Miller GW, and Berhane K. The emergence and future of public health data science (2021). Public Health Reviews. 42:1604023.

  • Jung YE, Sun Y, Schluger N. Impact and reach of papers posted on pre-print servers during the COVID-19 pandemic (2021). JAMA Internal Medicine. 181(3):395-397.

  • Sun Y, McCulloch CE, Marr KA, Huang C-Y (2021). Recurrent events analysis with data collected at informative clinical visits in electronic health records. Journal of the American Statistical Association. 116(534):594-604.

  • Ye S, Hiura G, Fleck E, Garcia A, Geleris JD, Lee P, Liyanage-Don N, Moise N, Schluger N, Singer J, Sobieszczyk M, Sun Y, West H, Kronish IM. (2021). Hospital Readmissions after Implementation of a Discharge Care Program for Patients with COVID-19 Illness. Journal of General Internal Medicine. 36(3):722-729.

  • Wysoczanski A, Angelini E, Smith BM, Hoffman E, Hiura G, Sun Y, Barr RG, Laine AF (2021). Unsupervised Clustering of Airway Tree Structures on High-Resolution CT: the MESA Lung Study. ISBI’ 21: IEEE International Symposium on Biomedical Imaging.

  • Sun Y, Chiou SH, Wang M-C (2020). ROC-Guided survival trees and ensembles. Biometrics. 76(4):1177-1189.

  • Geleris J, Sun Y, Platt J, Zucker J, Baldwin M, Hripcsak G, Lee P, Labella A, Manson D, Kubin C, Barr RG, Sobieszczyk M, Schluger N. Observational study of hydroxychloroquine in hospitalized patients with COVID-19 (2020). New England Journal of Medicine. 382:2411-2418.

  • Taskiran NP, Hiura GT, Zhang X, Dashnaw SM, Hoffman EA, Malinsky D, Oelsner EC, Prince MR, Smith BM, Sun Y, Sun Y, Wild JM, Shen W, Barr RG, Hughes EW (2021). Estimation of the Alveolar Partial Pressure of Oxygen using Hyperpolarized Helium-3: The Multi-Ethnic Study of Atherosclerosis (MESA) COPD Study. European Respiratory Journal. 58: OA1566.

  • Sheng Y†, Sun Y, Deng D, Huang C-Y (2020). Censored linear regression in the presence or absence of auxiliary survival information. Biometrics. 76(3): 734-745.

  • Namale VS, Kim C, Sun Y, Curcio A, Navis A, Idro R, Thakur KT (2020). Etiologies of community acquired bacterial meningitis and antibiotic resistance patterns in Africa over the last 30 years: a systematic review. Journal of Neurology & Neurophysiology, 11(6), 1-6.

  • Marr KA, Sun Y, Spec A, Lu N, Panackal A, Bennett J, Pappas P, Ostrander D, Datta K, Zhang SX, Williamson PR; On Behalf of the Cryptococcus Infection Network Cohort Study Working Group (2019). A multicenter, longitudinal cohort study of cryptococcosis in HIV-negative people in the United States. Clinical Infectious Disease. 70(2): 252-261.

  • Bai J, Sun Y, Schrack JA, Crainiceanu CM, Wang M-C (2018). A two-stage model for wearable device data. Biometrics. 74: 744-752.

  • Sun Y, Chan G, Qin J (2018). Simple and fast overidentified rank estimation for right-censored length-biased data. Biometrics. 74: 77-85.

  • Sun Y, Qin J, Huang C-Y (2018). Missing information principle: a unified approach for general left-truncated and/or right-censored survival data problems. Statistical Science. 33: 261-276.

  • Shinohara RT, Sun Y, Wang M-C (2018). Alternating event processes during lifetimes: population dynamics and statistical inference. Lifetime Data Analysis. 24(1):110-125.

    - Invited submission to the special issue dedicated to Professor Jack Kalbfleisch for his contribution to survival analysis

  • Sun Y and Wang M-C (2017). Evaluating utility measurement with recurrent marker processes in the presence of competing terminal events. Journal of the American Statistical Association. 112(518): 745-756.

  • Sun Y, Huang C-Y, Wang M-C (2017). Nonparametric benefit-risk assessment using marker process in the presence of a terminal event. Journal of the American Statistical Association. 112(518): 826-836.

  • Shrestha B, Sun Y, Faisal F, Kim V, Soares K, Blair A, Hermen JM, Narang A, Dholakia AS, Rosati L, Hacker-Prietz A, Chen L, Laheru DA, De Jesus-Acosta A, Le DT, Donehower R, Azad N, Diaz LA, Murphy A, Lee V, Fishman EK, Hruban RH, Liang T, Cameron JL, Makary M, Weiss MJ, Ahuja N, He J, Wolfgang CL, Huang C-Y, Zheng L (2017). Long-term survival benefit of upfront chemotherapy in newly diagnosed borderline resectable pancreatic cancer. Cancer Medicine, 6(7):1552-1562.

  • Wang M-C and Sun Y (2017). Nonparametric estimation of medical cost quantiles in the presence of competing terminal events. Biostatistics & Epidemiology, 1(1):78-91.

  • Li S, Sun Y, Huang C-Y, Follmann DA, Krause R (2016). Recurrent event data analysis with intermittently observed time-varying covariates. Statistics in Medicine. 35(18): 3049-3065.

  • Hong X, Ladd-Acosta C, Hao K, Sherwood B, Ji H, Keet CA, Kumar R, Caruso D, Liu X, Wang G, Chen Z, Ji Y, Mao G, Walker SO, Bartell TR, Ji Z, Sun Y, Tsai HJ, Pongracic JA, Weeks DE, Wang X (2016). Epigenome-wide association study links site-specific DNA methylation changes with cow's milk allergy. The Journal of Allergy and Clinical Immunology, 138(3): 908-911.

  • Hong X, Hao K, Ladd-Acosta C, Hansen KD, Tsai H-J, Liu X, Xu X, Thornton TA, Caruso D, Keet CA, Sun Y, Wang G, Luo W, Kumar R, Fuleihan R, Singh AM, Kim JS, Story RE, Gupta RS, Gao P, Chen Z, Walker SO, Bartell TR, Beaty TH, Fallin MD, Schleimer R, Holt PG, Nadeau KC, Wood RA, Pongracic JA, Weeks DE and Wang X (2015). Genome-wide association study identifies peanut allergy-specific loci and evidence of epigenetic mediation in US children. Nature Communications. 6 Article number: 6304.