An up-to-date list of publications can be found on my Google Scholar profile.
Boman, M., Jhun, P., & Schaekermann, M. (2025) Scaffolding for success: Blending learning with and about Generative AI in medical education. Medical Teacher [Google Research Blog]
Palepu, A., Dhillon, V., Niravath, P., Weng, W.-H., Prasad, P., Saab, K., Tanno, R., Cheng, Y., Mai, H., Burns, E., Ajmal, Z., Kulkarni, K., Mansfield, P., Webster, D., Barral, J., Gottweis, J., Schaekermann, M., Mahdavi, S., Natarajan, V., Karthikesalingam, A., & Tu, T. (2025) Exploring Large Language Models for Specialist-Level Oncology Care. New England Journal of Medicine (NEJM) AI [Google Research Blog]
Sayres, R., Hao, Y., Ward, A., Wang, A., Freeman, B., Zhan, S., Ardila, D., Li, J., Lee, I.-C., Iurchenko, A., Kou, S., Badola, K., Hu, J., Kumar, B., Johnson, K., Vijay, S., Krogue, J., Hassidim, A., Matias, Y., Webster, D. R., Virmani, S., Liu, Y., Duong, Q., & Schaekermann, M. (2025) [Last Author] Towards Better Health Conversations: The Benefits of Context-seeking [Google Research Blog]
Vedadi, E., Barrett, D., Harris, N., Wulczyn, E., Reddy, S., Ruparel, R., Schaekermann, M., Strother, T., Tanno, R., Sharma, Y., Lee, J., Hughes, C., Slack, D., Palepu, A., Freyberg, J., Saab, K., Liévin, V., Weng, W.-H., Tu, T., Liu, Y., Tomasev, N., Kulkarni, K., Mahdavi, S., Guu, K., Barral, J., Webster, D. R., Manyika, J., Hassidim, A., Chou, K., Matias, Y., Kohli, P., Rodman, A., Natarajan, V., Karthikesalingam, A., & Stutz, D. (2025) Towards physician-centered oversight of conversational diagnostic AI [Google Research Blog]
Saab, K., Freyberg, J., Park, C.-J., Strother, T., Cheng, Y., Weng, W.-H., Barrett, D., Stutz, D., Tomasev, N., Palepu, A., Liévin, V., Sharma, Y., Ruparel, R., Ahmed, A., Vedadi, E., Kanada, K., Hughes, C., Liu, Y., Brown, G., Gao, Y., Li, S., Mahdavi, S., Manyika, J., Chou, K., Matias, Y., Hassidim, A., Webster, D. R., Kohli, P., Eslami, S. M., Barral, J., Rodman, A., Natarajan, V., Schaekermann, M., Tu, T., Karthikesalingam, A., & Tanno, R. (2025) Advancing Conversational Diagnostic AI with Multimodal Reasoning [Google Research Blog]
Comanici, G., Bieber, E., Schaekermann, M., Pasupat, I., Sachdeva, N., Dhillon, I., Blistein, M., Ram, O., Zhang, D., Rosen, E., Marris, L., Petulla, S., Gaffney, C., Aharoni, A., Lintz, N., Cardal Pais, T., Jacobsson, H., Szpektor, I., Jiang, N., Haridasan, K., Omran, A., Saunshi, N., Bahri, D., Mishra, G., Chu, E., Boyd, T., Hekman, B., Parisi, A., Zhang, C., Kawintiranon, K., Bedrax-Weiss, T., Wang, O., Xu, Y., Purkiss, O., Mendlovic, U., Deutel, I., Nguyen, N., Langley, A., Korn, F., Rossazza, L., Ramé, A., Waghmare, S., Miller, H. and 3384 other authors (2025) Gemini 2.5: Pushing the frontier with advanced reasoning, multimodality, long context, and next generation agentic capabilities. [The "M" in Gemini and part of the first-name-initials easter egg in the author list 🐣] [Google Keyword Blog]
Palepu, A., Liévin, V., Weng, WH., Saab, K., Stutz, D., Cheng, Y., Kulkarni, K., Mahdavi, S.S., Barral, J., Webster, D. R., Chou, K., Hassidim, A., Matias, Y., Manyika, J., Tanno, R., Natarajan, V., Rodman, A., Tu, T., Karthikesalingam, A.*, & Schaekermann, M.* (2025) Towards Conversational AI for Disease Management. [*Co-last Author] [Google Research Blog]
Tu, T.*, Schaekermann, M.*, Palepu, A.*, Saab, K., Freyberg, J., Tanno, R., Wang, A., Li, B., Amin, M., Tomasev, N., Azizi, S., Singhal, K., Cheng, Y., Hou, L., Webson, A., Kulkarni, K., Mahdavi, S. S., Semturs, C., Gottweis, J., Barral, J., Chou, K., Corrado, G.S., Matias, Y., Karthikesalingam, A., & Natarajan, V. (2025). Towards Conversational Diagnostic AI. Nature [*Co-first Author] [Google Research Blog]
McDuff, D.*, Schaekermann, M.*, Tu, T.*, Palepu, A.*, Wang, A., Garrison, J., Singhal, K., Sharma, Y., Azizi, S., Kulkarni, K., Hou, L., Cheng, Y., Liu, Y., Mahdavi, S. S., Prakash, S., Pathak, A., Semturs, C., Patel, S., Webster, D. R., Dominowska, E., Gottweis, J., Barral, J., Chou, K., Corrado, G.S., Matias, Y., Sunshine, J., Karthikesalingam, A., & Natarajan, V. (2025). Towards Accurate Differential Diagnosis with Large Language Models. Nature [*Co-first Author] [Google Research Blog]
Wang, A., Ruparel, R., Iurchenko, A., Jhun, P., Séguin, J. A., Strachan, P., Wong, R., Karthikesalingam, A., Matias, Y., Hassidim, A., Webster, A., Semturs, C., Krause, J., & Schaekermann, M. (2025) Generative AI for medical education: Insights from a case study with medical students and an AI tutor for clinical reasoning. CHI Extended Abstracts [Last Author] [Google Research Blog]
Li, B., Wang, A., Strachan, P., Séguin, J. A., Lachgar, S., Schroeder, K. C., Fleck, M. S., Wong, R., Karthikesalingam, A., Natarajan, V., Matias, Y., Corrado, G. S., Webster, D., Liu, Y., Hammel, N., Sayres, R., Semturs, C.*, & Schaekermann, M.* (2024). Conversational AI in health: Design considerations from a Wizard-of-Oz dermatology case study with users, clinicians and a medical LLM. CHI Extended Abstracts [*Co-Last Author]
Modi, A., Veerubhotla, A. S., Rysbek, A., Huber, A., Wiltshire, B., Veprek, B., Gillick, D., Kasenberg, D., Ahmed, D., Jurenka, I., Cohan, J., She, J., Wilkowski, J., Alarakyia, K., McKee, K. R., Wang, L., Kunesch, M., Schaekermann, M., Pîslar, M., Joshi, N., Mahmoudieh, P., Jhun, P., Wiltberger, S., Mohamed, S., Agarwal, S., Phal, S. M., Lee, S. J., Strinopoulos, T., Ko, W.-J., Wang, A., Anand, A., Bhoopchand, A., Wild, D., Pandya, D., Bar, F., Graham, G., Winnemoeller, H., Nagda, M., Kolhar, P., Schneider, R., Zhu, S., Chan, S., Yadlowsky, S., Sounderajah, V., & Assael, Y. (2025) LearnLM: Improving Gemini for Learning [Google Research Blog]
Singhal, K., Tu, T., Gottweis, J., Sayres, R., Wulczyn, E., Hou, L., Clark, K., Pfohl, S., Cole-Lewis, H., Neal, D., Schaekermann, M., Wang, A., Amin, M., Lachgar, S., Mansfield, P., Prakash, S., Green, B., Dominowska, E., Arcas, B. A. y, Tomasev, N., Liu, Y., Wong, R., Semturs, C., Mahdavi, S. S., Barral, J., Webster, D., Corrado, G. S., Matias, Y., Azizi, S., Karthikesalingam, A., & Natarajan, V. (2025). Towards Expert-Level Medical Question Answering with Large Language Models. Nature Medicine [Google Cloud Blog] [Med-PaLM Website]
Schaekermann, M.*, Spitz, T.*, Pyles, M.*, Cole-Lewis, H., Wulczyn, E., Pfohl, S.R., Martin, D. Jr., Jaroensri, R., Keeling, G., Liu, Y., Farquhar, S., Xue, Q., Lester, J., Hughes, C., Strachan, P., Tan, F., Bui, P., Mermel, C. H., Peng, L. H., Matias, Y., Corrado, G. S., Webster, D. R., Virmani, S., Semturs, C., Liu, Y., Horn, I., & Chen, P. H. C. (2024). Health equity assessment of machine learning performance (HEAL): a framework and dermatology AI model case study. The Lancet eClinicalMedicine [*Co-first Author] [Google Research Blog] [Google Keyword Blog]
Pfohl, S. R., Cole-Lewis, H., Sayres, S., Neal, D., Asiedu, M., Dieng, A., Tomasev, N., Rashid, Q. M., Azizi, S., Rostamzadeh, N., McCoy, L. G., Celi, L. A., Liu, Y., Schaekermann, M., Walton, A., Parrish, A., Nagpal, C., Singh, P., Dewitt, A., Mansfield, P., Prakash, S., Heller, K., Karthikesalingam, A., Semturs, C., Barral, J., Corrado, G., Matias, Y., Smith-Loud, J., Horn, I., & Singhal, K. (2024) A toolbox for surfacing health equity harms and biases in large language models. Nature Medicine. [Google Keyword Blog]
Saab, K., Tao, T., Weng, W., Tanno, R., Stutz, D., Wulczyn, E., Zhang, F., Strother, T., Park, C., Vedadi, E., Chaves, J. Z., Hu, S., Schaekermann, M., Kamath, A., Cheng, Y., Barrett, D. G. T., Cheung, C., Mustafa, B., Palepu, A., McDuff, D., Hou, L., Golany, T., Liu, L., Alayrac, J., Houlsby, N., Tomasev, N., Freyberg, J., Lau, C., Kemp, J., Lai, J., Azizi, S., Kanada, K., Man, S., Kulkarni, K., Sun, R., Shakeri, S., He, L., Caine, B., Webson, A., Latysheva. N., Johnson, M., Mansfield, P., Lu, J., Rivlin, E., Anderson, J., Green, B., Wong, R., Krause, J., Shlens, J., Dominowska, E., Eslami, S. M. A., Chou, K., Cui, C., Vinyals, O., Kavukcuoglu, K., Manyika, J., Dean, J., Hassabis, D., Matias, Y., Webster, D., Barral, J., Corrado, G., Semturs, C., Mahdavi, S. S., Gottweis, J., Karthikesalingam, A., & Natarajan, V. (2024). Capabilities of Gemini Models in Medicine. [Google Research Blog]
Tu, T., Azizi, S., Driess, D., Schaekermann, M., Amin, M., Chang, P.-C., Carroll, A., Lau, C., Tanno, R., Ktena, I., Mustafa, B., Chowdhery, A., Liu, Y., Kornblith, S., Fleet, D., Mansfield, P., Prakash, S., Wong, R., Virmani, S., Semturs, C., Mahdavi, S. S., Green, G., Dominowska, E., Arcas, B. A. y, Barral, J., Webster, D., Corrado, G. S., Matias, Y., Singhal, K., Florence, P., Karthikesalingam, A., & Natarajan, V. (2023). Towards Generalist Biomedical AI. New England Journal of Medicine (NEJM) AI. [Google Research Blog]
Tanno, R., Barrett, D. G. T., Sellergren, A., Ghaisas, S., Dathathri, S., See, A., Welbl, J., Singhal, K., Azizi, S., Tu, T., Schaekermann, M., May, R., Lee, R., Man, S., Ahmed, Z., Mahdavi, S. S., Matias, Y., Barral, J., Eslami, A., Belgrave, D., Natarajan, V., Shetty, S., Kohli, P., Huang, P., Karthikesalingam, A., & Ktena, I. (2024). Collaboration between clinicians and vision–language models in radiology report generation. Nature Medicine
Freyberg, J., Roy, A. G., Spitz, T., Freeman, B., Schaekermann, M., Strachan, P., Schnider, E., Wong, R., Webster, D. R., Karthikesalingam, A., Liu, Y., Dvijotham, K., & Telang, U. (2024). MINT: A wrapper to make multi-modal and multi-image AI models interactive.
Stutz, D., Cemgil, A. T., Roy, A. G., Matejovicova, T., Barsbey, M., Strachan, P., Schaekermann, M., Freyberg, J., Rikhye, R., Freeman, B., Matos, J. P., Telang, U., Webster, D. R., Liu, Y., Corrado, G. S., Matias, Y., Kohli, P., Liu, Y., Doucet, A., & Karthikesalingam, A. (2023). Evaluating AI systems under uncertain ground truth: a case study in dermatology. Medical Image Analysis.
Aroyo, L., Lease, M., Paritosh, P., & Schaekermann, M. (2022). Data Excellence for AI: Why Should You Care? Interactions, 29(2), 66–69.
Ruamviboonsuk, P., Tiwari, R., Sayres, R., Nganthavee, V., Hemarat, K., Kongprayoon, A., Raman, R., Levinstein, B., Liu, Y., Schaekermann, M., Lee, R., Virmani, S., Widner, K., Chambers, J., Hersch, F., Peng, L., & Webster, D. R. (2022). Real-time diabetic retinopathy screening by deep learning in a multisite national screening programme: a prospective interventional cohort study. The Lancet Digital Health.
Pradhan, V. K., Schaekermann, M., & Lease, M. (2022). In Search of Ambiguity: A Three-Stage Workflow Design to Clarify Annotation Guidelines for Crowd Workers. Frontiers in Artificial Intelligence, 5.
Hettiachchi, D., Sanderson, M., Goncalves, J., Hosio, S., Kazai, G., Lease, M., Schaekermann, M., & Yilmaz, E. (2021). Investigating and Mitigating Biases in Crowdsourced Data. Proceedings of the CSCW 2021 Workshop.
Hettiachchi, D., Schaekermann, M., McKinney, T. J., & Lease, M. (2021). The Challenge of Variable Effort Crowdsourcing and How Visible Gold Can Help. Proceedings of the ACM on Human-Computer Interaction, 5(CSCW2), 1–26.
Schaekermann, M. (2020). Human-AI Interaction in the Presence of Ambiguity: From Deliberation-based Labeling to Ambiguity-aware AI. Doctoral Thesis. University of Waterloo, Canada.
Schaekermann, M., Beaton, G., Sanoubari, E., Lim, A., Larson, K., & Law, E. (2020). Ambiguity-aware AI Assistants for Medical Data Analysis. Proceedings of the 2020 CHI Conference on Human Factors in Computing Systems - CHI ’20. [First Author]
Schaekermann, M., Cai, C. J., Huang, A. E., & Sayres, R. (2020). Expert Discussions Improve Comprehension of Difficult Cases in Medical Image Assessment. Proceedings of the 2020 CHI Conference on Human Factors in Computing Systems - CHI ’20. [First Author]
Limwattanayingyong, J., Nganthavee, V., Seresirikachorn, K., Singalavanija, T., Soonthornworasiri, N., Ruamviboonsuk, V., Rao, C., Raman, R., Grzybowski, A., Schaekermann, M., Peng, L. H., Webster, D. R., Semturs, C., Krause, J., Sayres, R., Hersch, F., Tiwari, R., Liu, Y., & Ruamviboonsuk, P. (2020). Longitudinal Screening for Diabetic Retinopathy in a Nationwide Screening Program: Comparing Deep Learning and Human Graders. Journal of Diabetes Research, 2020, 1–8.
Schaekermann, M., Homan, C. M., Aroyo, L., Paritosh, P., Bollacker, K., & Welty, C. (2020). Place Your Bets: Will Machine Learning Outgrow Human Labeling? AI Magazine, 41(4), 123–126. [First Author]
Sokolov, E., Abdoul Bachir, D. H., Sakadi, F., Williams, J., Vogel, A. C., Schaekermann, M., Tassiou, N., Bah, A. K., Khatri, V., Hotan, G. C., Ayub, N., Leung, E., Fantaneanu, T. A., Patel, A., Vyas, M., Milligan, T., Villamar, M. F., Hoch, D., Purves, S., Esmaeili, B., Stanley, M., Lehn-Schioler, T., Tellez-Zenteno, J., Gonzalez-Giraldo, E., Tolokh, I., Heidarian, L., Worden, L., Jadeja, N., Fridinger, S., Lee, L., Law, E., Fodé Abass, C., Mateen, F. J. (2020). Tablet‐based electroencephalography diagnostics for patients with epilepsy in the West African Republic of Guinea. European Journal of Neurology, 27(8), 1570–1577.
Cohen, R., Schaekermann, M., Liu, S., & Cormier, M. (2019). Trusted AI and the Contribution of Trust Modeling in Multiagent Systems. Proceedings of the 18th International Conference on Autonomous Agents and MultiAgent Systems, 1644–1648.
Schaekermann, M., Beaton, G., Habib, M., Lim, A., Larson, K., & Law, E. (2019). Capturing Expert Arguments from Medical Adjudication Discussions in a Machine-readable Format. Companion Proceedings of The 2019 World Wide Web Conference - WWW ’19, 2, 1131–1137. [First Author]
Schaekermann, M., Beaton, G., Habib, M., Lim, A., Larson, K., & Law, E. (2019). crowdEEG: A Platform for Structured Consensus Formation in Medical Time Series Analysis. 8th Workshop on Interactive Systems in Healthcare (WISH) at CHI 2019. [First Author]
Williams, J., Cisse, F. A., Schaekermann, M., Sakadi, F., Tassiou, N. R., BAH, A. K., Hamani, A. B. D., Lim, A., Leung, E. C. W., Fantaneau, T. A., Milligan, T., Khatri, V., Hoch, D., Vyas, M., Lam, A., Hotan, G., Cohen, J., Law, E., & Mateen, F. (2019). Utilizing a wearable smartphone-based EEG for pediatric epilepsy patients in the resource poor environment of Guinea: A prospective study. Neurology, 92(15 Supplement).
Williams, J. A., Cisse, F. A., Schaekermann, M., Sakadi, F., Tassiou, N. R., Hotan, G. C., Bah, A. K., Hamani, A. B. D., Lim, A., Leung, E. C. W., Fantaneanu, T. A., Milligan, T. A., Khatri, V., Hoch, D. B., Vyas, M. v., Lam, A. D., Cohen, J. M., Vogel, A. C., Law, E., & Mateen, F. J. (2019). Smartphone EEG and remote online interpretation for children with epilepsy in the Republic of Guinea: Quality, characteristics, and practice implications. Seizure, 71, 93–99.
Schaekermann, M., Hammel, N., Basham, B., Campana, B., Law, E., Peng, L., Webster, D. R., & Sayres, R. (2019). Asynchronous Remote Adjudication for Grading Diabetic Retinopathy. Investigative Ophthalmology & Visual Science, 60(9), 158. [First Author]
Phene, S., Dunn, R. C., Hammel, N., Liu, Y., Krause, J., Kitade, N., Schaekermann, M., Sayres, R., Wu, D. J., Bora, A., Semturs, C., Misra, A., Huang, A. E., Spitze, A., Medeiros, F. A., Maa, A. Y., Gandhi, M., Corrado, G. S., Peng, L., & Webster, D. R. (2019). Deep Learning and Glaucoma Specialists: The Relative Importance of Optic Disc Features to Predict Glaucoma Referral in Fundus Photographs. Ophthalmology.
Hammel, N., Schaekermann, M., Phene, S., Dunn, C., Peng, L., Webster, D. R., & Sayres, R. (2019). A Study of Feature-based Consensus Formation for Glaucoma Risk Assessment. Investigative Ophthalmology & Visual Science, 60(9), 164.
Schaekermann, M., Hammel, N., Terry, M., Ali, T. K., Liu, Y., Basham, B., Campana, B., Chen, W., Ji, X., Krause, J., Corrado, G. S., Peng, L., Webster, D. R., Law, E., & Sayres, R. (2019). Remote Tool-Based Adjudication for Grading Diabetic Retinopathy. Translational Vision Science & Technology, 8(6), 40. [First Author]
Schaekermann, M., Beaton, G., Habib, M., Lim, A., Larson, K., & Law, E. (2019). Understanding Expert Disagreement in Medical Data Analysis through Structured Adjudication. Proceedings of the ACM on Human-Computer Interaction, 3(CSCW), 1–23. [First Author]
Schaekermann, M., Law, E., Larson, K., & Lim, A. (2018). Expert Disagreement in Sequential Labeling: A Case Study on Adjudication in Medical Time Series Analysis. 1st Workshop on Subjectivity, Ambiguity and Disagreement in Crowdsourcing at HCOMP 2018. [First Author]
Schaekermann, M., Goh, J., Larson, K., & Law, E. (2018). Resolvable vs. Irresolvable Disagreement: A Study on Worker Deliberation in Crowd Work. Proceedings of the 2018 ACM Conference on Computer Supported Cooperative Work and Social Computing (CSCW 2018), 2(CSCW), 1–19. [First Author] [Best Paper Award]
Wehbe, R. R., Mekler, E. D., Schaekermann, M., Lank, E., & Nacke, L. E. (2017). Testing Incremental Difficulty Design in Platformer Games. Proceedings of the 2017 CHI Conference on Human Factors in Computing Systems - CHI ’17, 5109–5113.
Schaekermann, M., Ribeiro, G., Wallner, G., Kriglstein, S., Johnson, D., Drachen, A., Sifa, R., & Nacke, L. E. (2017). Curiously Motivated: Profiling Curiosity with Self-Reports and Behaviour Metrics in the Game “Destiny.” Proceedings of the Annual Symposium on Computer-Human Interaction in Play - CHI PLAY ’17, 143–156. [First Author]
Williams, A. C., Bradshaw, J., Schaekermann, M., Tse, T., Callaghan, W., & Law, E. (2016). The Big Picture: Preserving Context in the Decomposition of Complex Expert Tasks. 1st Workshop on Microproductivity at SIGCHI 2016.
Schaekermann, M., Law, E., Williams, A. C., & Callaghan, W. (2016). Resolvable vs. Irresolvable Ambiguity: A New Hybrid Framework for Dealing with Uncertain Ground Truth. 1st Workshop on Human-Centered Machine Learning at SIGCHI 2016. [First Author]
Schaekermann, M. (2014). Implementation of a Collaborative Web Application for Annotating Gameplay Videos Based on Biometric Player Data. Bachelor's Thesis. University of Applied Sciences Salzburg, Austria.