Publications
This page contains a list of manuscripts that I have authored. * indicates equal contribution.
2023
- Pratyusha Ria Kalluri*, William Agnew*, Myra Cheng*, Kentrell Owens*, Luca Soldaini*, Abeba Birhane*. “The Surveillance AI Pipeline”. ArXiv 2309.15084.
- Orion Weller, Kyle Lo, David Wadden, Dawn Lawrie, Benjamin Van Durme, Arman Cohan, and Luca Soldaini. “When do Generative Query and Document Expansions Fail? A Comprehensive Study Across Methods, Retrievers, and Datasets”. ArXiv 2309.08541.
- Organizers of Queer in AI, Nathan Dennler, Anaelia Ovalle, Ashwin Singh, Luca Soldaini, Arjun Subramonian, Huy Tu, William Agnew, Avijit Ghosh, Kyra Yee, Irene Font Peradejordi, Zeerak Talat, Mayra Russo, Jess de Jesus de Pinho Pinhal. “Bound to the Bounty: Collaboratively Shaping Evaluation Processes for Queer AI Harms”. AIES 2023.
- Benjamin Newman, Luca Soldaini, Raymond Fok, Arman Cohan, and Kyle Lo “A Controllable QA-based Framework for Decontextualization”. ArXiv 2305.14772.
- Dawn Lawrie, Sean MacAvaney, James Mayfield, Paul McNamee, Douglas W. Oard, Luca Soldaini, and Eugene Yang. Overview of the TREC 2022 NeuCLIR Track. TREC 2022.
- Organizers of Queer in AI“Queer In AI: A Case Study in Community-Led Participatory AI”. FAccT 2023. best paper award
- Sean MacAvaney* and Luca Soldaini*. “One-Shot Labeling for Automatic Relevance Estimation” Short paper, SIGIR 2023.
- Kyle Lo, Joseph Chee ChangThe Semantic Reader Project: Augmenting Scholarly Documents through AI-Powered Interactive Reading Interfaces. ArXiv 2303.14334.
- Rodney Michael Kinney“The Semantic Scholar Open Data Platform”. ArXiv 2302.11266.
- Raymond Fok, Hita Kambhamettu, Luca Soldaini, Jonathan Bragg, Kyle Lo, Andrew Head, Marti A. Hearst, and Daniel S. Weld. “SCIM: Intelligent Skimming Support for Scientific Papers”. IUI 2023.
- Jon Saad-Falcon, Amanpreet Singh, Luca Soldaini, Mike D’Arcy, Arman Cohan, and Doug Downey. “Embedding Recycling for Language Models”. Findings of EACL 2023.
2022
- John Giorgi, Luca Soldaini, Bo Wang, Gary Bader, Kyle Lo, Lucy Lu Wang, and Arman Cohan. “Exploring the Challenges of Open Domain Multi-Document Summarization”. ArXiv 2212.10526.
- Matteo Gabburo, Rik Koncel-Kedziorski, Siddhant Garg, Luca Soldaini, and Alessandro Moschitti. “Knowledge Transfer from Answer Ranking to Answer Generation”. EMNLP 2022.
- Luca Di Liello, Siddhant Garg, Luca Soldaini, and Alessandro Moschitti. “Pre-training Transformer Models with Sentence-Level Objectives for Answer Sentence Selection”. Short paper, EMNLP 2022.
- Yoshitomo Matsubara, Luca Soldaini, Eric Lind, and Alessandro Moschitti. “Ensemble Transformer for Efficient and Accurate Ranking Tasks: an Application to Question Answering Systems”. Findings of EMNLP 2022.
- Benjamin Muller, Luca Soldaini, Rik Koncel-Kedziorski, Eric Lind, and Alessandro Moschitti. “Cross-Lingual GenQA: A Language-Agnostic Generative Question Answering Approach for Open-Domain Question Answering”. AACL 2022.
- Luca Di Liello, Siddhant Garg, Luca Soldaini, and Alessandro Moschitti. “Paragraph-based Transformer Pre-training for Multi-Sentence Inference”. Short paper, NAACL-HLT 2022.
2021
- Chao-Chun Hsu, Eric Lind, Luca Soldaini, and Alessandro Moschitti. “Answer Generation for Retrieval-based Question Answering Systems”. Findings of ACL 2021.
- Rujun Han, Luca Soldaini, and Alessandro Moschitti. “Modeling Context in Answer Sentence Selection Systems on a Latency Budget” EACL 2021.
2020
- Mingda Li, Xinyue Liu, Weitong Ruan, Luca Soldaini, Wael Hamza, and Chengwei Su. “Multi-task Learning of Spoken Language Understanding by Integrating N-Best Hypotheses with Hierarchical Attention”. COLING 2020.
- Luca Soldaini and Alessandro Moschitti. “The Cascade Transformer: Efficient Answer Sentence Selection”. ACL 2020.
- Subendhu Rongali, Luca Soldaini, Emilio Monti, and Wael Hamza. “Don’t Parse, Generate! A Sequence to Sequence Architecture for Task-Oriented Semantic Parsing”. Short paper, WWW 2020.
- Sean MacAvaney, Luca Soldaini, and Nazli Goharian. “Teaching a New Dog Old Tricks: Resurrecting Multilingual Retrieval Using Zero-shot Learning”. Short paper, ECIR 2020.
2018
- Sean MacAvaney, Andrew Yates, Arman Cohan, Luca Soldaini, Kai Hui, Nazli Goharian, and Ophir Frieder. “Overcoming Low-Utility Facets for Complex Answer Retrieval”. Information Retrieval Journal, 2018.
- Ziling Fan, Luca Soldaini, Arman Cohan, and Nazli Goharian. “Relation Extraction for Protein-Protein Interactions Affected by Mutation”. Short paper, ACM-BCB 2018.
- Arman Cohan*, Bart Desmet*, Andrew Yates*, Luca Soldaini, Sean MacAvaney, and Nazli Goharian. “SMHD: a Large-Scale Resource for Exploring Online Language Usage for Multiple Mental Health Conditions”. COLING 2018.
- Luca Soldaini, Timothy Walsh, Arman Cohan, Julien Han, and Nazli Goharian. “Helping or Hurting? Predicting Changes in Users’ Risk of Self-Harm Through Online Community Interactions”. CLPsych Workshop, NAACL-HLT 2018.
- Luca Soldaini. “The Knowledge and Language Gap in Medical Information Seeking”. PhD Thesis, Georgetown University. 2018.
- Sean MacAvaney, Bart Desmet, Arman Cohan, Luca Soldaini, Andrew Yates, Ayah Zirikly, and Nazli Goharian. “RSDD-Time: Temporal Annotation of Self-Reported Mental Health Diagnoses”. CLPsych Workshop, NAACL-HLT 2018.
- Sean MacAvaney, Andrew Yates, Arman Cohan, Luca Soldaini, Kai Hui, Nazli Goharian, and Ophir Frieder. “Characterizing Question Facets for Complex Answer Retrieval”. SIGIR 2018.
- Sean MacAvaney, Luca Soldaini, Arman Cohan, and Nazli Goharian. “Tree-LSTMs for Scientific Relation Classification”. SemEval Workshop, NAACL-HLT 2018.
2017
- Luca Soldaini, Andrew Yates, and Nazli Goharian. “Denoising Clinical Notes for Medical Literature Retrieval with Convolutional Neural Model”. Short paper, CIKM 2017.
- Luca Soldaini, Andrew Yates, and Nazli Goharian. “Learning to Reformulate Long Queries for Clinical Decision Support”. JASIST 2017.
- Luca Soldaini and Elad Yom-Tov. “Inferring Individual Attributes from Search Engine Queries and Auxiliary Information”. WWW 2017.
- Luca Soldaini and Nazli Goharian. “Learning to Rank for Consumer Health Search: a Semantic Approach”. Short paper, ECIR 2017.
2016
- Luca Soldaini and Nazli Goharian. “QuickUMLS: a Fast, Unsupervised Approach for Medical Concept Extraction”. MedIR workshop, SIGIR 2016.
- Arman Cohan, Luca Soldaini, and Nazli Goharian. “Identifying Significance of Discrepancies in Radiology Reports”. DMMH Workshop, SDM 2016.
- Luca Soldaini, Andrew Yates, Elad Yom-Tov, Ophir Frieder, and Nazli Goharian. “Enhancing Web Search in the Medical Domain via Query Clarification”. Information Retrieval Journal, 2016.
2015
- Arman Cohan, Luca Soldaini, and Nazli Goharian. “Matching Citation Text and Cited Spans in Biomedical Literature: a Search–Oriented Approach”. Short paper, NAACL-HLT 2015.
- Luca Soldaini, Arman Cohan, Andrew Yates, Nazli Goharian, and Ophir Frieder. “Retrieving Medical Literature for Clinical Decision Support”. ECIR 2015.
2014
- Arman Cohan, Luca Soldaini, Andrew Yates, Nazli Goharian, and Ophir Frieder. “On Clinical Decision Support”. Short paper, ACM-BCB 2014.