Deep Research and AI Lit Review Tools

This thread of research explores how AI can support researchers in discovering, organizing, and synthesizing scientific literature. Our position paper Beyond Summarization frames expository writing—such as literature reviews—as evidence-based and knowledge-generating, arguing for modular AI support across reading, synthesis, and composition rather than end-to-end automation. Building on this vision, several deep research systems have emerged: OpenScholar introduces a retrieval-augmented LM over 45M papers that outperforms GPT-4o in correctness and citation accuracy, while ScholarQA offers an open-source pipeline achieving state-of-the-art results on scientific QA benchmarks. SciArena provides a community-driven evaluation platform with 20,000+ human votes for ranking models on literature-grounded tasks. MyScholarQA and Intent-aware LFQA further improve deep research through personalization and intent-aware generation, respectively. On the literature review table generation front, ArxivDIGESTables curates 2,228 reference tables and benchmarks LLM-based schema and value generation, Intent-aware Schema shows that conditioning on synthesized table intents and applying refinement techniques significantly improves schema quality, and DimInd scaffolds literature review through successive structured representations—tables, taxonomies, and narratives—reducing cognitive load compared to chat-based baselines.

Papers

SciArena: An Open Evaluation Platform for Non-Verifiable Scientific Literature-Grounded Tasks

Yilun Zhao·Kaiyan Zhang·Tiansheng Hu...Joseph Chee Chang...

NeurIPS·2025·15 citations🏆 SpotlightPDF + AI Q&A

TLDRSciArena is presented, an open and collaborative platform for evaluating foundation models on scientific literature-grounded tasks, and SciArena-Eval, a meta-evaluation benchmark based on collected preference data, which measures the accuracy of models in judging answer quality by comparing their pairwise assessments with human votes.

Language Models Don't Know What You Want: Evaluating Personalization in Deep Research Needs Real Users

Nishant Balepur·Malachi Hamada·V. Kishore...Joseph Chee Chang...

ACL (main)·2026·1 citationPDF + AI Q&A

TLDRMyScholarQA (MySQA), a personalized DR agent that infers a profile with a user's research interests; proposes personalized actions for a user's input query; and writes a multi-section report for the query that follows user-approved actions.

Improving Attributed Long-form Question Answering with Intent Awareness

Xinran Zhao·Aakanksha Naik·Jay DeYoung·Joseph Chee Chang

ICLR·2026·3 citationsPDF + AI Q&A

TLDRThis work develops and employs structured, tag-based schemes to better elicit underlying implicit intents to write or cite and demonstrates that these extracted intents enhance both zero-shot generation capabilities in LLMs and enable the creation of high-quality synthetic data for fine-tuning smaller models.

Synthesizing scientific literature with retrieval-augmented language models.

Akari Asai·Jacqueline He·Rulin Shao...Joseph Chee Chang...

Nature·2026·50 citationsPDF

TLDROpenScholar is introduced, a specialized retrieval-augmented language model that answers scientific queries by identifying relevant passages from 45 million open-access papers and synthesizing citation-backed responses and improves off-the-shelf LMs by 12%.

Facets, Taxonomies, and Syntheses: Navigating Structured Representations in LLM-Assisted Literature Review

Raymond Fok·Joseph Chee Chang·Marissa Radensky

ArXiv·2025·4 citationsPDF + AI Q&A

TLDRDimInd is an interactive system that scaffolds literature review across large paper collections through LLM-generated structured representations through LLM-generated structured representations and supported participants in extracting information and conceptually organizing papers with less effort compared to a ChatGPT-assisted baseline workflow.

Ai2 Scholar QA: Organized Literature Synthesis with Attribution

Amanpreet Singh·Joseph Chee Chang·Chloe Anastasiades

ACL·2025·23 citationsPDF + AI Q&A

TLDRThis work introduces Ai2 Scholar QA, a free online scientific question answering application that outperforms competing systems on a recent scientific QA benchmark and makes the entire pipeline public to facilitate research.

Intent-aware Schema Generation and Refinement for Literature Review Tables

Vishakh Padmakumar·Joseph Chee Chang·Kyle Lo

EMNLP·2025·5 citationsPDF + AI Q&A

TLDRThis work presents an approach for augmenting unannotated table corpora withSynthesized intents, and applies it to create a dataset for studying schema generation conditioned on a given information need, thus reducing ambiguity and comprehensively benchmarking several single-shot schema generation methods.

ArxivDIGESTables: Synthesizing Scientific Literature into Tables using Language Models

Benjamin Newman·Yoonjoo Lee·Aakanksha Naik...Joseph Chee Chang...

EMNLP·2024·7 citationsPDF + AI Q&A

TLDRA framework that leverages LMs to perform this task by decomposing it into separate schema and value generation steps is introduced, and it is found that even when LMs fail to fully reconstruct a reference table, their generated novel aspects can still be useful.

OpenScholar: Synthesizing Scientific Literature with Retrieval-augmented LMs

Akari Asai·Jacqueline He·Rulin Shao...Joseph Chee Chang...

ArXiv -> Nature·2024·69 citationsPDF

TLDROpenScholar is introduced, a specialized retrieval-augmented LM that answers scientific queries by identifying relevant passages from 45 million open-access papers and synthesizing citation-backed responses, and achieves citation accuracy on par with human experts.

Beyond Summarization: Designing AI Support for Real-World Expository Writing Tasks

Shannon Zejiang Shen·Tal August·Pao Siangliulue...Joseph Chee Chang

CHI - In2Writing Workshop·2023·26 citationsPDF + AI Q&A

TLDRIt is argued that developing AI supports for expository writing has unique and exciting research challenges and can lead to high real-world impacts.

← Back to home