Deep Research and AI Lit Review Tools
10 papers in this research thread
This thread of research explores how AI can support researchers in discovering, organizing, and synthesizing scientific literature. Our position paper Beyond Summarization frames expository writing—such as literature reviews—as evidence-based and knowledge-generating, arguing for modular AI support across reading, synthesis, and composition rather than end-to-end automation. Building on this vision, several deep research systems have emerged: OpenScholar introduces a retrieval-augmented LM over 45M papers that outperforms GPT-4o in correctness and citation accuracy, while ScholarQA offers an open-source pipeline achieving state-of-the-art results on scientific QA benchmarks. SciArena provides a community-driven evaluation platform with 20,000+ human votes for ranking models on literature-grounded tasks. MyScholarQA and Intent-aware LFQA further improve deep research through personalization and intent-aware generation, respectively. On the literature review table generation front, ArxivDIGESTables curates 2,228 reference tables and benchmarks LLM-based schema and value generation, Intent-aware Schema shows that conditioning on synthesized table intents and applying refinement techniques significantly improves schema quality, and DimInd scaffolds literature review through successive structured representations—tables, taxonomies, and narratives—reducing cognitive load compared to chat-based baselines.
Papers
SciArena: An Open Evaluation Platform for Non-Verifiable Scientific Literature-Grounded Tasks
TLDRSciArena is presented, an open and collaborative platform for evaluating foundation models on scientific literature-grounded tasks, and SciArena-Eval, a meta-evaluation benchmark based on collected preference data, which measures the accuracy of models in judging answer quality by comparing their pairwise assessments with human votes.
Language Models Don't Know What You Want: Evaluating Personalization in Deep Research Needs Real Users
TLDRMyScholarQA (MySQA), a personalized DR agent that infers a profile with a user's research interests; proposes personalized actions for a user's input query; and writes a multi-section report for the query that follows user-approved actions.
Improving Attributed Long-form Question Answering with Intent Awareness
TLDRThis work develops and employs structured, tag-based schemes to better elicit underlying implicit intents to write or cite and demonstrates that these extracted intents enhance both zero-shot generation capabilities in LLMs and enable the creation of high-quality synthetic data for fine-tuning smaller models.
Synthesizing scientific literature with retrieval-augmented language models.
TLDROpenScholar is introduced, a specialized retrieval-augmented language model that answers scientific queries by identifying relevant passages from 45 million open-access papers and synthesizing citation-backed responses and improves off-the-shelf LMs by 12%.
Facets, Taxonomies, and Syntheses: Navigating Structured Representations in LLM-Assisted Literature Review
TLDRDimInd is an interactive system that scaffolds literature review across large paper collections through LLM-generated structured representations through LLM-generated structured representations and supported participants in extracting information and conceptually organizing papers with less effort compared to a ChatGPT-assisted baseline workflow.
Ai2 Scholar QA: Organized Literature Synthesis with Attribution
TLDRThis work introduces Ai2 Scholar QA, a free online scientific question answering application that outperforms competing systems on a recent scientific QA benchmark and makes the entire pipeline public to facilitate research.
Intent-aware Schema Generation and Refinement for Literature Review Tables
TLDRThis work presents an approach for augmenting unannotated table corpora withSynthesized intents, and applies it to create a dataset for studying schema generation conditioned on a given information need, thus reducing ambiguity and comprehensively benchmarking several single-shot schema generation methods.
ArxivDIGESTables: Synthesizing Scientific Literature into Tables using Language Models
TLDRA framework that leverages LMs to perform this task by decomposing it into separate schema and value generation steps is introduced, and it is found that even when LMs fail to fully reconstruct a reference table, their generated novel aspects can still be useful.
OpenScholar: Synthesizing Scientific Literature with Retrieval-augmented LMs
TLDROpenScholar is introduced, a specialized retrieval-augmented LM that answers scientific queries by identifying relevant passages from 45 million open-access papers and synthesizing citation-backed responses, and achieves citation accuracy on par with human experts.
Beyond Summarization: Designing AI Support for Real-World Expository Writing Tasks
TLDRIt is argued that developing AI supports for expository writing has unique and exciting research challenges and can lead to high real-world impacts.