OpenScholar: Synthesizing Scientific Literature with Retrieval-augmented LMs
TLDROpenScholar is introduced, a specialized retrieval-augmented LM that answers scientific queries by identifying relevant passages from 45 million open-access papers and synthesizing citation-backed responses, and achieves citation accuracy on par with human experts.
How do people cite this paper?
(generated 20 days ago)OpenScholar's retrieval-augmented architecture, open-source retriever, and 45-million-paper datastore have been directly integrated as components in downstream scientific AI systems—from agentic biomedical analysis pipelines and novelty verification in research idea generation to biomedical QA embedding models—while ScholarQABench has been adopted as a benchmark for evaluating literature synthesis and multi-document scientific reasoning, its evaluation rubrics (coverage, faithfulness, citation correctness) have informed assessment frameworks for query-based summarization, and the system has served as a platform extended with new user-facing features like attribution gradients, a baseline for scientific explanation generation, a test case for plagiarism detection capabilities, and a representative example motivating research on RAG pipeline design including query decomposition and iterative retrieval, exploration-capable information access agents, and scalable evaluation of knowledge-intensive AI systems.
Mentions
- Science: Open-source AI program can answer science questions better than humans. Developed by and for academics, OpenScholar aims to improve searches of the ballooning scientific literature. — Jeffrey Brainard
- Ai2 Blog: Scientific literature synthesis with retrieval-augmented language models — Akari Asai