OpenScholar: Synthesizing Scientific Literature with Retrieval-augmented LMs

Akari Asai·Jacqueline He·Rulin Shao...Joseph Chee Chang...

ArXiv -> Nature·2024·64 citations

TLDROpenScholar is introduced, a specialized retrieval-augmented LM that answers scientific queries by identifying relevant passages from 45 million open-access papers and synthesizing citation-backed responses, and achieves citation accuracy on par with human experts.

Code & Resources

Code: github.com/AkariAsai/OpenScholar
Benchmark: github.com/AkariAsai/ScholarQABench
Expert Eval: github.com/AkariAsai/OpenScholar_ExpertEval

How do people cite this paper?

(generated 3 months ago)

OpenScholar's retrieval-augmented architecture, open-source retriever, and 45-million-paper datastore have been directly integrated as components in downstream scientific AI systems—from agentic biomedical analysis pipelines and novelty verification in research idea generation to biomedical QA embedding models—while ScholarQABench has been adopted as a benchmark for evaluating literature synthesis and multi-document scientific reasoning, its evaluation rubrics (coverage, faithfulness, citation correctness) have informed assessment frameworks for query-based summarization, and the system has served as a platform extended with new user-facing features like attribution gradients, a baseline for scientific explanation generation, a test case for plagiarism detection capabilities, and a representative example motivating research on RAG pipeline design including query decomposition and iterative retrieval, exploration-capable information access agents, and scalable evaluation of knowledge-intensive AI systems.

Mentions

Science: Open-source AI program can answer science questions better than humans. Developed by and for academics, OpenScholar aims to improve searches of the ballooning scientific literature. — Jeffrey Brainard
Ai2 Blog: Scientific literature synthesis with retrieval-augmented language models — Akari Asai

Loading PDF...

Loading PDF reader...