OpenScholar: Synthesizing Scientific Literature with Retrieval-augmented LMs

Akari Asai·Jacqueline He·Rulin Shao...Joseph Chee Chang...
ArXiv -> Nature·2024·50 citations

TLDROpenScholar is introduced, a specialized retrieval-augmented LM that answers scientific queries by identifying relevant passages from 45 million open-access papers and synthesizing citation-backed responses, and achieves citation accuracy on par with human experts.

How do people cite this paper?

(generated 20 days ago)

OpenScholar's retrieval-augmented architecture, open-source retriever, and 45-million-paper datastore have been directly integrated as components in downstream scientific AI systems—from agentic biomedical analysis pipelines and novelty verification in research idea generation to biomedical QA embedding models—while ScholarQABench has been adopted as a benchmark for evaluating literature synthesis and multi-document scientific reasoning, its evaluation rubrics (coverage, faithfulness, citation correctness) have informed assessment frameworks for query-based summarization, and the system has served as a platform extended with new user-facing features like attribution gradients, a baseline for scientific explanation generation, a test case for plagiarism detection capabilities, and a representative example motivating research on RAG pipeline design including query decomposition and iterative retrieval, exploration-capable information access agents, and scalable evaluation of knowledge-intensive AI systems.

Mentions

Loading PDF reader...

OpenScholar: Synthesizing Scientific Literature with Retrieval-augmented LMs

Akari Asai·Jacqueline He·Rulin Shao...Joseph Chee Chang...
ArXiv -> Nature·2024·50 citations

TLDROpenScholar is introduced, a specialized retrieval-augmented LM that answers scientific queries by identifying relevant passages from 45 million open-access papers and synthesizing citation-backed responses, and achieves citation accuracy on par with human experts.

How do people cite this paper?

(generated 20 days ago)

OpenScholar's retrieval-augmented architecture, open-source retriever, and 45-million-paper datastore have been directly integrated as components in downstream scientific AI systems—from agentic biomedical analysis pipelines and novelty verification in research idea generation to biomedical QA embedding models—while ScholarQABench has been adopted as a benchmark for evaluating literature synthesis and multi-document scientific reasoning, its evaluation rubrics (coverage, faithfulness, citation correctness) have informed assessment frameworks for query-based summarization, and the system has served as a platform extended with new user-facing features like attribution gradients, a baseline for scientific explanation generation, a test case for plagiarism detection capabilities, and a representative example motivating research on RAG pipeline design including query decomposition and iterative retrieval, exploration-capable information access agents, and scalable evaluation of knowledge-intensive AI systems.

Mentions

Paper

Loading PDF reader...