Papers
arxiv:2508.04612

A Reproducible, Scalable Pipeline for Synthesizing Autoregressive Model Literature

Published on Aug 6, 2025
Authors:
,

Abstract

An automated pipeline for retrieving, analyzing, and reproducing research papers on autoregressive generative models demonstrates high accuracy in relevance classification and hyper-parameter extraction while supporting scalable and reproducible experimental workflows.

The accelerating pace of research on autoregressive generative models has produced thousands of papers, making manual literature surveys and reproduction studies increasingly impractical. We present a fully open-source, reproducible pipeline that automatically retrieves candidate documents from public repositories, filters them for relevance, extracts metadata, hyper-parameters and reported results, clusters topics, produces retrieval-augmented summaries and generates containerised scripts for re-running selected experiments. Quantitative evaluation on 50 manually-annotated papers shows F1 scores above 0.85 for relevance classification, hyper-parameter extraction and citation identification. Experiments on corpora of up to 1000 papers demonstrate near-linear scalability with eight CPU workers. Three case studies -- AWD-LSTM on WikiText-2, Transformer-XL on WikiText-103 and an autoregressive music model on the Lakh MIDI dataset -- confirm that the extracted settings support faithful reproduction, achieving test perplexities within 1--3% of the original reports.

Community

Sign up or log in to comment

Get this paper in your agent:

hf papers read 2508.04612
Don't have the latest CLI?
curl -LsSf https://hf.co/cli/install.sh | bash

Models citing this paper 0

No model linking this paper

Cite arxiv.org/abs/2508.04612 in a model README.md to link it from this page.

Datasets citing this paper 0

No dataset linking this paper

Cite arxiv.org/abs/2508.04612 in a dataset README.md to link it from this page.

Spaces citing this paper 0

No Space linking this paper

Cite arxiv.org/abs/2508.04612 in a Space README.md to link it from this page.

Collections including this paper 1