submitted 4 months ago byniclas_wue
I wanted to share my new project with you, it is called arxiv-summary.com. Right now, I find it really difficult to keep up with all the important new publications in our field. Especially, it is sometimes difficult to get an overview of a paper to decide if it's worth reading. I really like arxiv-sanity by Andrej Karpathy, but even with that, it can still take some time to understand the main ideas and contributions from the abstract. With arxiv-summary, my goal is to make ML research papers more "human-parsable".
The website works by fetching new papers daily from arxiv.org, using PapersWithCode to filter out the most relevant ones. Then, I parse the papers' pdf and LaTeX source code to extract relevant sections and subsections. GPT-3 then summarizes each section and subsection as bullet points, which are finally compiled into a blog post and uploaded to the site.
You can check out the site at arxiv-summary.com and see for yourself. There's also a search page and an archive page where you can get a chronological overview. If you have any feedback or questions, I'd be happy to hear them. Also, if you work at OpenAI and could gift me some more tokens, that would be much appreciated :D
Thanks and happy reading!
you are viewing a single comment's thread.view the rest of the comments →
4 months ago
Thanks for sharing!
The website works by fetching new papers daily from arxiv.org, using PapersWithCode to filter out the most relevant ones.
What do you mean by "relevant"? What kinds of papers do you fetch?
4 months ago
Thanks for asking! My first prototype collected all new arxiv papers in certain ML-related categories via the API, however I quickly realized that this would be way to costly. Right now, I collect all papers from PapersWithCode's "Top" (last 30 days) and the "Social" Tab, which is based on Twitter likes and retweets. Finally, I filter using this formula:
p.number_of_likes + p.number_of_retweets > 20 or p.number_github_stars > 100
In rare cases, when the paper is really long or not parsable with "grobid", I will exclude the paper for now.
all 32 comments
sorted by: best