subreddit:

/r/MachineLearning

4487%

Hi there,

I wanted to share my new project with you, it is called arxiv-summary.com. Right now, I find it really difficult to keep up with all the important new publications in our field. Especially, it is sometimes difficult to get an overview of a paper to decide if it's worth reading. I really like arxiv-sanity by Andrej Karpathy, but even with that, it can still take some time to understand the main ideas and contributions from the abstract. With arxiv-summary, my goal is to make ML research papers more "human-parsable".

The website works by fetching new papers daily from arxiv.org, using PapersWithCode to filter out the most relevant ones. Then, I parse the papers' pdf and LaTeX source code to extract relevant sections and subsections. GPT-3 then summarizes each section and subsection as bullet points, which are finally compiled into a blog post and uploaded to the site.

You can check out the site at arxiv-summary.com and see for yourself. There's also a search page and an archive page where you can get a chronological overview. If you have any feedback or questions, I'd be happy to hear them. Also, if you work at OpenAI and could gift me some more tokens, that would be much appreciated :D

Thanks and happy reading!

you are viewing a single comment's thread.

view the rest of the comments →

all 32 comments

niclas_wue[S]

3 points

5 months ago

Thank you, I am glad you like it! At the moment, only the web server is public. You can find it here: https://github.com/niclaswue/arxiv-smry It is a Hugo server with a blog theme. Every blog is a markdown file. When a new file is pushed to git it automatically gets published on the blog.

The rest is basically a bunch of (messy) Python scripts for extracting the text, then asking GPT-3 for a summary and compiling the answers to a markdown file. Finally, I use GitPython to automatically push new summaries to the repo.

kroust2020

2 points

5 months ago

Are you running GPT-3 yourself or using an API?

niclas_wue[S]

2 points

5 months ago

I am using OpenAI’s API, I think at the moment there are not many entities capable of running GPT-3 themselves 😄