Spaces:
Running
add_subcategories
Thanks for the excellent work!
I have added the option to filter per categories using checkboxes. For categories which are very extensive, like astro-ph this is useful for the user.
The categories and subcategories of the arXiv are saved on a .json file and fetched from there.
Please review this PR, thanks!
I am glad you have liked skimarxiv! Thanks also for suggesting to add sub-categories. I have created a private space to test your request. I tried to do some testing, for instance selecting EP sub-field from Astroph, there are currently 30 papers in that sub-field but results only showed 27. I also tested for GA and there is 40 but only showing 24. Also, for some sub-fields like CO, there is always an error comes up. I am happy to add this addition since I see it as a very valuable feature to have but we probably need to make extensive testing before merging the request. This would make the results faster but the risk of missing some papers is an issue. Another approach would be to update the prompt to keep track of the sub-field then we use that to filter accordingly. But in both cases, I believe the filtering should happen in the beginning before prompting e.g. in the filtered scraped_text". Otherwise, we lose efficiency, we should filter first according to subfield then only generate summaries for it, this is should be the faster approach. Let me know what you think? Thanks.
Hi, thanks for checking this! I agree with the extensive testing, it is definitely important. What I don't understand is what could be causing the discrepancy between the papers shown in this app for a subcategory and the total number of papers in that subcategory on the arXiv. In principle we are requesting the corresponding subcategory using the correct URL. Maybe is it the cross-lists or replacements or something like that? Another option would be trying to find a proper arXiv python API.
Regarding your last question, I didn't understand it well, I must say. You are suggesting using a modified prompt to filter the papers by subcategory?
I think it is definitely better to retrieve the subcategories properly from the arXiv and then do the summaries with the LLM. That's faster for the interested user and less resource expensive.
Happy to keep iterating on this!