Roy and Niels

Roy and Niels

Friday, November 16, 2012

Medical physics papers on the arXiv: Some stats

Last year I made a post about the state of open access in medical physics and the use of to make medical physics papers freely available to everyone. Today I've decided to follow that up by taking a look at what is going on over at and sorting through some of their data.

The arXiv is "an openly accessible, moderated repository for scholarly articles in specific scientific disciplines". Beyond simply allowing all comers to read and download the articles they host, the arXiv also makes information about its vast collection easily available via a set of APIs. I decided to make use of the API to download meta-data (i.e. author, title, abstract, etc) for all articles in the medical physics category. Let's take a look!

First some raw numbers. As of mid-November 2012, the category had (about) 980 articles. Those 980 articles where co-listed in 76 of the possible 126 other arXiv categories along with the medical physics category. In Figure 1 I've plotted the number of submissions by year (with a partial count for 2012). It's clear that the submission rate to has greatly increased. On the plot I've fit a logistic growth curve (Gompertz), as I'm assuming that the submission number will saturate at some point. You can see faint exponential and linear fits as well. The logistic model predicts 174 submissions for 2012, but 140-145 seems more likely this year.

Figure 1. Submissions per year to the category on

Another item of interest is how the medical physics category fits into arXiv. If you've browsed before, you know that it contains a broader range of topics than the popular "medical physics" journals, such as Physics in Medicine and Biology or Medical Physics. As mentioned above, many of the 980 articles were listed in multiple categories (663 to be exact). Figure 2 shows the most popular co-categories for articles in As might be expected, biophysics ( was the most popular co-category, along with topics that seem well aligned, such as instrumentation and detectors (physics.ins-det), organs and tissues (q-bio.TO), and computational physics (physics.comp-ph). But less obvious categories were also co-submitted as well, such as chaotic dynamics (nlin.CD).

Figure 2. "Co-categories" of papers submitted to
To try to get a better idea of how these categories interplay with one another, I made some simple network graph visualizations. Figures 3 and 4 show the connections between the co-categories. All of the papers are clearly in and all other categories are linked to that. The other lines in the network plots represent when a paper was simultaneously in more than two categories (e.g. medical physics, biophysics, and physics - data analysis).

Figure 3. Network graph of categories for papers submitted to Click for larger version.
Figure 4. Network graph of categories for papers submitted to Click to view larger version with category labels on the nodes.
In Figure 4 the region near the center of the graph are the categories that are the most likely to be co-listed with other categories. As you might expect, the categories with a dense set of lines connecting them tend to also be the most frequently occurring categories, as seen in Figure 2. This is more easily seen in the larger version of Figure 4 with category names. (click the above figure).

As you can see the arXiv category as a wide ranging and growing category for open access articles related to medical physics. It will be interesting to see how it fits in with the wider trends of funder mandates for open access and the general growing acceptance and demand for open access in out community.

N.B. This arXiv meta-data is relatively easily pulled down and processed with Python tools using the arXiv API, going from XML to JSON to your computer screen at home!