Journal Articles Need Interactive Graphics

Author  Pascal Mickelson

I should have thought of it earlier: In a day and age when we are increasingly reading scientific literature on computer screens, why is it that we limit our peer-reviewed data representation to static, unchanging graphs and plots? Why do we not try to create dynamic visualizations of our rich and varied data sets? Would we not derive benefits in the quality and clarity of scientific discourse from publishing these visualizations?

An article in the very good (and under-appreciated, in my opinion) American Scientist magazine written by Brian Hayes started me thinking about these questions. "Pixels or Perish" begins by recapping the evolution of graphics in scientific publications and notes that before people were good at making plots digitally, they were good at making figures from using photographic techniques; and before that, from elaborate engravings. Clearly, the state-of-the-art in scientific publishing is a moving target.

Hayes points out that one of the primary advantages of static images is that everyone knows how to use them and that almost no one lacks the tools to view them. That is, printed images in a magazine or static digital images in the portable document format (pdf) are easily viewed on paper or on a screen and can be readily interpreted by a wide audience. While I agree that this feature is very important, why have we not, as scientists, moved to the next level? We do not lack the ability to interpret data--it is our job to do so--not to mention that we are some of the heaviest generators of data in the first place.

The obstacles to progress towards interactive data are two-fold. First, generating dynamic data visualizations is not as easy as generating static plots. The data visualization tools simply are not as well developed and they do not show up as frequently in the programming environments in which scientists work. One example Hayes cites is that the ideas from programs such as D3 have not yet made an appearance in software, like R and Matlab, that more scientists use. This is one reason why I am so excited by the work that our very own Scott has been doing with this Recology blog, in trying to promote awareness of tools in R.

The second is that neither of our currently dominant publishing formats (physical paper and digital pdf files) support dynamic graphics. Hayes says it better than I could: "…the Web is not where scientists publish…[publications are]…available through the Web, not on the Web." So, not many current publications really take advantage of the new capabilities that the Web has offered us to showcase dynamic data sets. In fact, while Science and Nature--just to name two prominent examples of scientific journals--make available HTML versions of their articles, it seems like most of the interactivity is limited to looking at larger versions of figures in the articles*. I myself usually just download the pdf version of articles rather than viewing the HTML version. This obstacle, however, is not a fundamental one; it is only the current situation.

The more serious obstacle that Hayes foresees in transitioning to dynamic graphics is one of archiving. Figures in journal articles printed in 1900 are still readable today, but there is no guarantee that a particular file format will survive in usable form to 2100, or even 2020. I do not know the answer to this conundrum. A balance might need to be struck between generating static and dynamic data. At least in the medium term, papers should probably also contain static versions of figures representing dynamic data sets. It is inelegant, but it could avoid the situation where we lose access to information that was once there.

That said, if the New York Times can do it, so can we. We should not wait to make our data presentation more dynamic and interactive. At first, it will be difficult to incorporate these kinds of figures into the articles themselves, and they will likely be relegated to the "supplemental material" dead zone that is infrequently viewed. But the more dynamic material that journals receive from authors, the more incentive they will have to expand upon their current offerings. Ultimately, doing so will greatly improve the quality of scientific discourse.

* Whether the lack of dynamic data visualization on these journals' websites is due to the authors not submitting such material or due to restrictions from the journals themselves, I do not know. I suspect the burden falls more on the authors' shoulders at this point than the journals'.

Posted in  datavisualization publishing interactivegraphics

Author  Pascal Mickelson

Take the INNGE survey on math and ecology

Author  Scott Chamberlain

Many ecologists are R users, but we vary in our understanding of the math and statistical theory behind models we use. There is no clear consensus on what should be the basic mathematical training of ecologists.

To learn what the community thinks, we invite you to fill out a short and anonymous questionnaire on this topic here.

The questionnaire was designed by Frédéric Barraquand, a graduate student at Université Pierre et Marie Curie, in collaboration with the International Network of Next-Generation Ecologists (INNGE).

Posted in  R math statistics ecology

Author  Scott Chamberlain

Scraping Flora of North America

Author  Scott Chamberlain

So Flora of North America is an awesome collection of taxonomic information for plants across the continent. However, the information within is not easily machine readable.

So, a little web scraping is called for.

rfna is an R package to collect information from the Flora of North America.

So far, you can: 1. Get taxonomic names from web pages that index the names. 2. Then get daughter URLs for those taxa, which then have their own 2nd order daughter URLs you can scrape, or scrape the 1st order daughter page. 3. Query Asteraceae taxa for whether they have paleate or epaleate receptacles. This function is something I needed, but more functions will be made like this to get specific traits.

Further functions will do search, etc.

You can install by:

install.packages("devtools")
require(devtools)
install_github("rfna", "rOpenSci")
require(rfna)

Here is an example where a set of URLs is acquired using function getdaughterURLs, then the function receptacle is used to ask whether of each the taxa at those URLs have paleate or epaleate receptacles.

Posted in  R scraping

Author  Scott Chamberlain

RNetLogo - A package for running NetLogo from R

Author  Scott Chamberlain

Described in a new Methods in Ecology and Evolution paper here, a new R package RNetLogo allows you to use NetLogo from R.

NetLogo is software is a "multi-agent programmable modeling environment". NetLogo can be used in individual- and agent-based modeling, and is used in the book Agent-based and Individual-based Modeling: A Practical Introduction by Railsback & Grimm.

I have not tried the package yet, but looks interesting. I am always a fan of running stand-alone programs from R if possible.

Posted in  NetLogo R

Author  Scott Chamberlain

Taking a Closer Look at Peer Review

Author  Pascal Mickelson

This post is only tangentially about open science. It is more directly about the process of peer review and how it might be improved. I am working on a follow-up post about how these points can be addressed in an open publishing environment.

A recent paper on the arXiv got me thinking about the sticking points in the publishing pipeline. As it stands, most scientists have a pretty good understanding of how peer reviewed publishing is supposed to work. Once an author—or more likely, a group of authors—decides that a manuscript is ready for action, the following series of events will occur:

  1. the authors submit the manuscript to the journal of choice;
  2. the journal's editor makes a general decision about whether the article is appropriate for the journal;
  3. in the affirmative case, the editor selects referees for the manuscript and sends them the text for review;
  4. the referees return reviews of the manuscript (the referees are not typically identified to the authors);
  5. the editor makes the decision to reject the manuscript, accept it with minor revisions, or accept it with major revisions. Rejected manuscripts usually start over the process in another journal. Minor revisions to accepted manuscripts are usually made quickly and publication proceeds. In the case of major revisions, the suggested changes are made, if possible, and the manuscript is returned to the editor. At this point, the referees may get a second crack at the material (but not necessarily), before the editor makes a final accept/reject decision based on the feedback from the referees.

Peer review of manuscripts exists for several reasons. For one, self-regulation determines the suitability of the material for publication if it was not already obvious to the editor of the journal. Having peer reviewers also improves the material and its presentation. Furthermore, having expert reviewers lends credibility to the work and insures that misleading, wrong, or crackpot material does not receive the stamp of credibility. Finally, finding appropriately skilled referees spreads the workload beyond the editors, who may not have the resources to evaluate every paper arriving at their desk.

Though peer review has a storied history, it also has its drawbacks. First, and perhaps foremost, the process is often a slow one, with many months elapsing during even one round of communications between the authors, the editor, and the referees. Peer review is not always an objective process either: referees have the power to delay, or outright reject, work that their competitors have completed, and thus they may lose their impartiality in the process. Additionally, the publishing process does not reveal the feedback process that occurs between authors and referees, which can be a scientifically and pedagogically valuable exchange.

One proposal to address the shortcomings of the peer review process (alluded to in the first paragraph) was posted by Sergey Bozhevolnyi on the arXiv, a pre-publication website for many physics-related manuscripts. Bozhevolnyi calls his model of publishing Rapid, Impartial, and Comprehensive (RIC) publishing. To him, "rapid" means that editors should approve or reject manuscripts before the manuscripts are sent to the referees for review. Then, "impartial" means that referees, who might otherwise have an interest in rejecting a perfectly fine paper, lose the power to dictate whether or not a manuscript is published. Instead, the referees critique the paper without assessing whether it is publication-worthy. Lastly, "comprehensive" involves publishing everything having to do with the manuscript. That is, all positive and negative reviews are published in conjunction with the all versions of a manuscript.

The primary benefit of RIC, according to Bozhevolnyi, is that it saves the energies of authors, editors, and referees, thus allowing them all to do more research and less wrangling. Since most papers are ultimately accepted somewhere, then we should not cause additional delays in publishing by first rejecting them in multiple places. Instead, collate the manuscript and the reviews and publish them all together, along with any revisions to the manuscript. Having the reviews be publicly viewable will encourage referees to be more careful about writing their critiques and supporting their assertions, and the process as a whole will be more transparent than it currently is.

Before I critique the RIC publishing proposal, I should point out that some aspects of the proposal are very appealing. I particularly like the idea of publishing all reviews in addition to the manuscript. That said, I find it difficult to believe that the incentives for authors and referees change for the better under this proposal. For example, what happens if authors receive feedback, do not wish to invest the time to address the critique, and subsequently allow the original manuscript and the reviews to stand as they are? This situation seems like a moral hazard for authors that does a disservice to the quality of scientific literature. On the part of the referees, does removing decision-making authority make reviewing less appealing? Disempowering the referees by potentially ignoring their critique and only counting it as a minor part of the publishing process will not motivate them to write better reviews. In the case of editors, what makes us believe that an editor, or an editorial board, has the background to properly evaluate the suitability of work for acceptance into a journal? The reason we have referees in the current peer review system is because they have the very expertise and familiarity needed for this task.

Does the fact that Bozhevolnyi's RIC proposal does not make sense mean that peer review is fine as it is? I do not think so. Instead, it is worth asking what parts of peer review we like and what parts we would like to improve. I posit that rejection, or the threat of rejection, is the greatest motivator for authors to make necessary changes to their manuscript. As such, rejection by peers is still the best way to require and receive revisions. Though I think that referees should retain their rejecting power (and their anonymity!), I feel strongly that the entire peer review process would benefit from the increased transparency and accountability that publishing unsigned reviews would add. As far as editors, they play a role in shaping the kind of journal they run by selecting appropriate material on a general level, but they should not play too large a role in determining the "important" research in any field. The model used by the journal [Public Library of Science One][] is a promising one in this regard, with the only acceptance criterion being whether the science is sound.

The amount of time that it takes to publish is one of the most frustrating aspects of peer review, however. Journals could voluntarily publish time-to-publication figures, a number which could then be used by authors—along with impact factors and acceptance rates—to decide which journals to submit to. For instance, an editor of the Journal of Orthodontics writes about just this fact in an editorial. A Google search for "journal time to publication" reveals that people have been thinking about this problem for a while (e.g. computer science comparisons), but no general standard exists across journals. In fact, I suspect these are numbers most journals are afraid will hurt them more than help them. Nevertheless, journals acknowledge the demand for rapid publication when they offer services like [Springer's Fast Track publishingfasttrackpub or Physical Review's Rapid Communications.

Ultimately, it may not matter what journals do because authors are routing around this problem via pre-publication archives such as the arXiv for physics-related subject matter. Though not without complications, especially in the health sciences (see, for example, "The Promise and Perils of Pre-Publication Review"), pre-publication allows authors to communicate results and establish priority without stressing about getting through the peer review process as fast as possible. Instead, the process takes its normal, slower course while authors move along their on-going research.

I will conclude by leaving an open question that I may address in a future post: how do you encourage peer reviewers to do the best possible job, in a timely manner, without only relying on their altruism to doing good science and being good members of the community? It is this question about peer review, I feel, that is the most fraught with complication and subject to the law of unintended consequences if the incentives are changed.

Posted in  thoughts publishing peerreview

Author  Pascal Mickelson

Fork me on GitHub