- Heather Caslin
Understanding scientific papers- the audience and what's inside!
With the huge rush to understand COVID-19, develop a vaccine, and develop therapeutics, we've seen a huge rush to publish data within the scientific community and a huge rush to report it to the public since the beginning of the pandemic. And the urgency makes sense, because this virus is new and we need to communicate accurate health advice and make the best possible decisions to make in order to preserve life and our livelihoods. We are still making major decisions like whether or not to mandate masks, keep industries closed (like the concert industry), put in kids in schools vs. lose childcare and socialization, and we are still debating how we can return to normal-ish to keep our economy chugging along vs. how bad the virus will spread. We need to use all of the most up to date available data to do so.
But is reporting all of the data as fast as possible always the best option?
Only if we fully report and understand what the data means.
Individual data and scientific articles add to our overall knowledge, but alone they are just a small piece of the puzzle (as Samantha Yammine demonstrated here). Different study designs and study population have their own benefits and limitations and a combination of different study designs and reproducibility is important for complete understanding. You are seeing this play out in action now. Incremental knowledge is gained with each new study on COVID-19, but this doesn't always look like a straight path. Sometimes the data contradicts other data, sometimes we learn that we actually weren't certain on something when examined from a different angle. The research process really looks more like a game of Chutes and Ladders than a walk down the yellow brick road in The Wizard of Oz, and it often takes years- decades depending on interest and financial support and just how complicated a virus is to understand and treat.
In support of science, we've learned a lot really fast, and the premises that COVID-19 is fairly transmittable, COVID-19 has a morbidity and mortality rate that should categorize it as a serious threat, COVID-19 requires widespread action to mitigate (in our individual lives and at the federal level), and that our political, economic, and healthcare systems were not prepared for a widespread outbreak are all still true.
So why does it seem like the advice and the science are constantly changing?
You can blame some of this on communication. Both from scientists and from journalists. And partly because we've been reporting it really early. When a scientist writes a scientific paper, we are writing it for our peers- the peers that are in our field of research, already know most of the background, and are familiar with the methods and potential limitations. This is often the reason that jargon is used so heavily, and the reason that we assume a lot is known on a topic when writing the paper. And as scientists, we are also more likely to be recognized for our work and more likely to get grants and promotions if we can get more people to read and cite our papers, so we write our papers to communicate how exciting and important the results are. Scientists have also moved towards uploading their papers as "pre-prints" online before they've been peer reviewed- or critiqued by other researchers. So with COVID-19 research, you're seeing a rush to collect data and a rush to publish, which sometimes means that we're publishing incomplete stories or publishing mostly correlative data, without knowing causal factors. And when a journalist writes an article, they are hoping to be the first to cover a hot new topic or take, and pre-prints are especially new and hot. While there are pros to why and how scientists and journalists communicate research, you can probably see how the general publicized conclusions may be less than accurate when they are fairly preliminary, when they leave out assumed limitations, and when every story is sold as a hot topic. Science communication is often like a game of telephone, but with each person having different interests and different stakes in the message being spread. Samantha Yammine has more information talking about science communication in the era of COVID-19 here.
So what should we do with this information? Let me me first say that you should always be wary of pre-prints. Continue to follow up and see if the article actually makes it through peer review (a process that can take many months to a year or more) and if the resulting conclusions remain the same. Many will- but a few may not.
I will also quickly break down a scientific article and what you should expect to find in all- preprints and published articles. While the journals that publish scientific research often have their own standard formats, there aren't a lot of required practices in scientific writing, so some of this will differ by journal or field or even by author. There's also a large variation in how we write. Ten or twenty years ago, it was more common to see passive voice and third person voice, but we have shifted to more active and first person language which does improve readability! Style is very individualized based on author preferences, so you'll find a lot of variation the more you read. But generally, you can expect to find the below.
The abstract is the summary of the paper. It should tell you the main question and main conclusions in order to tell the reader if they should spend the time to read the paper. This is helpful because we're often reading hundreds of papers to truly understand what's known in a field and it can be a lot of work to read additional papers that aren't directly relevant to our interests and questions. It's also assumed that if you want to cite a paper or discuss the data or conclusions, that you'll read the whole paper, so limitations and how the data fit in with the other data in the field are rarely discussed here.
The introduction is the best place to go to find out what research question the author set out to answer. A good introduction should introduce you to what's known, what's unknown, and why it's important to study. Research often aims to fill in a gap in knowledge that incrementally or substantially advances our understanding of. For this reason, introductions either introduce the reader to a debate over contradictory data in the field, or build up their question with a series of studies that lead to their question of interest. Either way, some contradictory data and controversy is often left out of the introductions to set up a clear question- expect to dig further into that in the discussion!
The methods should provide enough information about how experiments were completed for other scientists to understand and replicate the data, however what's included is again dependent on the authors and the reviewers (who approve the paper for publication and can ask for more information in areas where they think more info is needed). Important information in the methods include the subject population or samples (population of humans, type of mice, type of cells, etc), the type of research (basic science, clinical trial, retrospective study), the treatments or experimental groups (dose, length of treatment), and the primary outcomes measured. Most of the other nitty gritty details will be specific to the methods and analyses used, but unless you know the techniques or statistical analyses, you shouldn't worry about them too much on the first pass reading the paper. Because the methods and analyses could have major limitations though, I often look on Twitter to see if experts are discussing any of the limitations if I don't know them myself (at least for a brand new paper, it's harder to find public commentary on papers older than a few weeks).
The results and associated figures and graphs are probably the most important part of a paper. The information provided will be dependent on field though. Some fields just list the statistical analyses for different data reported, some provide more background, context, and conclusions. I personally like seeing the data listed in the context of the question it answers and the information it provides, but the data isn't less accurate when not reported that way. And either way, it's important to double check if the data actually match the author's conclusions and if the differences are statistically significant and clinically relevant. For example, a change by 5% of the immune cell messenger IL-6 would be rather small as it's not highly expressed during homeostasis (and this regularly ranges from 0-a few hundred or thousand with infection), but a change in 5% for HbA1c used to detect diabetes is very large (as a healthy number is below 5 or 6% and diabetes is diagnosed above 7% and can range up to ~12%). I would also look for controls and pay attention to the axes (esp the y axis) on each graph. More often than not, there's a good reason for normalizing data and using the variables chosen for graphing, but there may still be limitations. This page has a few introductory discussion of misleading axes which provide examples of what to look out for.
Finally the discussion section should discuss the data within the context of the field. It should dive deeper than the results section even the results provide context, and there should be limitations and future studies proposed here. That being said, you have a lot of creative freedom to choose what to say and how to frame your data within the field, and basic limitations such as a) these results were found in mice and need to be studied in humans, b) these results are specific to the subjects we recruited and need to be studied in other populations, and c) these results are specific to the conditions we used/ measured and should be supported by additional data may not be included as they are often assumed. So I would just be aware that even the discussion may not frame the data within the entire view of the field- scientists often disagree and there's a lot of data out there to discuss!
So for anyone interested in learning more about a study that you see in the news, or on social media, or find when doing your own research on the topic, I would be sure to at least browse them- even other's have already provide a summary. Look to be sure that the basic question being studied and the data provided match the conclusions reported. Look at the subjects/ samples (humans, mice, cells) and the experimental conditions. Look to see how this data fits into what's already known (and maybe do your own quick search here too), as it's possible that the original authors left out a few papers or that new work has been published since the initial publication.
And if you're a scientist or an editor of a scientific journal, consider writing or requiring a short summary statement that's free from jargon and helps others interpret the significance of your research in relationship to the field and the unknowns (tip by Samantha Yammine as well!).
Happy reading! And happy science communicating!