15 Writing papers

On 2020-04-06 I presented at how to write papers using Google Docs. The video and notes mostly highlight these tools including Lean Library Workspace (formerly SciWheel and F1000Workspace) and CrossReference.

Most recently I presented at the LIBD rstats club more ideas behind writing papers. Here are the notes.

Here’s a <2 min video on exporting references from Lean Library Workspace (formerly SciWheel).

15.1 Things to keep in mind

While you’ll want to check those videos and notes, a lot of it comes down to being organized and using the right tool for the job.

In more detail, you’ll want to keep track of software versions as you develop your project so you have the information readily available when you write the methods. You’ll want to use big font sizes and consistent colors across your figures so you don’t have to re-make them later. Once you have a journal in mind, you’ll want to check the guidelines from that journal and potentially use their LaTeX template in Overleaf, although for a general journal, Google Docs works quite well and is easier to use by everyone. With both Overleaf or Google Docs (through the Cross Reference addin), you’ll definitely want to use automatic figure, table, equation, and citation numbers. Doing so will enable you to easily re-arrange your figures/tables/equations/citations and avoid having to manually update all numbers which would be very painful. Most journals will also require that you cite figures and tables in order of appearance and that all supplementary materials are referred to in the main text (introduction, results, discussion sections) or even be linked to a main figure/table. Funding is quite important too, so make sure that you check with all co-authors that all their funding sources are listed. Once the authors are defined, you’ll want to have their email, affiliation name(s), affiliation address(es), and ORCiD so you can fill that information in the journal website ²². Similarly, you and the team will want to think about potential reviewers which involves getting their emails and affiliations.

A paper has a lot of components and you’ll need to check that the information provided is consistent, include small things such as (A) vs (a) in the figures. So it takes a while to write a paper and it’s best to subdivide it into many small tasks so you don’t feel overwhelmed. That way you can check off items off your list. If “write the paper” is the item in your list, it’ll feel like it never ends.

15.2 Writing process

When we start a paper, we like to write bullet points for the different sections or results we want to show, as well as write in text the figures we would like to make. This makes it easier to then write one short paragraph at a time and to grow the paper that way while keeping an overall idea of the flow of the paper in mind.

Editing is also easier to do than writing from scratch. So you can expect to receive edits ²³ in the process of writing a paper. Other labs ask the first (co-first) to write a full draft. We prefer to write the paper together, although the lead writers might write most of the first versions for each paragraph while others will mostly edit. Given this style we use, it’s useful to re-read the final draft to make sure that we are consistent throughout the paper. For example, data set vs dataset.

Also, don’t be shy with the methods and supplementary material. Typically, journals allow us to include as many details as we want. That way, you’ll be able to refer back to sections of you paper in the future when people ask you how you did X or Y part of the paper.

In a similar sense, don’t be shy about tagging others in your paper in case you need feedback on a section you just wrote or you need others to help you write. We are a team in the end and are in this together =)

15.3 OneDrive: publication files

In OneDrive ²⁴, use a directory like submission_01 for the first submission as there will be multiple rounds for any paper. Follow this general structure for the publication files:

submission_01
- bioRxiv author information
- Manuscript: for the PDF/docx version of the manuscript and other files required by the journal such as the table describing software and methods used, the cover letter, etc.
- Figures: include a directory for every single Figure such that we can keep all files related to that figure in that directory and use Adobe Illustrator to link the different panels together.
  - Name the directory with the same name you use for CrossReference. For example, fig_overview. if overview is the name you use in CrossReference. That makes it easy to remember what are the names you are using in CrossReference.
- Tables: if the tables are small, consider combining them into a single Excel file where you include one sheet with the table legends. Keep the advice from Karl Broman and Kara Woo in mind (DOI 10.1080/00031305.2017.1375989) when making the table files.
  - In many journals, a table counts the same as a figure in terms of display items. We thus tend to not use tables (only supplementary tables).
- SupplementaryFigures: similar to Figures, you want to have each directory name to include the CrossReference code used for the figure. See the Mendelay Data example mentioned further below.
- SupplementaryTables: similar to SupplementaryFigures.
  - If you have related supplementary tables, you could use an Excel file with multiple sheets.
  - Remember to describe all columns, such that others can understand them without absolutely having to send an email asking about the table: you want to minimize getting lots of emails about them! Though of course, getting the occassional email because something wasn’t clear is ok.
submission_02
- Manuscript: revised manuscript, response to reviewers, cover letter, etc.
- Figures: any revised figures. Sometimes we copy them all if most of them have some modifications or if the numbers changed.
- Tables: any revised tables, similar to Figures.
- SupplementaryFigures: similar to Figures.
- SupplementaryTables: similar to SupplementaryFigures

Unlike version controlled files in GitHub, for all files related to the publication, we will keep multiple copies of them across submission versions.

15.3.1 Examples

See Regional heterogeneity in gene expression, regulation and coherence in hippocampus and prefrontal cortex across development and in schizophrenia. Collado-Torres et al, Neuron, 2019 on Mendeley Data which contains our final Figures/, SupplementaryFigures/ and SupplementaryTables/ directories. We shared all full resolution images as some were large files.

A second example is Figures_Maynard_Collado-Torres_et_al_Nature_Neuroscience_2021.zip that contains the Figures/ and SupplementaryFigures/ directories for that paper. This included large .tiff images with RNAscope results that we thought others might be interested in using at some point, since the versions of the images on the paper itself are not high-resolution enough to extract data from them.

Both Mendeley Data and Figshare give you a DOI that you can refer to in the manuscript of the paper associated with the figures and tables that you are sharing.

15.4 Revisions

When responding to reviewers, make a new Google Doc where the feedback from the reviewers is in bold or some other font, and add placeholders for responses. You can then tag the co-author(s) that can help the most with the response for each point.

As examples, check:

the BrainSEQ Phase II Google Doc with code at LieberInstitute/brainseq_phase2 as well through Zenodo. The final version is available at DOI 10.1016/j.neuron.2019.05.013.
the spatialLIBD project Google Doc with code at LieberInstitute/HumanPilot as well as through Zenodo. The final version is available at DOI 10.1038/s41593-020-00787-0. For an Overleaf example, check the spatialLIBD software paper.

15.5 Details

Here are more details:

Title: check character limits.
Author list: affiliations numbers have to appear in order.
Affiliations: include zip code and country.
Abstract / summary: check word limits, don’t include citations.
Introduction: motivate the problem we are trying to solve and give an short summary of what we did. Remember to define any acronyms.
Results: describe what we found. For how we did things, refer to the appropriate methods section.
Discussion: any speculation goes here as well as potential drawbacks to the work we did. You’ll also want to highlight the main results and conclusions from our work.
Acknowledgements: information about the donors as well as any requests from consortiums like GTEx.
Funding: include full grant IDs and who they supported.
Author contributions: use the CRediT contributor roles taxonomy
Conflicts of interest: in case any authors worked for companies, list that info here.
Figure/table legends: figure/table number and figure/table title should be in bold font. Then the description in plain text. I also like to use (A) instead of (A) whenever the description of a sub-panel begins. Some journals will ask that you also link to all supplementary figure/tables from the main figures. Check also the allowed text length for figure legends.
Methods: use sub-sections for the different parts of the work and sort them in order of appearance from the Results section. Use version numbers and a different font style for software / functions to make it easier to identify those words as names of tools. Cite every single software you used! Don’t interpret results here, that should be in the Results section.
Data and software availability: link to GitHub and cite using Zenodo which provides DOIs for GitHub repositories as shown in this guide. Note that we start most of our GitHub repositories as private, but we make them public by the time we post a pre-print. You’ll want to have a high quality README.md in your repository that shows how the work should be cited and guides visitors to the different components included in the repository. Several journals will require to you also complete more detailed forms such as this Reporting Summary example from one of our papers.

Finally, feel free to sign up for a Data Science guidance session with anyone in the team that recently wrote a paper.

15.6 Styling

Below we describe some styling guides that are useful to write papers uniformly across projects. These are based on our experience across a few journals as well as being aimed at helping our readers digest our results.

Google Docs:

use heading 1 for big section names like Introduction/Background, Results, Methods, etc.
use heading 2 for subsections like specific subsections of the Results. Change the styling such that heading 2 uses an underline.

General styling guides:

For function names or code elements (such as R object names) use code font (Google Docs: Courier New). That will make it easier for readers to notice if a word is linked to a coding concept, as function names can sometimes be English verbs / words that someone who is not familiar with them might confuse for actual English words. That is, use this different font to provide context to the readers.
- Only specify the non-default arguments of the key functions used. In general, there’s no need to spell out the default arguments: the software should document them and readers can find them for specific versions of software we used.
Use italics for software names. This context will again help users differentiate software names from actual English words. Think of recount versus recount. recount is referring to an actual software, whereas recount refers to an English verb.
Show the version number of the software ²⁵ such as “recount3 v1.12.0”. Use just the v prefix instead of typing version, particularly when word count limits are in play.
Cite the software used when the version of it is mentioned (even if was cited previously in a different Methods subsection).
- Check that you are spelling the software name correctly, particularly as capitalization matters quite a bit: it’s not uncommon to find two tools with the same name but different capitalization. Text editors sometimes auto-capitalize the first letter when you start a sentence, even though it should have been lower case. For example: “recount3 v1.12.0 was used for this analysis.” is correct instead of “Recount3 v1.12.0 was used for this analysis.”.
- For R packages you can use citation("packageName") to find the citation information. For Bioconductor packages, you can also see this information at https://bioconductor.org/packages/packageName.
  - Some packages like spatialLIBD might specify more than one citation. You need to manually check which are the correct citation(s) for what you use the package for.
- R functions can specify how to cite specific functions. Check the documentation with ?packageName::functionName to see if such details are specified. See for example ?stats::p.adjust where you can find that the citation for Benjamini and Hochberg’s method multiple testing adjusting via control of the the false discovery rate is from doi: 10.1111/j.2517-6161.1995.tb02031.x.
Use the Oxford comma for iterations, such as “hello, hi, and goodbye” where , highlights the Oxford comma.
Do not use words such as interestingly, remarkably, surprisingly as noted here and elsewhere.
Do not use synonyms for technical terms. Just stick to one term (and maybe one acronym for that technical term) even if this makes the writing a bit boring. Words have many definitions that are context dependent, and if you try to use many synonyms you are likely going to confuse your readers more, even though your goal might have been to keep the readers entertained.
Be consistent: avoid multiple acronyms on the text and figures for the same concept. Just stick to one and make sure the spelling is consistent throughout the paper: you likely will have to double check if others are editing the paper (or adding figures / tables) and introducing this type of typo.
Figure captions should follow a syntax like this:
- Figure 1: Figure title in sentence case. (A) Description of the (A) panel. (B) Description of the (B) panel, which could refer back to the (A) panel without using bold.
- Note how in the description of Figure1B we call back to the (A) panel with regular font, whereas to describe the (A) panel we did use a bold font for the letter A. This use of the bold font for the first mention (aka description) of a panel helps readers visually find where the description of each panel starts. Here are some incorrect formats:
  - Figure 1. That is, there was no title for this figure. (A) Description of the (A) panel. (B) Description of the (B) panel, which could refer back to the (A) panel without using bold.
  - Figure 1: Figure title in sentence case. (A) Description of the (A) panel. (B) Description of the (B) panel, which could refer back to the (A) panel without using bold but now we don’t easily know where (A)’s description starts since we have many mentions of (A).
  - Figure 1: Figure title in sentence case. Description of the (A) panel (A). Description of the (B) panel, which could refer back to the (A) panel without using bold. But then has more information (B). Aka, we no longer know where the description for (B) starts. It’s best to have the panel in bold at the beginning of its description than at the end.
- Keep the figure titles short: ideally 1 line, max like 1.5 lines long.
Supplemental figures or tables should refer back to a main figure and/or table at the end of the descriptions.
- Example: “Supplementary Figure 1: supp fig title. Supp fig 1 description. Related to Figure 1.”
- Using this “Related to …” syntax will help users easily build a mental network of how the different display items are related to each other. That is, it will help them navigate through the main / supplementary display items in the paper.
- On the reverse, main display items can also use a “See also …” syntax as was the case in the BrainSEQ Phase II paper.
For supplementary tables, think about it from a reader / user perspective: “is this data that I would like to be able to easily access?” If the answer is yes, then you should add a supplementary table with that information.
- As a bonus, on GitHub, you likely want to document where each supplementary table was made as showcased in the living brain re-analysis study.

Let us know if we are missing any styling details that we haven’t documented!

15.7 Promoting our work

If you are new to Bluesky or LinkedIn or Twitter/X or other (work) social media platforms for sharing your work, you should check out this excellent talk by Mara Averick at RStudio Conf 2018 on Phrasing: Communicating data science through tweets, gifs, and classic misdirection.

On several platforms, you can draft your new messages (even a chain of them typically known as a thread or tweetorial 🧵) which is very useful to see how each platform is counting the number of characters you can use. Particularly, different platforms count differently how many characters are needed for a link. It’s best to draft messages using each platform and save your drafts before asking anyone in the team (through a DSgs) for help reviewing a message before you make it public. Typically, you cannot edit messages and thus cannot fix typos that easily (unless you delete the message and re-post it), so you should take your time to review carefully any message you post.

Use of emojis is encouraged 🙌🏽. Using photos / screenshots or gifs can also help make your message more visually appealing. You should also share about your own personal experience from the project: what you liked, what you learned, and what you are looking forward to. It doesn’t all have to be about science. In general, keeping a positive tone is best when promoting your work. Use hashtags for common concepts others might use to group messages (search them first as they could be NSFW words you didn’t know about!). Ask around if you are unfamiliar with any appropriate hashtags. Some generic ones are #RStats and #AcademicSky. Tag accounts from other co-authors or other entities involved: aka, share the credit and encourage others to re-share your messages. You can find several of our work social media accounts in the Google Sheet with information about our team members.

As your first message on a thread / tweetorial 🧵 is the one that gets shared the most, it’s best to include the link to the bioRxiv / medRxiv / arRxiv pre-print or the journal DOI on that first message. Doing so will help the AltMetric score of your paper. See https://biorxiv.altmetric.com/details/142601373 for an example. We have sometimes quoted Tweets, blog posts, and/or AltMetric scores of our pre-prints on the cover letters to journal editors as examples of interest from our peers in our work.

To further help spread the word, ask your co-authors to quote-tweet (aka include a link) your first message in your thread / tweetorial. That way, they don’t have to include the link to the pre-print / journal since you already have that link on your first thread message. Co-authors can then write their own threads / tweetorials 🧵 to highlight their specific contributions and experience with a project. That really helps highlight the diversity of our contributions and expertise, plus what the project meant for everyone.

Check https://lcolladotor.github.io/cv/#publications for example of previous Twitter summaries for papers (aka threads / tweetorials 🧵) if you want to see some specific examples. Some of them are:

spatialDLPFC by Louise https://twitter.com/lahuuki/status/1626686409313763328 which was quote-tweeted by several co-authors https://twitter.com/lahuuki/status/1626686409313763328/quotes.
habenulaPilot where Leo shared on Twitter/X, Bluesky, and LinkedIn to reach different co-authors and audiences. Not all co-authors or people whom Leo acknowledged use all social media platforms, which you can notice as only some co-authors are tagged in each social media platform.

If you are promoting a LIBD rstats club video, make sure to tag it so it can re-share your work. Aka https://twitter.com/LIBDrstats or https://bsky.app/profile/libdrstats.bsky.social.

When in doubt, sign up for a DSgs with those in the team whom have promoted our work in the past! Just have your draft message(s) ready.

Note that bioRxiv has an author information template table that you can use.↩︎
Likely using suggestion mode in Google Docs. Also note that both Google Docs and Overleaf allow you to name versions, which makes it easier to refer back to them and compare against them.↩︎
In the past we used Dropbox, but because OneDrive is free through LIBD, we switched to it.↩︎
For software with no version numbers but that is available from GitHub, use the SHA from the Git commit identifying the exact version you used. For example, “SPEAQeasy v51ffb22” for this specific version of SPEAQeasy.↩︎