Initial impressions from testing Posit Connect
Two weeks ago I finished testing Posit Connect as a replacement to shinyapps.io. At the end of the trial period, I presented my conclusions during a LIBD rstats club session. Here you can read the detailed notes and watch the resulting video:
The video and notes dive more into the technical details and I also performed some live tests. Below I summarize the main points from our use case and solutions we have experience with.
Our use case
Overall, my team and colleagues at LIBD are in a position where we have:
- Over 75
iSEE
andspatialLIBD
web apps powered byshiny
and hosted onshinyapps.io
.- These apps have been deployed by 14 different people so far. That is, we have been able to teach new members or students how to deploy apps: anyone can do so with a bit of training.
- Our apps are typically shared publicly, so we don’t really need password-controlled access.
- We don’t share URLs for studies under development assuming that no one will correctly guess the URLs.
- Our data is stored in
SummarizedExperiment
,SingleCellExperiment
, orSpatialExperiment
objects. The single cell and spatially-resolved transcriptomics data typically involves over 50,000 cells or Visium spots across 20,000 or so genes. Some larger studies are over 100,000 or 200,000 cells/spots. The cells/spots are the columns and the genes are the rows in these large matrices of expression data. Many cells/spots don’t have expression data, so the matrices are sparse. For more on these types of objects check the video below.
- For spatially-resolved transcriptomics data,
SpatialExperiment
objects also contain images per sample (defined as a Visium capture area). These are typicallylowres
images that are maximum 600 x 600 pixels. One of our largest studies has 120 samples.- There are other images that we typically remove from the objects, such as this case which saved 3.29 GB of memory by dropping non-essential images in a study with 36 samples. The initial
SpatialExperiment
object used 8.81 GB in memory for this study.
- There are other images that we typically remove from the objects, such as this case which saved 3.29 GB of memory by dropping non-essential images in a study with 36 samples. The initial
- The expression matrices are typically saved as
dgCMatrix
objects from theMatrix
package. These are great for storing sparse data if you want it all in memory. However, for our larger studies, we typically just keep one matrix that has the library-size normalized counts (calledlogcounts
) and drop the un-normalizedcounts
.- In the same dataset with 36 samples, this meant another 2.6 GB saved in memory as shown here. This resulted in a reduced
SpatialExperiment
object of 2.92 GB in memory for this study containing only thelogcounts
normalized gene expression data and thelowres
images.
- In the same dataset with 36 samples, this meant another 2.6 GB saved in memory as shown here. This resulted in a reduced
- For larger studies, we can use
HD5Array
objects to store the expression data on disk and only load the data in memory when needed.- For a dataset with 120 samples, this results in a 505.20 Mb
SpatialExperiment
object in memory that has thecounts
,lowres
images, 36,601 genes, and 599,034 spots as shown here. The files on disk use about 1.5 GB of space: 1.4 GB for the.h5
file as noted here. - Actually, for this study, we have another
SpatialExperiment
object withlogcounts
andcounts
, no images, 28,965 genes, and 535,248 spots that uses 154.04 Mb in memory as noted here. However, the files on disk grow to use 13 GB of disk space for the.h5
file as shown here. I believe that’s because thelogcounts
which have decimal numbers take much more disk space to store them. This same object as a sparse memory object and with thelowres
images added used 10.69 GB of memory as noted here. HDF5Array
is awesome for saving memory, but it can take up a lot of disk space. It’s a great resource developed by Hervé Pagès from Bioconductor’s core team.- For more on
HDF5Array
and compressed data, check this video:
- For a dataset with 120 samples, this results in a 505.20 Mb
- We have observed that
iSEE
andspatialLIBD
need more memory than just the one needed for loading the object as reported on the memory usage graphs provided byshinyapps.io
. That’s because of some internal operations, and while both packages try to minimize memory duplication, it’s best to have more memory than twice the object size to prevent the apps from crashing: if they are unreliable, people won’t use them.- In practice, we try to deploy objects that use up to 3 GB of memory when using the 8 GB memory instances on shinyapps.io.
Use case summary
Overall we have been able to use shinyapps.io
for studies with R objects that use up to 3 GB of memory with sparse memory dgCMatrix
objects for storing the logcounts
normalized expression data. In the case of spatially-resolved transcriptomics projects, we only include the lowres
images. These objects can also be uploaded within the 3 to 5 GB file size limit as noted in the official shinyapps documentation. However, we have datasets where this is no longer possible. You either need more memory or the ability to save data in files larger than 5 GB.
Deployment options
We have considered the following options for deploying our apps:
- AWS-hosted shiny-server
- You choose how much disk space and memory you need on a given AWS instance
- You have to install R yourself and the open source
shiny-server
- You can only have 1 app deployed by AWS instance
- You upload the data and R code for the app yourself
- For us, a LIBD system admin has dealt with the AWS and
shiny-server
setup, plus the R updates. Then a second person (myself so far) has had to upload the data and keep the R code up to date. In practice, we rarely update this app. - Example: http://spatial.libd.org/spatialLIBD/ which at some point was deployed on an AWS instance with 128 GB of RAM. We wanted to make sure that this app could handle several dozen users.
- shinyapps.io
- If you have the best account (
Professional
; check their pricing information) you can have apps deployed with up to 8 GB of memory and max 3-5 GB disk space. - For our apps, we typically configure them to have 1 R process per instance, with 2 max connections per instance. Aka, up to 2 users per instance.
- Each app can run in up to 10 instances, so you can have up to 20 users per app.
- You can deploy the same R code and data under two apps (aka two URLs) or more if you want to have even more capacity.
- Deployment is done via
rsconnect::deployApp()
which has to upload your data every time and install packages too. While compiled R packages have increased the speed of this process, it can easily take 15-30 minutes per app. - Example: libd.shinyapps.io/spatialDLPFC_Visium_Sp09 from the spatialDLPFC project.
- If you have the best account (
- Posit Connect
- You can upload data to the server separate from the R code for the apps.
- This requires
ssh
access, but that’s fairly easy to implement.
- This requires
- It can auto-sync the R code from GitHub repositories every 15 minutes.
- It can auto-sync changes to the list of R packages also through GitHub repositories. This involves
manifest.json
files made withrsconnect::writeManifest()
.- It makes it incredibly easy to update many applications at once by just updating the
manifest.json
files. - Enables a much faster update process than
shinyapps.io
as we make updates tospatialLIBD
to add new features or fix bugs.
- It makes it incredibly easy to update many applications at once by just updating the
- Your server can have 200 GB of disk space and 31 GB of RAM (as the one I tested) or more resources.
- This unlocks the ability to upload
HDF5Array
-compressed data that has less memory requirements but larger disk space requirements. - Note that the server resources are shared among all apps on the server!
- My understanding is that
Posit Connect
has ways to deal with resources running out on a server, but I haven’t tested this feature.
- My understanding is that
- This unlocks the ability to upload
- You can upload data to the server separate from the R code for the apps.
Conclusions
Overall, I’m hopeful that we’ll get long term access to a Posit Connect
server. It will allow us to deploy our apps with larger datasets and more memory. It will also make it easier to update our apps and keep them in sync with our GitHub repositories. This will be especially useful as we continue to develop new features and fix bugs in our apps. For example, I typically test new spatialLIBD
features on one dataset. But we now have many datasets each with different properties, and it’s hard to test them all. Instead, with Posit Connect
, I could modify spatialLIBD
, then easily update several apps on Posit Connect
, and check them to see if the new features work as expected. This will make it much easier to maintain our apps and keep them up to date.
If I were to start today, I would use the organization that I tested at github.com/LieberInstitute/Posit_Connect_shiny_apps using only HDF5Array
objects to take advantage of the memory savings and faster app load times you get with those objects compared to dgCMatrix
objects (it takes several dozen seconds more to load a ~10 GB object compared to a 500 MB one). With just these changes, I think that our users will be happier with the apps. Simply being able to visualize our largest datasets would be a great win for us.
Of course, I don’t have pricing information for Posit Connect
yet. But we have reached the limits of what the best shinyapps.io
accounts offer. I believe that other options exist, but I haven’t tested them yet. Though from what I’ve seen, Posit Connect
works and I’m very confident that we could teach our team members and collaborators on how to use it effectively.
Regardless of where our apps are deployed, we also need to work on a way to document them all and make it easier for others to find them. I’m aware of iSEEhub
’s existence which makes an app that launches iSEE
apps from data hosted by ExperimentHub
. We might need to make something like an spatialLIBDhub
for our datasets.
PS. The two members from Posit with whom I interacted with for this test were very friendly and helpful. Thanks again!