The OpenScPCA Project: What We've Built Together in Year One
The Open Single-cell Pediatric Cancer Atlas (OpenScPCA) project is one year old, and there is much to celebrate! For the past year, we’ve worked closely with pediatric cancer experts to analyze data from the ScPCA Portal, improving its utility for researchers everywhere. Our focus has been on adding reliable cell type annotations across samples on the Portal, but the journey has been much more than that.
As we’ve collaborated with an expanding community of contributors, we’ve gained valuable insights that have led to the development of new features, making it easier than ever to participate in OpenScPCA. We've also reached a point where the broader research community can begin using our analysis modules to fuel their own experiments. In this post, we’ll look deeper at these milestones and share new ways for contributors to join this ongoing effort.
Developing features to accelerate contributions
From a researcher’s initial idea to their final submission, careful review and ongoing feedback from the Data Lab team are integral to the contribution process. Through this approach, we’ve identified commonalities and trends across planned and active analysis modules, highlighting areas for improvement and uncovering opportunities to support contributors better. Here are some examples of modules we’ve developed to help accelerate contributions!
doublet-detection
Many contributors wanted to filter doublets in their proposed data processing pipelines. Ideally, to ensure consistency among analysis modules, we would like contributors to use the existing processed ScPCA datasets as a starting point. However, our ScPCA pipeline was not previously set up to detect or filter doublets. We decided to create the [.inline-snippet]doublet-detection[.inline-snippet] module!
We put this module into production in [.inline-snippet]OpenScPCA-nf[.inline-snippet], the pipeline that generates results for mature modules. With the [.inline-snippet]doublet-detection[.inline-snippet] module, we ran [.inline-snippet]scDblFinder[.inline-snippet] across all ScPCA samples and made these results available to contributors.
rOpenScPCA
We also found that many researchers wanted to use clustering to explore and/or validate cell type annotations. However, many different approaches for clustering exist, and we know from previous experience that getting reliable clusters is challenging. We also wanted a platform to provide functions to support other common analyses. We decided to write an R package called [.inline-snippet]rOpenScPCA[.inline-snippet] to offer a set of consistent functions for contributors!
So far, we've added functions in this package to perform and evaluate graph-based clustering, map Ensembl gene identifiers to gene symbols, and convert SingleCellExperiment objects to Seurat objects. We’ve even created usage examples for [.inline-snippet]rOpenScPCA[.inline-snippet] clustering functions, available in our [.inline-snippet]hello-clusters[.inline-snippet] analysis module. We plan to expand this module with more functions as we learn from future contributors!
seurat-conversion
Previously, all ScPCA data were provided as [.inline-snippet]SingleCellExperiment[.inline-snippet] objects. But we expected that some contributors would prefer to work with [.inline-snippet]Seurat[.inline-snippet] objects. We recently made [.inline-snippet]Seurat[.inline-snippet] files available for all processed objects from the ScPCA Portal using the steps outlined in the [.inline-snippet]seurat-conversion[.inline-snippet] module. The [.inline-snippet]Seurat[.inline-snippet] objects are not directly available for download from the Portal, but contributors can quickly obtain them from the OpenScPCA project results bucket. Now, contributors can directly start their analysis from a Seurat object!
cell-type-consensus
The [.inline-snippet]cell-type-consensus[.inline-snippet] module generates an ontology-aware consensus cell type label for every cell across all ScPCA samples. We obtain cell type annotations for samples from the ScPCA Portal using [.inline-snippet]SingleR[.inline-snippet] and [.inline-snippet]CellAssign[.inline-snippet]. When both methods agree on the cell type annotation for a sample, this module creates a unified consensus label. Contributors can easily access a TSV file from the workflow results bucket, which includes the consensus cell type labels for all samples within an ScPCA project.
OpenScPCA wouldn’t be possible without the researchers who have contributed so far. We’ll continue expanding features that enable contributors to work more efficiently and to improve their overall experience!
Delivering results for the pediatric cancer community
Creating publicly available cell type reference datasets across 55 pediatric cancer types is an ambitious goal, but we’re making progress! Researchers have submitted 14 analysis proposals, with four of them (so far) resulting in completed analysis modules. In addition to supporting contributors, the Data Lab team is also contributing our own analysis ideas.
🚀 Even if you’re not an active OpenScPCA contributor, you can still use these results in your downstream analysis! Anyone can download data and cell types from the ScPCA Portal and run the code by following along with the OpenScPCA module you’re interested in. Here are some analysis modules developed to assign cell types to an ScPCA project!
cell-type-ewings
If you read our last blog post by Data Scientist Ally Hawkins, PhD, you’ve heard about the [.inline-snippet]cell-type-ewings[.inline-snippet] module she developed as a reference for future OpenScPCA contributors. This was also the Data Lab’s first attempt at annotating cell types for an entire ScPCA project! Ally has now wrapped up this module, which assigns cell types, including normal cell types and tumor cell states, to all Ewing sarcoma samples in [.inline-snippet]SCPCP000015[.inline-snippet]. Explore the [.inline-snippet]cell-type-ewings[.inline-snippet] module!
Community-contributed modules
Last year, we awarded three OpenScPCA contributors who completed assigning cell types to four ScPCA projects. These researchers openly analyzed data from acute lymphoblastic leukemia (ALL) and Wilms tumor and made their results available in the [.inline-snippet]OpenScPCA-analysis[.inline-snippet] repository.
Early T cell Precursor Acute Lymphoblastic Leukemia (ETP T-ALL) Annotation
- Project ID: [.inline-snippet]SCPCP000003[.inline-snippet]
- Number of samples: 30
- Contributed by: Jui Wan Loh, PhD, UT Southwestern Medical Center
- Explore this module
Non-ETP T-ALL Annotation
- Project ID: [.inline-snippet]SCPCP000003[.inline-snippet]
- Number of samples: 11
- Contributed by: Jui Wan Loh, PhD, UT Southwestern Medical Center
- Explore this module
Wilms Tumor Annotation
- Project ID: [.inline-snippet]SCPCP000006[.inline-snippet]
- Number of samples: 40
- Contributed by: Maud Plaschka, PhD, St. Anna Children's Cancer Research Institute (CCRI)
- Explore this module
Wilms Tumor Annotation
- Project ID: [.inline-snippet]SCPCP000014[.inline-snippet]
- Number of samples: 10
- Contributed by: Jingxuan Chen, PhD, UT Southwestern Medical Center
- Explore this module
Check out GitHub Discussions to see more analysis ideas that researchers have proposed or are currently working on!
Awards for sharing your expertise!
We are thrilled with the outcomes from our first year of OpenScPCA, but we are just getting started! We know there are more ways for researchers to contribute their expertise, so we have relaunched grant opportunities by adding a new and unique category.
Introducing the Advisor Award:
Do you want to contribute your expertise without having to complete an analysis? Researchers can become eligible for a grant by submitting a detailed protocol for cell type annotation for all diagnoses assigned to a specified group of samples from the ScPCA Portal. You will explain step-by-step how you would perform cell type annotation, and the Data Lab team will follow your guidance to complete the analysis! Get started by filing a GitHub Discussion using the template to submit a detailed protocol.
(Re)Introducing the Analyst Award:
Build your analysis portfolio while learning reproducible research practices and working directly with ScPCA data! Researchers will complete cell type annotation for a specified group of samples from the ScPCA Portal. The researcher must complete cell type annotation analysis openly and collaboratively by contributing their analysis to the OpenScPCA project. Get started by filing a GitHub Discussion using the template to propose a new analysis.
Important Dates:
- Advisor Awards: The final protocol must be approved by 5 PM EDT on October 31, 2025.
- Analyst Awards: The final table and Description section of the README must be approved by 5 PM EDT on October 31, 2025.
- Application Submission Deadline, if eligible: 5 PM EDT on November 14, 2025 or within ten days of notification of eligibility, whichever is sooner.
Visit our full announcement below for more details including the sample table, grant amounts, and eligibility requirements.
Questions? Email us at scpca@ccdatalab.org or post in the questions category on GitHub Discussions!
The Open Single-cell Pediatric Cancer Atlas (OpenScPCA) project is one year old, and there is much to celebrate! For the past year, we’ve worked closely with pediatric cancer experts to analyze data from the ScPCA Portal, improving its utility for researchers everywhere. Our focus has been on adding reliable cell type annotations across samples on the Portal, but the journey has been much more than that.
As we’ve collaborated with an expanding community of contributors, we’ve gained valuable insights that have led to the development of new features, making it easier than ever to participate in OpenScPCA. We've also reached a point where the broader research community can begin using our analysis modules to fuel their own experiments. In this post, we’ll look deeper at these milestones and share new ways for contributors to join this ongoing effort.
Developing features to accelerate contributions
From a researcher’s initial idea to their final submission, careful review and ongoing feedback from the Data Lab team are integral to the contribution process. Through this approach, we’ve identified commonalities and trends across planned and active analysis modules, highlighting areas for improvement and uncovering opportunities to support contributors better. Here are some examples of modules we’ve developed to help accelerate contributions!
doublet-detection
Many contributors wanted to filter doublets in their proposed data processing pipelines. Ideally, to ensure consistency among analysis modules, we would like contributors to use the existing processed ScPCA datasets as a starting point. However, our ScPCA pipeline was not previously set up to detect or filter doublets. We decided to create the [.inline-snippet]doublet-detection[.inline-snippet] module!
We put this module into production in [.inline-snippet]OpenScPCA-nf[.inline-snippet], the pipeline that generates results for mature modules. With the [.inline-snippet]doublet-detection[.inline-snippet] module, we ran [.inline-snippet]scDblFinder[.inline-snippet] across all ScPCA samples and made these results available to contributors.
rOpenScPCA
We also found that many researchers wanted to use clustering to explore and/or validate cell type annotations. However, many different approaches for clustering exist, and we know from previous experience that getting reliable clusters is challenging. We also wanted a platform to provide functions to support other common analyses. We decided to write an R package called [.inline-snippet]rOpenScPCA[.inline-snippet] to offer a set of consistent functions for contributors!
So far, we've added functions in this package to perform and evaluate graph-based clustering, map Ensembl gene identifiers to gene symbols, and convert SingleCellExperiment objects to Seurat objects. We’ve even created usage examples for [.inline-snippet]rOpenScPCA[.inline-snippet] clustering functions, available in our [.inline-snippet]hello-clusters[.inline-snippet] analysis module. We plan to expand this module with more functions as we learn from future contributors!
seurat-conversion
Previously, all ScPCA data were provided as [.inline-snippet]SingleCellExperiment[.inline-snippet] objects. But we expected that some contributors would prefer to work with [.inline-snippet]Seurat[.inline-snippet] objects. We recently made [.inline-snippet]Seurat[.inline-snippet] files available for all processed objects from the ScPCA Portal using the steps outlined in the [.inline-snippet]seurat-conversion[.inline-snippet] module. The [.inline-snippet]Seurat[.inline-snippet] objects are not directly available for download from the Portal, but contributors can quickly obtain them from the OpenScPCA project results bucket. Now, contributors can directly start their analysis from a Seurat object!
cell-type-consensus
The [.inline-snippet]cell-type-consensus[.inline-snippet] module generates an ontology-aware consensus cell type label for every cell across all ScPCA samples. We obtain cell type annotations for samples from the ScPCA Portal using [.inline-snippet]SingleR[.inline-snippet] and [.inline-snippet]CellAssign[.inline-snippet]. When both methods agree on the cell type annotation for a sample, this module creates a unified consensus label. Contributors can easily access a TSV file from the workflow results bucket, which includes the consensus cell type labels for all samples within an ScPCA project.
OpenScPCA wouldn’t be possible without the researchers who have contributed so far. We’ll continue expanding features that enable contributors to work more efficiently and to improve their overall experience!
Delivering results for the pediatric cancer community
Creating publicly available cell type reference datasets across 55 pediatric cancer types is an ambitious goal, but we’re making progress! Researchers have submitted 14 analysis proposals, with four of them (so far) resulting in completed analysis modules. In addition to supporting contributors, the Data Lab team is also contributing our own analysis ideas.
🚀 Even if you’re not an active OpenScPCA contributor, you can still use these results in your downstream analysis! Anyone can download data and cell types from the ScPCA Portal and run the code by following along with the OpenScPCA module you’re interested in. Here are some analysis modules developed to assign cell types to an ScPCA project!
cell-type-ewings
If you read our last blog post by Data Scientist Ally Hawkins, PhD, you’ve heard about the [.inline-snippet]cell-type-ewings[.inline-snippet] module she developed as a reference for future OpenScPCA contributors. This was also the Data Lab’s first attempt at annotating cell types for an entire ScPCA project! Ally has now wrapped up this module, which assigns cell types, including normal cell types and tumor cell states, to all Ewing sarcoma samples in [.inline-snippet]SCPCP000015[.inline-snippet]. Explore the [.inline-snippet]cell-type-ewings[.inline-snippet] module!
Community-contributed modules
Last year, we awarded three OpenScPCA contributors who completed assigning cell types to four ScPCA projects. These researchers openly analyzed data from acute lymphoblastic leukemia (ALL) and Wilms tumor and made their results available in the [.inline-snippet]OpenScPCA-analysis[.inline-snippet] repository.
Early T cell Precursor Acute Lymphoblastic Leukemia (ETP T-ALL) Annotation
- Project ID: [.inline-snippet]SCPCP000003[.inline-snippet]
- Number of samples: 30
- Contributed by: Jui Wan Loh, PhD, UT Southwestern Medical Center
- Explore this module
Non-ETP T-ALL Annotation
- Project ID: [.inline-snippet]SCPCP000003[.inline-snippet]
- Number of samples: 11
- Contributed by: Jui Wan Loh, PhD, UT Southwestern Medical Center
- Explore this module
Wilms Tumor Annotation
- Project ID: [.inline-snippet]SCPCP000006[.inline-snippet]
- Number of samples: 40
- Contributed by: Maud Plaschka, PhD, St. Anna Children's Cancer Research Institute (CCRI)
- Explore this module
Wilms Tumor Annotation
- Project ID: [.inline-snippet]SCPCP000014[.inline-snippet]
- Number of samples: 10
- Contributed by: Jingxuan Chen, PhD, UT Southwestern Medical Center
- Explore this module
Check out GitHub Discussions to see more analysis ideas that researchers have proposed or are currently working on!
Awards for sharing your expertise!
We are thrilled with the outcomes from our first year of OpenScPCA, but we are just getting started! We know there are more ways for researchers to contribute their expertise, so we have relaunched grant opportunities by adding a new and unique category.
Introducing the Advisor Award:
Do you want to contribute your expertise without having to complete an analysis? Researchers can become eligible for a grant by submitting a detailed protocol for cell type annotation for all diagnoses assigned to a specified group of samples from the ScPCA Portal. You will explain step-by-step how you would perform cell type annotation, and the Data Lab team will follow your guidance to complete the analysis! Get started by filing a GitHub Discussion using the template to submit a detailed protocol.
(Re)Introducing the Analyst Award:
Build your analysis portfolio while learning reproducible research practices and working directly with ScPCA data! Researchers will complete cell type annotation for a specified group of samples from the ScPCA Portal. The researcher must complete cell type annotation analysis openly and collaboratively by contributing their analysis to the OpenScPCA project. Get started by filing a GitHub Discussion using the template to propose a new analysis.
Important Dates:
- Advisor Awards: The final protocol must be approved by 5 PM EDT on October 31, 2025.
- Analyst Awards: The final table and Description section of the README must be approved by 5 PM EDT on October 31, 2025.
- Application Submission Deadline, if eligible: 5 PM EDT on November 14, 2025 or within ten days of notification of eligibility, whichever is sooner.
Visit our full announcement below for more details including the sample table, grant amounts, and eligibility requirements.
Questions? Email us at scpca@ccdatalab.org or post in the questions category on GitHub Discussions!