Diving into cell type annotation: Insights from the OpenScPCA project
Launching the Open Single-cell Pediatric Cancer Atlas (OpenScPCA) project in April 2024 was a highlight of our year! This community-driven initiative aims to analyze data from the ScPCA Portal, which currently holds 700 samples from 55 pediatric cancer types. The project is a step forward in advancing our knowledge of pediatric cancers through single-cell analysis, and we're excited to expand OpenScPCA in 2025! To that end, we're reflecting on some of our recent accomplishments and how we can keep that momentum going into next year.
One of our main goals is to improve the utility of the ScPCA data, with a particular interest in adding reliable cell type annotations across samples in the Portal. To complement the Portal's cell type annotations from automated methods, we launched our first call for contributions to bring expertise from the pediatric cancer research community to the cell type annotation process. Since then, several contributors joined OpenScPCA and proposed different analysis frameworks for annotating cell types. Through working with new contributors, we've learned a lot about how the research community approaches cell type analysis, as well as ways we can improve OpenScPCA and better support future contributions. In this blog post, we'll share a little about what we learned from these interactions, including a few successes we had along the way!
We learned how researchers approach cell type annotation
New OpenScPCA contributors are encouraged to begin analyses by filing a Discussion post outlining how they anticipate performing their analysis. This gives us a chance to offer some initial feedback before they dive in, including both logistical guidance about contributing to OpenScPCA and scientific discussion about their planned approach. This also gives us a broad sense of the approaches used by folks in the pediatric cancer research community for single-cell data processing and cell type annotation. We observed several trends throughout these discussions.
Most proposed analyses involved a multilayered approach using a variety of tools. Commonly, researchers proposed to first perform some form of cell type annotation using either label transfer, reference-based, or marker-gene–based methods, followed by a secondary analysis to distinguish between malignant and normal cells. Many researchers planned to use copy-number variation (CNV) inference methods for this task (we talk more about this later!), while others proposed to distinguish certain normal cell populations based on their degree of mixing with other cells, reasoning that normal cells are relatively more homogenous than malignant cells.
We also learned how researchers think about validating their cell type labels. Some researchers, for example, proposed using marker-gene expression to validate cell types obtained from a complementary cell annotation approach like label transfer. Others suggested using clustering and visualization (either before or after integrating samples together) to identify and re-annotate putatively mislabeled cells.
We were particularly excited that many contributors hoped to bring their own lab's domain expertise in a given cancer type to OpenScPCA! For example, several contributors had under their belt a set of putative marker genes for cell types they expected to see in a given sample. We think that bringing this information to an open science framework like OpenScPCA is a great way to accelerate research in the community more broadly, and efficiently establish a reliable list of genes for use with marker-gene–based cell type annotation approaches.
Above all, working with external contributors helped us get a pulse on how the research community is thinking about cell type annotation, which will help us tackle even more projects in the future.
We developed ideas for future work in OpenScPCA
Seeing how researchers approach cell type annotation also gave us some analysis ideas! As mentioned earlier, several researchers proposed to distinguish between malignant and normal cells using copy-number variation (CNV) detection tools (e.g., [.inline-snippet]inferCNV[.inline-snippet] and [.inline-snippet]copyKAT[.inline-snippet]), with the assumption that cells with CNV are more likely to be malignant, and cells without CNV are more likely to be normal. When reviewing results from these methods, we gained a deeper appreciation for the challenges that CNV-based methods present when working with pediatric cancer genomes, which are relatively quiet compared to adult cancer genomes. In addition, these tools often require, or at least benefit from, a set of known normal cells to use as a reference.
Across both planned and active modules, a common approach that contributors considered to identify a set of normal cells was to use results from existing cell type annotations. Specifically, they planned to use putatively normal cells, e.g., cells previously annotated as endothelium or immune, for example, as input to CNV detection methods. We wondered whether we could make identifying normal cells more efficient by leveraging existing results in the ScPCA Portal. We had previously performed automated cell type annotation on all samples using two complementary tools: the reference-based method [.inline-snippet]SingleR[.inline-snippet], and the marker-gene–based method [.inline-snippet]CellAssign[.inline-snippet]. Perhaps we could use the results from these automated methods to identify putatively normal cells which could be provided as a normal reference to CNV detection methods.
To this end, we've opened a GitHub Discussion that outlines our scientific goals of using consensus results from [.inline-snippet]SingleR[.inline-snippet] and [.inline-snippet]CellAssign[.inline-snippet]: We plan to first identify cells where these methods agree, and then determine whether any of those cell types are unlikely to be malignant. Those cells would be indicated for use as a normal reference for CNV inference methods. We expect this analysis will take some time and careful thought, but in the end, we'll gain a deeper understanding of how existing cell type annotations relate to one another and how existing ScPCA data can inform future analyses. We're only now in the early stages of this analysis, but you can follow along with our progress in the [.inline-snippet]cell-type-consensus[.inline-snippet] module!
Do you have ideas for cell type annotation? Interested in using any of the results or tools we've introduced here? We invite you to explore and contribute your ideas to OpenScPCA!
- Check out the OpenScPCA-analysis repository on GitHub.
- Join the conversation on GitHub Discussions.
- Let us know you’re interested by filling out the OpenScPCA intake form.
Launching the Open Single-cell Pediatric Cancer Atlas (OpenScPCA) project in April 2024 was a highlight of our year! This community-driven initiative aims to analyze data from the ScPCA Portal, which currently holds 700 samples from 55 pediatric cancer types. The project is a step forward in advancing our knowledge of pediatric cancers through single-cell analysis, and we're excited to expand OpenScPCA in 2025! To that end, we're reflecting on some of our recent accomplishments and how we can keep that momentum going into next year.
One of our main goals is to improve the utility of the ScPCA data, with a particular interest in adding reliable cell type annotations across samples in the Portal. To complement the Portal's cell type annotations from automated methods, we launched our first call for contributions to bring expertise from the pediatric cancer research community to the cell type annotation process. Since then, several contributors joined OpenScPCA and proposed different analysis frameworks for annotating cell types. Through working with new contributors, we've learned a lot about how the research community approaches cell type analysis, as well as ways we can improve OpenScPCA and better support future contributions. In this blog post, we'll share a little about what we learned from these interactions, including a few successes we had along the way!
We learned how researchers approach cell type annotation
New OpenScPCA contributors are encouraged to begin analyses by filing a Discussion post outlining how they anticipate performing their analysis. This gives us a chance to offer some initial feedback before they dive in, including both logistical guidance about contributing to OpenScPCA and scientific discussion about their planned approach. This also gives us a broad sense of the approaches used by folks in the pediatric cancer research community for single-cell data processing and cell type annotation. We observed several trends throughout these discussions.
Most proposed analyses involved a multilayered approach using a variety of tools. Commonly, researchers proposed to first perform some form of cell type annotation using either label transfer, reference-based, or marker-gene–based methods, followed by a secondary analysis to distinguish between malignant and normal cells. Many researchers planned to use copy-number variation (CNV) inference methods for this task (we talk more about this later!), while others proposed to distinguish certain normal cell populations based on their degree of mixing with other cells, reasoning that normal cells are relatively more homogenous than malignant cells.
We also learned how researchers think about validating their cell type labels. Some researchers, for example, proposed using marker-gene expression to validate cell types obtained from a complementary cell annotation approach like label transfer. Others suggested using clustering and visualization (either before or after integrating samples together) to identify and re-annotate putatively mislabeled cells.
We were particularly excited that many contributors hoped to bring their own lab's domain expertise in a given cancer type to OpenScPCA! For example, several contributors had under their belt a set of putative marker genes for cell types they expected to see in a given sample. We think that bringing this information to an open science framework like OpenScPCA is a great way to accelerate research in the community more broadly, and efficiently establish a reliable list of genes for use with marker-gene–based cell type annotation approaches.
Above all, working with external contributors helped us get a pulse on how the research community is thinking about cell type annotation, which will help us tackle even more projects in the future.
We developed ideas for future work in OpenScPCA
Seeing how researchers approach cell type annotation also gave us some analysis ideas! As mentioned earlier, several researchers proposed to distinguish between malignant and normal cells using copy-number variation (CNV) detection tools (e.g., [.inline-snippet]inferCNV[.inline-snippet] and [.inline-snippet]copyKAT[.inline-snippet]), with the assumption that cells with CNV are more likely to be malignant, and cells without CNV are more likely to be normal. When reviewing results from these methods, we gained a deeper appreciation for the challenges that CNV-based methods present when working with pediatric cancer genomes, which are relatively quiet compared to adult cancer genomes. In addition, these tools often require, or at least benefit from, a set of known normal cells to use as a reference.
Across both planned and active modules, a common approach that contributors considered to identify a set of normal cells was to use results from existing cell type annotations. Specifically, they planned to use putatively normal cells, e.g., cells previously annotated as endothelium or immune, for example, as input to CNV detection methods. We wondered whether we could make identifying normal cells more efficient by leveraging existing results in the ScPCA Portal. We had previously performed automated cell type annotation on all samples using two complementary tools: the reference-based method [.inline-snippet]SingleR[.inline-snippet], and the marker-gene–based method [.inline-snippet]CellAssign[.inline-snippet]. Perhaps we could use the results from these automated methods to identify putatively normal cells which could be provided as a normal reference to CNV detection methods.
To this end, we've opened a GitHub Discussion that outlines our scientific goals of using consensus results from [.inline-snippet]SingleR[.inline-snippet] and [.inline-snippet]CellAssign[.inline-snippet]: We plan to first identify cells where these methods agree, and then determine whether any of those cell types are unlikely to be malignant. Those cells would be indicated for use as a normal reference for CNV inference methods. We expect this analysis will take some time and careful thought, but in the end, we'll gain a deeper understanding of how existing cell type annotations relate to one another and how existing ScPCA data can inform future analyses. We're only now in the early stages of this analysis, but you can follow along with our progress in the [.inline-snippet]cell-type-consensus[.inline-snippet] module!
Do you have ideas for cell type annotation? Interested in using any of the results or tools we've introduced here? We invite you to explore and contribute your ideas to OpenScPCA!
- Check out the OpenScPCA-analysis repository on GitHub.
- Join the conversation on GitHub Discussions.
- Let us know you’re interested by filling out the OpenScPCA intake form.