Diving into cell type annotation: Insights from the OpenScPCA project

December 17, 2024

STEPHANIE SPIELMAN

Launching the Open Single-cell Pediatric Cancer Atlas (OpenScPCA) project in April 2024 was a highlight of our year! This community-driven initiative aims to analyze data from the ScPCA Portal, which currently holds 700 samples from 55 pediatric cancer types. The project is a step forward in advancing our knowledge of pediatric cancers through single-cell analysis, and we're excited to expand OpenScPCA in 2025! To that end, we're reflecting on some of our recent accomplishments and how we can keep that momentum going into next year.

One of our main goals is to improve the utility of the ScPCA data, with a particular interest in adding reliable cell type annotations across samples in the Portal. To complement the Portal's cell type annotations from automated methods, we launched our first call for contributions to bring expertise from the pediatric cancer research community to the cell type annotation process. Since then, several contributors joined OpenScPCA and proposed different analysis frameworks for annotating cell types. Through working with new contributors, we've learned a lot about how the research community approaches cell type analysis, as well as ways we can improve OpenScPCA and better support future contributions. In this blog post, we'll share a little about what we learned from these interactions, including a few successes we had along the way!

We learned how researchers approach cell type annotation

New OpenScPCA contributors are encouraged to begin analyses by filing a Discussion post outlining how they anticipate performing their analysis. This gives us a chance to offer some initial feedback before they dive in, including both logistical guidance about contributing to OpenScPCA and scientific discussion about their planned approach. This also gives us a broad sense of the approaches used by folks in the pediatric cancer research community for single-cell data processing and cell type annotation. We observed several trends throughout these discussions.

*Part of an analysis proposal from an OpenScPCA contributor.* *View the entire proposal on GitHub Discussions.*

*The Data Lab provides initial feedback and helps the contributor develop their analysis plans.* *View the entire discussion*.

Most proposed analyses involved a multilayered approach using a variety of tools. Commonly, researchers proposed to first perform some form of cell type annotation using either label transfer, reference-based, or marker-gene–based methods, followed by a secondary analysis to distinguish between malignant and normal cells. Many researchers planned to use copy-number variation (CNV) inference methods for this task (we talk more about this later!), while others proposed to distinguish certain normal cell populations based on their degree of mixing with other cells, reasoning that normal cells are relatively more homogenous than malignant cells.

We also learned how researchers think about validating their cell type labels. Some researchers, for example, proposed using marker-gene expression to validate cell types obtained from a complementary cell annotation approach like label transfer. Others suggested using clustering and visualization (either before or after integrating samples together) to identify and re-annotate putatively mislabeled cells.

We were particularly excited that many contributors hoped to bring their own lab's domain expertise in a given cancer type to OpenScPCA! For example, several contributors had under their belt a set of putative marker genes for cell types they expected to see in a given sample. We think that bringing this information to an open science framework like OpenScPCA is a great way to accelerate research in the community more broadly, and efficiently establish a reliable list of genes for use with marker-gene–based cell type annotation approaches.

Above all, working with external contributors helped us get a pulse on how the research community is thinking about cell type annotation, which will help us tackle even more projects in the future.

We developed ideas for future work in OpenScPCA

Seeing how researchers approach cell type annotation also gave us some analysis ideas! As mentioned earlier, several researchers proposed to distinguish between malignant and normal cells using copy-number variation (CNV) detection tools (e.g., [.inline-snippet]inferCNV[.inline-snippet] and [.inline-snippet]copyKAT[.inline-snippet]), with the assumption that cells with CNV are more likely to be malignant, and cells without CNV are more likely to be normal. When reviewing results from these methods, we gained a deeper appreciation for the challenges that CNV-based methods present when working with pediatric cancer genomes, which are relatively quiet compared to adult cancer genomes. In addition, these tools often require, or at least benefit from, a set of known normal cells to use as a reference.

Across both planned and active modules, a common approach that contributors considered to identify a set of normal cells was to use results from existing cell type annotations. Specifically, they planned to use putatively normal cells, e.g., cells previously annotated as endothelium or immune, for example, as input to CNV detection methods. We wondered whether we could make identifying normal cells more efficient by leveraging existing results in the ScPCA Portal. We had previously performed automated cell type annotation on all samples using two complementary tools: the reference-based method [.inline-snippet]SingleR[.inline-snippet], and the marker-gene–based method [.inline-snippet]CellAssign[.inline-snippet]. Perhaps we could use the results from these automated methods to identify putatively normal cells which could be provided as a normal reference to CNV detection methods.

To this end, we've opened a GitHub Discussion that outlines our scientific goals of using consensus results from [.inline-snippet]SingleR[.inline-snippet] and [.inline-snippet]CellAssign[.inline-snippet]: We plan to first identify cells where these methods agree, and then determine whether any of those cell types are unlikely to be malignant. Those cells would be indicated for use as a normal reference for CNV inference methods. We expect this analysis will take some time and careful thought, but in the end, we'll gain a deeper understanding of how existing cell type annotations relate to one another and how existing ScPCA data can inform future analyses. We're only now in the early stages of this analysis, but you can follow along with our progress in the [.inline-snippet]cell-type-consensus[.inline-snippet] module!

Do you have ideas for cell type annotation? Interested in using any of the results or tools we've introduced here? We invite you to explore and contribute your ideas to OpenScPCA!

Check out the OpenScPCA-analysis repository on GitHub.
Join the conversation on GitHub Discussions.
Let us know you’re interested by filling out the OpenScPCA intake form.

We learned how researchers approach cell type annotation

Above all, working with external contributors helped us get a pulse on how the research community is thinking about cell type annotation, which will help us tackle even more projects in the future.

We developed ideas for future work in OpenScPCA

Do you have ideas for cell type annotation? Interested in using any of the results or tools we've introduced here? We invite you to explore and contribute your ideas to OpenScPCA!

Check out the OpenScPCA-analysis repository on GitHub.
Join the conversation on GitHub Discussions.
Let us know you’re interested by filling out the OpenScPCA intake form.

Back To Blog

Training

September 13, 2024

Building Reproducible Research Skills: A Training Workshop with the Treehouse Childhood Cancer Initiative

The Data Lab recently traveled to California to lead a hands-on workshop for nine researchers from the UC Santa Cruz Treehouse Childhood Cancer Initiative. The participants, all from a range of backgrounds and experience levels, came together to learn common practices for reproducible computational research. Our relationship with Treehouse spans years, grounded in a shared commitment to open science and reproducibility. This workshop was a chance to strengthen that partnership and an opportunity to put shared values into practice!

Tools

May 4, 2022

Use cases as a brainstorming tool

‍Use cases define how users interact with a product or system, including actions users can take and how the system responds. It also identifies user goals and paths for the system to handle errors.

Projects

March 2, 2022

The OpenScPCA Project: What We've Built Together in Year One

The Open Single-cell Pediatric Cancer Atlas (OpenScPCA) project is one year old, and there is much to celebrate! For the past year, we’ve worked closely with pediatric cancer experts to analyze data from the ScPCA Portal, improving its utility for researchers everywhere. Our focus has been on adding reliable cell type annotations across samples on the Portal, but the journey has been much more than that.

Diving into cell type annotation: Insights from the OpenScPCA project

We learned how researchers approach cell type annotation

We developed ideas for future work in OpenScPCA

We learned how researchers approach cell type annotation

We developed ideas for future work in OpenScPCA

Related Post