Introducing the first community-contributed datasets on the ScPCA Portal!

May 1, 2024

In March 2022, we launched the Single-cell Pediatric Cancer Atlas (ScPCA) Portal to make uniformly processed single-cell and single-nuclei RNA-Seq data widely available to the childhood cancer research community. Initially, all data available on the Portal were generated through grants funded by Alex’s Lemonade Stand Foundation (ALSF) as part of the ScPCA project. But enabling access to ALSF-funded data was just the beginning of our vision.

Sharing is key to ensuring the Portal’s continued growth. Our sights were set on allowing more pediatric cancer researchers to contribute data to the ScPCA Portal.

Enabling the community to share

scpca-nf is the open-source, Nextflow based pipeline we built to ensure that all data available on the portal is uniformly processed. In order to add external data to the portal, we need to ensure that the data is processed in the same way. By designing a fast, reproducible, and affordable-to-use pipeline we made it easier for the community to process their own data for inclusion in the Portal. Utilizing our existing pipeline is a requirement for anyone who wishes to contribute data.

Here are a few ways we designed the pipeline with others in mind!

  • We asked the community for feedback. Early on, we conducted usability testing with community members. Based on user feedback, we implemented a number of improvements and wrote effective user-facing documentation.
  • We prioritized speed and efficiency. 
    • Using Nextflow makes it possible for users to process multiple samples in parallel. The workflow also uses alevin-fry, which processes single-cell data quickly and with less computing power than other comparable technologies. 
    • To facilitate rerunning the workflow as the pipeline is updated, we segmented off the most computationally-intensive steps so they can be skipped when possible. We can also rerun sections as we add new analyses that do not depend on the raw data.
  • We ensured the pipeline is portable and adaptable. scpca-nf supports different high-performance computing platforms, such as local clusters or the cloud, so that users can run the workflow using the infrastructure that they have available.
  • We wrote more about the pipeline here.

By May 2023, the Data Lab was prepared to accept external data submissions, and we launched our first call for community contributions.

Now available on the ScPCA Portal!

Two pediatric cancer research labs with eligible datasets are now the first to share data on the Portal whose generation was not funded through the initial ScPCA grants.

Profiling pediatric and young adolescent (Ped-AYA) high-grade gliomas (HGGs)

Dr. Jo Lynne Rokita from the Center for Data-Driven Discovery in Biomedicine (D3b) at Children’s Hospital of Philadelphia contributed a single-nuclei RNA-seq dataset comprised of 15 high grade glioma and glioblastoma samples obtained from multiple patients.

This is not the first time the Data Lab and D3b have worked together to maximize the impact of available data! We co-organized the Open Pediatric Brain Tumor Atlas (OpenPBTA) project, a global open science initiative to analyze the data from more than 1,000 pediatric brain tumors and collaboratively authored a manuscript. Today, the project is complete, and all the processed data is publicly available. Some of the samples analyzed as part of the OpenPBTA are included in this ScPCA dataset, making the data traceable across multiple sources!

Identification of drug-resistance-related cell states in paired pre- and post-treatment neuroblastoma cell lines

Dr. Joae Wu from the University of Massachusetts Chan Medical School submitted a single-nuclei RNA-seq dataset comprised of 4 neuroblastoma cell lines derived from one patient. These are the first cell lines to be made available on the Portal!

Thank you to D3b and the Wu Lab for helping us put more data in the hands of researchers. These community-contributed datasets (along with data from over 500 samples across 51 pediatric cancer types!) are available for immediate download on the Portal.  

Interested in contributing data?

Openly sharing your data on the ScPCA Portal can increase the impact of your research and enhance the visibility of your work. Over 300 users have downloaded data from the Portal since it launched, and thousands more have browsed the available projects!

We accept submissions of 10x Genomics single-cell or single-nuclei profiling of childhood and adolescent cancer (ages 0-19) data, broadly defined to include tumor data, relevant animal models, patient-derived xenografts, or cell lines. Researchers that submit data may also be eligible to receive a small one-time grant of unrestricted funds to be used for childhood cancer research.

Requirements

If your data is eligible for inclusion on the portal, you must:

  • Generate summarized gene expression data using the Data Lab's scpca-nf pipeline. (The Data Lab will provide support for setting up and running the pipeline, as needed!)
  • Submit the output, along with project, sample, and cell metadata
  • Ensure a Data Transfer Agreement between your institution and ALSF is executed

View the full contribution guidelines to learn more about the requirements and the type of data we can accept.

Get started

If you have reviewed the guidelines and are interested in submitting data to the ScPCA Portal, the next step is to complete an intake form to determine the eligibility of your dataset.

After completing the form, you will be contacted by the Data Lab within 3 business days to notify you of your eligibility and provide additional information required for submission. 

Important deadlines for potential submitters to keep in mind:

  • Dataset Submission Deadline: September 1, 2024
  • Application submission deadline (if eligible): September 15, 2024

Please contact us at scpca@ccdatalab.org with any questions. You can also reach us by joining the #scpca-contributions channel on Cancer Data Science Slack. We look forward to connecting with more potential ScPCA contributors!

In March 2022, we launched the Single-cell Pediatric Cancer Atlas (ScPCA) Portal to make uniformly processed single-cell and single-nuclei RNA-Seq data widely available to the childhood cancer research community. Initially, all data available on the Portal were generated through grants funded by Alex’s Lemonade Stand Foundation (ALSF) as part of the ScPCA project. But enabling access to ALSF-funded data was just the beginning of our vision.

Sharing is key to ensuring the Portal’s continued growth. Our sights were set on allowing more pediatric cancer researchers to contribute data to the ScPCA Portal.

Enabling the community to share

scpca-nf is the open-source, Nextflow based pipeline we built to ensure that all data available on the portal is uniformly processed. In order to add external data to the portal, we need to ensure that the data is processed in the same way. By designing a fast, reproducible, and affordable-to-use pipeline we made it easier for the community to process their own data for inclusion in the Portal. Utilizing our existing pipeline is a requirement for anyone who wishes to contribute data.

Here are a few ways we designed the pipeline with others in mind!

  • We asked the community for feedback. Early on, we conducted usability testing with community members. Based on user feedback, we implemented a number of improvements and wrote effective user-facing documentation.
  • We prioritized speed and efficiency. 
    • Using Nextflow makes it possible for users to process multiple samples in parallel. The workflow also uses alevin-fry, which processes single-cell data quickly and with less computing power than other comparable technologies. 
    • To facilitate rerunning the workflow as the pipeline is updated, we segmented off the most computationally-intensive steps so they can be skipped when possible. We can also rerun sections as we add new analyses that do not depend on the raw data.
  • We ensured the pipeline is portable and adaptable. scpca-nf supports different high-performance computing platforms, such as local clusters or the cloud, so that users can run the workflow using the infrastructure that they have available.
  • We wrote more about the pipeline here.

By May 2023, the Data Lab was prepared to accept external data submissions, and we launched our first call for community contributions.

Now available on the ScPCA Portal!

Two pediatric cancer research labs with eligible datasets are now the first to share data on the Portal whose generation was not funded through the initial ScPCA grants.

Profiling pediatric and young adolescent (Ped-AYA) high-grade gliomas (HGGs)

Dr. Jo Lynne Rokita from the Center for Data-Driven Discovery in Biomedicine (D3b) at Children’s Hospital of Philadelphia contributed a single-nuclei RNA-seq dataset comprised of 15 high grade glioma and glioblastoma samples obtained from multiple patients.

This is not the first time the Data Lab and D3b have worked together to maximize the impact of available data! We co-organized the Open Pediatric Brain Tumor Atlas (OpenPBTA) project, a global open science initiative to analyze the data from more than 1,000 pediatric brain tumors and collaboratively authored a manuscript. Today, the project is complete, and all the processed data is publicly available. Some of the samples analyzed as part of the OpenPBTA are included in this ScPCA dataset, making the data traceable across multiple sources!

Identification of drug-resistance-related cell states in paired pre- and post-treatment neuroblastoma cell lines

Dr. Joae Wu from the University of Massachusetts Chan Medical School submitted a single-nuclei RNA-seq dataset comprised of 4 neuroblastoma cell lines derived from one patient. These are the first cell lines to be made available on the Portal!

Thank you to D3b and the Wu Lab for helping us put more data in the hands of researchers. These community-contributed datasets (along with data from over 500 samples across 51 pediatric cancer types!) are available for immediate download on the Portal.  

Interested in contributing data?

Openly sharing your data on the ScPCA Portal can increase the impact of your research and enhance the visibility of your work. Over 300 users have downloaded data from the Portal since it launched, and thousands more have browsed the available projects!

We accept submissions of 10x Genomics single-cell or single-nuclei profiling of childhood and adolescent cancer (ages 0-19) data, broadly defined to include tumor data, relevant animal models, patient-derived xenografts, or cell lines. Researchers that submit data may also be eligible to receive a small one-time grant of unrestricted funds to be used for childhood cancer research.

Requirements

If your data is eligible for inclusion on the portal, you must:

  • Generate summarized gene expression data using the Data Lab's scpca-nf pipeline. (The Data Lab will provide support for setting up and running the pipeline, as needed!)
  • Submit the output, along with project, sample, and cell metadata
  • Ensure a Data Transfer Agreement between your institution and ALSF is executed

View the full contribution guidelines to learn more about the requirements and the type of data we can accept.

Get started

If you have reviewed the guidelines and are interested in submitting data to the ScPCA Portal, the next step is to complete an intake form to determine the eligibility of your dataset.

After completing the form, you will be contacted by the Data Lab within 3 business days to notify you of your eligibility and provide additional information required for submission. 

Important deadlines for potential submitters to keep in mind:

  • Dataset Submission Deadline: September 1, 2024
  • Application submission deadline (if eligible): September 15, 2024

Please contact us at scpca@ccdatalab.org with any questions. You can also reach us by joining the #scpca-contributions channel on Cancer Data Science Slack. We look forward to connecting with more potential ScPCA contributors!

Back To Blog