Behind the scenes with an OpenScPCA contributor

March 5, 2025

In April 2024, we launched the Open Single-cell Pediatric Cancer Atlas (OpenScPCA) project. This project aims to analyze the data on the ScPCA Portal in an open and collaborative format, similar to our OpenPBTA project, by inviting members of the pediatric research community to contribute to analyses and share results in real-time. 

Before we launched OpenScPCA, we had to outline the process for contributing to analyses and then document that process for others (see the OpenScPCA documentation). In addition, when designing the process for contributing to the project, we made sure to implement strategies to ensure reproducibility over the life cycle of the project. You can read more about how we are implementing reproducible practices in Jaclyn Taroni’s recent blog post

After planning and documenting expectations for contributors, we prepared to launch our first call for contributions, where we asked pediatric cancer experts to help us assign cell type annotations for all samples on the Portal. We thought it would be helpful to have an existing analysis module that other contributors could reference, so we picked a member of our science team (it’s me, hi 👋) to go through the process of developing an analysis module. 

I have previous experience studying Ewing sarcoma, so in Spring 2024, I started on the journey of assigning cell types to all of the samples in the Portal that were obtained from Ewing sarcoma patients and developed the [.inline-snippet]cell-type-ewings[.inline-snippet] module. Using this module as an example, this blog post will walk through what it was like to contribute to OpenScPCA. 

Overview of the process for contributing to OpenScPCA.

Getting set up as a contributor 

Before I could get started actually doing any analysis, I had to plan out my analysis and get set up to develop within the OpenScPCA framework. This included: 

Filing a discussion post: The first thing I did was file a discussion post to propose my analysis and outline the approach and methods I planned to use. Filing a discussion post before starting any analysis allows members of the Data Lab to comment and provide feedback on the analysis strategy. 

Discussion post outlining my proposed analysis for annotating Ewing sarcoma samples.

Accessing data and resources: The Data Lab provides an AWS account for all contributors, which provides them access to data and a Linux virtual computer through Lightsail for Research. Everyone is given a monthly budget for computational costs, allowing contributors like myself to perform computationally intensive analyses through AWS rather than on my local computer. Once my account was set up, I was able to easily download the ScPCA data in the format that I was going to use. 

Setting up my environment: Before starting the analysis, I followed the technical setup instructions. This included forking the OpenScPCA-analysis repository, installing and setting up [.inline-snippet]conda[.inline-snippet], and ensuring I had set up [.inline-snippet]pre-commit[.inline-snippet]. 

Initiating my analysis module: Prior to developing any code for my module, I had to create and initiate my module. To do this, I used our handy script which creates a new module and fills it in with a skeleton file structure. What’s really great about this is that it also creates the files needed to manage your environment, making it easier for contributors to set up their own reproducible environment.

Developing the analysis module

Once I completed the setup steps, I was ready to get started and dive into analysis! For every new step of the analysis I was adding to my module, I followed the same general approach:

Start with an issue: In OpenScPCA, we prefer to break up analyses into small bite-size chunks, allowing members of the Data Lab to easily review each part of the analysis and provide feedback before the analysis gets incorporated into the main branch of [.inline-snippet]OpenScPCA-analysis[.inline-snippet]. Before I started on any analysis proposed in my discussion post, I broke it up into smaller chunks and filed an issue for each smaller piece of work I planned to complete. You can find an example of an issue outlining an exploratory analysis I proposed here

Develop code in a new branch: After filing the issue outlining what I planned to work on, I was ready to start developing that analysis! In my fork of [.inline-snippet]OpenScPCA-analysis[.inline-snippet], I created a new branch that I used to work on that part of the analysis. Any new files or changes to existing files were then saved to that branch. P.S. If you’re new to Git and not familiar with branches, be sure to check out the OpenScPCA documentation on working with Git. 

File a pull request: Once I was happy with the changes that I was making, I was ready to file a pull request (PR), requesting that the new changes be incorporated into the main code base of OpenScPCA. Generally, each of my PRs corresponded to a single issue and contained a single unit of work (see the OpenScPCA documentation for scoping PRs for more insight). You can find an example of a PR that I filed, adding an exploratory notebook and supporting documentation for identifying tumor cell states here.

Respond to reviews: Every time I filed a PR, a Data Lab team member reviewed the code changes to provide feedback on the analysis and ensure that any new changes being added were clear, correct, and reproducible. These comments were always really helpful in refining my analysis and answering any questions that I had not yet thought of! Once I had addressed all the comments, my PR was merged into the main code base of OpenScPCA. 

Example review comment from a Data Lab team member.

File a new issue: Because each PR contains only a single unit of work and not an entire analysis, you’re likely to have some additional steps that need to be completed after your PR is merged. If you are working on an exploratory part of your analysis, you may even have additional questions that you want to look into further that didn’t get addressed as part of the first PR, and that’s okay. Write up any continued analysis you want to do in a new issue and start the process over again! 

Collaboration is the key to success 

While working on the [.inline-snippet]cell-type-ewings[.inline-snippet] module, I came across some results that I wasn’t quite sure about. Although I have experience working with Ewing sarcoma, I am not an expert, and as we all know, biology - especially tumor biology - can be tricky! 

One of the benefits of working on this in OpenScPCA, is that we were conducting analysis in an open format in which anyone in the community can contribute. Contributors can use discussion posts to share results and seek feedback from other members of the community. While working on the Ewing samples, I had some questions neither I nor other Data Lab team members could answer, so I compiled my thoughts in a discussion post. I received replies from members of the community who had seen some of the same challenges I had been seeing, and through this discussion learned about new tools I could use for my analysis. Following this discussion, I was able to successfully implement these new tools in my analysis module! 

Using discussion posts to relay results and get feedback from pediatric cancer experts.

Completing the module… for now 

Ultimately, I completed the analysis that I proposed in my initial discussion post and annotated all of the cells in the Ewing sarcoma samples. You can find the culmination of this effort in a final exploratory notebook in the module

The best part is that my analysis and results are now available for other contributors to use in real-time. Even though I have wrapped up the [.inline-snippet]cell-type-ewings[.inline-snippet] module, the results from this module can be used in downstream analysis by others! The analysis possibilities are endless! 

Join the OpenScPCA Project

Are you ready to make your contribution? 

Did you know that by contributing your expertise to OpenScPCA, you may become eligible for a small one-time grant? Learn more about grants for advancing pediatric cancer research!

In April 2024, we launched the Open Single-cell Pediatric Cancer Atlas (OpenScPCA) project. This project aims to analyze the data on the ScPCA Portal in an open and collaborative format, similar to our OpenPBTA project, by inviting members of the pediatric research community to contribute to analyses and share results in real-time. 

Before we launched OpenScPCA, we had to outline the process for contributing to analyses and then document that process for others (see the OpenScPCA documentation). In addition, when designing the process for contributing to the project, we made sure to implement strategies to ensure reproducibility over the life cycle of the project. You can read more about how we are implementing reproducible practices in Jaclyn Taroni’s recent blog post

After planning and documenting expectations for contributors, we prepared to launch our first call for contributions, where we asked pediatric cancer experts to help us assign cell type annotations for all samples on the Portal. We thought it would be helpful to have an existing analysis module that other contributors could reference, so we picked a member of our science team (it’s me, hi 👋) to go through the process of developing an analysis module. 

I have previous experience studying Ewing sarcoma, so in Spring 2024, I started on the journey of assigning cell types to all of the samples in the Portal that were obtained from Ewing sarcoma patients and developed the [.inline-snippet]cell-type-ewings[.inline-snippet] module. Using this module as an example, this blog post will walk through what it was like to contribute to OpenScPCA. 

Overview of the process for contributing to OpenScPCA.

Getting set up as a contributor 

Before I could get started actually doing any analysis, I had to plan out my analysis and get set up to develop within the OpenScPCA framework. This included: 

Filing a discussion post: The first thing I did was file a discussion post to propose my analysis and outline the approach and methods I planned to use. Filing a discussion post before starting any analysis allows members of the Data Lab to comment and provide feedback on the analysis strategy. 

Discussion post outlining my proposed analysis for annotating Ewing sarcoma samples.

Accessing data and resources: The Data Lab provides an AWS account for all contributors, which provides them access to data and a Linux virtual computer through Lightsail for Research. Everyone is given a monthly budget for computational costs, allowing contributors like myself to perform computationally intensive analyses through AWS rather than on my local computer. Once my account was set up, I was able to easily download the ScPCA data in the format that I was going to use. 

Setting up my environment: Before starting the analysis, I followed the technical setup instructions. This included forking the OpenScPCA-analysis repository, installing and setting up [.inline-snippet]conda[.inline-snippet], and ensuring I had set up [.inline-snippet]pre-commit[.inline-snippet]. 

Initiating my analysis module: Prior to developing any code for my module, I had to create and initiate my module. To do this, I used our handy script which creates a new module and fills it in with a skeleton file structure. What’s really great about this is that it also creates the files needed to manage your environment, making it easier for contributors to set up their own reproducible environment.

Developing the analysis module

Once I completed the setup steps, I was ready to get started and dive into analysis! For every new step of the analysis I was adding to my module, I followed the same general approach:

Start with an issue: In OpenScPCA, we prefer to break up analyses into small bite-size chunks, allowing members of the Data Lab to easily review each part of the analysis and provide feedback before the analysis gets incorporated into the main branch of [.inline-snippet]OpenScPCA-analysis[.inline-snippet]. Before I started on any analysis proposed in my discussion post, I broke it up into smaller chunks and filed an issue for each smaller piece of work I planned to complete. You can find an example of an issue outlining an exploratory analysis I proposed here

Develop code in a new branch: After filing the issue outlining what I planned to work on, I was ready to start developing that analysis! In my fork of [.inline-snippet]OpenScPCA-analysis[.inline-snippet], I created a new branch that I used to work on that part of the analysis. Any new files or changes to existing files were then saved to that branch. P.S. If you’re new to Git and not familiar with branches, be sure to check out the OpenScPCA documentation on working with Git. 

File a pull request: Once I was happy with the changes that I was making, I was ready to file a pull request (PR), requesting that the new changes be incorporated into the main code base of OpenScPCA. Generally, each of my PRs corresponded to a single issue and contained a single unit of work (see the OpenScPCA documentation for scoping PRs for more insight). You can find an example of a PR that I filed, adding an exploratory notebook and supporting documentation for identifying tumor cell states here.

Respond to reviews: Every time I filed a PR, a Data Lab team member reviewed the code changes to provide feedback on the analysis and ensure that any new changes being added were clear, correct, and reproducible. These comments were always really helpful in refining my analysis and answering any questions that I had not yet thought of! Once I had addressed all the comments, my PR was merged into the main code base of OpenScPCA. 

Example review comment from a Data Lab team member.

File a new issue: Because each PR contains only a single unit of work and not an entire analysis, you’re likely to have some additional steps that need to be completed after your PR is merged. If you are working on an exploratory part of your analysis, you may even have additional questions that you want to look into further that didn’t get addressed as part of the first PR, and that’s okay. Write up any continued analysis you want to do in a new issue and start the process over again! 

Collaboration is the key to success 

While working on the [.inline-snippet]cell-type-ewings[.inline-snippet] module, I came across some results that I wasn’t quite sure about. Although I have experience working with Ewing sarcoma, I am not an expert, and as we all know, biology - especially tumor biology - can be tricky! 

One of the benefits of working on this in OpenScPCA, is that we were conducting analysis in an open format in which anyone in the community can contribute. Contributors can use discussion posts to share results and seek feedback from other members of the community. While working on the Ewing samples, I had some questions neither I nor other Data Lab team members could answer, so I compiled my thoughts in a discussion post. I received replies from members of the community who had seen some of the same challenges I had been seeing, and through this discussion learned about new tools I could use for my analysis. Following this discussion, I was able to successfully implement these new tools in my analysis module! 

Using discussion posts to relay results and get feedback from pediatric cancer experts.

Completing the module… for now 

Ultimately, I completed the analysis that I proposed in my initial discussion post and annotated all of the cells in the Ewing sarcoma samples. You can find the culmination of this effort in a final exploratory notebook in the module

The best part is that my analysis and results are now available for other contributors to use in real-time. Even though I have wrapped up the [.inline-snippet]cell-type-ewings[.inline-snippet] module, the results from this module can be used in downstream analysis by others! The analysis possibilities are endless! 

Join the OpenScPCA Project

Are you ready to make your contribution? 

Did you know that by contributing your expertise to OpenScPCA, you may become eligible for a small one-time grant? Learn more about grants for advancing pediatric cancer research!

Back To Blog