Resources

Current blog category

Recents post

From Blog

January 11, 2020

Help You Help Future You: Organizing your research projects reproducibly

It's that time of year again – it's time to start a new research project! Planning for the research itself can be daunting, but planning for your work to be fully reproducible is a whole other layer that isn't always emphasized. In this blog post, we'll outline a few project organization principles we teach in our Reproducible Research Practices workshop that we think are really helpful for ensuring reproducibility (among other benefits!).

January 11, 2020

Pediatric Cancer Researchers Driving Progress Through Data, Training, and Collaboration

At the Childhood Cancer Data Lab, we support pediatric cancer researchers by providing data science training, collaborating on data-intensive projects, and designing open-source tools. Through this work, we’ve had the opportunity to engage with scientists who are applying rigorous, data-driven approaches to address some of the most pressing challenges in childhood cancer.

January 11, 2020

Prototyping process with journey maps

The Open Single-cell Pediatric Cancer Atlas (OpenScPCA) is an open, collaborative project to analyze data from the Single-cell Pediatric Cancer Atlas (ScPCA) Portal, which currently holds over 500 samples from over 50 pediatric cancer types. OpenScPCA uses an open contribution model designed to allow experts worldwide to contribute and rapidly share the results of analyses in real time. The project was officially launched in April 2024.

January 11, 2020

Git workflows for scientific projects and when we use them

Writing source code is a significant part of data-intensive biomedical research. Everything from cleaning and pre-processing data to generating publication figures can be accomplished programmatically. Increasingly, funding agencies and journals require researchers to share their code. To pick a few examples, the Data Lab’s parent organization, Alex’s Lemonade Stand Foundation (ALSF), has such a requirement for awardees, and PLoS Computational Biology requires authors to make code underlying results and conclusions available.

January 11, 2020

I’m terrible with names…but I’m using ontologies to try to be better

There is an old joke in computer science about how there are only two hard things: cache invalidation, naming things, and off-by-one errors. I’ll leave aside the first one as beyond my own expertise, but the second comes up all the time in my work as a biological data scientist. Naming variables and functions in my code is a constant struggle, but one I have to deal with on my own or with my team. Much bigger problems come up when trying to deal with all the various ways that people across the world use names when talking about the diseases they work on, the types of cells they are looking at, the experimental methods they are using, and just about every other aspect of their studies.

January 11, 2020

Don't Make Me Read: Tips for Writing Effective Documentation

Writing effective documentation is challenging. Users might not always read every word in the documentation. They might even just scroll past large chunks of text, but we can accommodate those behaviors by structuring and formatting content appropriately.

January 11, 2020

Scientific Community Bulletin: What’s happening in December?

Welcome to the Data Lab’s December Scientific Community Bulletin! Each month we share upcoming opportunities from Alex’s Lemonade Stand Foundation (ALSF), the Data Lab, and other events that we have gathered from a variety of science and research organizations. Subscribe to our blog to be alerted about future Scientific Community Bulletin posts!

January 11, 2020

Scientific Community Bulletin: What’s happening in November?

Welcome to the Data Lab’s November Scientific Community Bulletin! Each month we share upcoming opportunities from Alex’s Lemonade Stand Foundation (ALSF), the Data Lab, and other events that we have gathered from a variety of science and research organizations. Subscribe to our blog to be alerted about future Scientific Community Bulletin posts!

January 11, 2020

Cataloging the CCDI Childhood Cancer Data Catalog (CCDC)

Here at the Data Lab, we're all about, well, data! We believe that data sharing and accessibility is key to accelerating the research process, and ultimately to improving outcomes for childhood cancer patients. So, we were excited to learn that one of the goals of the NCI/NIH initiative, the Childhood Cancer Data Initiative (CCDI), is to build up a Data Ecosystem that will facilitate pediatric cancer researchers' ability to explore and collect data from disparate resources. Although this Ecosystem is still in the early stages, several components are already being developed and are available for researchers to use! One component that is particularly interesting to us is the CCDI's Childhood Cancer Data Catalog (CCDC).

January 11, 2020

Scientific Community Bulletin: What’s happening in October?

Welcome to the October Scientific Community Bulletin! Each month we share upcoming opportunities from Alex’s Lemonade Stand Foundation (ALSF), the Data Lab, and other events that we have gathered from a variety of science and research organizations. Subscribe to our blog to be alerted about future Scientific Community Bulletin posts!

January 11, 2020

Scientific Community Bulletin: What's happening in September?

Welcome to the September Scientific Community Bulletin! Each month we share upcoming opportunities from Alex’s Lemonade Stand Foundation (ALSF), the Data Lab, and other events that we have gathered from a variety of science and research organizations. Subscribe to our blog to be alerted about future Scientific Community Bulletin posts!

January 11, 2020

Scientific Community Bulletin: What's happening in August?

Welcome to the August Scientific Community Bulletin! Each month we share upcoming opportunities from Alex’s Lemonade Stand Foundation (ALSF), the Data Lab, and other events that we have gathered from a variety of science and research organizations.

January 11, 2020

Queueing Javascript Promises

Often when building a server-client web application, we will encounter a situation where we want to send requests to our API in the chronological order that they occur on the client. Due to the asynchronous nature of these requests, it might not be possible to send them in the same callback for the event that triggered them. This is because we want to use the response from the previous request to craft our current one. A solution to this problem would be to implement a queue. Instead of calling the API immediately after events occur, implementing a queue ensures the latest data is sent with any request.

January 11, 2020

Scientific Community Bulletin: What's happening in July?

Welcome to the July Scientific Community Bulletin! Each month we share upcoming opportunities from Alex’s Lemonade Stand Foundation (ALSF), the Data Lab, and other events that we have gathered from a variety of science and research organizations. Subscribe to our blog to be alerted about future Scientific Community Bulletin posts!

January 11, 2020

Scientific Community Bulletin: What’s happening in June?

Welcome to the Childhood Cancer Data Lab’s new blog feature, the monthly Scientific Community Bulletin! At the start of each month, we will share upcoming opportunities from Alex’s Lemonade Stand Foundation (ALSF), the Data Lab, and other events that we have gathered from a variety of science and research organizations. Our goal is to promote learning opportunities and highlight some of the excellent resources that our community provides.

January 11, 2020

How we integrate science and engineering

The CCDL team includes science, engineering, and design expertise. Combining these three disciplines in different ways across projects enables us to carry out our mission.

January 11, 2020

Pinning transitive R dependencies for fun and reproducible builds

Like many teams that work with large amounts of external software, we run into issues with our transitive dependencies. In general, transitive dependencies are a hard problem to solve.

January 11, 2020

Overcoming the steep data science learning curve in childhood cancer research using workshops

Though technology can introduce great benefit into our lives, it is often accompanied by a substantial amount of time and some expected frustration before we can reap the rewards. The time spent learning a new technology is what we usually call a learning curve.

January 11, 2020

Gene Expression Repositories Explained

The goal of our refine.bio project is to download, process, and make available gene expression datasets that can be analyzed together, or in parts, depending on a researcher’s need. Childhood cancer researchers need to be able to use data generated through multiple profiling technologies including microarrays and RNA-sequencing.

January 11, 2020

Better Logging in Python

There are countless log blog posts out there about the benefits of good logging, how to log well, and how much to log. Going through them all can be a real log blog slog. Wouldn't it be cool if you could log like this:logger.info("Something happened!", job=job.id, user=user.id) and get an easily searchable output.

January 11, 2020

Method for the preparation of a caffeine-containing solution from dehydrated magic beans

Caffeine is a stimulant that can induce alertness in certain individuals when consumed at an appropriate quantity. Caffeine is often obtained by ingesting caffeine-containing solutions. However, no protocol for obtaining caffeine from dehydrated, roasted beans using materials typically available in a Philadelphia office has been described in the published literature.

January 11, 2020

Why ALSF Views Resource Sharing as Important

Alex’s Lemonade Stand Foundation (ALSF) staunchly believes that stronger scientific sharing practices will accelerate the pace of discovery and finding cures for children with cancer. Robust sharing improves reproducibility, minimizes redundant studies and maximizes our return on research investment.

January 11, 2020

How we set goals

Our particular process is designed to source opportunities from our team members and external stakeholders, convert those opportunities into a set of potential goals, and then select the goals that we expect will most advance our mission.

January 11, 2020

Automatic scroll restoration in Single Page Applications (SPA)

The ability to restore scroll position is often critical for website usability. It helps users keep the flow of navigation when going back and forth between different pages. Most modern browsers take care of restoring the scroll position automatically, but it doesn’t always work for Single Page Applications where the content is generated on the client’s side, often asynchronously.

January 11, 2020

How we train: Going remote

When the CCDL (along with everyone else) realized that we would have to conduct our bioinformatics training workshops remotely, we had to make some quick decisions about how we were going to do it. Most of the instructional materials for our in person workshops were already online, so we knew we had a good base to work from. We just needed to figure how to adapt the live instruction.

January 11, 2020

Why We Must Share Research and Resources

When my daughter Alex was diagnosed with cancer and throughout her battle, we saw how our community of people rallied around our family. No one knew quite how to help, but they were willing to do whatever was needed to ease the burden we faced.

January 11, 2020

The Childhood Cancer Data Lab's not-so-secret sauce for efficient workflows — aka Philadelphia’s third most famous process

'Work smarter not harder’ is useless advice if you don’t know how to ‘work smarter’. But the Childhood Cancer Data Lab's work and processes may be the smartest I’ve ever had the pleasure of learning and adopting.

January 11, 2020

Automating analyses with workflow managers

At the Data Lab, we are big proponents of automating the boring stuff so we can spend more time thinking about the fun stuff. But how exactly do we do that, and what does it mean to automate the boring stuff?

January 11, 2020

Building, Improving, and Collaborating: A Look Back at Training Workshops in 2021

November marked the final Childhood Cancer Data Lab training workshop for 2021. We held four week-long virtual workshops this year, teaching 88 researchers the data science skills they need to examine their own data.

January 11, 2020

Setting your research up for success in a data driven world

Before working as a Data Scientist at the Childhood Cancer Data Lab, I spent time in my PhD and post-doctoral fellowship in two very different research environments. Each had their own unique way of doing research. I found that some things worked really well and others were not as successful.