The CanPath Story: Unveiling the Future of Canadian Health

Canada’s Largest Population Health Study
CanPath (the Canadian Partnership for Tomorrow’s Health) is Canada’s largest population health study, tracking over 330,000 adult volunteers aged 30-74 across all ten provinces to open up the mysteries of chronic disease and cancer.
Quick Facts About CanPath:
- Scale: 331,359 participants nationwide
- Data: Over 1 billion data points collected and growing
- Biosamples: 178,065+ DNA source materials
- Regional Reach: Seven regional cohorts spanning all Canadian provinces
- Study Duration: Longitudinal follow-up for 30+ years (potentially 50 years)
- Primary Goal: Understanding how genetics, environment, lifestyle, and behavior interact to cause chronic diseases and cancer
What Makes CanPath Unique:
| Feature | Description |
|---|---|
| Purpose | Study the biology, behaviors, and environments of Canadians to learn about chronic disease and cancer causes |
| Vision | Become the national platform for population-level health research in Canada and globally |
| Mission | Improve and save lives by taking action today for a healthier future through research-driven insights |
| Data Types | Questionnaires, physical measurements, biological samples (blood, urine, saliva, toenails), linked provincial health records |
| Access | Available to researchers worldwide through an independent Access Committee and online portal |
CanPath represents a living laboratory that enables researchers globally to explore disease risk factors and identify prevention strategies. The platform harmonizes data across regional cohorts, creating an unprecedented pan-Canadian resource with over 2,300 measures of participant health and lifestyle factors.
As Dr. Maria Chatzou Dunford, CEO and Co-founder of Lifebit, I’ve spent over 15 years working in computational biology and genomic data platforms—building secure, federated systems that enable researchers to open up insights from complex health datasets like CanPath without compromising privacy or compliance. Our work at Lifebit helps power the kind of large-scale, compliant data analysis that makes population health studies actionable for drug findy, public health policy, and precision medicine.

The Bedrock of Findy: A Deep Dive into CanPath’s Data
At the heart of CanPath‘s groundbreaking potential lies its immense and carefully collected dataset. We’re talking about a treasure trove of information, generously provided by hundreds of thousands of Canadians, all contributing to a clearer understanding of health and disease. This is not just data; it’s a living, breathing testament to collective action for a healthier tomorrow.
The study boasts an incredible 331,359 adult participant volunteers from across Canada, specifically individuals aged 30-74. This age range is crucial, as it captures a demographic where chronic diseases and cancers often begin to manifest, allowing researchers to observe the long-term progression and contributing factors. These participants are drawn from seven regional cohorts spanning all ten provinces, ensuring a truly pan-Canadian representation. These cohorts, each with its own regional focus while contributing to the national whole, include the BC Generations Project, Alberta’s Tomorrow Project, the Manitoba Tomorrow Project, the Ontario Health Study, Quebec’s CARTaGENE, the Atlantic PATH project covering all four Atlantic provinces, and the Healthy Future Sask cohort. This federated structure allows the study to capture the unique environmental, lifestyle, and genetic diversity across Canada, from dense urban centres to rural and remote communities. This vast geographical and demographic scope is what makes CanPath a uniquely powerful tool for population health research.

Our data collection methods are comprehensive, designed to capture a holistic view of each participant’s health journey. This involves a multi-faceted approach, beginning with detailed questionnaires that dig into various aspects of life, from socio-demographics to lifestyle habits and medical histories. Beyond self-reported information, a significant subset of participants (over 90,000) have also visited study centres to provide physical measurements, offering objective health indicators. Crucially, nearly half of all participants have also provided a biological sample, adding an invaluable layer of scientific depth to the data.
The Richness of the Dataset
The sheer volume of data is staggering: over 1 billion data points and growing. Imagine the stories these numbers tell, the patterns they reveal! This massive dataset is not just raw information; it’s harmonized across all regional cohorts. Data harmonization is a meticulous and critical process where data from different sources are transformed into a common format with shared definitions. For CanPath, this means that a question about physical activity asked in British Columbia is directly comparable to the answer from a participant in Newfoundland. This process involves creating common data dictionaries, standardizing variable names and formats, and applying consistent quality control checks. This standardization makes it an incredibly robust resource for pan-Canadian analysis, allowing researchers to confidently pool data and achieve the statistical power needed to study complex interactions. The longitudinal nature of CanPath, following participants for 30+ years, means we can track changes over time, observe disease onset, and identify risk factors that might emerge decades before a diagnosis. This long-term perspective is a game-changer in understanding complex diseases.
To give you a clearer picture of the depth and breadth of information we collect, here’s a summary of the data types:
| Data Type | Examples of Information Collected |
|---|---|
| Socio-demographics | Age, gender, education, income, ethnicity, marital status |
| Lifestyle Behaviours | Diet and nutrition, physical activity, alcohol and tobacco use, sun exposure |
| Medical History | Personal and family disease histories (including cancer, diabetes, cardiovascular disease, arthritis), medication use, reproductive health |
| Physical Measures | Height, weight, blood pressure, waist circumference, body mass index (BMI) |
| Biosamples | Blood (including DNA, plasma, serum), urine, saliva, toenails |
| Environmental Data | Material deprivation index, annual average exposure to ambient air pollution (from CANUE) |
This comprehensive approach allows us to explore the intricate interplay between genetics, environment, and lifestyle, moving beyond single-factor analyses to understand the true complexity of human health.
Biosamples: A Biological Time Capsule
Perhaps one of the most exciting aspects of CanPath is its extensive collection of biosamples. We have over 178,065 biosamples of DNA source material, along with blood, urine, saliva, and toenail samples from various participants. Specifically, more than 150,000 participants have provided non-fasting venous blood samples, over 101,000 urine samples, over 31,000 toenail samples, and over 18,000 saliva samples. Each sample type provides a unique window into a participant’s health:
- Blood: Blood samples are a cornerstone of the collection, fractionated into components like plasma, serum, and buffy coat (a source of DNA). These allow for a vast array of analyses, from genetic sequencing and proteomic studies to measuring levels of vitamins, hormones, and inflammatory markers.
- Urine: Urine samples are invaluable for studying metabolic function and kidney health. They can reveal biomarkers for diseases like diabetes and provide insights into a person’s diet and exposure to certain environmental chemicals.
- Saliva: For many participants, saliva provides a non-invasive way to collect high-quality DNA for genetic studies, making large-scale recruitment for genomic research more feasible.
- Toenails: Toenail clippings offer a unique long-term record of exposure to certain trace elements and heavy metals, such as arsenic or mercury. As nails grow slowly, they incorporate these substances over months, providing a historical log that is not available from blood or urine.
These biosamples are veritable biological time capsules. They offer an unparalleled opportunity to dig into the genetic and molecular underpinnings of disease. With advances in technologies like Next Generation Sequencing, researchers can analyze DNA to identify genetic predispositions, find biomarkers, and understand how genes interact with environmental factors. These biological materials are critical for uncovering novel insights into disease mechanisms and developing new diagnostic tools and therapies.
How CanPath is Revolutionizing Canadian and Global Health Research
The significance of CanPath in the field of population health research, both in Canada and globally, cannot be overstated. It stands as a monumental effort to understand the complex mix of health and disease, providing a bedrock for identifying risk factors, tracking long-term health trends, and ultimately, paving the way for prevention and personalized medicine. With such a rich dataset, we’re moving closer to a future where Precision Medicine Data Analysis becomes a standard, custom to individual needs.

Historically, studies like the British Doctors Study (which famously linked smoking to lung cancer) and the Framingham Heart Study (connecting obesity and heart disease) revolutionized our understanding of public health. CanPath aims to build on this legacy, but on an even grander scale, tackling the broader spectrum of chronic diseases and cancers that affect Canadians.
Key Research Outcomes from the CanPath Cohort
The data from CanPath is already yielding profound insights across a spectrum of health challenges:
- Cancer Research: As a core focus, CanPath is a powerful engine for studies into the origins, progression, and prevention of various cancers. Its longitudinal design is particularly crucial for prospective studies, where biosamples collected years before a diagnosis can be analyzed for early warning signs. For example, researchers are actively using CanPath data to identify novel biomarkers for cancers like breast, colorectal, and lung cancer. One groundbreaking study using samples from the Ontario Health Study demonstrated the feasibility of detecting cancer-associated protein biomarkers in blood samples collected up to seven years before a clinical diagnosis. This type of research, which seeks to find a “biological echo” of a developing tumour long before symptoms appear, holds immense promise for developing new screening tests that could dramatically improve survival rates through earlier detection and intervention.
- Chronic Diseases: Beyond cancer, CanPath is a vital resource for understanding chronic conditions like diabetes, cardiovascular disease, osteoporosis, and arthritis. For instance, the Atlantic PATH cohort is currently undertaking a project on the “Current Management and Health Care Quality for Patients with Hip and Knee Osteoarthritis.”
- Mental Health: Recognizing mental health as a serious concern, CanPath tracks mental health status and family history through its baseline, follow-up, and even COVID-19 questionnaires. Research using CanPath data, such as a study from Holland, is exploring the relationship between cancer and depression or anxiety.
- Healthy Aging: With Canada’s aging population, understanding the determinants of healthy aging is paramount. CanPath data, especially when combined with provincial health records, offers unique opportunities to study the complex factors that contribute to health and disease in later life. This includes research into neurodegenerative diseases, cardiovascular health, and mobility. For instance, a key area of investigation is Small Vessel Disease, a condition affecting tiny blood vessels in the brain that is linked to stroke and cognitive decline. A project led by Dr. Hertzel Gerstein through the Canadian Alliance for Healthy Hearts and Minds used CanPath data to reveal that this microvascular damage is a major pathway through which diabetes leads to vascular brain injury. Other studies leveraging the cohort have explored the link between visceral adiposity (deep belly fat) and cognitive function, finding that higher levels of this fat are associated with poorer cognitive performance, highlighting a modifiable risk factor for cognitive decline.
- COVID-19 Antibody Study: In response to the pandemic, CanPath quickly pivoted to conduct COVID-19 questionnaires and collect blood spot samples, providing critical data on the virus’s impact and antibody prevalence within the Canadian population.
These are just a few examples of how CanPath data is being leveraged to address pressing health questions, providing evidence that can inform public health policies and clinical practices.
Collaboration: The Power of Partnership
No single entity can tackle the complexities of population health alone. CanPath thrives on collaboration, forging powerful partnerships that amplify its impact. One of the most significant collaborations is with Health Data Research Network Canada (HDRN Canada). This partnership is a game-changer, enabling the linkage of CanPath’s rich self-reported survey data to provincial health records and related administrative data across multiple regions. This linkage transforms the dataset, moving beyond self-reported conditions to include clinically verified health outcomes. The administrative data includes a wealth of information such as:
- Physician billing records: Capturing every visit to a doctor and the associated diagnosis.
- Hospital discharge abstracts: Providing detailed information on hospitalizations, procedures, and diagnoses.
- Prescription drug dispensaries: Tracking all prescribed medications, offering insights into treatment patterns and medication adherence.
- Cancer registries: Supplying gold-standard data on cancer incidence, type, stage, and treatment.
This allows researchers to get a much fuller, more objective picture of health outcomes and healthcare utilization. For example, a researcher can now connect a participant’s reported diet and exercise habits from a CanPath questionnaire with their subsequent hospitalization for a heart attack, as recorded in administrative data.
HDRN Canada’s Data Access Support Hub (DASH) acts as a single access portal for researchers, simplifying the complex process of requesting and accessing these linked, multi-regional data. This streamlined approach is vital for efficient pan-Canadian research. The Ontario Health Study, a regional cohort within CanPath, has a data-sharing agreement with ICES (formerly the Institute for Clinical Evaluative Sciences), which further facilitates the linkage of de-identified OHS data with other data holdings. This is a prime example of Federated Data Analysis in action, where data remains in its secure environment but can be analyzed across different sources.
Beyond national borders, CanPath is a proud member of the International HundredK+ Cohorts Consortium (IHCC). This global research platform unites over 103 large population health studies from 43 countries, involving nearly 50 million participants. This international collaboration means that CanPath data can be combined with other global cohorts, enriching the platform, improving the competitiveness of Canadian research, and fostering made-in-Canada findies with global relevance. Our federated approach at Lifebit is designed to facilitate precisely this kind of secure, large-scale, international data analysis without moving sensitive data.
Open uping Insights: Accessing the CanPath Data Platform
The immense value of CanPath lies in its accessibility to the broader research community. We believe that by sharing this rich resource, we can accelerate findies and foster innovation in health research worldwide. Data access for researchers is carefully managed to ensure both security and ethical compliance. An independent Access Committee oversees all requests, ensuring that data use aligns with ethical guidelines and participant consent. Researchers can steer this process through an intuitive online portal, designed to streamline applications and facilitate data findy.
How Researchers Can Access CanPath Data
For researchers eager to tap into this unparalleled resource, the process is designed to be as clear and efficient as possible:
- Access Criteria: Researchers need to demonstrate that their proposed study is ethically sound, scientifically rigorous, and aligns with CanPath‘s objectives of understanding chronic disease and cancer.
- Application Process: Applications are submitted electronically via the online portal. This includes a detailed research proposal, ethical approvals, and a data request.
- Data Dictionaries: Before applying, researchers can browse CanPath data dictionaries through the online portal. This allows them to understand the available variables and plan their studies effectively.
- Secure Data Analysis: Once an application is approved, researchers gain access to the de-identified data within a secure environment. This often involves working within a Federated Trusted Research Environment (TRE), which ensures that sensitive data never leaves its secure location, while approved researchers can still run their analyses. This approach is critical for maintaining participant privacy and data integrity, a principle we champion at Lifebit.
Researchers do not need to be formally affiliated with CanPath investigators to access these resources. This open access policy ensures that the data can benefit the widest possible scientific community.
The CanPath Student Dataset: Training the Next Generation
We’re not just looking to empower today’s researchers; we’re also dedicated to nurturing the talent of tomorrow. That’s why CanPath has developed a Student Dataset, a brilliant initiative designed to give students hands-on experience working with population health data.
This isn’t real participant data, but a carefully crafted synthetic dataset. It’s designed to mimic the complexity and structure of CanPath‘s nationally harmonized data, providing a realistic training ground without any privacy concerns. The Student Dataset includes over 40,000 observations and 403 categorical variables from the CanPath Baseline and Additional Diseases Questionnaires. It covers a wide range of information, including socio-demographics, lifestyle and behavior (like tobacco use, alcohol use, and nutrition), health perception, self-reported diseases, and even environmental variables from the Canadian Urban Environmental Health Research Consortium (CANUE).
This invaluable resource is available at no cost to instructors at Canadian universities or colleges for use in academic courses. It allows students to:
- Gain practical experience in data analysis using large-scale, realistic health data.
- Explore complex health questions, such as the relationship between work schedule and binge drinking, or green space and obesity.
- Develop essential research skills in a safe, controlled environment.
While the Student Dataset is for training purposes only and cannot be used for publication, it serves as an excellent stepping stone. Students who demonstrate proficiency and have compelling research questions can then apply through the regular CanPath Access Process for real data, often at a reduced fee for students and trainees, to publish their findings. This program is instrumental in preparing the next generation of public health leaders and researchers, ensuring a continuous flow of talent into this critical field. You can find more CanPath Student Dataset details here.
Frequently Asked Questions about CanPath
We understand that a study of this magnitude can raise many questions. Here, we address some of the most common inquiries to provide a clearer understanding of CanPath‘s unique position and operational principles.
How does CanPath differ from other large studies like the Framingham Heart Study?
CanPath is often compared to other seminal longitudinal studies, but it has distinct characteristics that make it uniquely Canadian and globally significant.
- Scope: While the Framingham Heart Study, originating in the USA, focused on a single town (Framingham, Massachusetts) before expanding, CanPath is pan-Canadian, recruiting participants from across all ten provinces. This broad geographical representation allows for insights into the diverse health landscape of a vast country like Canada.
- Size: The initial Framingham study had around 5,209 participants. CanPath, in contrast, has recruited over 330,000 Canadians, making it vastly larger. This larger sample size is critical for studying rarer illnesses, such as specific types of cancer, which would be difficult to analyze in smaller cohorts.
- Focus: While the Framingham study was primarily designed to study cardiovascular disease risk factors, CanPath has a broader mandate. We focus on understanding the causes and progression of a wide array of chronic diseases and cancers, reflecting the most pressing health challenges of our time. This comprehensive approach allows us to explore multiple health outcomes simultaneously.
CanPath builds on the legacy of these earlier studies but scales up the ambition, aiming for a more diverse and expansive understanding of health determinants across a national population.
Can participants see their individual results?
This is a common and very understandable question. Generally, participants will not receive individual results or be notified when their information is used by researchers. This approach is fundamental to protecting participant privacy and ensuring the integrity of the de-identified dataset used for research. No one—including family members, friends, employers, or insurance companies—will be able to access any personal health information collected for CanPath.
However, there is an important exception. If researchers find something unexpected that could significantly affect a participant’s health (known as an incidental research finding) or reveal a serious condition that could be treated or prevented, CanPath staff will work in partnership with the Research Ethics Board to determine how this information should be communicated to the participant. This ensures that while privacy is paramount, any findings with direct clinical relevance can be responsibly shared, prioritizing the well-being of our volunteers.
How does CanPath ensure the inclusion of Indigenous peoples?
Ensuring equitable and respectful inclusion of Indigenous peoples is a critical priority for CanPath. We are proud to have over 7,000 self-identified Indigenous participants within the national cohort. However, recognizing the historical context and the importance of Indigenous data sovereignty, CanPath operates under strict principles regarding this sensitive data.
At present, to honor the First Nations Principles of OCAP (Ownership, Control, Access, and Possession) and other standards that guide Indigenous health research, CanPath does not release data on these participants to general researchers. Instead, we are actively collaborating with Indigenous health leaders and communities to develop proper, culturally appropriate data access procedures for their data. This commitment ensures that Indigenous data is managed in a way that respects their inherent rights and supports research that is beneficial and relevant to their communities, as determined by them. It’s a testament to our dedication to ethical and inclusive research practices.
The Future of Health is Collaborative
As we look to the horizon, the future goals and strategic plans for CanPath are ambitious and deeply rooted in collaboration and innovation. Our 2023-2027 Strategic Plan outlines a clear path forward, emphasizing the continued long-term follow-up of participants—for 30 to 50 years—to capture the full arc of health and disease progression. This commitment to longevity will only increase the value of the dataset, providing insights that are simply impossible to glean from shorter-term studies. We were thrilled to see CanPath recently awarded $3m to study crises in a changing world, highlighting its evolving role in addressing contemporary global health challenges.
The vision for CanPath aligns perfectly with our mission at Lifebit: to enable secure, large-scale research that transcends geographical and institutional boundaries. The sheer volume and sensitivity of CanPath‘s data necessitate cutting-edge solutions for data governance and analysis. This is where our federated technology truly shines.
Our platforms, built on the principles of a Trusted Research Environment, allow researchers to analyze vast, sensitive datasets like CanPath‘s without the data ever leaving its secure, sovereign location. This “data to code” approach ensures maximum privacy and compliance, while still empowering researchers with advanced AI/ML analytics capabilities. We believe this model is not just the future of health research but a present necessity, enabling collaborative science on an unprecedented scale.
By powering population-scale genomics and advancing public health initiatives through secure data federation, we contribute to CanPath‘s vision of a healthier future for Canadians and the global community. Our work with governments and public health agencies, as highlighted in our public sector initiatives, demonstrates how federated platforms can open up real-time insights and support evidence-based policy making. Together with CanPath, we are not just collecting data; we are building a legacy of health findy, ensuring that the generosity of today’s participants translates into better health outcomes for generations to come. The journey to open up the full potential of this incredible Canadian resource is just beginning, and we are excited to be part of it.