SlideShare a Scribd company logo
1 of 1
Download to read offline
Yun Zhu, Emily Williams, Yuan Tian, Carol Munroe, John Bucci, Yutao Fu, Fiona Hyland, and Corina Shtir, Clinical Next-Gen Seq Division, Thermo Fisher Scientific Inc., 5781 Van Allen Way, Carlsbad, CA, U.S.A, 92008.
Table 1. Disease annotation for the 28 identified gene clusters.ABSTRACT
We developed Information Genetic Content (IGC), a comprehensive
knowledgebase and discovery tool for human genes and genetic disorders
research use. IGC comprises three components: the Disease-Association
Database (DAD), the Gene Scoring Algorithm (GSA), and the Virtual Panel
Library (VPL). The DAD module contains over 400,000 associations
between over 17,000 genes and 15,000 Mendelian and complex diseases
from both expert-curated and text-mined data. The DAD module also
features a hierarchical organization of human diseases using a UMLS-
controlled vocabulary, permitting queries at any level of the disease
ontology hierarchy. The GSA module aims to prioritize genes for a specific
disease of interest. This gene scoring algorithm is distinctive in the way it
combines the strength of association and the number of associated
diseases to provide an unbiased score for each gene. In conjunction with
the DAD module, the GSA module is able to produce a list of ranked genes
for one or more diseases at any level of the disease hierarchy. The VPL
module generates optimal gene grouping by disease classification using
hierarchical-clustering-based network analysis. Genes that are involved in
the same pathological pathways are grouped into the same cluster.
INTRODUCTION
The identification of disease-associated genes is an important step towards
understanding disease mechanisms, diagnosis, and therapy for the future.
However, due to the complex and distributed nature of the problem, current
scientific knowledge is spread out over several overlapping databases
maintained by independent groups. It is unclear how to rank gene-disease
research associations due to the distributed and dispersed nature of our
knowledge. To fill this gap, we developed Information Genetic Content
(IGC), a comprehensive knowledgebase and discovery tool for human
genes and genetic disorders research use. IGC is unique in two aspects.
First, it integrates data from multiple databases into one system. Second, it
provides an unbiased scoring algorithm to rank gene-disease research
association at any level of the disease ontology hierarchy.
METHODS
CONCLUSIONS
We created a comprehensive, efficient, and informative engine, the IGC, to optimize
gene selection given diseases at any level of the disease ontology hierarchy:
• The DAD organizes diseases into an effective hierarchical structure for
lookup, and associate diseases to genes.
• The GSA ranks genes by clinical relevance, and summarizes the scores for
disease at any level of the hierarchy.
• The VPL efficiently groups genes into pools by disease classifications, and
further ranks the genes within clusters by their relative importance to
diseases.
REFERENCES
1.Pinero J, Queralt-Rosinach N, Bravo A et al (2015) DisGeNET: a discovery platform for the dynamical exploration of
human diseases and their genes. Database 2015:bav028.
2.Bodenreider O (2004) The Unified Medical Language System (UMLS): integrating biomedical terminology. Nucleic
Acids Res. 2004 Jan 1;32(Database issue):D267-70.
3.Langfelder P, Horvath S (2008) WGCNA: an R package for weighted correlation network analysis. BMC
Bioinformatics 2008: 9:559
For Research Use only. Not for use in diagnostic procedures
© 2016 Thermo Fisher Scientific Inc. All rights reserved. All trademarks are the property of Thermo Fisher Scientific and
its subsidiaries unless otherwise specified.
Information Genetic Content (IGC): a comprehensive discovery platform for disease-gene research association
Thermo Fisher Scientific • 5781 Van Allen Way • Carlsbad, CA 92008 • thermofisher.com
Figure 2. Gene Association Database (DAD) maps genes to diseases
• DAD contains over 400,000 associations between over 17,000 genes and 15,000 Mendelian
and complex diseases from both expert and text-mined data.
• DAD established gene-disease relationships based on DisGeNET1, which scores gene-
disease associations according to expert-curated sources (e.g. CTD, CLINVAR, and
ORPHANET), predicted data using mouse models, and text-mining of publications. Blue
circles: two neurological diseases – schizophrenia and bipolar disorder. Green circles: genes
associated with these two diseases.
• The disease association database (DAD) organizes diseases into an effective hierarchical
structure for lookup, using disease parent-child relationships established in NIH Unified
Medical Language System (UMLS).
• For any disease in the hierarchical tree, the GSA computes the rank-weighted sum score
(RWSS) to summarize the strength of the gene’s association with all of its child diseases (see
below).
Figure 3. Gene Scoring Algorithm (GSA)
Figure 5. Gene clustering identified 28 VPLs that can be well defined by
disease classifications.
A
B
Disease Key
MeSH
Category
Description
C04 Neoplasms
C05 Musculoskeletal Diseases
C06 Digestive System Diseases
C07 Stomatognathic Diseases
C08 Respiratory Tract Diseases
C09 Otorhinolaryngologic Diseases
C10 Nervous System Diseases
C11 Eye Diseases
C12 Male Urogenital Diseases
C13
Female Urogenital Diseases and
Pregnancy Complications
C14 Cardiovascular Diseases
C15 Hemic and Lymphatic Diseases
C16
Congenital, Hereditary, and Neonatal
Diseases and Abnormalities
C17 Skin and Connective Tissue Diseases
C18 Nutritional and Metabolic Diseases
C19 Endocrine System Diseases
C20 Immune System Diseases
Cluster Groups
Disease of interest
DisGeNET Database
Rank-Weighted Sum Score (RWSS)
RWSS is an unbiased gene scoring method
that accounts for both the strength and number
of gene-disease pairs.
From the top 5,000 genes that are clinical relevant by GSA, 28 gene clusters were identified
using WGCNA algorithm3. A) Hierarchical clustering of genes according to their association
patterns with 16 high-level MeSH categories relevant to inherited diseases. B) Gene cluster
association scores with the 16 MeSH disease categories are shown with p-values.
RESULTS
Figure 1. Overview of IGC framework
Figure 4. Gene Scoring in multiple disease hierarchies
Level 1
Level 2
Level 3
Level 4
• The GSA module uses RWSS method to prioritize genes for a specific disease of interest.
• In conjunction with the DAD module, the GSA module is able to produce a list of ranked
genes for one or more diseases at any level of the disease hierarchy.
Module # Module Color GeneCount Disease Annotation
1 turquoise 530 Nervous System Diseases
2 blue 321 Nutritional and Metabolic Diseases
3 brown 307 Cardiovascular Diseases
4 yellow 280 Digestive System Diseases
5 green 253 Eye Diseases
6 red 250 Skin and Tissue Connective Diseases
7 black 229 Male and Female Urogenital Diseases
8 pink 205 Musculoskeletal Diseases
9 magenta 164 Nervous System Diseases; Nutritional and Metabolic Diseases
10 purple 150 Hemic and Lymphatic Diseases
11 greenyellow 140 Musculoskeletal Diseases; Nervous System Diseases
12 tan 137 Neoplasms
13 salmon 129 Respiratory Tract Diseases
14 cyan 111 Otorhinolaryngologic Diseases; Nervous System Diseases
15 midnightblue 90 Male Urogenital Diseases;
16 lightcyan 87
Immune; Male Urogenital Diseases; Female Urogenital Diseases and
Pregnancy Complications
17 grey60 76 Stomatognathic Diseases
18 lightgreen 69 Hemic and Lymphatic Diseases; Immune System Diseases
19 lightyellow 67
Female Urogenital Diseases and Pregnancy Complications; Endocrine System
Diseases
20 royalblue 63 Female Urogenital Diseases and Pregnancy Complications
21 darkred 61 Musculoskeletal Diseases; Skin and Connective Tissue Diseases
22 darkgreen 60 Musculoskeletal Diseases; Stomatognathic Diseases
23 darkgrey 55 Female and Male Urogenital Diseases; Nutritional and Metabolic Diseases
24 darkturquoise 55 Nutritional and Metabolic Diseases; Endocrine System Diseases
25 darkorange 36 Musculoskeletal Diseases; Cardiovascular Diseases
26 orange 36 Immune System Diseases
27 white 35 Endocrine System Diseases
28 skyblue 34 Immune System Diseases; Skin and Connective Tissue Diseases

More Related Content

What's hot

Moving from Big Data to Better Models of Disease and Drug Response - Joel Dudley
Moving from Big Data to Better Models of Disease and Drug Response - Joel DudleyMoving from Big Data to Better Models of Disease and Drug Response - Joel Dudley
Moving from Big Data to Better Models of Disease and Drug Response - Joel Dudley
CityAge
 
The evolving promise of genomic medicine ibm white paper
The evolving promise of genomic medicine ibm white paperThe evolving promise of genomic medicine ibm white paper
The evolving promise of genomic medicine ibm white paper
Pietro Leo
 
A common rejection module (CRM) for acute rejection across multiple organs
A common rejection module (CRM) for acute rejection across multiple organsA common rejection module (CRM) for acute rejection across multiple organs
A common rejection module (CRM) for acute rejection across multiple organs
Kevin Jaglinski
 
Primary Immunodeficiency Diseases in Latin America: The Second Report of the ...
Primary Immunodeficiency Diseases in Latin America: The Second Report of the ...Primary Immunodeficiency Diseases in Latin America: The Second Report of the ...
Primary Immunodeficiency Diseases in Latin America: The Second Report of the ...
Dr. Juan Rodriguez-Tafur
 
Advances in Personalized Medicine and Improving the Quality of Life: The Futu...
Advances in Personalized Medicine and Improving the Quality of Life: The Futu...Advances in Personalized Medicine and Improving the Quality of Life: The Futu...
Advances in Personalized Medicine and Improving the Quality of Life: The Futu...
Cancer Treatment Centers of America
 
ACMG-2016-CNVs-in-Cardiomyopathy-Genes
ACMG-2016-CNVs-in-Cardiomyopathy-GenesACMG-2016-CNVs-in-Cardiomyopathy-Genes
ACMG-2016-CNVs-in-Cardiomyopathy-Genes
Rebecca Latimer
 

What's hot (20)

Ne smith et al.2009.sirs in the icu
Ne smith et al.2009.sirs in the icuNe smith et al.2009.sirs in the icu
Ne smith et al.2009.sirs in the icu
 
PMED: APPM Workshop: Data & Analytics in Precision Oncology- Warren Kibbe, Ma...
PMED: APPM Workshop: Data & Analytics in Precision Oncology- Warren Kibbe, Ma...PMED: APPM Workshop: Data & Analytics in Precision Oncology- Warren Kibbe, Ma...
PMED: APPM Workshop: Data & Analytics in Precision Oncology- Warren Kibbe, Ma...
 
Role of heparanases
Role of heparanasesRole of heparanases
Role of heparanases
 
Recent Trends in Genomic Biomarkers - Pepgra Healthcare
Recent Trends in Genomic Biomarkers - Pepgra HealthcareRecent Trends in Genomic Biomarkers - Pepgra Healthcare
Recent Trends in Genomic Biomarkers - Pepgra Healthcare
 
Cancer as a causes of death among people with aids
Cancer as a causes of death among people with aidsCancer as a causes of death among people with aids
Cancer as a causes of death among people with aids
 
Personalized medicine - Mathura Shanmugasundaram PhD
Personalized medicine - Mathura Shanmugasundaram PhDPersonalized medicine - Mathura Shanmugasundaram PhD
Personalized medicine - Mathura Shanmugasundaram PhD
 
Poster IC Congresso UNIFESP Final
Poster IC Congresso UNIFESP FinalPoster IC Congresso UNIFESP Final
Poster IC Congresso UNIFESP Final
 
MOJPB-03-00085
MOJPB-03-00085MOJPB-03-00085
MOJPB-03-00085
 
10.1164@rccm.201701 0053 ed
10.1164@rccm.201701 0053 ed10.1164@rccm.201701 0053 ed
10.1164@rccm.201701 0053 ed
 
Life exp. fas canada
Life exp. fas canadaLife exp. fas canada
Life exp. fas canada
 
Moving from Big Data to Better Models of Disease and Drug Response - Joel Dudley
Moving from Big Data to Better Models of Disease and Drug Response - Joel DudleyMoving from Big Data to Better Models of Disease and Drug Response - Joel Dudley
Moving from Big Data to Better Models of Disease and Drug Response - Joel Dudley
 
04 rencontres biomédicale LIR Philippe Froguel
04 rencontres biomédicale LIR Philippe Froguel04 rencontres biomédicale LIR Philippe Froguel
04 rencontres biomédicale LIR Philippe Froguel
 
MyRISQ
MyRISQMyRISQ
MyRISQ
 
The evolving promise of genomic medicine ibm white paper
The evolving promise of genomic medicine ibm white paperThe evolving promise of genomic medicine ibm white paper
The evolving promise of genomic medicine ibm white paper
 
A common rejection module (CRM) for acute rejection across multiple organs
A common rejection module (CRM) for acute rejection across multiple organsA common rejection module (CRM) for acute rejection across multiple organs
A common rejection module (CRM) for acute rejection across multiple organs
 
Primary Immunodeficiency Diseases in Latin America: The Second Report of the ...
Primary Immunodeficiency Diseases in Latin America: The Second Report of the ...Primary Immunodeficiency Diseases in Latin America: The Second Report of the ...
Primary Immunodeficiency Diseases in Latin America: The Second Report of the ...
 
Advances in Personalized Medicine and Improving the Quality of Life: The Futu...
Advances in Personalized Medicine and Improving the Quality of Life: The Futu...Advances in Personalized Medicine and Improving the Quality of Life: The Futu...
Advances in Personalized Medicine and Improving the Quality of Life: The Futu...
 
Gen 5452 sclc p2 p content updates dt4
Gen 5452 sclc p2 p content updates dt4Gen 5452 sclc p2 p content updates dt4
Gen 5452 sclc p2 p content updates dt4
 
ACMG-2016-CNVs-in-Cardiomyopathy-Genes
ACMG-2016-CNVs-in-Cardiomyopathy-GenesACMG-2016-CNVs-in-Cardiomyopathy-Genes
ACMG-2016-CNVs-in-Cardiomyopathy-Genes
 
Kaposi Sarcoma in Immune Reconstitution Inflammatory Syndrome
Kaposi Sarcoma in Immune Reconstitution Inflammatory SyndromeKaposi Sarcoma in Immune Reconstitution Inflammatory Syndrome
Kaposi Sarcoma in Immune Reconstitution Inflammatory Syndrome
 

Viewers also liked

Oncomine Cancer Research Panel (OCP) | ESHG 2015 Poster PS12.131
Oncomine Cancer Research Panel (OCP) | ESHG 2015 Poster PS12.131Oncomine Cancer Research Panel (OCP) | ESHG 2015 Poster PS12.131
Oncomine Cancer Research Panel (OCP) | ESHG 2015 Poster PS12.131
Thermo Fisher Scientific
 
Ud.14. genética molecular
Ud.14. genética molecularUd.14. genética molecular
Ud.14. genética molecular
biologiahipatia
 
Next generation sequencing
Next generation sequencingNext generation sequencing
Next generation sequencing
Dayananda Salam
 

Viewers also liked (16)

Ion Torrent™ Next Generation Sequencing-Oncomine™ Lung cfDNA assay detected 0...
Ion Torrent™ Next Generation Sequencing-Oncomine™ Lung cfDNA assay detected 0...Ion Torrent™ Next Generation Sequencing-Oncomine™ Lung cfDNA assay detected 0...
Ion Torrent™ Next Generation Sequencing-Oncomine™ Lung cfDNA assay detected 0...
 
A next Generation Sequencing Approach to Detect Large Rearrangements in BRCA1...
A next Generation Sequencing Approach to Detect Large Rearrangements in BRCA1...A next Generation Sequencing Approach to Detect Large Rearrangements in BRCA1...
A next Generation Sequencing Approach to Detect Large Rearrangements in BRCA1...
 
Oncomine Cancer Research Panel (OCP) | ESHG 2015 Poster PS12.131
Oncomine Cancer Research Panel (OCP) | ESHG 2015 Poster PS12.131Oncomine Cancer Research Panel (OCP) | ESHG 2015 Poster PS12.131
Oncomine Cancer Research Panel (OCP) | ESHG 2015 Poster PS12.131
 
High-throughput processing to maximize genomic analysis through simultaneous ...
High-throughput processing to maximize genomic analysis through simultaneous ...High-throughput processing to maximize genomic analysis through simultaneous ...
High-throughput processing to maximize genomic analysis through simultaneous ...
 
Computational Methods for detection of somatic mutations at 0.1% frequency fr...
Computational Methods for detection of somatic mutations at 0.1% frequency fr...Computational Methods for detection of somatic mutations at 0.1% frequency fr...
Computational Methods for detection of somatic mutations at 0.1% frequency fr...
 
Rating Television Food Handling
Rating Television Food HandlingRating Television Food Handling
Rating Television Food Handling
 
High Sensitivity Sanger Sequencing for Minor Variant Detection
High Sensitivity Sanger Sequencing for Minor Variant DetectionHigh Sensitivity Sanger Sequencing for Minor Variant Detection
High Sensitivity Sanger Sequencing for Minor Variant Detection
 
Ud.14. genética molecular
Ud.14. genética molecularUd.14. genética molecular
Ud.14. genética molecular
 
1.introduccion endocrinologia
1.introduccion endocrinologia1.introduccion endocrinologia
1.introduccion endocrinologia
 
Semiconductor Sequencing Applications for Plant Sciences
Semiconductor Sequencing Applications for Plant SciencesSemiconductor Sequencing Applications for Plant Sciences
Semiconductor Sequencing Applications for Plant Sciences
 
Future of medicine in diagnosis of disease
Future of medicine in diagnosis of diseaseFuture of medicine in diagnosis of disease
Future of medicine in diagnosis of disease
 
Next generation sequencing
Next generation sequencingNext generation sequencing
Next generation sequencing
 
Next Gen Sequencing (NGS) Technology Overview
Next Gen Sequencing (NGS) Technology OverviewNext Gen Sequencing (NGS) Technology Overview
Next Gen Sequencing (NGS) Technology Overview
 
NGS technologies - platforms and applications
NGS technologies - platforms and applicationsNGS technologies - platforms and applications
NGS technologies - platforms and applications
 
Ngs ppt
Ngs pptNgs ppt
Ngs ppt
 
Introduction to next generation sequencing
Introduction to next generation sequencingIntroduction to next generation sequencing
Introduction to next generation sequencing
 

Similar to Information Genetic Content (IGC): a comprehensive discovery platform for disease-gene research association

Algorithmically Optimized Gene Selection for Targeted Clinical Sequencing Panels
Algorithmically Optimized Gene Selection for Targeted Clinical Sequencing PanelsAlgorithmically Optimized Gene Selection for Targeted Clinical Sequencing Panels
Algorithmically Optimized Gene Selection for Targeted Clinical Sequencing Panels
Thermo Fisher Scientific
 
(서울의대 공유용) 빅데이터 분석 유전체 정보와 개인라이프로그 정보 활용-2015_11_24
(서울의대 공유용) 빅데이터 분석  유전체 정보와 개인라이프로그 정보 활용-2015_11_24(서울의대 공유용) 빅데이터 분석  유전체 정보와 개인라이프로그 정보 활용-2015_11_24
(서울의대 공유용) 빅데이터 분석 유전체 정보와 개인라이프로그 정보 활용-2015_11_24
Hyung Jin Choi
 
Establishment and analysis of a disease risk prediction model for chronic kid...
Establishment and analysis of a disease risk prediction model for chronic kid...Establishment and analysis of a disease risk prediction model for chronic kid...
Establishment and analysis of a disease risk prediction model for chronic kid...
KrishMendapara1
 
The Monarch Initiative: From Model Organism to Precision Medicine
The Monarch Initiative: From Model Organism to Precision MedicineThe Monarch Initiative: From Model Organism to Precision Medicine
The Monarch Initiative: From Model Organism to Precision Medicine
mhaendel
 
dkNET Webinar: Illuminating The Druggable Genome With Pharos 10/23/2020
dkNET Webinar: Illuminating The Druggable Genome With Pharos 10/23/2020dkNET Webinar: Illuminating The Druggable Genome With Pharos 10/23/2020
dkNET Webinar: Illuminating The Druggable Genome With Pharos 10/23/2020
dkNET
 

Similar to Information Genetic Content (IGC): a comprehensive discovery platform for disease-gene research association (20)

Algorithmically Optimized Gene Selection for Targeted Clinical Sequencing Panels
Algorithmically Optimized Gene Selection for Targeted Clinical Sequencing PanelsAlgorithmically Optimized Gene Selection for Targeted Clinical Sequencing Panels
Algorithmically Optimized Gene Selection for Targeted Clinical Sequencing Panels
 
Comorbidities Present in the Alopecia Areata Registry, Biobank & Clinical Tri...
Comorbidities Present in the Alopecia Areata Registry, Biobank & Clinical Tri...Comorbidities Present in the Alopecia Areata Registry, Biobank & Clinical Tri...
Comorbidities Present in the Alopecia Areata Registry, Biobank & Clinical Tri...
 
20160119 디지털 헬스케어 의사모임 1월 전체 파일 v3
20160119 디지털 헬스케어 의사모임 1월 전체 파일 v320160119 디지털 헬스케어 의사모임 1월 전체 파일 v3
20160119 디지털 헬스케어 의사모임 1월 전체 파일 v3
 
의료 빅데이터와 인공지능의 현재와 미래
의료 빅데이터와 인공지능의 현재와 미래의료 빅데이터와 인공지능의 현재와 미래
의료 빅데이터와 인공지능의 현재와 미래
 
A New Generation Of Mechanism-Based Biomarkers For The Clinic
A New Generation Of Mechanism-Based Biomarkers For The ClinicA New Generation Of Mechanism-Based Biomarkers For The Clinic
A New Generation Of Mechanism-Based Biomarkers For The Clinic
 
Human Disease and Genomics
Human Disease and GenomicsHuman Disease and Genomics
Human Disease and Genomics
 
(서울의대 공유용) 빅데이터 분석 유전체 정보와 개인라이프로그 정보 활용-2015_11_24
(서울의대 공유용) 빅데이터 분석  유전체 정보와 개인라이프로그 정보 활용-2015_11_24(서울의대 공유용) 빅데이터 분석  유전체 정보와 개인라이프로그 정보 활용-2015_11_24
(서울의대 공유용) 빅데이터 분석 유전체 정보와 개인라이프로그 정보 활용-2015_11_24
 
Establishment and analysis of a disease risk prediction model for chronic kid...
Establishment and analysis of a disease risk prediction model for chronic kid...Establishment and analysis of a disease risk prediction model for chronic kid...
Establishment and analysis of a disease risk prediction model for chronic kid...
 
Bioinformatics
BioinformaticsBioinformatics
Bioinformatics
 
The Monarch Initiative: From Model Organism to Precision Medicine
The Monarch Initiative: From Model Organism to Precision MedicineThe Monarch Initiative: From Model Organism to Precision Medicine
The Monarch Initiative: From Model Organism to Precision Medicine
 
G. Poste. Big Data and the Evolution of Precision Medicine, Cambridge 2nd Ann...
G. Poste. Big Data and the Evolution of Precision Medicine, Cambridge 2nd Ann...G. Poste. Big Data and the Evolution of Precision Medicine, Cambridge 2nd Ann...
G. Poste. Big Data and the Evolution of Precision Medicine, Cambridge 2nd Ann...
 
Repurposing large datasets for exposomic discovery in disease
Repurposing large datasets for exposomic discovery in diseaseRepurposing large datasets for exposomic discovery in disease
Repurposing large datasets for exposomic discovery in disease
 
dkNET Webinar: Illuminating The Druggable Genome With Pharos 10/23/2020
dkNET Webinar: Illuminating The Druggable Genome With Pharos 10/23/2020dkNET Webinar: Illuminating The Druggable Genome With Pharos 10/23/2020
dkNET Webinar: Illuminating The Druggable Genome With Pharos 10/23/2020
 
P4 Medicine: A Vision For Your Molecular Health
P4 Medicine: A Vision For Your Molecular HealthP4 Medicine: A Vision For Your Molecular Health
P4 Medicine: A Vision For Your Molecular Health
 
From reads to pathways for efficient disease gene finding
From reads to pathways for efficient disease gene findingFrom reads to pathways for efficient disease gene finding
From reads to pathways for efficient disease gene finding
 
Isaac Kohane, "A Data Perspective on Autonomy, Human Rights, and the End of N...
Isaac Kohane, "A Data Perspective on Autonomy, Human Rights, and the End of N...Isaac Kohane, "A Data Perspective on Autonomy, Human Rights, and the End of N...
Isaac Kohane, "A Data Perspective on Autonomy, Human Rights, and the End of N...
 
Structuring Genetic Disease Complexity & Environmental Drivers
Structuring Genetic Disease Complexity & Environmental DriversStructuring Genetic Disease Complexity & Environmental Drivers
Structuring Genetic Disease Complexity & Environmental Drivers
 
MLGG_for_linkedIn
MLGG_for_linkedInMLGG_for_linkedIn
MLGG_for_linkedIn
 
Neo4j GraphTalk Basel - Using Graph Technology to drive Diabetes Reserach
Neo4j GraphTalk Basel - Using Graph Technology to drive Diabetes ReserachNeo4j GraphTalk Basel - Using Graph Technology to drive Diabetes Reserach
Neo4j GraphTalk Basel - Using Graph Technology to drive Diabetes Reserach
 
Genomics: Personalised Medicine in Brain Cancer?
Genomics: Personalised Medicine in Brain Cancer?Genomics: Personalised Medicine in Brain Cancer?
Genomics: Personalised Medicine in Brain Cancer?
 

More from Thermo Fisher Scientific

Improvement of TMB Measurement by removal of Deaminated Bases in FFPE DNA
Improvement of TMB Measurement by removal of Deaminated Bases in FFPE DNAImprovement of TMB Measurement by removal of Deaminated Bases in FFPE DNA
Improvement of TMB Measurement by removal of Deaminated Bases in FFPE DNA
Thermo Fisher Scientific
 
TaqMan®Advanced miRNA cDNA synthesis kit to simultaneously study expression o...
TaqMan®Advanced miRNA cDNA synthesis kit to simultaneously study expression o...TaqMan®Advanced miRNA cDNA synthesis kit to simultaneously study expression o...
TaqMan®Advanced miRNA cDNA synthesis kit to simultaneously study expression o...
Thermo Fisher Scientific
 

More from Thermo Fisher Scientific (20)

Why you would want a powerful hot-start DNA polymerase for your PCR
Why you would want a powerful hot-start DNA polymerase for your PCRWhy you would want a powerful hot-start DNA polymerase for your PCR
Why you would want a powerful hot-start DNA polymerase for your PCR
 
TCRB chain convergence in chronic cytomegalovirus infection and cancer
TCRB chain convergence in chronic cytomegalovirus infection and cancerTCRB chain convergence in chronic cytomegalovirus infection and cancer
TCRB chain convergence in chronic cytomegalovirus infection and cancer
 
Improvement of TMB Measurement by removal of Deaminated Bases in FFPE DNA
Improvement of TMB Measurement by removal of Deaminated Bases in FFPE DNAImprovement of TMB Measurement by removal of Deaminated Bases in FFPE DNA
Improvement of TMB Measurement by removal of Deaminated Bases in FFPE DNA
 
What can we learn from oncologists? A survey of molecular testing patterns
What can we learn from oncologists? A survey of molecular testing patternsWhat can we learn from oncologists? A survey of molecular testing patterns
What can we learn from oncologists? A survey of molecular testing patterns
 
Evaluation of ctDNA extraction methods and amplifiable copy number yield usin...
Evaluation of ctDNA extraction methods and amplifiable copy number yield usin...Evaluation of ctDNA extraction methods and amplifiable copy number yield usin...
Evaluation of ctDNA extraction methods and amplifiable copy number yield usin...
 
Analytical Validation of the Oncomine™ Comprehensive Assay v3 with FFPE and C...
Analytical Validation of the Oncomine™ Comprehensive Assay v3 with FFPE and C...Analytical Validation of the Oncomine™ Comprehensive Assay v3 with FFPE and C...
Analytical Validation of the Oncomine™ Comprehensive Assay v3 with FFPE and C...
 
Novel Spatial Multiplex Screening of Uropathogens Associated with Urinary Tra...
Novel Spatial Multiplex Screening of Uropathogens Associated with Urinary Tra...Novel Spatial Multiplex Screening of Uropathogens Associated with Urinary Tra...
Novel Spatial Multiplex Screening of Uropathogens Associated with Urinary Tra...
 
Liquid biopsy quality control – the importance of plasma quality, sample prep...
Liquid biopsy quality control – the importance of plasma quality, sample prep...Liquid biopsy quality control – the importance of plasma quality, sample prep...
Liquid biopsy quality control – the importance of plasma quality, sample prep...
 
Streamlined next generation sequencing assay development using a highly multi...
Streamlined next generation sequencing assay development using a highly multi...Streamlined next generation sequencing assay development using a highly multi...
Streamlined next generation sequencing assay development using a highly multi...
 
Targeted T-cell receptor beta immune repertoire sequencing in several FFPE ti...
Targeted T-cell receptor beta immune repertoire sequencing in several FFPE ti...Targeted T-cell receptor beta immune repertoire sequencing in several FFPE ti...
Targeted T-cell receptor beta immune repertoire sequencing in several FFPE ti...
 
Development of Quality Control Materials for Characterization of Comprehensiv...
Development of Quality Control Materials for Characterization of Comprehensiv...Development of Quality Control Materials for Characterization of Comprehensiv...
Development of Quality Control Materials for Characterization of Comprehensiv...
 
A High Throughput System for Profiling Respiratory Tract Microbiota
A High Throughput System for Profiling Respiratory Tract MicrobiotaA High Throughput System for Profiling Respiratory Tract Microbiota
A High Throughput System for Profiling Respiratory Tract Microbiota
 
A high-throughput approach for multi-omic testing for prostate cancer research
A high-throughput approach for multi-omic testing for prostate cancer researchA high-throughput approach for multi-omic testing for prostate cancer research
A high-throughput approach for multi-omic testing for prostate cancer research
 
Why is selecting the right thermal cycler important?
Why is selecting the right thermal cycler important?Why is selecting the right thermal cycler important?
Why is selecting the right thermal cycler important?
 
A rapid library preparation method with custom assay designs for detection of...
A rapid library preparation method with custom assay designs for detection of...A rapid library preparation method with custom assay designs for detection of...
A rapid library preparation method with custom assay designs for detection of...
 
Generation of Clonal CRISPR/Cas9-edited Human iPSC Derived Cellular Models an...
Generation of Clonal CRISPR/Cas9-edited Human iPSC Derived Cellular Models an...Generation of Clonal CRISPR/Cas9-edited Human iPSC Derived Cellular Models an...
Generation of Clonal CRISPR/Cas9-edited Human iPSC Derived Cellular Models an...
 
TaqMan®Advanced miRNA cDNA synthesis kit to simultaneously study expression o...
TaqMan®Advanced miRNA cDNA synthesis kit to simultaneously study expression o...TaqMan®Advanced miRNA cDNA synthesis kit to simultaneously study expression o...
TaqMan®Advanced miRNA cDNA synthesis kit to simultaneously study expression o...
 
Identifying novel and druggable targets in a triple negative breast cancer ce...
Identifying novel and druggable targets in a triple negative breast cancer ce...Identifying novel and druggable targets in a triple negative breast cancer ce...
Identifying novel and druggable targets in a triple negative breast cancer ce...
 
Evidence for antigen-driven TCRβ chain convergence in the melanoma-infiltrati...
Evidence for antigen-driven TCRβ chain convergence in the melanoma-infiltrati...Evidence for antigen-driven TCRβ chain convergence in the melanoma-infiltrati...
Evidence for antigen-driven TCRβ chain convergence in the melanoma-infiltrati...
 
Analytical performance of a novel next generation sequencing assay for Myeloi...
Analytical performance of a novel next generation sequencing assay for Myeloi...Analytical performance of a novel next generation sequencing assay for Myeloi...
Analytical performance of a novel next generation sequencing assay for Myeloi...
 

Recently uploaded

Asymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 b
Asymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 bAsymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 b
Asymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 b
Sérgio Sacani
 
Biogenic Sulfur Gases as Biosignatures on Temperate Sub-Neptune Waterworlds
Biogenic Sulfur Gases as Biosignatures on Temperate Sub-Neptune WaterworldsBiogenic Sulfur Gases as Biosignatures on Temperate Sub-Neptune Waterworlds
Biogenic Sulfur Gases as Biosignatures on Temperate Sub-Neptune Waterworlds
Sérgio Sacani
 
Module for Grade 9 for Asynchronous/Distance learning
Module for Grade 9 for Asynchronous/Distance learningModule for Grade 9 for Asynchronous/Distance learning
Module for Grade 9 for Asynchronous/Distance learning
levieagacer
 
Introduction,importance and scope of horticulture.pptx
Introduction,importance and scope of horticulture.pptxIntroduction,importance and scope of horticulture.pptx
Introduction,importance and scope of horticulture.pptx
Bhagirath Gogikar
 

Recently uploaded (20)

Connaught Place, Delhi Call girls :8448380779 Model Escorts | 100% verified
Connaught Place, Delhi Call girls :8448380779 Model Escorts | 100% verifiedConnaught Place, Delhi Call girls :8448380779 Model Escorts | 100% verified
Connaught Place, Delhi Call girls :8448380779 Model Escorts | 100% verified
 
Asymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 b
Asymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 bAsymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 b
Asymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 b
 
Unit5-Cloud.pptx for lpu course cse121 o
Unit5-Cloud.pptx for lpu course cse121 oUnit5-Cloud.pptx for lpu course cse121 o
Unit5-Cloud.pptx for lpu course cse121 o
 
Clean In Place(CIP).pptx .
Clean In Place(CIP).pptx                 .Clean In Place(CIP).pptx                 .
Clean In Place(CIP).pptx .
 
IDENTIFICATION OF THE LIVING- forensic medicine
IDENTIFICATION OF THE LIVING- forensic medicineIDENTIFICATION OF THE LIVING- forensic medicine
IDENTIFICATION OF THE LIVING- forensic medicine
 
pumpkin fruit fly, water melon fruit fly, cucumber fruit fly
pumpkin fruit fly, water melon fruit fly, cucumber fruit flypumpkin fruit fly, water melon fruit fly, cucumber fruit fly
pumpkin fruit fly, water melon fruit fly, cucumber fruit fly
 
Biogenic Sulfur Gases as Biosignatures on Temperate Sub-Neptune Waterworlds
Biogenic Sulfur Gases as Biosignatures on Temperate Sub-Neptune WaterworldsBiogenic Sulfur Gases as Biosignatures on Temperate Sub-Neptune Waterworlds
Biogenic Sulfur Gases as Biosignatures on Temperate Sub-Neptune Waterworlds
 
Module for Grade 9 for Asynchronous/Distance learning
Module for Grade 9 for Asynchronous/Distance learningModule for Grade 9 for Asynchronous/Distance learning
Module for Grade 9 for Asynchronous/Distance learning
 
FAIRSpectra - Enabling the FAIRification of Spectroscopy and Spectrometry
FAIRSpectra - Enabling the FAIRification of Spectroscopy and SpectrometryFAIRSpectra - Enabling the FAIRification of Spectroscopy and Spectrometry
FAIRSpectra - Enabling the FAIRification of Spectroscopy and Spectrometry
 
GBSN - Biochemistry (Unit 1)
GBSN - Biochemistry (Unit 1)GBSN - Biochemistry (Unit 1)
GBSN - Biochemistry (Unit 1)
 
COST ESTIMATION FOR A RESEARCH PROJECT.pptx
COST ESTIMATION FOR A RESEARCH PROJECT.pptxCOST ESTIMATION FOR A RESEARCH PROJECT.pptx
COST ESTIMATION FOR A RESEARCH PROJECT.pptx
 
Proteomics: types, protein profiling steps etc.
Proteomics: types, protein profiling steps etc.Proteomics: types, protein profiling steps etc.
Proteomics: types, protein profiling steps etc.
 
Introduction,importance and scope of horticulture.pptx
Introduction,importance and scope of horticulture.pptxIntroduction,importance and scope of horticulture.pptx
Introduction,importance and scope of horticulture.pptx
 
Kochi ❤CALL GIRL 84099*07087 ❤CALL GIRLS IN Kochi ESCORT SERVICE❤CALL GIRL
Kochi ❤CALL GIRL 84099*07087 ❤CALL GIRLS IN Kochi ESCORT SERVICE❤CALL GIRLKochi ❤CALL GIRL 84099*07087 ❤CALL GIRLS IN Kochi ESCORT SERVICE❤CALL GIRL
Kochi ❤CALL GIRL 84099*07087 ❤CALL GIRLS IN Kochi ESCORT SERVICE❤CALL GIRL
 
High Profile 🔝 8250077686 📞 Call Girls Service in GTB Nagar🍑
High Profile 🔝 8250077686 📞 Call Girls Service in GTB Nagar🍑High Profile 🔝 8250077686 📞 Call Girls Service in GTB Nagar🍑
High Profile 🔝 8250077686 📞 Call Girls Service in GTB Nagar🍑
 
STS-UNIT 4 CLIMATE CHANGE POWERPOINT PRESENTATION
STS-UNIT 4 CLIMATE CHANGE POWERPOINT PRESENTATIONSTS-UNIT 4 CLIMATE CHANGE POWERPOINT PRESENTATION
STS-UNIT 4 CLIMATE CHANGE POWERPOINT PRESENTATION
 
High Class Escorts in Hyderabad ₹7.5k Pick Up & Drop With Cash Payment 969456...
High Class Escorts in Hyderabad ₹7.5k Pick Up & Drop With Cash Payment 969456...High Class Escorts in Hyderabad ₹7.5k Pick Up & Drop With Cash Payment 969456...
High Class Escorts in Hyderabad ₹7.5k Pick Up & Drop With Cash Payment 969456...
 
Factory Acceptance Test( FAT).pptx .
Factory Acceptance Test( FAT).pptx       .Factory Acceptance Test( FAT).pptx       .
Factory Acceptance Test( FAT).pptx .
 
Pulmonary drug delivery system M.pharm -2nd sem P'ceutics
Pulmonary drug delivery system M.pharm -2nd sem P'ceuticsPulmonary drug delivery system M.pharm -2nd sem P'ceutics
Pulmonary drug delivery system M.pharm -2nd sem P'ceutics
 
9999266834 Call Girls In Noida Sector 22 (Delhi) Call Girl Service
9999266834 Call Girls In Noida Sector 22 (Delhi) Call Girl Service9999266834 Call Girls In Noida Sector 22 (Delhi) Call Girl Service
9999266834 Call Girls In Noida Sector 22 (Delhi) Call Girl Service
 

Information Genetic Content (IGC): a comprehensive discovery platform for disease-gene research association

  • 1. Yun Zhu, Emily Williams, Yuan Tian, Carol Munroe, John Bucci, Yutao Fu, Fiona Hyland, and Corina Shtir, Clinical Next-Gen Seq Division, Thermo Fisher Scientific Inc., 5781 Van Allen Way, Carlsbad, CA, U.S.A, 92008. Table 1. Disease annotation for the 28 identified gene clusters.ABSTRACT We developed Information Genetic Content (IGC), a comprehensive knowledgebase and discovery tool for human genes and genetic disorders research use. IGC comprises three components: the Disease-Association Database (DAD), the Gene Scoring Algorithm (GSA), and the Virtual Panel Library (VPL). The DAD module contains over 400,000 associations between over 17,000 genes and 15,000 Mendelian and complex diseases from both expert-curated and text-mined data. The DAD module also features a hierarchical organization of human diseases using a UMLS- controlled vocabulary, permitting queries at any level of the disease ontology hierarchy. The GSA module aims to prioritize genes for a specific disease of interest. This gene scoring algorithm is distinctive in the way it combines the strength of association and the number of associated diseases to provide an unbiased score for each gene. In conjunction with the DAD module, the GSA module is able to produce a list of ranked genes for one or more diseases at any level of the disease hierarchy. The VPL module generates optimal gene grouping by disease classification using hierarchical-clustering-based network analysis. Genes that are involved in the same pathological pathways are grouped into the same cluster. INTRODUCTION The identification of disease-associated genes is an important step towards understanding disease mechanisms, diagnosis, and therapy for the future. However, due to the complex and distributed nature of the problem, current scientific knowledge is spread out over several overlapping databases maintained by independent groups. It is unclear how to rank gene-disease research associations due to the distributed and dispersed nature of our knowledge. To fill this gap, we developed Information Genetic Content (IGC), a comprehensive knowledgebase and discovery tool for human genes and genetic disorders research use. IGC is unique in two aspects. First, it integrates data from multiple databases into one system. Second, it provides an unbiased scoring algorithm to rank gene-disease research association at any level of the disease ontology hierarchy. METHODS CONCLUSIONS We created a comprehensive, efficient, and informative engine, the IGC, to optimize gene selection given diseases at any level of the disease ontology hierarchy: • The DAD organizes diseases into an effective hierarchical structure for lookup, and associate diseases to genes. • The GSA ranks genes by clinical relevance, and summarizes the scores for disease at any level of the hierarchy. • The VPL efficiently groups genes into pools by disease classifications, and further ranks the genes within clusters by their relative importance to diseases. REFERENCES 1.Pinero J, Queralt-Rosinach N, Bravo A et al (2015) DisGeNET: a discovery platform for the dynamical exploration of human diseases and their genes. Database 2015:bav028. 2.Bodenreider O (2004) The Unified Medical Language System (UMLS): integrating biomedical terminology. Nucleic Acids Res. 2004 Jan 1;32(Database issue):D267-70. 3.Langfelder P, Horvath S (2008) WGCNA: an R package for weighted correlation network analysis. BMC Bioinformatics 2008: 9:559 For Research Use only. Not for use in diagnostic procedures © 2016 Thermo Fisher Scientific Inc. All rights reserved. All trademarks are the property of Thermo Fisher Scientific and its subsidiaries unless otherwise specified. Information Genetic Content (IGC): a comprehensive discovery platform for disease-gene research association Thermo Fisher Scientific • 5781 Van Allen Way • Carlsbad, CA 92008 • thermofisher.com Figure 2. Gene Association Database (DAD) maps genes to diseases • DAD contains over 400,000 associations between over 17,000 genes and 15,000 Mendelian and complex diseases from both expert and text-mined data. • DAD established gene-disease relationships based on DisGeNET1, which scores gene- disease associations according to expert-curated sources (e.g. CTD, CLINVAR, and ORPHANET), predicted data using mouse models, and text-mining of publications. Blue circles: two neurological diseases – schizophrenia and bipolar disorder. Green circles: genes associated with these two diseases. • The disease association database (DAD) organizes diseases into an effective hierarchical structure for lookup, using disease parent-child relationships established in NIH Unified Medical Language System (UMLS). • For any disease in the hierarchical tree, the GSA computes the rank-weighted sum score (RWSS) to summarize the strength of the gene’s association with all of its child diseases (see below). Figure 3. Gene Scoring Algorithm (GSA) Figure 5. Gene clustering identified 28 VPLs that can be well defined by disease classifications. A B Disease Key MeSH Category Description C04 Neoplasms C05 Musculoskeletal Diseases C06 Digestive System Diseases C07 Stomatognathic Diseases C08 Respiratory Tract Diseases C09 Otorhinolaryngologic Diseases C10 Nervous System Diseases C11 Eye Diseases C12 Male Urogenital Diseases C13 Female Urogenital Diseases and Pregnancy Complications C14 Cardiovascular Diseases C15 Hemic and Lymphatic Diseases C16 Congenital, Hereditary, and Neonatal Diseases and Abnormalities C17 Skin and Connective Tissue Diseases C18 Nutritional and Metabolic Diseases C19 Endocrine System Diseases C20 Immune System Diseases Cluster Groups Disease of interest DisGeNET Database Rank-Weighted Sum Score (RWSS) RWSS is an unbiased gene scoring method that accounts for both the strength and number of gene-disease pairs. From the top 5,000 genes that are clinical relevant by GSA, 28 gene clusters were identified using WGCNA algorithm3. A) Hierarchical clustering of genes according to their association patterns with 16 high-level MeSH categories relevant to inherited diseases. B) Gene cluster association scores with the 16 MeSH disease categories are shown with p-values. RESULTS Figure 1. Overview of IGC framework Figure 4. Gene Scoring in multiple disease hierarchies Level 1 Level 2 Level 3 Level 4 • The GSA module uses RWSS method to prioritize genes for a specific disease of interest. • In conjunction with the DAD module, the GSA module is able to produce a list of ranked genes for one or more diseases at any level of the disease hierarchy. Module # Module Color GeneCount Disease Annotation 1 turquoise 530 Nervous System Diseases 2 blue 321 Nutritional and Metabolic Diseases 3 brown 307 Cardiovascular Diseases 4 yellow 280 Digestive System Diseases 5 green 253 Eye Diseases 6 red 250 Skin and Tissue Connective Diseases 7 black 229 Male and Female Urogenital Diseases 8 pink 205 Musculoskeletal Diseases 9 magenta 164 Nervous System Diseases; Nutritional and Metabolic Diseases 10 purple 150 Hemic and Lymphatic Diseases 11 greenyellow 140 Musculoskeletal Diseases; Nervous System Diseases 12 tan 137 Neoplasms 13 salmon 129 Respiratory Tract Diseases 14 cyan 111 Otorhinolaryngologic Diseases; Nervous System Diseases 15 midnightblue 90 Male Urogenital Diseases; 16 lightcyan 87 Immune; Male Urogenital Diseases; Female Urogenital Diseases and Pregnancy Complications 17 grey60 76 Stomatognathic Diseases 18 lightgreen 69 Hemic and Lymphatic Diseases; Immune System Diseases 19 lightyellow 67 Female Urogenital Diseases and Pregnancy Complications; Endocrine System Diseases 20 royalblue 63 Female Urogenital Diseases and Pregnancy Complications 21 darkred 61 Musculoskeletal Diseases; Skin and Connective Tissue Diseases 22 darkgreen 60 Musculoskeletal Diseases; Stomatognathic Diseases 23 darkgrey 55 Female and Male Urogenital Diseases; Nutritional and Metabolic Diseases 24 darkturquoise 55 Nutritional and Metabolic Diseases; Endocrine System Diseases 25 darkorange 36 Musculoskeletal Diseases; Cardiovascular Diseases 26 orange 36 Immune System Diseases 27 white 35 Endocrine System Diseases 28 skyblue 34 Immune System Diseases; Skin and Connective Tissue Diseases