ASK HERE

seminarsonly · 21-09-2010, 01:11 PM

Outline
Introduction

Methods

Results

Conclusion

INTRODUCTION
Chronic diseases are caused by the interactions of various genes.
These are subject to interactions and environmental factors

Mutations

These entities and their interactions can be
represented well using graph networks
Contd..
Genomics and Phenomics

Integrated genomicsâ€“phenomics knowledge

Approach:
Generate Semantic Web based network data structures
Perform centrality analyses to rank genes
Contd..
Traditional methodologies :
Positional cloning
Modern methodology:
Gene expression profiling

Gene prioritization is the identification of right set of genes from gene lists, for the disease under study
Contd..
Importance:
Integration of knowledge and computational methodologies has accelerated the discovery of disease causing genes

Advantages:
A more comprehensive description of functional gene networks.
Data integration limits false positives and increases sensitivity
Contd..
Classification of gene prioritization
methods:
training set of data
No training data.

Resource Description Framework (RDF)
Contd..
Our method has two phases:
Find the biologically functional important genes from the test set by integrating multiple genomic knowledge sets.

We apply specific disease context to the genomic network by adding phenotypic or clinical features relevant to the disease under study
Contd..
Resulting RDF is a Directed Acyclic Graph
Centrality analysis is applied to score the elements in the network
Degree Centrality Analysis
Ranking done using SPARQL
In this study, we have focused on cardiovascular diseases (CVD)
Methods
Collection of datasets
Mapping clinical features to find UMLS concepts
Mapping clinical features to genes
Generating RDF
Ranking on Semantic Web

Collection of datasets
Genomic knowledge sources:
Gene Ontology
Gene-pathway annotations

Phenomic knowledge sources:
Mammalian Phenotype (MP) ontology, Mouse gene phenotype annotations and the corresponding human genes
Online Mendelian Inheritance in Man (OMIM)
datasets
Syndrome DB.
Mapping clinical features to find UMLS concepts
This is done to remove inconsistencies

There is inconsistency in the use of clinical
feature-terms in different datasets

Therefore, the terms are mapped to Unified
Medical Language System (UMLS)
Contd..
MetaMap- Natural Language Processing tool.
Input: free text from biomedical domain
Output: matching concepts from UMLS

JAVA script was written to parse the results.
Contd..
GATE- General Architecture for Text Engineering

Large sections of text of OMIM requires exceptionally long processing times in MetaMap.

Mapping clinical features to genes
Association between genes and corresponding clinical features form the phenome network.

After semantic normalization, for further association between genes and clinical features, the mim2gene dataset is used.
Generating RDF
Each RDF statement is referred to as a triple. A triple describes a resource, the resourceâ€™s properties and the values of those properties.

They can be represented as a graph of nodes (resources) connected by edges (properties) to values.

Contd..

Jena is used to generate the required triples for RDF

The data is retrieved from local relational databases to create BioRDF instantly for the specific disease and gene set under study.
Ranking on Semantic Web
Two metrics:
Subjectivity Score (SS) and
Objectivity Score (OS)

Each node has a score.
Score indicates the relevance of the node in the
network

The algorithm is recursive:
The score of each node is passed to the
adjacent node in the next iteration.

Contd..
Kleinbergâ€™s algorithm:
1. Let R be the set of resources (nodes) and E be the set of properties (edges) in the RDF graph

2. For every resource r in R, let S[r] be its
subjectivity score and O[r] be its objectivity score.

3. Initialize S[r] and O[r] to 1 for all r in R.
Contd..
4. While the vectors S and O have not converged:
(a) For all r in R,
O[r] = Î£ (r1, r) Îµ E S[r1] *objWt(e)
(b) For all r in R,
S[r] = Î£ (r, r1) Îµ E O[r1] * subWt(e)

Contd..
Convergence:
when the subjectivity and objectivity scores for all resources become stable after finite iterations.

Finally, the importance of each resource
I[r] = S[r] + O[r].
Ranking the retrieved results
SPARQL-SELECT clause:
The resultant ranked list of nodes are identified by the and sorted according to their pre-calculated relevance scores

SPARQL returns a set of variable bindings matching to the query parameters

Results
We randomly selected 60 diseases from a total of 423 CVD from OMIM database

Results were quite promising:
In 44 out of 60 cases (74%) the related gene is ranked in the top 10
In 33 cases (55%) ranked in top 5.
Application
Prioritizing candidate genes from cardiovascular
disease implicated genomic regions:
Hypertrophic cardiomyopathy
Dilated cardiomyopathy
Hypertrophic cardiomyopathy
This is a disease of the myocardium (the muscle
of the heart) in which a portion of the
myocardium is thickened
We ranked the 110 genes

The chromosome locus 7p12.1â€“7q21.

GTF21RD1 as the top ranked gene.

Rank Gene symbol Score
1 GTF2IRD1 173.334
2 GTF2I 120.1975
3 ELN 93.53132
4 SBDS 77.47414
5 EGFR 42.30714
6 LIMK1 40.77264
7 YWHAG 38.65037
8 BAZ1B 20.52658
9 ZNF117 9.506009
10 ZNF273 8.496438
Dilated cardiomyopathy
Dilated cardiomyopathy is a
group of heart muscle disorders in which
the ventricles enlarge but are not able to
pump enough blood for the body's needs,
resulting in heart failure.

We prioritized 68 genes in Chromosome
10q25â€“26 region
FGFR2 as the top ranked gene.

Rank Gene symbol Score
1 FGFR2 144.9478
2 GRK5 138.7307
3 ADRB1 122.455
4 TIAL1 100.2871
5 EMX2 97.13126
6 GFRA1 82.81983
7 BUB3 47.78852
8 DMBT1 46.70792
9 SLC18A2 20.62583
10 PRLHR 19.57693

Conclusion
Systematic gene prioritization approach.
Without a focus training set by utilizing both mouse phenotypes and human disease clinical features.

These methods will accelerate the disease gene discovery process by gathering and sifting through all knowledge of each candidate gene.

This approach can be applied to any group of genes or diseases.

[attachment=4199]
[attachment=4200]

Possibly Related Threads...
Thread		Author	Replies	Views	Last Post
	Validation of Using Mixed Iron and Plastic Wastes in Concrete	seminar surveyer	1	5,318	09-02-2012, 10:59 AM Last Post: seminar addict

Important Note..!

ASK HERE