Pathway Analysis for CYP2C19 Gene

December 7, 2015meghashahdesani 1 Comment

Living systems are complex in nature and so it becomes difficult to predict their functioning or behavior by just studying an individual part of it. Thus, Systems Biology involves the study of systems of biological components, which may be molecules, cells, organisms or entire species (Systems Biology ). It is a combination of various study disciplines including biology, computer science, engineering, bioinformatics, physics and others. This integrative approach helps scientists to design predictive models of such complex systems, which further aid in better understanding of how these systems respond to changes in time and environment. Based on the results of these predictions, target drugs and precision treatment approaches can be designed,new biomarkers for disease can be discovered as well as patients can be stratified based on their genetic profiles. On biological level, our bodies work on the principle of ‘Network of Networks’. Any living body is made up of many networks, that communicate with each other based on internal and external changes to effectively perform their biological function. These networks can be at cellular level, molecular level, genomic level,etc.Systems biology looks at these networks across scales to integrate behaviors at different levels, to formulate hypotheses for biological function and to provide spatial and temporal insights into dynamical biological changes .(https://www.systemsbiology.org/about/what-is-systems-biology/ )

NetworkofNetworks

Network of Networks theory in Systems Biology (Source: Institute for Systems Biology)

Biological Pathway:

The cells in various organs of our body are constantly interacting and receiving chemical cues from external and internal environment in the form of injury, infection,need for food, etc.In order to react to these cues and produce certain products such as fat,protein,etc o to perform a required cell function so that the develops and stays healthy, these cells send and receive signals through biological pathways. Thus, a biological pathway is a series of actions among molecules in a cell that leads to a certain product or a change in the cell. Such a pathway can trigger the assembly of new molecules, such as a fat or protein. Pathways can also turn genes on and off, or spur a cell to move.(Biological Pathway).

Various resources were used to study the various molecular interactions of CYP2C19 gene and its involvement in biological system.

The first database that we searched was Geneontology.org (GO). This is an open access database and upon searching for Ontology information for the CYP2C19 and CYP2C9 gene, not enough information was found to be registered in there.So just to give an idea to the readers of this post about how this source is useful for pathway analysis, CYP2D6 gene was considered for the search query, which gave the following result:

Screen Shot 2015-12-07 at 2.56.29 PM

Further, navigating through the P450 inhibitor option gave us some Term information, that included the GO Accession ID :CHEBI:50183 for the CYP2D6 gene. Clicking on the Graph view, gives the ontology of CYP2D6 as shown in the image below:

Screen Shot 2015-12-07 at 3.30.03 PM

KEGG: Kyoto Encyclopedia of Genes and Genomes :

KEGG is an open source database for understanding the various functions, interactions and pathways involving cells, molecules, organism and ecosystem of a biological system. On entering our gene name in the KEGG Pathway search option, around six different pathways involving the CYP2C19 gene are obtained. I just selected the most relevant Drug Metabolism- Cytochrome P450, which is in line with the information we have discussed in previous blogs that the major role of the CYP2C19 gene is in the metabolism of drugs such as mephenytoin, proguanil,warfarin,etc.

Screen Shot 2015-12-07 at 9.59.47 AM

Kegg Pathway Text Search for the CYP2C19 Gene: Drug Metabolism – Cytochrome P450 Pathway ( Entry-map00982)

The particular metabolism pathway were the involvement of the CYP2C19 gene is highlighted in above figure is for the drug Citalopram which is as follows:

Screen Shot 2015-12-07 at 10.06.21 AM

Ingenuity Pathway Analysis (IPA) database is a highly curated source which does not have an open access. It is a web based software application, that uses powerful algorithms to identify regulators, relationships, mechanisms, functions, and pathways relevant to changes observed in an analyzed dataset.

Initial search for the CYP2C19 gene in the Genes and Chemicals tool of the IPA yields the following result, which shows that none of the drugs are associated with this gene as highlighted with a red box in the below figure:

Screen Shot 2015-12-07 at 10.31.50 AM

On selecting the CYP2C19 gene from the above entry and adding it to my new pathway and building it with the help of Grow tool where number of molecules were limited to 25 gives the following pathway in Organic view:

Screen Shot 2015-12-07 at 10.54.58 AM

The pathway result in the above figure involves all molecular interactions for human,rat as well as mouse species. In the next step, some of the filtering options were selected to get a more precise view of the interactions of the CYP2C19 gene with other enzymes, molecules,chemicals ad drugs. For this, the MicroRNA-mRNA interactions from the Data Sources was deselected and along with that Moderated(Predicted) Confidence level was excluded, the species search was narrowed down to Human and Mouse and microRNA was excluded from the Molecule Types. After applying all these filters we were able to get the following view:

Screen Shot 2015-12-07 at 11.17.18 AM

Hovering on individual molecules, gives information about the family to which they belong as highlighted in orange boxes in the diagram. For example, mephenytoin is a chemical drug whereas FOXA3 is a Transcription-regulator. Also, if we click on the relationship interaction solid or dotted line between the CYP2C19 gene and the molecule ( e.g blue colored line seen between chemical drug proguanil and the CYP2C19 gene), than it gives the summary of the relationship between the gene and the molecule as below:

Screen Shot 2015-12-07 at 11.26.24 AM

Another feature to navigate through in IPA is the Diseases and Functions tool in which one can enter the disease name with which a particular gene is related and than add it to the existing pathway of molecules to see the relationship summary as below:

Screen Shot 2015-12-07 at 11.42.58 AM

Based on my previous blog posts, we have the knowledge that the presence of a CYP2C19 gene allele in an individual leads to a poor metabolism of drugs like warfarin, mephenytoin,omeprazole, proguanil,etc. So , I selected he responsiveness of blood platelets disease from the various diseases list that was generated with CYP2C19 gene as entry and added it to my pathway. On clicking over the relationship arrow, the relationship summary was generated as seen in the above figure which is consistent with our knowledge from the past regarding poor metabolizing effect in mutant human CYP2C19 protein carriers.

Next,using the Trim Tool we can view the pathway with the specification such as show only those interactions which are direct between the gene and the molecules.We can also specify which species to be included, relationship types,molecule types, tissues and cell lines, diseases,biomarkers,etc. For, the CYP2C19 gene , the pathway was trimmed to show only direct interactions ( the species were not specified and thus included Human,Mouse ,Rat and Uncategorized species):

Screen Shot 2015-12-07 at 11.59.18 AM

From approximately 25 interactions that were seen in previous images, trimming to direct interactions yields around 7-8 interactions only as shown in the above diagram.

“Canonical Pathways,” are idealized or generalized pathways that represent common properties of a particular signaling module or pathway. To view the involvement of a the CYP2C19 gene in a particular signaling pathway, the overlay button from Path Editor window was used and than Canonical pathway option was selected from the tools. This gave us a list of signaling pathways of which CYP2C19 gene is a part.The PXR/RXR signaling pathway was selected for analysis purpose as represented below:

Screen Shot 2015-12-07 at 12.31.58 PM

The position highlighted in red is where the role of the CYP2C19 gene is in metabolism of various drugs.The Pathway and Tox lists feature of IPA also generates a report for the selected Canonical pathway. As per the summary provided by this report,The pregnane X receptor (PXR) is a nuclear receptor which is mainly expressed in the liver and intestine and along with the retinoid X receptor (RXR), it plays a role in metabolism of various drugs by activating the cytochrome P450 family of enzymes. Also,PXR/RXR activation induces drug conjugation enzymes as well as the drug efflux pumps. Thus, this PXR/RXR canonical pathway is very important for drug metabolism and transport and is also involved in bile acid synthesis and lipid metabolism. It also regulates the expression of gene products required for xenobiotic and endobiotic metabolism. This summary is in consonance with the information in my previous Gene Expression Analysis blog as well as from OMIM, which states that the CYP2C19 gene belongs to the cytochrome P450 family and is expressed mainly in liver and has a major role in metabolism of drugs such as mephenytoin, omeprazole,etc. Also, it is now known that individuals who are carrier of mutant CYP2C19 gene alleles are poor metabolizers of these drugs and from this signaling pathway we can explain the same.

CYP2C9 Gene Expression

November 30, 2015meghashahdesani Leave a comment

A gene is a coded genetic material called DNA. Each gene contains information required to produce gene products, mainly proteins, which are needed for cell function and for organisms to survive. Just as in our daily life we buy or use things, as we require, similarly these genes make gene products in the body as when required. This on and off mechanism is called Gene Expression. Gene expression is the process by which the information encoded in a gene is used to direct the assembly of a protein molecule (Gene expression).

Regulation of Gene Expression:

Genes are expressed either as RNA or proteins. However, as mentioned earlier not all gene products are required at all point of time in a cell and also the amount of protein required might vary as per the demand of the particular organ. So, based on the external and internal requirements and other environmental factors, the cells need to decide how much gene expression is necessary for it.

The amounts and types of mRNA molecules in a cell reflect the function of that cell. Thus, the primary control point for gene expression is usually at the very beginning of the protein production process — the initiation of transcription. RNA transcription makes an efficient control point because many proteins can be made from a single mRNA molecule. Eukaryotic transcripts are also more complex than prokaryotic transcripts. Different cell types have varying gene expression profiles because of presence of distinct transcription regulators. DNA sequence usually has a promoter sequence to which RNA polymerase can bind and thus start the transcription process. Along with the promoter sequences, there are also enhancer sequences on DNA, which provide binding sites for regulatory proteins that affect RNA polymerase activity. These regulatory proteins can either increase or decrease the transcription process, thus influencing the expression of a gene in a cell (http://www.nature.com/scitable/topicpage/gene-expression-14121669),(http://study.com/academy/lesson/what-is-gene-expression-regulation-analysis-definition.html)

Epigenetics and DNA Methylation:

One important aspect in gene expression is that it does not only depend on the gene’s DNA sequence but is also influenced by the epigenetic or external environmental factors. Thus, Epigenetics is the study of changes in gene expression, which is controlled by external factors other than the gene’s DNA sequence, and these changes are heritable in nature. Epigenetic changes can switch genes on or off and determine which proteins are transcribed. It is also involved in many cellular processes and thus all our cells have the same DNA but get differentiated into different types of cells in the body such as neurons, liver cells, pancreatic cells, etc. (http://www.nature.com/scitable/topicpage/epigenetic-influences-and-disease-895)

epigenetics

Source:journal.frontiersin.org

DNA methylation is an epigenetic mechanism in which a methyl (CH₃) group is added to a DNA. It usually happens in a region where a cytosine nucleotide is located next to a guanine nucleotide linked by a phosphate, this is a called a CpG site. In the bulk of genomic DNA, most CpG sites are heavily methylated while CpG islands (sites of CpG clusters) in germ-line tissues and located near promoters of normal somatic cells, remain unmethylated, thus allowing gene expression to occur. When a CpG island in the promoter region of a gene is methylated, expression of the gene is repressed (it is turned off). ( http://www.whatisepigenetics.com/dna-methylation/)

methyl

Source: www.bloodjournal.org

In previous blogs we have discussed mainly about the CYP2C19 gene, but for this part of the blog we will be analyzing gene expression for CYP2C9 gene. Using the Tissue-specific Gene Expression and Regulation (TiGER) database curated by John Hopkins University, we can observe the tissues in which the CYP2C9 is mainly expressed:

Screen Shot 2015-11-29 at 9.18.34 PM

Screen Shot 2015-11-29 at 9.22.36 PM

CYP2C9 Gene expression – TiGER database

From both figures above of the Expressed Sequence Tag (EST) Profile, it can be observed that the CYp2C9 gene is mainly expressed in the liver with a small amount expressed in uterus,muscle,colon,kidney and eye. (http://bioinfo.wilmer.jhu.edu/tiger/db_gene/CYP2C9-index.html).This information is concurrent with the genotype and phenotype details of CYP2C9 gene mentioned in OMIM.org . As per OMIM, the CYP2C9 gene belongs to the cytochrome P450 enzyme family which are mainly responsible for metabolism of drugs such as anti-coagulant warfarin, anti-diabetic drugs tolbutamide and glipizide and anti-convulsant phenytoin. These drugs mostly have hepatic metabolism pathway and thus, we could agree with the TiGER database of the CYp2C9 gene being specifically expressed in liver.

Similar results were seen with The Human Protein Atlas database as seen below:

Screen Shot 2015-11-29 at 10.10.55 PM

The RNA Expression and Protein Localization for the CYp2C9 gene is highest in liver followed by duodenum, small intestine, colon,kidney and appendix.

In order to study the epigenetic effect through DNA methylation in the CYP2c9 gene, the NCBI Epigenomics ‘Browse Experiments’Tool was used to yield the following results:

Screen Shot 2015-11-29 at 11.03.50 PM

Screen Shot 2015-11-29 at 11.06.22 PM

Epigenomic Analysis of DNA methylation of Kidney and Liver tissue for CYP2C9 gene

Comparing the DNA methylation of CYP2C9 gene on chromosome 10 in adult kidney and adult liver tissue shows that there are no CpG islands within the gene which is highlighted by the orange arrows. Also it can be seen that the kidney tissue has almost no methylation, whereas the liver tissue has very high methylation across the gene.

Lastly, to compare RNASeq and Microarray technologies for evaluation of differential gene expression of CYP2C9 gene, I used the huge Neuroblastoma dataset and narrowed down my search to the CYP2C9 gene in the gene data bank. I found only one entry for the gene, which is documented below:

Screen Shot 2015-11-30 at 9.35.11 AM

As seen in the figure above, the differential score for Tumor Stage 1 and Tumor Stage 4 has not been reported. But for Tumor Stage 4S the differential score was 50.4 for RNASeq which down regulated to 38 for the Microarray technology. Thus, it proves that RNASeq is a better differential technique to use as compared to Microarray.

Comparing Sequences of the CYP2C19 Gene Across Different Species

November 16, 2015meghashahdesani Leave a comment

Few years ago, due to lack of technology and advanced lab facilities, protein sequencing was one of the last methods used to obtain information about the functional proteins encoded in a gene. But today the scenario has changed; with decreasing cost and wide use of techniques such as whole-genome sequencing and whole-exome sequencing, as well as development of rapid methods for sequence comparison such as heuristic algorithms and parallel computers,protein sequence comparison across different species has now become the primary source to gain knowledge about the biological function of a gene. It is the most powerful tool for analyzing and studying protein sequences because of the enormous amount of information that is preserved throughout the evolutionary process. Proteins that share a common ancestor are called homologous proteins and they always share a common three-dimensional folding structure and often share common active sites or binding domains, which can be useful from a pharmacogenomics point of view in drug design and discovery of targets for drugs. One of the important aspect of comparing sequences is to find biological properties of proteins that ave been conserved overtime in different species ( http://people.virginia.edu/~wrp/papers/ismb2000.pdf). A conserved sequence is a sequence of amino acids in a polypeptide or of nucleotides in DNA or RNA that is similar across multiple species. A known set of conserved sequences is represented by a consensus sequence ( http://ghr.nlm.nih.gov/glossary=conservedsequence).Sequence comparison method can be used for selecting functionally significant site in a sequence as well as for predicting protein functional class (http://www.ncbi.nlm.nih.gov/pubmed/18763738).

CYP2C19 Gene – Sequence Comparison

The NCBI BLAST ( Basic Local Alignment Search Tool ) is a very useful source for sequence comparison among different species. The program compares the protein sequences to sequence databases and calculates the statistical significance of matches. It is helpful for finding local similarity between sequences and also it generates an evolutionary distance tree of all the sequences, which helps to visualize and calculate the evolutionary distance between the comparison species of interest. The NCBI protein id for the human CYP2C19 – NP_000760.1was searched for in BLAST, limiting the search results to refseq proteins and mammals. The search query in BLAST generates a table of results showing sequences for Homo sapiens and other matched species. It also gives an Accession Id for each sequence indicating whether the sequence for that particular species is actual ( NP) or Predicted (XP). The total score is used to see the similarity between the reference sequence and the compared sequence and the Indent value gives an estimate of evolutionary distance between the species compared with humans as the reference.

For the CYP2C19 gene, most of the sequences producing significant alignments generated by BLAST were predicted sequences(XP) and not the actual ones(NP). So, NCBI Gene and NCBI Protein database were searched for actual protein sequences of species other than human.After exploring all the three search options, the following species were selected for sequence comparison:

Species Selected for Sequence Comparison using BLAST, NCBI GENE and NCBI PROTEIN sources

Of these six selected sequences, the human, cow and sheep protein sequences are actual, whereas the seqences for small-eared galago and common marmoset are predicted.The horse protein sequence is provisional and has not been yet subjected for final NCBI review (http://www.ncbi.nlm.nih.gov/protein/603843768?report=genbank&log$=protalign&blast_rank=42&RID=4CTZ2RVX016). From BLAST results, we can see that the total score for humans is 1011 whereas its 863 for small-eared galago, 859 for common marmoset and 822 for horse. So, in terms of similarity, small-eared galago differs by 148 points from human,and common marmoset and horse differ by 152 and 189 points respectively. Looking the evolutionary distance estimate in form of Indent value, it can be observed that it is 14% for both small-eared galago and common marmoset whereas it is 18% for the horse (http://blast.ncbi.nlm.nih.gov/Blast.cgi).

The MegAlign Pro feature of DNA Star was used for sequence comparison of all the above mentioned six species. The sequences in FASTA format were realigned using Clustal Omega algorithm. The results consisted of protein sequence comparison consecutively in rows, distance values table and an evolutionary tree which is shown in the figure below:

Evolutionary Distance Tree

The sequences were renamed in this tree, with the common species name for ease.From this tree, we can an overall idea about which species are close to each other in terms of similarity of sequences without going through the actual Distance score.For our analysis, the Human CYP2C19 protein sequence seems to show some similarity with the common marmoset protein, but it is quite different from the remaining species.

Distance Table for Sequence Comparison

It can be interpreted from the Distance table that common marmoset and small-eared galago are closest to human (0.14) in terms of evolutionary distance whereas, sheep is the farthest(0.33). Sheep and cow have the least evolutionary distance of 0.06 whereas cow and common marmoset has the maximum evolutionary distance of 0.35.

The image below highlights the R433W variant site for the CYP2C19 gene and also allows us to compare it with sequences from other species a consecutive sequence comparison view. The variant site is highlighted in red.

Sequence Comparison-Highlighting Variant Site R433W for CYP2C19 gene

We can observed that the arginine(R) at variant site R433W in the human CyP2C19 gene is similar for all the five non-human species as well as for the consensus, thus summarizing that it has remained quite conserved throughout the evolutionary time.

MegAlign Pro also allows us to select a reference sequence and compare the remaining sequences against it using its Comparison feature. For the purpose of this comparison, the human CYP2C19 protein sequence was selected as the reference sequence and compared against the remaining five species for a total of 20 amino acids in N to C direction, including the variant site.

Color only matches to Reference Human CYP2C19 Protein Sequence

The variant site R433W is highlighted in blue and in the Comparison feature- Color only Matches to Reference is selected. We can see that in both N and C direction from the variant site most of the amino acids have been conserved in the evolution process for all the species.But at position 423, except for human and common marmoset,who have asparagine (N) amino acid in their sequence, the remaining species have aspartic acid (D) at that position, indicating some evolutionary changes that have been inherited in different species.Similarly at position 434, it is valine (V) for cow and sheep ( they also have minimum evolutionary distance ) as compared to isoleucine(I) for the remaining mammals.At position 436, only small-eared galago has a different amino acid alanine (A), whereas all others have valine and thus seems to be an important region for researchers from evolution point of view.

Color Only Differences from Reference View

The Color Only Differences from Reference option also highlights that most of the amino acids are conserved for the all the six species for positions 423 to 443.The major differences are seen at position 433, where cow, horse,small-eared galago and sheep as well as the Consensus sequence have aspartic amino acid instead of asparagine.But, the remaining differences, it on individual species level with the Consensus sequence resembling the reference Human sequence in context of similarity. The closeness in evolutionary distance between cow and sheep can be observed clearly in this figure, as both the species have common amino acid replacement at position 423,424,427 and 434 when compared to Human protein sequences. The common marmoset protein has two unique amino acid difference at position 426 and 430. At position 426, all other species have methionine(M) whereas common marmoset has isoleucine, similarly at position 430 it has threonine(T) and others have alanine(A).

Show Only Differences from Human Reference Sequence

The Show Only Difference from Reference option concludes that cow and sheep sequences mainly are different with reference to human sequence and most of the amino acids for this 21 positions have remained conserved in all the six species throughout the evolution period.

Diagnostic Genome Analysis- Discovering the Undiscovered Secrets of Genes and Genomes

November 9, 2015meghashahdesani Leave a comment

A four-year old boy named Nicholas Volker, who like all other kids loves batman and gun fights arrives at Children Hospital in Wisconsin in 2007 with a mysterious bowel disease leaving the doctors baffled regarding the diagnosis and treatment options. Food, which is a basic necessity for any human being to survive, became a dream for Nicholas because of his rare case of extreme irritable bowel syndrome causing holes in his intestine and leaking of fecal matter in the abdomen. At the young age of four he had already survived 100 surgeries with his colon removed, leaving him in a state of malnutrition and lose of hope to lead a normal life. His doctors had tried almost every diagnostic test and treatment possible to treat such a condition but with no success. After exhausting nearly all the medical options, his doctors decided to perform whole genome sequencing of Nicholas’s DNA to unfold the mystery of his rare medical condition. In November 2009, they were finally able to track down the genetic variations in Nicholas’ s genome sequence and identified that a mutation in the XIAP gene on the X-chromosome was responsible for this little boy’s suffering. A G to A mutation that led to an amino acid substitution and finally formation of an incorrect protein, which made his own immune system attack the healthy cells of the intestine.

The Mysterious Bowel Disease Story of Nicholas Volker

The treatment was bone marrow transplantation and today this light blued eye is in remission. This case is an excellent example of sequencing techniques, which have started an entire new era in the medicine and genetic world.(One in a Billion-Nicholas Volker)

Initially sequencing techniques were only used for identifying rare hereditary disorders, but with the decreasing cost and availability of high end cutting technologies, it has now become possible to sequence entire human genome in comparatively less time and cost. The entire human genome can now be sequenced in less than one week and can cost anywhere between 5000-10,000$. (The Promise and Challenges of Next-Generation Genome Sequencing for Clinical Care). Whole genome sequencing or WGS is a process used to analyze the entire DNA sequence of an organism’s genome at a single time. In this technique, the DNA sample of an individual is collected and than the 3 billion nucleotides which comprise of human genome are studied against a reference sequence to identify any mutations in coding as well as non-coding regions, which can further help to provide precision treatment approach to the patient. The other sequencing technique is known as Whole Exome Sequencing, which involves sequencing of an exome (protein-coding region of human genome). This technique is less time consuming and more cost-effective, since the exome represents less than 2% of the genetic code but contains approximately 85% of the disease known variants(Exome Sequencing). WGS and WES are now being used more frequently especially in the field of pharmacogenomics to develop a personalized treatment approach for patients.

Whole Genome Sequencing
Source- Knowgenetics.com

Exome_Sequencing_Workflow_1a

Exome_Sequencing_Workflow_1b

Next generation sequencing methods are useful for screening neonatal blood for early diagnosis of thalassemia and also for identification other novel mutations associated with it (http://Pediatr Res. 1992 Mar;31(3):217-21.) It also helps to diagnose and treat medical conditions that are difficult to diagnose using routine clinical and laboratory criteria. WGS can also be used to identify mutation in M.tuberculosis gene that are associated with antibiotic resistance and are likely to be responsible for phenotypic resistance (http://J Clin Microbiol. 2015 May;53(5):1473-83. doi: 10.1128/JCM.02993-14. Epub 2015 Feb 11). This can further help to design an effective, cost and time saving treatment approach which would not only benefit the patient but also the health care system.

Some of the advantages of WGS and WES are that they can be used for extensive research purposes for discovering novel mutations, for population screening, for neonatal screening as well as for pre-symptomatic testing. WGS is superior to WES in terms of sequence coverage. A low average read depth is required for WGS as compared to WES. The main challenge with doing WGS is generation of enormous data for which large storage and data analysis capacity is required. This is not an issue with WES technique, which only targets for protein coding regions and thus reduces storage and analysis cost making it possible to sequence larger population based comparisons (http://www.nature.com/ejhg/journal/v21/n1s/full/ejhg201346a.html#bib6)

With the advancement in sequencing techniques came the concept of Direct-to-consumer (DTC) genetic testing. Companies such as 23 and Me, Full Genomes Corporation (http://www.isogg.org/wiki/List_of_DNA_testing_companies) and other provide genetic test results directly to the patient without involving a health care provider. This kind of services can be beneficial when a person wants to know his ancestry, wants to improve his/her lifestyle by acquiring knowledge about his risk for genetic mutations and associated phenotype as well as for public education and awareness. But the disadvantages of such testing are more when compared to the advantages. Since, a health care provider is not involved in the delivery of these genetic test results, the chances of misinterpreting the outcomes increases. Moreover, The Genetic Information Nondiscrimination Act of 2008 (GINA) does protect the consumers against discrimination based on genetic testing, but it has several loopholes in it. The law still allows insurance companies to utilize genetic test results in determining insurance payments and the DTC genetic test results can be misused by insurance providers in terms of coverage offered to an individual. Genetic privacy is also becoming a major concern in this area, as there are currently no strong laws to prevent an individual’s genetic information from getting public. Emotional impact of learning about being a carrier of any genetic disease can be huge and can lead a person to anxiety, depression and suicidal tendencies (Direct-to-consumer genetic testing).

With this background, let us analyze a situation in which a hypothetical patient Mary is found to be homozygous for a poor metabolism associated mutation ARG433TRP of the CYP2C19 gene. Because of the mutation in her gene, Mary is has low metabolizing capacity for drugs such as anticonvulsant Mephenytoin and tolbutamide. From past one year, Mary has been on a diagnostic odyssey due to her poor response to drugs and has developed multiple medical conditions, which does not show positive signs of improvement due to low metabolism of the drugs given to her. Having exhausted all the traditional diagnostic options available, Mary and her family have decided to opt for any of the following options:

Participate in a clinical trial offering full exome analysis for Mary and her parents at no personal cost.
Seek full genome analysis and work with their insurance provider to seek coverage, a 4-6 month negotiation.
Pay out of pocket for the full genome analysis ($5-10k).
Use direct-to-consumer services and perform independent analysis of the raw results.

Since most of the allelic variants for CYP2C19 gene are found to be polymorphic in nature, in my opinion Mary and her parents should first go for a clinical trial offering full exome analysis at no additional cost.This would help her confirm the presence of a genetic mutation in her gene and would also help to identify any other novel variant responsible for the loss of function of the CYP2C19 gene. Also, it has been seen that CYP2C19 gene alleles alone are not attributable for the loss of metabolizing function and there might be other genes responsible for t. To know this Mary and her family might consider undergoing WGS after negotiating with their insurance provider. But upon weighing the benefits and cost factor of WGS versus WES; in Mary’s case WES would be a better choice. As it has been seen that most of the alleles of the CYP2C19 gene are seen at a low frequency rate in the population and also not all the allelic variants are associated with a phenotypic effect. So, doing WGS would not only cost more for Mary and her family but also consume time and generate a huge amount of repetitive data which might not prove to be extremely useful for her (http://www.ncbi.nlm.nih.gov/pmc/articles/PMC3748366/). DTC should not be considered in this case, because the genetic findings obtained need to be utilized for a precision dose adjustment of drugs and that is only possible through a health care provider or a primary physician.Thus, looking at the low frequency rate of the allele of the CYP2C19 gene and its less deleterious effect in terms of disease potential, WES is the best choice for Mary and her family.

Incidental findings: While conducting WGS and WES, a large amount of clinically relevant data is generated, which might also include some secondary or incidental findings for which Mary had not ordered the diagnostic testing. If in a clinical trial, than an informed consent should be obtained from Mary and her family regarding the return of incidental findings as per IRB andThe American College of Medical Genetics and Genomics guidelines (https://www.acmg.net/docs/IF_Statement_Final_7.24.13.pdf). If the researchers and analyst come across any novel genetic mutation , which might be playing a major role in the phenotype of Mary apart from the common allelic variant and if the benefits of disclosing this information to her outweighs the potential risks associated with it, than these secondary findings should be returned to Mary and her following after following all the necessary procedure for it.

Mary and her family decide to go for WES and they later received a USB drive with a VCF file for their genome analysis. The raw data in this VCF can be used through NCBI source to verify whether the sequencing confirmed the presence of disease causing variant as follows:

Using http://omim.org/ the rsID for ARG433TRP allelic variant of the CYP2C19 gene was identified as rs56337013

rsID for ARG433TRP variant of CYP2C19 Gene

Navigating through the SNP data base of NCBI and entering the rsID for the variant gives the following result:

SNP file

We can than go to the VarView link at the bottom of the page and zoom in to identify the protein sequence for our variation site

VarView of SNP Database highlighting the rsID and position of variant site

Hovering the cursor over the rsID gives the position number of variant site relative to position of chromosome on which the gene is located. The position number for this variant was 94852738

Position Number of Variant Site

Clicking on the exon view indicates the start and end position of exon containing the variant site as follows:

Exon range including variant site

As seen in the figure the beginning and end positions of exon range including the variant site were 94852733 and 94852914 respectively.

The single line VCF formatted data for a patient homozygous for ARG433TRP allelic variant of the CYP2C19 gene is indicated as below:

VCF single line formatted data

3D Structure view of Human CYP2C19 Gene

November 2, 2015meghashahdesani Leave a comment

The technological advances in this techno-savy age are gradually converting this globe into a Virtual 3D World, where people prefer to view everything in a tertiary structure; right from playing a 3D video game to a 3D painting to designing a 3D protein structure. In medical field, 3D printing has many implications such as designing customized prosthetics, tissue and organ fabrication,etc.( Medical 3D printing )

Visualizing a 3D structure of a gene or protein can be very useful from a bioinformatics perspective. It can be helpful for rational drug design, viewing sites associated with disease causing variants and also to see the proximity of amino acids that are distant in the primary sequence.As discussed in my previous post, the CYP2C19 gene is a monomer and a member of Cytochrome P450 family and is responsible for the metabolism of drugs such as anti-convulsant mephenytoin, anti-ulcer omeprazole and anti-malarial drug proguanil. An OMIM allele for CYP2C19 gene is.0003 ( p.TRP212TER (rs4986893) [dbSNP:rs4986893] ). A G to A mutation results in substitution of Tryptophan W at position 212 of protein sequence to a termination codon(X).This mutation leads to poor metabolism of proguanil and other drugs in individuals who are carriers for this variant.Using Protean 3D feature of DNA Star software we can visualize the 3D structure of Human CYP2C19 gene as follows:

Human CYP2C19 gene – PDB ID 4GQS

In the above figure we can see a mixed backbone structure of gene which includes both ribbon and sphere design. From the style menu we can select different styles for the protein structure such as tube, balls and spheres, sticks, etc to customize it as per our visualization needs. Also below the 3D structure , there is the translated sequence of protein available along with the KSD secondary structure. We can highlight a portion of the sequence and the same gets highlighted in the tertiary structure, which helps us to gain an insight into the positioning of the amino acids i.e whether they are located on the surface or in the turn region or inside the helix.

When we highlight our variant location for allele 0.0003, i.e amino acid Tryptophan at position 212, it gets reflected in the 3D structure as shown in figure below:

Highlighted position 212 for amino acid Tryptophan in allele .0003 of human CYP2C19 gene

From this view, it can be seen that the variant position is located on the interior region (beta sheet ) of the helix. We can also see the variant location in different views and colors using the style menu :

3D Human CYP2C19 gene with highlighted variant position as blue spheres

Protean 3D allows us to use our creativity at its best and create different 3D view structures using various colors and design options available:

Sphere view of Human CYP2C19 gene with position 212 highlighted in green

Tube structure view of Human CYP2C19 gene

Tube temperature structure view of Human CYP2C19 gene with highlighted allele position in green. The spheres seen in the above image represent the ligands and water in the structure.

Tube occupancy view of Human CYP2C19 gene with variant site highlighted in green

In order to check the validity of protein structure prediction algorithm, we compared the tertiary structure of Human CYP2C19 gene to the secondary structure prediction Chou Fasman algorithm in the analysis view of Protean 3D:

Chou Fasman Secondary Structure prediction

The above figure indicates that our 3D structure prediction is valid, as even the Chou Fasman secondary structure prediction algorithm shows that the variant site which is highlighted in light blue is located in the yellow color beta region of the helix.

When we looked at the Hydrophilicity plots generated by Kyte-doolittle and Hopp Woods algorithm, it further confirmed that the 3D structure prediction algorithm does tally with the secondary structure:

Kyte-doolittle and Hopp Woods Hydrophilicity plots

Both the hydrophilicity plots show that the highlighted variant site falls in the orange region which is representative of hydrophobicity. Thus the allele position is located in a hydrophobic region which is located usually in the lipid bilayer in the interior of the helix.

The Emini algorithm predicts the surface probability for a highlighted region and for position 212, which is the variant site in OMIM allele .0003 of the Human CYP2C19 gene, the surface region indicated is line as shown in the image below:

Emini Surface Probability Algorithm

The violet blocks represent the surface whereas the lines are for the interior beta sheet regions.The light blue highlighted variant site falls in the line area, confirming our results with other predictive algorithms.

Thus, Protean 3D not only helps us to view the tertiary structure of a human gene or protein , but also through its analysis feature it allows us to compare the validity of different predictive algorithms in order to confirm our results and to give accurate structure prediction using different style options.

click on the following link for more information on types of protein structure and their implications in clinical filed :

http://www.ncbi.nlm.nih.gov/books/NBK6824/

First Annual Midwest Bioinformatics Conference, 2015: Panel Speaker- Dr.John Spertus,MD,MPH,FACC

October 20, 2015meghashahdesani Leave a comment

The First Annual Midwest Bioinformatics Conference was held on Thursday October 15th and Friday October 16th at UMKC, Volker Campus.It was a knowledge-packed conference for all the budding bioinformaticians with speakers coming from different fields of bioinformatics and sharing their valuable experiences with the audience. On the second day of the conference, Dr.John Spertus from Saint Luke’s Mid America Heart Institute brought up a much discussed subject in bioinformatics- Precision Medicine .

Dr. Spertus focused on ‘Applying Precision Medicine’ in today’s world. He said that we can improve our treatment approach based on evidence medicine. A risk stratification model can be designed where we can see the treatment approach from a genetics, pharmacogenomics, proteomics and biomarkers point of view and than tailor a specific personalized treatment or medicine for a patient. He specified that their are already risk stratification models available but are not being frequently used for improving precision medicine. Using such tools to improve the value of precision medicine and in planning the treatment and medicine approach of the next patient would really help to cut down the health care cost and would also save patient’s as well as the physician’s valuable time by making the treatment fast and target specific.The main implication of designing this Risk stratification models would be to obtain Decision Support Tools from this models. The current Informed Consent Form for patients in hospitals can be replaced with such decision support tools and thus collect maximum information from a patient with his consent on it. This would aid a physician decide the treatment approach required for the patient. It also improves consistency of care, beneficial for treating high risk patients with utmost attention and ultimately reduces the burden of cost. Dr.Spertus and his colleagues have also published a paper in the British Medical Journal highlighting the importance of Precision Medicine in reducing bleeding in patients undergoing percutaneous coronary intervention. The detailed research paper can be found on the following link:

Precision medicine to improve use of bleeding avoidance strategies and reduce bleeding in patients undergoing percutaneous coronary intervention: prospective cohort study before and after implementation of personalized bleeding risks

The future goals would be to use these risk models in all areas of medicine and extending models to include novel -omics as well.

Overall, Precision Medicine is an exciting field of research in bioinformatics which is relatively still undiscovered and I am thankful to Dr.John Spertus for enlightening us on this subject through his speech. Hope precision medicine helps us to deliver better health care services and bring more smiles on the faces of patients for whom we care so much and for whom we are here to serve.

Cool Badge from QIAGEN Bioinformatics..Proud to be a Bioinformatician!! 🙂

Molecular Diagnostics for a Gene Variant Analysis

October 12, 2015meghashahdesani Leave a comment

Mary our hypothetical patient is suspected to carry a disease associated variant of the CYP2C19 gene. This gene is basically responsible for the metabolism of drugs like anticonvulsant Mephenytoin, anti-malarial Proguanil, anti-ulcer proton-pump inhibitor Omeprazole and anti-platelet drug clopidogrel. The variants such as .0001, .0002, .0003, .0004 as mentioned in OMIM can lead to poor metabolism of above mentioned drugs in persons carrying this trait.Through an NCBI (National centre for Biotechnology Information)source called ClinVar,it was identified that there are 14 listed variants of CYP2C19 gene and the highest review status is of ‘Single Submitter’ with 3 entries in that category followed by “Atleast One Star’ which also has 3 entries in it. In terms of clinical significance it states that there are 2 Benign, 4 Pathogenic and 3 of uncertain significance variants.

In order to confirm the presence of CYP2C19 gene variants in the DNA sequence of Mary as well as the association of the variant to a disorder, her DNA sample can be send for analysis to either of the CLIA certified 17 diagnostic laboratories listed on Genetic Testing Registry(GTR). A few of them are as listed below:

Baylor Miraca Genetics Laboratories Baylor College of Medicine Houston, Texas, United States
Fulgent Clinical Diagnostics Lab Fulgent Diagnostics Temple City, California, United States
Genelex Corporation Seattle, Washington, United States
Quest Diagnostics Nichols Institute San Juan Capistrano Quest Diagnostics San Juan Capistrano, California, United States
Outside United States: – 1) CGC Genetics Porto, Porto, Portugal

Diagnostic techniques such as PCR ( Polymerase Chain Reaction) and RFLPs ( Restriction Fragment Length Polymorphisms) are used as confirmatory tests for the presence or absence of a particular gene variant and also to analyze its association with a disease or disorder.

Polymerase Chain Reaction (PCR):

PCR also called a Molecular Photocopying was developed Kary B.Mullis and was awarded a Nobel prize in 1993 for this revolutionizing discovery in the field of Chemistry. Basically PCR helps to amplify and copy small fragments of DNA in a fast and inexpensive manner which are useful for molecular and genetic analyses, gene mapping in laboratories, DNA fingerprinting, diagnosis of genetic disorders, detection of bacteria or virus,etc. In order to develop multiple copies of DNA within a couple of hours , the DNA sample is first heated to separate it into two pieces of single stranded DNA.This is followed by the introduction of primers, free nucleotides and a heat resistant ‘Taq polymerase’ enzyme which with the help of primer and the old strand of DNA synthesizes a new strand of DNA under cooler temperatures. This cycle is repeated 30 to 40 times leading to the formation of more than billion copies of original DNA segment.A machine called thermocycler is used to control the temperature at various points of the reaction process.

Image Source: Genome.gov

Restriction Fragment Length Polymorphisms (RFLPs):

It is used along with PCR to confirm the presence or absence of a particular variant of a gene. Restriction enzymes or restriction endonuclease are enzymes which recognize a specific sequence of nucleotides on both the strands of DNA usually including four to six bases and than make a double stranded cut on the DNA. There are many scientific applications of these restriction enzymes, one of which is to identify gene alleles and to determine their association with a disease by observing the change in the restriction sites on introducing the variant gene. The variant gene can make the restriction disappear or can cause a shift in the nearby restriction sites downstream or upstream. These changes can be seen through agarose gel electrophoresis method and Southern blot procedure which highlights the number of bands for a particular restriction enzyme and their cut areas. The RFLP summary also highlights the change in size or length of the restriction enzyme between the original and variant gene.

Using Bioinformatics tools and components of DNA Star software program like GeneQuest and SeqBuilder, an overview of the variant of Mary can be obtained.

The CYP2C19 variant used in the last assignment was .0002 ( p.ARG433TRP [dbSNP:rs56337013]). It is a novel C to T mutation at nucleotide 1297 in exon 9 of the CYP2C19 gene, resulting in an arg433-to-trp (R433W) substitution in the heme-binding region. Upon checking the effect of this variant on the restriction site , it was observed that there was no restriction site present at the point in the sequence where variation took place and the one restriction site EcoP151 which was around 17 base pairs away from the restriction site remain unaltered when the variant gene was introduced.This can be seen in the image below:

CDS Original | CDS Variant

Another allele listed on OMIM for CYP2C19 gene is .0003 ( p.TRP212TER (rs4986893) [dbSNP:rs4986893] ). A G to A mutation results in subsitution of Tryptophan ( W ) at position 212 of protein sequence to a termination codon(X). This mutation not only terminates the sequence but also causes a change in the restriction site as seen below. the protein sequence is provided for reference.

CDS Original | CDS Variant

Restriction site BamHI disappears due to a G to A mutation

PCR and RFLP analysis of CYP2C19 gene variant c.636G>A (p.Trp212Ter) with GeneQuest gives the following results:

CYP2C19 | CYP2C19 variant ( p.TRP212TER)

The PCR agarose gel simulation shows the difference in number and positioning of restriction enzyme BamHI in original CYP2C19 gene and its variant form. The red highlighted band is the band for BamHI indicating its cutter sites. It can be observed that in the original gene there are two enzyme cutter sites between 250 to 500 and 500 to 1000 regions. This means that the restriction enzyme BamHI cuts at two regions in the CDS sequence of CYP2C19 gene. But upon introduction of variant c.636G>A (p.Trp212Ter) the CDS sequence is terminated at position 212 and leads t disappearance of one restriction site for BamHI as seen in the figure for variant. The position for the other restriction also changes and is now seen between 1000 to 2500 markers.

The results obtained from RFLP summary for CYP2C19 gene and its variant are as follows:

CYP2C19 CYP2C19 variant ( p.Trp212Ter )

As seen in RFPL summary, in the CYP2C19 gene the restriction enzyme BamHI had 2 cuts on the DNA sequence, one at position 661 having length of 1139 and another at position 29 with a size of 632. The variant p.Trp212Ter has only one cut site at position 29 with length of 1771, whch confirms that a BamHI restriction site disappears upon introducing the variant form in the original CDS.

Looking at the above observation we might be interested to study this variant along with this restriction site in future for further genetic and molecular analysis. Using SeqBuilder feature of DNA Star we can create Primers for this fragment of variant gene including the restriction site, which can than be amplified with PCR and multiple copies of it can be made to study the variation. Using 600 base pairs with restriction site BamHI approximately in the centre the following primers were created where the top strand is GATCCGGCGTTTCTCCCTCAT.

Top strand of Primer for 600 base pairs of CYP2C19 gene having restriction site BamHI approximately at centre

Bottom strand of Primer for 600 base pairs of CYP2C19 gene having restriction site BamHI approximately at centre.

Lastly technology has changed our lives in many aspects and the use of bioinformatics tools definitely helps to improve our approach for the diagnosis, treatment and prevention of a number of deleterious diseases.

Studying Allelic Effects of CYP2C19 Gene with DNA STAR Software

October 5, 2015meghashahdesani Leave a comment

CYP2C19 gene plays an important role in the metabolism of drugs like anticonvulsant mephenytoin, anti-ulcer drug such as omeprazole as well as anti-malarial drug proguanil. Mutation at any site in this gene can lead to poor metabolism of the above mentioned drugs.

CYP2C19*5 is an allele of CYP2C19 gene which is formed due to mutation at position 433. The amino acid Arginine(R) at position 433 in CYP2C19 gene is replaced with amino acid Tryptophan(W) in CYP2C19*5 allele (ARG433TRP,124020.0002)

In the studies of Wrighton et.al and Goldstein et.al,the correlation between levels of CYP2C19 protein and S-mephenytoin 4-prime-hydroxylase activity in human liver has been highlighted. Xiao et.al in 1997 identified a new allele for CYP2C19 gene in their study which compared two Chinese ethnic groups Han and Bai. During the study, one Chinese individual of Bai ethnicity exhibited poor metabolism of Mephenytoin which was later analysed as the heterozygosity for CYP2C19m1 allele. It took place due to a novel C-to-T mutation at nucleotide 1297 in exon 9 of CYP2C19 gene, resulting in replacement of Arginine(r) with Tryptophan(W) at position 433 at the heme-binding region. This mutation leads to the formation of an inactive protein and thus decreases the activity of recombinant enzyme towards S-mephenytoin and tolbutamide (a hypoglycemic drug used for the treatment of Type-2 diabetes) resulting in poor metabolism of these drugs in the body.CYP2C19*5 allele is rare and it has been estimated that its frequency is low in Chinese and Caucasian populations (Ibeanu.et.al ,1998).

DNA Star is a global software which helps analyze protein sequences, study structures and functions of gene, create and visualize various macromolecular structures, etc. The Protean and Protean 3d components of DNA Star helps view graphical structures of protein sequences, gain knowledge about physiochemical properties and evaluate secondary structural characteristics such as hydropathy and charge density.

Helical wheel structure of CYP2C19 gene using Protean of DNA Star Lasergene software

Comparing CYP2C19 gene with its allele CYP2C19*5 in terms of following parameters using DNA Star :

1) Composition: Using Protean we can observe that there is slight change in the Isoelectric point ( 7.05 for CYP2C19*5 v/s 7.29 for CYP2C19) as well as charge at pH7 (0.23 for CYP2C19*5 v/s 1.23 for CYP2C19) as shown in the image below.

CYP2C19*5 | CYP2C19

Biophysical Properties: As seen below both Arginine and Tryptophan differ in terms of their Net Charges and Average Hydropathy. Arginine has a net charge(pH=7) of 0.99 in comparison to Tryptophan which has a net charge (pH=7) of -0.00. Also average hydropathy of Arginine is -4.50 as compared to -0.90 0f Tryptophan. This indicates that the average hydropathy of Tryptophan is more than Arginine, thus CYP2C19*5 allele is more hydrophobic than the actual CYP2C19 gene.The secondary structures are also different for both the amino acids with Tryptophan having a phenyl ring in its structure whereas Arginine doesn’t.

Average Hydropathy: Hydrophilicity plots are used to measure hydrophilicity and hydrophobicity of amino acids of a protein. The more hydrophobic is an amino acid, the more it is likely to get attracted and function in the lipid bilayer. The more hydrophilic an amino acid, the more likely it is to interact with water solvents and function on the outer surface of protein. The Kyte-Doolittle scale and Hopp-Woods scale are used to measure these hydrophilic and hydrophobic parameters. using Protean 3D we can observe that the Hopp-Woods scale shows more hydrophobic region in CYP2C19*5 allele as compared to CYP2C19 gene at position 433. In the picture shown below, the highlighted light blue marking is position 433 where the R—W substitution has taken place. The upper bar represents the Kyte-Doolittle scale and the lower bar is the Hopp-Woods scale.The orange region indicates hydrophobicity and the blue region represents hydrophilicity.As seen above in biophysical properties, since Tryptophan is more hydrophobic than Arginine, thus CYP2C19*5 shows a change from hydrophilicity to hydrophobicity on Hopp-Woods scale as well as an increase in hydrophobic region on Kyte-Doolittle scale when compared to CYP2C19 gene:

CYP2C19 | CYP2C19*5

Charge Density: All amino acids have the generic structure as shown below:

General Structure of Amino Acid

The amino acids vary in nature based on the nature of the R substituent. Amino acids with non polar substituents are hydrophobic in nature and those with polar R substituents are hydrophilic in nature.The remaining amino acids have either positive or negative charge at neutral pH in aqueous solution and thus tend to be hydrophilic in nature.Since charged amino acids are found more at the outer surface, the Lehninger Charge density component of Protean 3d helps us to study the surface characteristics of a gene. If we compare CYP2C19*5 with CYP2C19 gene, it can be seen that Tryptophan at position 433 has a nonpolar R substitute and thus is more hydrophobic in nature and has a net charge of -0.00 as compared to Arginine at the same position in CYP2C19 which has a positively charged R group and is thus strongly hydrophilic in nature with a net charge of 0.99. This can be observed in the image below:

CYP2C19 | CYP2C19*5

Summary:

Substitution of Tryptophan in place of Arginine at position 433 in CYP2C19 gene leads to mutation and formation of allele CYP2C19*5. This kind of mutation is rare and populations carrying this variant are found to be poor metabolizers of S-mephenytoin and tolbutamide. The change in hydrophobicity and net charge of tryptophan as compared to arginine can be attributed to cause the poor metabolizing action of the allele.

Using DNA Star Software for visualizing the folded RNA structure for a gene

September 28, 2015meghashahdesani Leave a comment

The DNA STAR Software is very useful from genomics point of view. Its various components such as SeqBuilder, GeneQuest, EditSeq, etc. allow you to analyze a gene as well as to create and visualize a folded RNA structure for the same. The following folded RNA structure was created for CYP2C19 gene using the above mentioned software:

For more information on CYP2C19 gene please visit the following sites:

a) omim.org

b) genome.uscs.edu

c) www.ncbi.nlm.nih.gov

Visualizing the pharmaceutical world through bioinformatics-genomics window

August 31, 2015meghashahdesani Leave a comment

PET GENE : CYP2C19 ( CYTOCHROME P450, SUBFAMILY IIC, POLYPEPTIDE19)

Alternative title: Mephenytoin 4-Prime-Hydroxylase

MIM Number: *124020 (AUTOSOMAL LOCI OR PHENOTYPE – entry created before May15,1994)

Cytogenetic Location: 10q23.33

Figure represents CYP2C19 gene obtained from Protein Data Bank based on 1r9o

Description:

It is a very important enzyme in liver, mainly responsible for the metabolism of drugs including anti-convulsants mephenytoin , anti-ulcer drugs like omeprazole, some antidepressants as well as anti-malarial drug proguanil.

Cloning and Expression:

The IIC subfamily contains constitutively expressed genes, phenobarbital induced genes and some sex specific expression genes.
The CYP2C19 is a constitutively expressed gene ( i.e it transcribes continually), which showed a 10-fold interindividual variation in gene expression when RNA was isolated from human liver and tested through Northern blot hybridization; suggesting that it is not greatly influenced by environmental factors.

Drug Metabolism and Allelic Variants:

Since CYP2C19 gene is responsible for drug metabolism in the liver, a slight variation or mutation in this gene can lead to formation of allelic variants which cause defect at the site of metabolism of drugs like mephenytoin as well as proguanil.
Wrighton et al. (1993) and of Goldstein et al. (1994) had shown that there is correlation between levels of CYP2C19 protein and microsomal S-mephenytoin 4-prime-hydroxylase activity in human liver.
CYP2C19*2 – Allelic variant formed due to a G-to-A mutation at nucleotide 681 in exon 5 that created an aberrant splice site in CYP2C19 gene. It is the major defect responsible for poor metabolism of S-mephenytoin poor metabolizer (PM) phenotype (609535)
This genetic polymorphism in the metabolism of anticonvulsant drug mephenytoin shows racial heterogeneity with a poor metaboliser phenotype representing 13 to 23% of Oriental populations, but only 2 to 5% of Caucasian populations.
Also it is seen that individuals having S-mephenytoin poor metabolizer (PM) phenotype also have the issue of poor metabolism of Proguanil, a drug used for chemoprophylaxis of malaria. CYP2C19*2 is responsible for poor metabolism of proguanil and is mostly seen in Asian population, which explains the high dose of drug required in these populations during clinical trials as compared to Caucasians.
CYP2C19*2 also lead to poor metabolism of Clopidogrel, a drug used as an anti-platelet therapy. The gene mutation leads to increased residual platelet activity thereby decreasing the efficacy of the drug.
CYP2C19*3 – Allelic variant formed due to G-to-A mutation at nucleotide 636 in exon 4 of the CYP2C19 gene. Just as CYP2C19*2 allele, this too is responsible for poor metabolism of drugs like mephenytoin, proguanil and clopidogrel and is specially found in Japanese and Chinese population.
CYP2C19*4 – An A-to-G mutation in the initiation codon of CYP2C19, resulting in a met1-to-val substitution, expressed specially in Caucasian poor metabolizers (609535).
CYP2C19*5 -A novel C-to-T mutation at nucleotide 1297 in exon 9 of the CYP2C19 gene, resulting in an arg433-to-trp (R433W) substitution in the heme-binding region. it decreases the activity of enzyme towards S-mephenytoin and tolbutamide. Found less frequently in Chinese and Caucasians.

Resource: http://www.medscape.org/viewarticle/416471_2 -Innovative Epilepsy Therapies for the 21st Century – Part 3: Will We/Can We Prescribe by Genotype?

OMIM Phenotype entry for CYP2C19 gene: #609535

Metabolism of the anticonvulsant drug mephenytoin is used as a measure to detect CYP2C19 phenotype in individuals.
During metabolism of mephenytoin, two metabolites are formed S-mephenytoin and R-mephenytoin. Mephenytoin hydroxylation with the help of CYP2C19 results in fast elimination of S-mephenytoin from the body while sustaining R-mephenytoin which is required for clinical action.
In genetic polymorphism carriers of allele of CYP2C19 show poor metabolism of Mephenytoin and thus S-mephenytoin does not get eliminated from the body quickle and instead stays along with the R-enantiomer to cause severe adverse effects.
Thus, in those individuals who are of phenotype poor metabolizer, the dose of the drug needs to be reduced in order to avoid the side effect of accumulating drug inside the body.
Several studies (Andersson et al. (1992),Kaneko et al. (1997),Giusti et al. (2007)) have shown that carriers of allele of CYP2C19 also show poor metabolizing efficacy for drugs like omeprazole, proguanil and clopidogrel.
Also it is observed that the poor metabolizer phenotype percentage is high among Asian espcially Japanese and Chinese population as compared to the Caucasians.
Inheritance : Studies have also confirmed that poor metabolizing capacity of mephenytoin or in other terms the mutation of CYP2C19 is autosomal recessive in nature and can be inherited from generation to generation (Inaba et al. (1986),Kupfer and Preisig (1984)).
The figure shows an example of varying plasma concentrations between CYP2C19 extensive and poor metabolizers and thus explains the need for reduction of dose of drugs in phenotype ppor metabolizers:
Example of phenotype poor metabolizer of Omeprazole

For more information on CYP2C19 gene and its alleles as well as the dosing information for carriers of CYP2C19 allele please visit:https://www.pharmgkb.org/gene/PA124?tabType=tabVip#tabview=tab0&subtab=31