413 State Hall, Department of Computer Science, Wayne State University, Detroit MI 48202
Phone: (313) 577-5070 Fax: (313) 577-6868

Link to Dr. Sorin Draghici's home page

Onto Tools FAQ

General FAQ

1. Which articles should I cite in my paper when I use Onto-Tools?

Answer: Please cite appropriate publications from the following list:

  1. PTarca, Adi Laurentiu, Draghici, Sorin, Khatri, Purvesh, Hassan, Sonia S, Mittal, Pooja, Kim, Jung-Sun, Kim, Chong Jai, Kusanovic, Juan Pedro, and Romero, Roberto: A novel signaling pathway impact analysis., Bioinformatics 25(1), W75-W82, January 2009.

  2. Draghici, Sorin, Tarca, Adi L, Yu, Longfei, Ethier, Stephen, and Romero, Roberto: KUTE-BASE: storing, downloading and exporting MIAME-compliant microarray experiments in minutes rather than hours., Bioinformatics 24(5), W738-W740, March 2008.

  3. Draghici, Sorin, Khatri, Purvesh, Tarca, Adi Laurentiu, Amin, Kashyap, Done, Arina, Voichita, Calin, Georgescu, Constantin, and Romero, Roberto: A systems biology approach for pathway level analysis., Genome Res 17(10), W1537-W1545, October 2007.

  4. Purvesh Khatri, Calin Voichita, Khalid Kattan, Nadeem Ansari, Avani Khatri, Constantin Georgescu, Adi L. Tarca, Sorin Draghici Onto-Tools: New Additions and Improvements in 2006. Nucleic Acids Research, 35, W206-W211, July 2007.

  5. Sorin Draghici, Sivakumar Sellamuthu, Purvesh Khatri. Babel's tower revisited: a universal resource for cross-referencing across annotation databases. Bioinformatics, 22(23):2934-2939, December 2006.

  6. Purvesh Khatri, Valmik Desai, Adi L. Tarca, Sivakumar Sellamuthu, Derek E Wildman, Roberto Romero, Sorin Draghici. New Onto-Tools: Promoter-Express, nsSNPCounter and Onto-Translate . Nucleic Acids Research, 34, W626-W631, July 2006

  7. Purvesh Khatri, Sivakumar Sellamuthu, Pooja Malhotra, Kashyap Amin, Arina Done and Sorin Draghici.Recent additions and improvements to the Onto-Tools. Nucleic Acids Research, 33(Web Server issue):W762-W765, July 2005.

  8. Purvesh Khatri, Pratik Bhavsar, Gagandeep Bawa and Sorin Draghici.Onto-Tools: an ensemble of web-accessible, ontology-based tools for the functional design and interpretation of high-throughput gene expression experiments. Nucleic Acids Research, 32: W449-W456, July 2004.

  9. Sorin Draghici, Purvesh Khatri, Pratik Bhavsar, Abhik Shah, Stephen Krawetz and Michael A. Tainsky.Onto-Tools, The toolkit of the modern biologist: Onto-Express, Onto-Compare, Onto-Design and Onto-Translate. Nucleic Acids Research, 31(13): 3775-81, July 2003.

  10. Sorin Draghici, Purvesh Khatri, Abhik Shah and Michael Tainsky. Assessing the functional bias of commercial microarrays using the Onto-Compare database. BioTechniques, Microarrays and Cancer: Research and Applications, Suppl:55-61, March 2003.

  11. Sorin Draghici, Purvesh Khatri, Rui P. Martins, G. Charles Ostermeier and Stephen A.Krawetz.Global functional profiling of gene expression. Genomics 81(2):98-104, February 2003.

  12. Purvesh Khatri, Sorin Draghici, G. Charles Ostermeier, Stephen A. Krawetz - Profiling Gene Expression Using Onto-Express. Genomics, 79(2):266-270, February 2002.

2. How do I cite Onto-Tools in my paper?

Answer: Please give a detail description of how a particular Onto-Tool (Onto-Express, Onto-Compare, Onto-Design, Onto-Translate or Onto-Miner), was useful in your research.

The following example can serve as a good guideline. This has been taken from, Matthew A. Ginos, Grier P. Page, Bryan S. Michalowicz, Ketan J. Patel, Sonja E. Volker, Stefan E. Pambuccian, Frank G. Ondrey, George L. Adams, and Patrick M. Gaffney, Identification of a Gene Expression Signature Associated with Recurrent Disease in Squamous Cell, Carcinoma of the Head and Neck, Cancer Res 2004 64: 55-63.

"To more thoroughly characterize sets of functionally related genes differentially expressed between HNSCC and NOM, we used Onto- Express (44, 45) to classify genes according to the following Gene-Ontology (GO) (46) categories: biological process; cellular role; and molecular function. The numbers of genes corresponding to each GO category among the 2890 differentially expressed genes was tallied and compared with the number of genes expected for each GO category based on their representation on the Affymetrix U133A array. Significant differences from the expected were calculated with a two-sided binomial distribution. False discovery rates (47) and Bonferroni adjustments were also calculated based upon the number of GO categories having at least 1 gene in the list of 2890 differentially expressed genes. Table 2 shows all GO functional classes with a Bonferroni-corrected significance of P 0.05, the significance of each class, the false discovery rate for that class, as well as the number of genes corresponding to each GO functional class identified in our differentially expressed gene list. The functional gene groups demonstrating the most significant representation in our set of differentially expressed genes appear under the biological process ontology and map to the inflammatory response, immune response, and epidermal differentiation categories. Genes involved in inflammatory and immune response are highly expressed in subsets of HNSCC and correlate with the presence of tumor infiltrating immune cells (Fig. 1, A and B). In contrast, genes corresponding to the epidermal differentiation category are highly expressed in NOM compared with HNSCC, reflecting the loss of normal epithelial architecture associated with malignant transformation (Fig. 1A). Functional categories significantly represented under the cellular component and molecular function ontologies include genes involved in extracellular matrix functions, integrin complexes, RNA binding, chemokine signaling, and cell adhesion."

(44) Khatri, P., Draghici, S., Ostermeier, G. C., and Krawetz, S. A. Profiling gene expression using Onto-express. Genomics, 79: 266–270, 2002.

(45) Draghici, S., Khatri, P., Martins, R. P., Ostermeier, G. C., and Krawetz, S. A. Global functional profiling of gene expression. Genomics, 81: 98–104, 2003.

3. How do I get the access to Onto-Tools?

Answer: You can register with us here. Your username and password will be emailed to your email id you specified in the registration.

4. How can I get help for using Onto-Tools?

Answer: You can access the online help pages for Onto-Tools. These help pages give a detail description regarding the importance, use, features and results given by each tool.

5. In case of error how can I know what is going wrong?

Answer: We have online troubleshooting options for Onto-Tools. These can be accessed here.

6. Whom can I contact to know more about Onto-Tools?

Answer: You can contact Dr. Sorin Draghici or Dr. Purvesh Khatri to know more about Onto-Tools.

7. Whom can I contact to know more about TAQ?

Answer: You can contact Dr. Sorin Draghici to know more about TAQ.

8. Why does my analysis seem to take infinite time to be completed?

Answer: In order to work correctly, our tools need that an appropriate amount of memory is allocated for the Java Runtime Environment through a correct setting of the Java Plug-in; follow this link to set it up.

Onto-Tools FAQ

In Technical FAQ section we have shared some of the questions asked to us by the users of Onto-Tools and TAQ, which we believe will help other users too.

1. Why do I not see anything on my computer after I login? (Only Onto-Tools help page with "do not close this page" is shown)

Answer: The Onto-Tools run as a Java applet on a client machine. They require that Sun Java Runtime Environment (JRE) version 1.6 or higher is installed on the client machine. The Onto-Tools applet is NOT compatible with Microsoft Java virtual machine. Please make sure that Sun JRE is installed on your computer and is set to be the default JRE on your computer. You can find out whether Sun JRE is installed on your machine or not by going to Start->Control Panel->Java. If you do not see Java in your control panel, you can download it from http://java.sun.com.

If you see Java in the Control Panel and the Onto-Tools still do not work, check that the Java version is 1.6 or higher. You can do this by double clicking the Java icon in Control Panel and, in the "General" tab click the "About..." button. If the Java version is below 6 (build below 1.6), you need to update it: in Control Panel, double click the Java icon and select the "Update" tab in the window that appears. Click the "Update Now" button and follow the instructions to install the updated version of Java.

2. What type of IDs can I use as input?

Answer: The Onto-Tools support a number of different types of IDs as input, including GenBank accession numbers, UniGene cluster IDs, Entrez Gene IDs, gene symbols, Affymetrix probe IDs, etc. The following table shows an example of different types of IDs.

GenBank Accession Number UniGene cluster ID Entrez Gene ID Gene Symbol Affymetrix Probe ID
A00127.1 Hs.325116 4241 MFI2 229086_at
A08695.2 Hs.654611 1124 CHN2 242766_at
AA001791.6 481186 1182 CLCN3 205443_at
AA004795 651939 9223 MAGI1 200873_s_at
AA005018 Hs.144513 23671 TMEFF2 229802_at
AA009569.3 Hs.592136 7150 TOP1 220948_s_at
AA010078.3 Hs.43697 2119 ETV5 211936_at
AA013087 43697 2119 ETV5 213330_s_at
AA015605.5 Hs.4 125 ADH1B 201231_s_at

Onto-Express FAQ

1. Why do I see some genes annotated more than once in Tree View output of Onto-Express.

Answer: One gene can be associated with more than one GO Term (node) and some of these GO Terms can be the parent of another. In the Tree View output of Onto-Express, if one of these parent node is expanded, each category will be analyzed independently and these genes will be taken into consideration in both computations. If the parent node is collapsed, these genes will be there twice (once as associated with the parent and once as associated with the child). However, OE will remove any such duplicates and will consider each gene only once. (Example)

2. Why are there some differences between the results generated from MATLAB using "hygepdf" function and the results generated by Onto-Express?

Answer: The correct function to use in MATLAB to calculate the p-value is, hygecdf(X,M,K,N) (not hygepdf). Secondly, the parameters that are passed to hygepdf function may be incorrect. MATLAB defines hygecdf function as p = hygecdf(X,M,K,N), where p is the probability of drawing upto X of a possible K items in N drawings without replacements from a group of M objects. Please refer to the following example. (Example)

3. Why does Onto-Express generate a GIF image of size 0 bytes when I try to save the results as an image?

Answer: This means that your Java Plug-in is running out of memory while generating the gif image. You will have to increase the Java runtime (JRE) memory for your java Plug-in and restart your browser. Find out how to allocate more memory to JRE here.

4. What does expand/collapse mean in the Onto-Express output?

Answer: The terms "expand" and "collapse" refer to the state of a GO term in the tree-view of Onto-Express. They are related to the DAG nature of GO. In the "expand" mode, OE only considers the genes specifically annotated with the given term, and ignores the genes annotated with the children terms. In the "collapse" mode, for a given term, OE considers the genes annotated with its children terms as also being annotated with the given term.

For instance, let us assume that the term "apoptosis" is a parent of the term "induction of apoptosis". Further, let us assume that a gene A is annotated with the term apoptosis and a gene B is annotated with the term induction of apoptosis. In the expand mode the total number of genes for both terms is 1. However, if a term is collapsed, all genes annotated with the children term are now considered to be annotated with the collapsed term (i.e., the parent term). Hence, if the term apoptosis is collapsed, the gene B, which is annotated with induction of apoptosis (child term), is now considered to be annotated with apoptosis. Therefore, the total number of genes for apoptosis in collapsed mode will be 2.

Pathway-Express FAQ

1. What is the format of the input files for Pathway-Express?

Answer: The format of an input file for Pathway-Express is same as that of Onto-Express. The input file must contain only one ID per line. They can also contain fold changes for each ID. The fold change value for each input ID must be separated by a tab character. The input IDs can either be Affymetrix probe IDs, GenBank accession numbers, Gene symbol or Entrez Gene IDs. Some example input files for Pathway-Express are available here: Using GenBank accession number, Using gene symbols, Using Affymetrix probe IDs

2. Can I submit a list of genes with their p-values from a microarray experiment instead of fold changes?

Answer: You cannot use p-values instead of fold changes. The idea is to capture the amount of change that is happening on the pathway. The p-values have to do with the ratio between the expression level and the variance with which this is measured. In essence, the p-values are influenced by the technology (affy, illumina, etc), number of replicate measurements, etc. (this paper discusses how various technologies provide very different results: http://vortex.cs.wayne.edu/papers/Tigs_article.pdf. The results of the pathway analysis should not be influenced by these factors but only by the amount of change measured for each gene (estimated as accurately as possible, whichever way you think is best). The analysis method also propagates the perturbation of the genes throughout the pathway following the topology that describes how the genes interact. If one gene A represses B immediately downstream, and if A goes up by 2 fold, B should feel this as an inhibitory influence proportional to the amount of change, not proportional to the amount of significance that we have in that change. It makes no sense to propagates p-values.

In essence, you should select a set of DE genes using your best statistical approach. You should do this based on p-values or such not based on fold change (see http://vortex.cs.wayne.edu/papers/Statistical_Intelligence_reprint.pdf or Data analysis tools for DNA microarrays for why fold change is not good). Then you should take that subset of DE genes and analyze it with Pathway Express.

3. What is the maximum number of genes I can submit as input to Pathway-Express?

Answer: It is possible to put a very long list of genes including all genes on the array. However, this may not be the best thing to do. The idea is to test how significantly impacted each pathway is. Our model looks at two factors: - the probability to have that many differentially expressed (DE) genes on the given pathway just by chance - where the DE genes are places on the pathway and how they interact; implicitly this looks at whether the genes are in a coherent group, downstream from each other and interacting with each other or whether they are distributed randomly across the pathway.

By submitting all genes on the array as the DE, you are not helping the testing at all. In essence, each pathway will have all genes as DE, which essentially will provide no information from a probabilistic perspective. Also, since most or all genes will have measured fold changes (even though they are small), you essentially remove the possibility to detect coherent groups of DE genes that may be localized in a given region of the pathway.

4. What does the gamma p-value represent? How is it different from the p-value for each pathway?

Answer: Pathway-Express provides two types of p-values for each pathway: i) p-value obtained using the classical statistics (referred to as classical p-value) and ii) p-value obtained using the impact analysis (referred to as gamma p-value).

This classical p-value appears in the first term in Equation 1 (see paper). It can be obtained using hypergeometric, binomial or any other appropriate statistical distribution. The corrected p-value is the classical p-value corrected for multiple comparisons.

The gamma p-value is the p-value provided by the impact analysis as described in the paper. This is the p-value that one should look at. However, note that there is a tendency to include some false positives if the list of input genes is very small, as discussed briefly in the supplementary materials.

Pathway-Express provides both p-values in order to allow comparison of the results of the classical approach with the results of the impact analysis.

5. How can I save the colored pathway diagrams?

The easiest way to save the colored pathway diagram is to take a screen shot. However, if the screenshot quality is poor, there is another way to save the pathway diagrams.

6. How can I save the results?

Answer: The results shown in the Pathway Express output window can be saved as Tab-Delimited text file, following these instructions.

6. How can I see the GML file saved from the Pathway Express output window?

Answer: We recommend to use one of the following solutions:

These are the instructions to obtain an appropriate graph view with yED:

Dr. Sorin Draghici   -
Calin Voichita -
Michele Donato -

counter create hit