Suplementary Material for manuscript submitted to BioEssays

From Genome to Phenome:
A back to the future of gene expression analysis

Antonio Reverter, Wes Barris, Sean McWilliam, Greg Harper and Brian Dalrymple

Bioinformatics Group, CSIRO Livestock Industries
306 Carmody Rd., St. Lucia, QLD 4067, Australia



  1. Abstract.

  2. Table: List of tissues, number of SAGE libraries and genes used in the study (MS Word).

  3. Directory of Extreme Genes. Directory with extreme (ie. tissue-specific) genes for each of the 41 tissues included in the analyses. Each file is an ASCII text file with varying number of lines from 0 to 239. Each file contains 4 fields: 1. Gene ID; 2. Average expression across all tissues (in Log of transcripts for 200,000); 3. STD of expression across all tissues; and 4. Average expression in the tissue in question. NB: These genes (1,423 in total) were removed for subsequent analyses.

  4. Table: REML Solutions Results from the restricted maximum likelihood estimation analysis of Model [1] performed using the VCE software (ASCII text file. 104 Lines).

  5. Table: List of the 16,348 genes included in the analyses and sorted by differential expression (DE) from most underexpressed to most overexpressed in cancer. For each condition (cancer and normal), four values are provided: number of tissues (T), number of libraries (L), average tags per 200,000 (tp2), and t-statistic as computed from the BLUP difference between cancer and normal in the gene by condition random interaction (Microsoft Excel file).

  6. Figure: Density and Clusters Empirical density for measures of differential expression and the posterior probability of each value belonging to each cluster. The density has no ordinate scale, but the total area under the curve corresponds to probability 1 and individual densities are drawn proportionally. The lines for clusters 1, 2 and 3 represented posterior probabilities for three classes of DE genes. Cluster 1, extreme DE genes; cluster 2, intermediate DE genes; cluster 3, genes with no differential expression when compared between the cancerous and the normal state. (PNG file).

  7. Table: Gene expression correlations among the 41 tissues (Microsoft Excel file).

  8. Figure: Heat map of the tissue to tissue correlation matrix. Thick lines separate cancerous from normal tissues. The spectrum goes from blue (correlation <= -0.45) to white (-0.05 < correlation <= 0.05) to red (correlation > 0.45).

  9. Modules Figure: Modules of Co-Expression: Of the top 100 genes, 73 had functional annotation in the Cancer Module Map (Segal et al. 2004; Nat. Genet. 36, 1090-1098) in a total of 210 modules, 87 of which contained at least 3 genes. This figure depicts the circular analysis of the 87 modules with their number of genes indicating the radius of the circles and the average expression (in terms of up- or down-regulation in cancer tissue) as the up- or down- distance to the horizontal zero axis. The number inside the circles indicate the Module Number and the thickness of the lines connecting the circles indicates the number of genes in common.

  10. Modules Table: Modules of Co-Expression: List of the 87 modules from the Cancer Module Map represented among our differentially expressed (DE) genes. The list includes the number of DE genes, the rank of the module in terms of up- or down- regulation due to cancer, and the parent and children modules (Microsoft Excel file).

  11. Gene to Gene Correlation Matrix: Correlation matrix for a subset of the top 100 cancer genes. Thick lines indicate blocks: A for extracellular matrix; B for nucleus and cell progression; C for actin cytoskeleton; D for fatty acid metabolism and E for glutamine/glutathioine/oxidative. stress (png file).

  12. Visible Human Distance Table: Table of distances across organs obtained from exploring the relevant anatomic images from The Visible Human Project. (Word document).

  13. Visible Human Distance Figure: Comparison Visual vs Normal vs Cancer: Location of two-dimensional coordinates for 12 organs (brain, retina, spinal, heart, thyroid, prostate, liver lung, stomach, kidney, pancreas and colon).

  14. Permutation Source Code: Fortran 90 code to perform permutation test to the coordinates resulting from the distances across organs obtained from exploring the relevant anatomic images from The Visible Human Project. Note: The code contain the 3-dimensional coordinates for each organ resulting from multidimensional scaling (ASCII text file. 244 Lines).

  15. A Human Through Alien Eyes: Drawing by Ana Fonollosa Gonzalez of a Human being as seen by an Alien (JPG Picture).


    NOTE: The following files are BIG

  16. Data File: Original (entire) data file containing 1,782,189 records (rows) and 5 columns as follows: 1. Gene ID; 2. Tissue; 3. SAGE library; 4. Size of library in transcripts; and 5. Gene expression in transcripts per 200 thousand (gzip ASCII text compressed file, 9.7Mb!).

  17. Mixtures Table: Results from mixtures of distributions analyses (model-based clustering) performed using the EMMIX software to identify DE genes between cancer and normal tissue (ASCII text file. 55,991 Lines!).