Data Analysis Tools for DNA Microarrays Publisher: Chapman & Hall/CRC Press. |
![]() |
| Overview | Contents | Errata | Opinions |
| Table of Contents |
| Companion CD Contents |
|
Table of Contents |
|
0.0.1 |
Audience and prerequisites |
viii |
|
|
0.0.2 |
Aims and contents |
viii |
|
|
|
|
|
|
|
1 |
Introduction |
3 |
|
|
1.1 |
Bioinformatics . an emerging discipline |
3 |
|
|
1.2 |
The building blocks of genomic information |
5 |
|
|
1.3 |
Expression of genetic information |
9 |
|
|
1.4 |
The need for microarrays |
13 |
|
|
1.5 |
Summary |
14 |
|
|
|
|
|
|
|
2 |
Microarrays |
15 |
|
|
2.1 |
Microarrays . tools for gene expression analysis |
15 |
|
|
2.2 |
Fabrication of microarrays |
16 |
|
|
2.2.1 |
cDNA microarrays |
17 |
|
|
2.2.2 |
In situ synthesis |
17 |
|
|
2.2.3 |
A brief comparison of cDNA and oligonucleotide technologies |
22 |
|
|
2.3 |
Applications of microarrays |
22 |
|
|
2.4 |
Challenges in using microarrays in gene expression studies |
23 |
|
|
2.5 |
Sources of variability |
28 |
|
|
2.6 |
Summary |
32 |
|
|
|
|
|
|
| 3 | Image processing | 33 | |
|
3.1 |
Introduction |
33 | |
|
3.2 |
Basic elements of digital imaging |
33 | |
|
3.3 |
Microarray image processing |
38 | |
|
3.4 |
Image processing of cDNA microarrays |
42 | |
|
3.4.1 |
Spot nding |
42 | |
|
3.4.2 |
Image segmentation |
43 | |
|
3.4.3 |
Quantication |
50 | |
|
3.4.4 |
Spot quality assessment |
53 | |
|
3.5 |
Image processing of Affymetrix arrays |
55 | |
|
3.6 |
Summary |
58 | |
|
|
|
|
|
| 4 | Elements of statistics |
61 |
|
|
4.1 |
Introduction |
61 |
|
|
4.2 |
Some basic terms |
62 |
|
|
4.3 |
Elementary statistics |
64 |
|
|
4.3.1 |
Measures of central tendency: mean, mode and median | 64 | |
|
4.3.2 |
Measures of variability | 68 | |
|
4.3.3 |
Some interesting data manipulations | 70 | |
|
4.3.4 |
Covariance and correlation | 71 | |
|
4.4 |
Probabilities | 77 | |
|
4.4.1 |
Computing with probabilities | 80 | |
|
4.5 |
Bayes' theorem | 84 | |
|
4.6 |
Probability distributions |
86 | |
|
4.6.1 |
Discrete random variables |
87 | |
|
4.6.2 |
Binomial distribution |
89 | |
|
4.6.3 |
Continuous random variables |
94 | |
|
4.6.4 |
The normal distribution | 96 | |
|
4.6.5 |
Using a distribution |
99 | |
|
4.7 |
Central limit theorem |
102 | |
|
4.8 |
Are replicates useful? | 104 | |
|
4.9 |
Summary |
106 | |
|
4.10 |
Solved problems |
106 | |
|
4.11 |
Exercises |
107 | |
5 |
Statistical hypothesis testing |
109 |
|
|
5.1 |
Introduction |
109 | |
|
5.2 |
The framework | 109 | |
|
5.3 |
Hypothesis testing and signicance |
112 | |
|
5.3.1 |
One-tail testing |
113 | |
|
5.3.2 |
Two-tail testing |
118 | |
|
5.4 |
I do not believe God does not exist | 120 | |
|
5.5 |
An algorithm for hypothesis testing | 121 | |
|
5.6 |
Errors in hypothesis testing |
122 | |
|
5.7 |
Summary |
126 | |
|
5.8 |
Solved problems |
126 | |
|
6 |
Classical approaches to data analysis |
129 |
|
|
6.1 |
Introduction |
129 | |
|
6.2 |
Tests involving a single sample |
130 | |
|
6.2.1 |
Tests involving the mean. The t distribution |
130 | |
|
6.2.2 |
Choosing the number of replicates |
134 | |
|
6.2.3 |
Tests involving the variance (σ2). The chi-square distribution |
136 | |
|
6.2.4 |
Condence intervals for standard deviation |
139 | |
|
6.3 |
Tests involving two samples |
140 | |
|
6.3.1 |
Comparing variances. The F distribution |
140 | |
|
6.3.2 |
Comparing means |
144 | |
|
6.3.3 |
Condence intervals for the difference of means µ1 - µ2 |
149 | |
|
6.4 |
Summary |
150 | |
|
6.5 |
Exercises |
153 | |
7 |
Analysis of Variance - ANOVA |
155 | |
|
7.1 |
Introduction |
155 | |
|
7.1.1 |
Problem denition and model assumptions |
155 | |
|
7.1.2 |
The .dot. notation | 158 | |
|
7.2 |
One-way ANOVA |
159 | |
|
7.2.1 |
One-way Model I ANOVA |
159 | |
|
7.2.2 |
One-way Model II ANOVA |
166 | |
|
7.3 |
Two-way ANOVA |
169 | |
|
7.3.1 |
Randomized complete block design ANOVA |
170 | |
|
7.3.2 |
Comparison between one-way ANOVA and randomized block design ANOVA |
172 | |
|
7.3.3 |
Some examples | 174 | |
|
7.3.4 |
Factorial design two-way ANOVA |
178 | |
|
7.3.5 |
Data analysis plan for factorial design ANOVA | 182 | |
|
7.3.6 |
Reference formulae for factorial design ANOVA | 183 | |
|
7.4 |
Quality control |
183 | |
|
7.5 |
Summary |
186 | |
|
7.6 |
Exercises |
187 | |
|
8 |
Experiment design |
189 |
|
|
8.1 |
The concept of experiment design | 189 | |
|
8.2 |
Comparing varieties |
190 | |
|
8.3 |
Improving the production process |
192 | |
|
8.4 |
Principles of experimental design |
193 | |
|
8.4.1 |
Replication |
194 | |
|
8.4.2 |
Randomization |
196 | |
|
8.4.3 |
Blocking |
197 | |
|
8.5 |
Guidelines for experimental design |
198 | |
|
8.6 |
A short synthesis of statistical experiment designs | 200 | |
|
8.6.1 |
The xed effect design | 200 | |
|
8.6.2 |
Randomized block design |
201 | |
|
8.6.3 |
Balanced incomplete block design |
201 | |
|
8.6.4 |
Latin square design |
202 | |
|
8.6.5 |
Factorial design |
203 | |
|
8.6.6 |
Confounding in the factorial design |
204 | |
|
8.7 |
Some microarray specic experiment designs | 205 | |
|
8.7.1 |
The Jackson Lab approach | 206 | |
|
8.7.2 |
Ratios and ip-dye experiments |
208 | |
|
8.7.3 |
Reference design vs. loop design |
210 | |
|
8.8 |
Summary |
213 | |
9 |
Multiple comparisons |
215 |
|
|
9.1 |
Introduction |
215 | |
|
9.2 |
The problem of multiple comparisons |
215 | |
|
9.3 |
A more precise argument |
220 | |
|
9.4 |
Corrections for multiple comparisons |
222 | |
|
9.4.1 |
The Sidak correction | 222 | |
|
9.4.2 |
The Bonferroni correction | 223 | |
|
9.4.3 |
Holm's step-wise correction |
224 | |
|
9.4.4 |
The false discovery rate (FDR) | 225 | |
|
9.4.5 |
Permutation correction |
225 | |
|
9.4.6 |
Signicance analysis of microarrays (SAM) |
227 | |
|
9.4.7 |
On permutations based methods | 228 | |
|
9.5 |
Summary |
229 | |
|
10 |
Analysis and visualization tools |
231 |
|
|
10.1 |
Introduction |
231 | |
|
10.2 |
Box plots | 231 | |
|
10.3 |
Gene pies | 232 | |
|
10.4 |
Scatter plots |
233 | |
|
10.4.1 |
Scatter plot limitations |
237 | |
|
10.4.2 |
Scatter plot summary |
238 | |
|
10.5 |
Histograms |
239 | |
|
10.5.1 |
Histograms summary |
244 | |
|
10.6 |
Time series | 245 | |
|
10.7 |
Principal component analysis (PCA) |
246 | |
|
10.7.1 |
PCA limitations | 257 | |
|
10.7.2 |
PCA summary | 257 | |
|
10.8 |
Independent component analysis (ICA) |
259 | |
|
10.9 |
Summary |
260 | |
|
11 |
Cluster analysis |
263 |
|
|
11.1 |
Introduction |
263 | |
|
11.2 |
Distance metric |
264 | |
|
11.2.1 |
Euclidean distance |
265 | |
|
11.2.2 |
Manhattan distance |
266 | |
|
11.2.3 |
Chebychev distance |
268 | |
|
11.2.4 |
Angle between vectors |
268 | |
|
11.2.5 |
Correlation distance |
269 | |
|
11.2.6 |
Squared Euclidean distance |
270 | |
|
11.2.7 |
Standardized Euclidean distance |
270 | |
|
11.2.8 |
Mahalanobis distance |
272 | |
|
11.2.9 |
Minkowski distance |
273 | |
|
11.2.10 |
When to use what distance | 273 | |
|
11.2.11 |
A comparison of various distances | 275 | |
|
11.3 |
Clustering algorithms |
276 | |
|
11.3.1 |
k-means clustering |
281 | |
|
11.3.2 |
Hierarchical clustering |
288 | |
|
11.3.3 |
Kohonen maps or self-organizing feature maps (SOFM) |
297 | |
|
11.4 |
Summary |
305 | |
12 |
Data pre-processing and normalization |
309 |
|
|
12.1 |
Introduction |
309 | |
|
12.2 |
General pre-processing techniques |
309 | |
|
12.2.1 |
The log transform | 309 | |
|
12.2.2 |
Combining replicates and eliminating outliers |
311 | |
|
12.2.3 |
Array normalization |
313 | |
|
12.3 |
Normalization issues specic to cDNA data |
318 | |
|
12.3.1 |
Background correction |
318 | |
|
12.3.2 |
Other spot level pre-processing |
320 | |
|
12.3.3 |
Color normalization |
320 | |
|
12.4 |
Normalization issues specic to Affymetrix data |
329 | |
|
12.4.1 |
Background correction |
329 | |
|
12.4.2 |
Signal calculation |
330 | |
|
12.4.3 |
Detection calls |
334 | |
|
12.4.4 |
Relative expression values |
335 | |
|
12.5 |
Other approaches to the normalization of Affymetrix data |
336 | |
|
12.6 |
Useful pre-processing and normalization sequences |
336 | |
|
12.7 |
Summary |
338 | |
|
12.8 |
Appendix |
339 | |
|
12.8.1 |
A short primer on logarithms | 339 | |
|
13 |
Methods for selecting differentially regulated genes |
341 |
|
|
13.1 |
Introduction |
341 | |
|
13.2 |
Criteria |
342 | |
|
13.3 |
Fold change | 343 | |
|
13.3.1 |
Description |
343 | |
|
13.3.2 |
Characteristics |
345 | |
|
13.4 |
Unusual ratio |
347 | |
|
13.4.1 |
Description |
347 | |
|
13.4.2 |
Characteristics |
348 | |
|
13.5 |
Hypothesis testing, corrections for multiple comparisons and resampling |
349 | |
|
13.5.1 |
Description |
349 | |
|
13.5.2 |
Characteristics |
350 | |
|
13.6 |
ANOVA |
351 | |
|
13.6.1 |
Description |
351 | |
|
13.6.2 |
Characteristics |
351 | |
|
13.7 |
Noise sampling |
352 | |
|
13.7.1 |
Description |
352 | |
|
13.7.2 |
Characteristics |
353 | |
|
13.8 |
Model based maximum likelihood estimation methods |
354 | |
|
13.8.1 |
Description |
354 | |
|
13.8.2 |
Characteristics |
357 | |
|
13.9 |
Affymetrix comparison calls |
358 | |
|
13.10 |
Other methods |
359 | |
|
13.11 |
Summary |
360 | |
|
13.12 |
Appendix |
361 | |
|
13.12.1 |
A comparison of the noise sampling method with the full blown ANOVA approach |
361 | |
|
14 |
Functional analysis and biological interpretation of microarray data |
363 |
|
|
14.1 |
Introduction |
363 | |
|
14.2 |
The Gene Ontology |
364 | |
|
14.2.1 |
The need for an ontology | 364 | |
|
14.2.2 |
What is the Gene Ontology (GO)? | 364 | |
|
14.2.3 |
What does GO contain? | 365 | |
|
14.2.4 |
Access to GO |
366 | |
|
14.3 |
Other related resources |
367 | |
|
14.4 |
Translating lists of differentially regulated genes into biological knowledge |
367 | |
|
14.4.1 |
Statistical approaches |
369 | |
|
14.5 |
Onto-Express |
372 | |
|
14.5.1 |
Implementation |
372 | |
|
14.5.2 |
Graphical input interface description |
373 | |
|
14.5.3 |
Some real data analyses | 376 | |
|
14.5.4 |
Interpretation of the functional analysis results |
381 | |
|
14.6 |
Summary |
382 | |
|
15 |
Focused microarrays . comparison and selection |
383 |
|
|
15.1 |
Introduction |
383 | |
|
15.2 |
Criteria for array selection |
385 | |
|
15.3 |
Onto-Compare |
385 | |
|
15.4 |
Some comparisons | 387 | |
|
15.5 |
Summary |
391 | |
|
16 |
Commercial applications |
393 |
|
|
16.1 |
Introduction |
393 | |
|
16.2 |
Signicance testing among groups using GeneSight |
395 | |
|
16.2.1 |
Problem description |
395 | |
|
16.2.2 |
Experiment design |
396 | |
|
16.2.3 |
Data analysis |
396 | |
|
16.2.4 |
Conclusion |
407 | |
|
16.3 |
Statistical analysis of microarray data using S-PLUS and |
409 | |
|
16.3.1 |
Experiment design |
410 | |
|
16.3.2 |
Data preparation and exploratory data analysis |
410 | |
|
16.3.3 |
Differential expression analysis |
410 | |
|
16.3.4 |
Clustering and prediction |
411 | |
|
16.3.5 |
Analysis summaries, visualization and annotation of results |
411 | |
|
16.3.6 |
S+ArrayAnalyzer example: Swirl Zebrash experiment |
412 | |
|
16.3.7 |
Summary |
415 | |
|
16.4 |
SAS software for genomics |
416 | |
|
16.4.1 |
SAS research data management |
416 | |
|
16.4.2 |
SAS microarray solution |
418 | |
|
16.5 |
Spotre's DecisionSite |
421 | |
|
16.5.1 |
Introduction | 421 | |
|
16.5.2 |
Experiment description |
421 | |
|
16.5.3 |
Microarray data access |
422 | |
|
16.5.4 |
Data transformation |
423 | |
|
16.5.5 |
Filtering and visualizing gene expression data |
424 | |
|
16.5.6 |
Finding gene expression patterns |
427 | |
|
16.5.7 |
Using clustering and data reduction techniques to isolate |
428 | |
|
16.5.8 |
Comparing sample groups |
431 | |
|
16.5.9 |
Using Portfolio Lists to isolate signicant genes |
432 | |
|
16.5.10 |
Summary |
434 | |
|
16.6 |
Summary | 436 | |
|
17 |
The road ahead |
437 |
|
|
17.1 |
What next? | 437 | |
|
17.2 |
Molecular diagnosis |
437 | |
|
17.3 |
Gene regulatory networks | 439 | |
|
17.4 |
Conclusions |
441 | |
|
References |
443 |
||
|
|
|