Contents - Data Analysis Tools for DNA Microarrays



Data Analysis Tools for DNA Microarrays

Sorin Draghici

Publisher: Chapman & Hall/CRC Press
ISBN: 1584883154
Order from Chapman & Hall /CRC Press.
Related Research.


Overview Contents Errata Opinions


Table of Contents
Companion CD Contents

Table of Contents

Elementary statistics

       

0.0.1

Audience and prerequisites

viii

0.0.2

Aims and contents

viii

 

 

 

 

1

Introduction

3

1.1

Bioinformatics . an emerging discipline

3

1.2

The building blocks of genomic information

5

1.3

Expression of genetic information

9

1.4

The need for microarrays

13

1.5

Summary

14

 

 

 

 

Microarrays

15

2.1

Microarrays . tools for gene expression analysis

15

2.2

Fabrication of microarrays

16

2.2.1

cDNA microarrays

17

2.2.2

In situ synthesis

17

2.2.3

A brief comparison of cDNA and oligonucleotide technologies

22

2.3

Applications of microarrays

22

2.4

Challenges in using microarrays in gene expression studies

23

2.5

Sources of variability

28

2.6

Summary

32

 

 

 

 

3 Image processing 33

3.1

Introduction

33

3.2

Basic elements of digital imaging

33

3.3

Microarray image processing

38

3.4

Image processing of cDNA microarrays

42

3.4.1

Spot nding

42

3.4.2

Image segmentation

43

3.4.3

Quantication

50

3.4.4

Spot quality assessment

53

3.5

Image processing of Affymetrix arrays

55

3.6

Summary

58

 

 

 

4 Elements of statistics

61

4.1

Introduction

61

4.2

Some basic terms

62

4.3

Elementary statistics

  64

4.3.1

Measures of central tendency: mean, mode and median   64

4.3.2

Measures of variability   68

4.3.3

Some interesting data manipulations   70

4.3.4

Covariance and correlation   71

4.4

Probabilities   77

4.4.1

Computing with probabilities 80

4.5

Bayes' theorem 84

4.6


Probability distributions
86

4.6.1


Discrete random variables
87

4.6.2


Binomial distribution
89

4.6.3


Continuous random variables
94

4.6.4

The normal distribution 96

4.6.5


Using a distribution
99

4.7


Central limit theorem
102

4.8

Are replicates useful? 104

4.9


Summary
106

4.10


Solved problems
106

4.11


Exercises
107
       

5

Statistical hypothesis testing


109

 5.1


Introduction
109

5.2

The framework 109

5.3


Hypothesis testing and signicance
112

5.3.1


One-tail testing
113

5.3.2


Two-tail testing
118

5.4

I do not believe God does not exist 120

5.5

An algorithm for hypothesis testing 121

5.6


Errors in hypothesis testing
122

5.7


Summary
126

5.8


Solved problems
126
       

  6

Classical approaches to data analysis


129

6.1


Introduction
129

6.2


Tests involving a single sample
130

6.2.1


Tests involving the mean. The t distribution
130

6.2.2


Choosing the number of replicates
134

6.2.3


Tests involving the variance (σ2). The chi-square distribution
136

6.2.4


Condence intervals for standard deviation
139

6.3


Tests involving two samples
140

6.3.1


Comparing variances. The F distribution
140

6.3.2


Comparing means
144

6.3.3


Condence intervals for the difference of means
µ1 - µ2
149

6.4


Summary
150

6.5


Exercises
153
       

7

Analysis of Variance - ANOVA
 155

7.1

 Introduction
155

7.1.1

 
Problem denition and model assumptions
155

7.1.2

  The .dot. notation 158

7.2

 
One-way ANOVA
159

7.2.1

 
One-way Model I ANOVA
159

7.2.2

 
One-way Model II ANOVA
166

7.3

 
Two-way ANOVA
169

7.3.1

 
Randomized complete block design ANOVA
170

7.3.2

 

Comparison between one-way ANOVA and randomized block design ANOVA

172

7.3.3

  Some examples 174

7.3.4

 
Factorial design two-way ANOVA
178

7.3.5

   Data analysis plan for factorial design ANOVA 182

7.3.6

   Reference formulae for factorial design ANOVA 183

7.4

 
Quality control
183

7.5

 
Summary
186

7.6

 
Exercises
187
       

8

 
Experiment design

189

8.1

  The concept of experiment design 189

8.2

 
Comparing varieties
190

8.3

 
Improving the production process
192

8.4

 
Principles of experimental design
193

8.4.1

 
Replication
194

8.4.2

 
Randomization
196

8.4.3

 
Blocking
197

8.5

 
Guidelines for experimental design
198

8.6

  A short synthesis of statistical experiment designs 200

8.6.1

  The xed effect design 200

8.6.2

 
Randomized block design
201

8.6.3

 
Balanced incomplete block design
201

8.6.4

 
Latin square design
202

8.6.5

 
Factorial design
203

8.6.6

 
Confounding in the factorial design
204

8.7

  Some microarray specic experiment designs 205

8.7.1

  The Jackson Lab approach 206

8.7.2

 
Ratios and ip-dye experiments
208

8.7.3

 
Reference design vs. loop design
210

8.8

 
Summary
213
       

9
 
Multiple comparisons

215

9.1

 
Introduction
215

9.2

  The
problem of multiple comparisons
215

9.3

  A
more precise argument
220

9.4

 
Corrections for multiple comparisons
222

9.4.1

  The Sidak correction 222

9.4.2

  The Bonferroni correction 223

9.4.3

 
Holm's step-wise correction
224

9.4.4

  The false discovery rate (FDR) 225

9.4.5

 
Permutation correction
225

9.4.6

 
Signicance analysis of microarrays (SAM)
227

9.4.7

  On permutations based methods 228

9.5

 
Summary
229
       

10

 
Analysis and visualization tools

231

10.1

 
Introduction
231

10.2

  Box plots 231

10.3

  Gene pies 232

10.4

 
Scatter plots
233

10.4.1

 
Scatter plot limitations
237

10.4.2

 
Scatter plot summary
238

10.5

 
Histograms
239

10.5.1

 
Histograms summary
244

10.6

  Time series 245

10.7

 
Principal component analysis (PCA)
246

10.7.1

  PCA limitations 257

10.7.2

  PCA summary 257

10.8

 
Independent component analysis (ICA)
259

10.9

 
Summary
260
       

11

 
Cluster analysis

263

11.1

 
Introduction
263

11.2

 
Distance metric
264

11.2.1

 
Euclidean distance
265

11.2.2

 
Manhattan distance
266

11.2.3

 
Chebychev distance
268

11.2.4

 
Angle between vectors
268

11.2.5

 
Correlation distance
269

11.2.6

 
Squared Euclidean distance
270

11.2.7

 
Standardized Euclidean distance
270

11.2.8

 
Mahalanobis distance
272

11.2.9

 
Minkowski distance
273

11.2.10

  When to use what distance 273

11.2.11

  A comparison of various distances 275

11.3

 
Clustering algorithms
276

11.3.1

 
k-means clustering
281

11.3.2

 
Hierarchical clustering
288

11.3.3

 
Kohonen maps or self-organizing feature maps (SOFM)
297

11.4

 
Summary
305
       

12
 
Data pre-processing and normalization

309

12.1

 
Introduction
309

12.2

 
General pre-processing techniques
309

12.2.1

  The log transform 309

12.2.2

 
Combining replicates and eliminating outliers
311

12.2.3

 
Array normalization
313

12.3

 
Normalization issues specic to cDNA data
318

12.3.1

 
Background correction
318

12.3.2

 
Other spot level pre-processing
320

12.3.3

 
Color normalization
320

12.4

 
Normalization issues specic to Affymetrix data
329

12.4.1

 
Background correction
329

12.4.2

 
Signal calculation
330

12.4.3

 
Detection calls
334

12.4.4

 
Relative expression values
335

12.5

 
Other approaches to the normalization of Affymetrix data
336

12.6

 
Useful pre-processing and normalization sequences
336

12.7

 
Summary
338

12.8

 
Appendix
339

12.8.1

  A short primer on logarithms 339
       

13

 
Methods for selecting differentially regulated genes

341

13.1

 
Introduction
341

13.2

 
Criteria
342

13.3

  Fold change 343

13.3.1

 
Description
343

13.3.2

 
Characteristics
345

13.4

 
Unusual ratio
347

13.4.1

 
Description
347

13.4.2

 
Characteristics
348

13.5

 
Hypothesis testing, corrections for multiple comparisons and resampling
349

13.5.1

 
Description
349

13.5.2

 
Characteristics
350

13.6

 
ANOVA
351

13.6.1

 
Description
351

13.6.2

 
Characteristics
351

13.7

 
Noise sampling
352

13.7.1

 
Description
352

13.7.2

 
Characteristics
353

13.8

 
Model based maximum likelihood estimation methods
354

13.8.1

 
Description
354

13.8.2

 
Characteristics
357

13.9

 
Affymetrix comparison calls
358

13.10

 
Other methods
359

13.11

 
Summary
360

13.12

 
Appendix
361

13.12.1

 

A comparison of the noise sampling method with the full blown ANOVA approach

361
       

14

 
Functional analysis and biological interpretation of microarray data

363

14.1

 
Introduction
363

14.2

  The
Gene Ontology
364

14.2.1

  The need for an ontology 364

14.2.2

  What is the Gene Ontology (GO)? 364

14.2.3

  What does GO contain? 365

14.2.4

 
Access to GO
366

14.3

 
Other related resources
367

14.4

 
Translating lists of differentially regulated genes into biological knowledge
367

14.4.1

 
Statistical approaches
369

14.5

 
Onto-Express
372

14.5.1

 
Implementation
372

14.5.2

 
Graphical input interface description
373

14.5.3

  Some real data analyses 376

14.5.4

 
Interpretation of the functional analysis results
381

14.6

 
Summary
382
       

15

 
Focused microarrays . comparison and selection

383

15.1

 
Introduction
383

15.2

 
Criteria for array selection
385

15.3

 
Onto-Compare
385

15.4

  Some comparisons 387

15.5

 
Summary
391
       

16

 
Commercial applications

393

16.1

 
Introduction
393

16.2

 
Signicance testing among groups using GeneSight
395

16.2.1

 
Problem description
395

16.2.2

 
Experiment design
396

16.2.3

  Data
analysis
396

16.2.4

 
Conclusion
407

16.3

 

Statistical analysis of microarray data using S-PLUS and
Insightful ArrayAnalyzer

409

16.3.1

 
Experiment design
410

16.3.2

  Data
preparation and exploratory data analysis
410

16.3.3

 
Differential expression analysis
410

16.3.4

 
Clustering and prediction
411

16.3.5

 
Analysis summaries, visualization and annotation of results
411

16.3.6

 
S+ArrayAnalyzer example: Swirl Zebrash experiment
412

16.3.7

 
Summary
415

16.4

  SAS
software for genomics
416

16.4.1

  SAS
research data management
416

16.4.2

  SAS
microarray solution
418

16.5

 
Spotre's DecisionSite
421

16.5.1

   Introduction 421

16.5.2

   Experiment
description
421

16.5.3

 
Microarray data access
422

16.5.4

  Data
transformation
423

16.5.5

 
Filtering and visualizing gene expression data
424

16.5.6

 
Finding gene expression patterns
427

16.5.7

 

Using clustering and data reduction techniques to isolate
group of genes

428

16.5.8

 
Comparing sample groups
431

16.5.9

 
Using Portfolio Lists to isolate signicant genes
432

16.5.10

 
Summary
434

16.6

   Summary 436
       

17

 
The road ahead

437

17.1

What next? 437

17.2

 
Molecular diagnosis
437

17.3

Gene regulatory networks 439

17.4

 
Conclusions
441
       

References

   
443
       

 


Companion CD Contents
(available with the book)

1
GeneSight

(BioDiscovery) - software suite for microarray data analysis.
2
ImaGene

(BioDiscovery) - software for Image Processing of microarray
slides

3
S-Plus

(Insightful) - software suite for statistical analysis
4
ArrayAnalyzer

(Insightful) - software suite for microarray data analysis

counter create hit