Data Loading...

Wavelet-Based Statistical Analysis of fMRI Data Flipbook PDF

1 UCLA, Ivo Dinov Slide 1 Stat 233 Statistical Methods in Biomedical Imaging Wavelet-Based Statistical Analysis of fMRI

106 Views
18 Downloads
FLIP PDF 4.61MB

Stat 233 Statistical Methods in Biomedical Imaging

fMRI Simulation movie

Wavelet-Based Statistical Analysis of fMRI Data Ivo Dinov http://www.stat.ucla.edu/~dinov

UCLA, Ivo Dinov

Slide 3

Slide 1

Problems with BOLD

Activation Imaging using BOLD

z How localized is the BOLD response to the site of neuronal activity?

z Resting state versus Active state e.g. Finger tapping, word recognition.

Is the signal from draining veins rather than the tissue itself?

z Whole brain scanned in ~3 seconds using a high speed imaging technique (EPI).

z How is the signal change coupled to neuronal activity?

z Perform analysis to detect regions which show a signal increase in response to the stimulus.

Do changes in timing of BOLD responses reliably tell us about changes in timing of neural activity?

Slide 4

UCLA, Ivo Dinov

Slide 5

UCLA, Ivo Dinov

UCLA, Ivo Dinov

Hypotheses vs. Data driven Approaches

fMRI Data Analysis Tools

Hypothesis-driven z Brain Voyager http://www.brainvoyager.de/ z CARET http://brainmap.wustl.edu/resources/caretnew.html z FSL http://www.fmrib.ox.ac.uk/fsl/ z SCIrun http://software.sci.utah.edu/scirun.html z McGill BIC www.bic.mni.mcgill.ca/usr/local/matlab5/toolbox/fmri/

• •

• • • •

z AFNI http://afni.nimh.nih.gov/afni/ z SPM http://www.fil.ion.ucl.ac.uk/spm/ UCLA, Ivo Dinov

Examples: t-tests, correlations, general linear model (GLM) a priori model of activation is suggested data is checked to see how closely it matches components of the model most commonly used approach (e.g., SPM)

Data-driven – Independent Component Analysis (ICA)

•

z LONI Pipeline http://www.loni.ucla.edu

Slide 6

• • •

No prior hypotheses are necessary Multivariate techniques determine the patterns in the data that account for the most variance across all voxels Can be used to validate a model (see if the math comes up with the components you would’ve predicted) Can be inspected to see if there are things happening in your data that you didn’t predict Can be used to identify confounds (e.g., head motion) Need a way to organize the many possible components New and upcoming Slide 7

UCLA, Ivo Dinov

1

Two approaches: Global vs. Regional

Two Approaches: Whole Brain Stats

A. Region-Of-Interest (ROI) approach 1. 2. 3.

A localizer run(s) to find a region ( e.g., show moving rings to find medial temporal MT area) Extract time course information from that region in separate independent runs See if the trends in that region are statistically significant The runs that are used to generate the area are independent from those used to test the hypothesis. Example study: Tootell et al, 1995, Motion Aftereffect Localize “motion area” MT in a run comparing moving vs. stationary rings

Extract time courses from MT in subsequent runs while subjects see illusory motion (motion aftereffect)

B. Whole volume statistical approach 1. Make predictions about what differences you should see if your hypotheses are correct 2. Decide on statistical measures to test for predicted differences (e.g., t-tests, correlations, GLMs) 3. Determine appropriate statistical threshold 4. See if statistical estimates are significant

Statistics available 1. T-test 2. Correlation 3. Frequency-Based (Fourier/Wavelet/Fractal) modeling 4. General Linear Model -overarching statistical model that lets you perform many types of statistical analyses (incl. correlation/regression, ANOVA)

MT

Slide 8

Slide 9

UCLA, Ivo Dinov

Comparing the two approaches

Why do we need statistics?

Region of Interest (ROI) Analyses • Gives you more statistical power because you do not have to correct for the number of comparisons • Hypothesis-driven • ROI is not smeared due to intersubject averaging • Easy to analyze and interpret • Neglects other areas which may play a fundamental role • Popular in North America Whole Brain Analysis • Requires no prior hypotheses about areas involved • Includes entire brain • Can lose spatial resolution with intersubject averaging • Can produce meaningless lists of areas that are difficult to interpret • Depends highly on statistics and threshold selected • Popular in Europe NOTE: Though different experimenters tend to prefer one method over the other, they are NOT mutually exclusive. You can check ROIs you predicted and then check the data for other areas. Slide 10

MR Signal intensities are random -variation caused by scanner (magnet, coil, time) -variation in neurophysiology (area, task, subject) The rest-state scans used against activation tasks in subtraction paradigm setting to correct for some of these effects within the same run. Statistical analyses help us confirm whether the eyeball tests of significance based on visual thresholding are real/correct. Because we do so many comparisons (~106), we need a way to compensate (increase of Type I error). Slide 11

UCLA, Ivo Dinov

Paradigm: Event-related

design to assess between-population differences of amplitude and variance of hemodynamic response to visual stimulus (14 young, 14 old noAD and 13 AD subjects) blank

Look for differences b/w One-Trial & Two-Trial Activation, for a given voxel: Measure average MR signal and SD for each volume in which 1-Trial stimuli were presented (7-8 epochs x 8 volumes/epoch = 56-64 volumes). Measure average MR signal and SD for each volume in which 2-Trial stimuli were presented (56-64 volumes).

Buckner et al. J. Cog. Neurosci. 2000

4 runs per Subject 128 volumes per run 2 (randomly chosen) conditions (always 7+8) 1 Trial = 21.44 sec (8 vol’s) Total 60 Trials/subject

1-trial condition–1.5 second 8Hz B/W checkerboard flicker

To look for OneTrial – TwoTrial Activation, look at the negative tail of the comparison

2-trial condition – 2 visual stimuli 5.36 sec apart 15 Random Trials / Run Run 1 1 8 16 24 32 40 48 56 64 72 80 88 96 104 112 120128

Run 2

1 8 16 24 32 40 48 56 64 72 80 88 96 104 112 120128

Run 3 1 8 16 24 32 40 48 56 64 72 80 88 96 104 112 120128

Run 4 1 8 16 24 32 40 48 56 64 72 80 88 96 104 112 120128

Slide 12

UCLA, Ivo Dinov

P>F

F>P

Stat-Significant Mean difference? Calculate To value. Look up p value for that number of degrees of freedom (df >= 56 x 2 = 112). e.g., For ~112 df To>1.98 → p 3.39 → p < 0.001 Repeat this process 65,536 more times (64x64x16), once for each voxel.

fMRI Signal

16 coronal slices 64x64 in-plane voxels volume time = 2 sec 3 x 3 x 5 mm3 voxels

UCLA, Ivo Dinov

One Approach – Voxel-Based T-tests

fMRI of Young, Nondemented and Demented Adults

Stimulus required a right hand index-finger click

Source: Tootell et al UCLA, Ivo Dinov

1 Trial 2 Trial 1 Trial 2 Trial If p = .001 and 65,536 voxels, 65,536*0.001 = 66 voxels could be significant purely by chance Slide 13

UCLA, Ivo Dinov

2

T-test: Maps

Another Statistical Analysis – Basic Correlation

For each voxel in the brain, we can now color code a map based on the computed t and p values:

• voxels with time course correlated with a reference function • can incorporate hemodynamic response function (HRF) to predict time course more accurately Value of fMRI Signal

We can do this for the positive tail (1 Trial – 2 Trial)

Orange = low significance Yellow = high significance Schmutz or ESP voxels?

56 data points 56 data points

Calculate r

0 1 Ref.Model Value of Predictor 2 Remember r is the proportion of variance accounted for by our predictor, e.g., if r=0.7, r2=0.49Î49% Data

And we can also do this for the negative tail (2 Trial – 1 Trial) Blue = low significance Green = high significance Slide 14

UCLA, Ivo Dinov

For each voxel: • Find the correlation between the predictor and the MR signal • Extract the correlation (r value) and find the corresponding p value (Pearson/Spearman). • Determine whether it is statistically significant • Similar in spirit to a T-test. UCLA, Ivo Dinov Slide 15

Predictor – Hemodynamic Response Function - HRF signal begins to rise soon after stimulus begins

m(x)

time to peak signal peaks 4-6 sec after stimulus begins Stimulus On

Time from Stimulus Onset (sec)

post stimulus undershoot

To find a 1-Trial responsive area, we can correlate the convolved face predictor with each voxel time course

Model

(m*g)(x)

signal suppressed after stimulation ends

% signal change = (point – baseline)/baseline usually 0.5-3%

Data

f(x)

initial dip -more focal and potentially a better measure -somewhat elusive so far, not everyone can find it Slide 16

56 Model points

56 data points

0 1 Model Value of Predictor Data

Slide 17

UCLA, Ivo Dinov

Correlations: Negative Tail

UCLA, Ivo Dinov

Two Main fMRI Designs

Looking for anything that shows a reduced response during 1-Trial condition (compared to both 2-Trial and fixation/rest)

Slide 18

g(x) Kernel

Predictor

Value of fMRI Signal

Percent Change From Baseline

time to rise

Correlation: Incorporating the HRF We can model the expected curve of the data by convolving our predictor with the hemodynamic response function.

UCLA, Ivo Dinov

Block design • Compare long periods (e.g., 16 sec) of one condition with long periods of another • Traditional approach • Most statistically powerful approach • Less dependent on how well you model the HRF (hemo) • Habituation effects?!?

Event-related design • Compare brief trials (1 sec) of 1 condition with brief of another • Newer approach (1997+) • Less statistically powerful but has some advantages • Trials can be well-spaced to allow the MR signal to return to baseline b/w trials (e.g., 21+ sec apart) or closely spaced (e.g., every 2 sec) Slide 19

UCLA, Ivo Dinov

3

General Linear Model for fMRI

Problems with T-tests and Correlation Analyses 1) How do we evaluate runs with different orders? We could average our 4 runs (60 trials) by 1-Trial of 2-Trial stimuli and then do stats comparing the functional differences between the 2 paradigms. There is no way to collapse between orders.

2) If we test more subjects, how can we evaluate all subjects together?

Parse out variance in the voxel’s time course to the contributions of six predictors plus residual noise (what the predictors can’t account for).

As with the single subject runs, we could average all the subjects together (after morphing them into an atlas space) but that still means we have to run all of them in the same order.

3) We can get nice hemodynamic predictors for 1-Trial or 2Trial stimuli, but how can we compare them accurately?

Completeness and Efficiency of Signal Representation – Wavelet functions 1D

Subject Run

+

β4 ×

… +

β5 ×

+

Trial

residuals

Group (AD/Young) +

β6 ×

A Solution: Use General Linear Modeling Slide 20

+

β3 ×

fMRI signal

Examples: Stimulus

+

β2 ×

=

If this predictor is significant, we won’t know if it’s because 1-Trial > 2-Trial OR 1-Trial > Fixation

UCLA, Ivo Dinov

Design Matrix

β1 ×

Adapted from Brain Voyager course slides

ROI Slide 21

UCLA, Ivo Dinov

Features of Natural Images

SOCR Wavelet/Fourier Demo!

z Mostly no global multi-scale self-similarity.

http://socr.stat.ucla.edu/

z Contain both man-made and natural features z Mostly no simple and universal underlying physical or biological processes that generate the patterns in a general image.

Ref. www.math.umn.edu/~jhshen Slide 22

UCLA, Ivo Dinov

Slide 23

UCLA, Ivo Dinov

Efficiency of representation

Fourier Representation Claim: Harmonic waves are bad vision neurons… Because. A typical Fourier neuron is

z David Field (Cornell U, Vision psychologist):

φ = exp(iax ).

To process a simple bright spot

“To discriminate between objects, an

δ (x )in the visual field,

effective transform (representation) encodes each image using the smallest possible number of neurons, chosen from a large pool.”

all such neurons have to respond to it (!) since

δ ,φ

=

∫

δ ( x ) e − ( i π x ) dx Slide 24

= 1 . UCLA, Ivo Dinov

Slide 25

UCLA, Ivo Dinov

4

Marr’s Edge Detection

Asking our own “headtop”…

z Detection of edge contours is a critical ability of human vision (Marr, 1982). Marr and Hildreth (1980) proposed a model for human detection of edges at all scales. This is Marr’s Theory of Zero-Crossings: 2  2

z Psychologists show that visual neurons are spatially organized, and each behaves like a small sensor (receptor) that can respond strongly to spatial changes such as edge contours of objects (Fields, 1990).

x +y  exp − , Gaussian Kernel  2πσ 2σ 2   Laplacian of the Gaussian is : ∂ 2Gσ ∂ 2Gσ ∆Gσ = + = ∂x 2 ∂y 2  x2 + y 2  1  x 2 + y 2  , =− 1− exp − 4 2   2σ 2σ 2  πσ    Edge occurs in I where ( ∆Gσ ∗ I ) = 0. Why?

Gσ =

Slide 26

1

2

Slide 27

UCLA, Ivo Dinov

UCLA, Ivo Dinov

Laplacian of a Gaussian – Why an Edge at Zero-Crossing?

Marr’s Edge Detection z Laplacian of (an isotropic) Gaussian Kernel (at (0,0)) (http://www.dai.ed.ac.uk/HIPR2/flatjavasrc/)

z Uses the second spatial derivatives of an image. z The operator response will be zero in areas where the image has a constant intensity (i.e. where the intensity gradient is zero). z Near a change in image intensity the operator response

will be positive on the darker side, and negative on the lighter side.

z A sharp edge between two regions of different uniform

intensities, the Laplacian of Gaussians response will be:

Gσ =

 x2 + y2   exp − 2  2πσ 2σ 2   1

∆Gσ =

Slide 28

∂ 2Gσ ∂ 2Gσ + ∂x 2 ∂y 2

zero away from the edge, positive to one side of the edge, negative to the other zero on the edge in b/w. Example: Response of 1D Laplacian of Gaussian (σ = 3 pixels) to a 1D step edge. Slide 29

UCLA, Ivo Dinov

Laplacian of a Gaussian – Why an Edge at Zero-Crossing? z A zero crossing detector looks for places where the Laplacian changes sign. Such points often occur around edges in images.

UCLA, Ivo Dinov

Laplacian of a Gaussian – Why an Edge at Zero-Crossing? z Demos: C:\Ivo.dir\UCLA_Classes\Applets.dir\ImageProcessing\LaplacianOfGaussian

z But they also occur at places that are not as easy to associate with edges. Zero crossings always lie on closed contours.

LaplacianOfGaussianDemo.html

z The starting point for the zero crossing detector is an image which has been convolved with a Laplacian of Gaussian filter.

LaplacianDemo.html

z The zero crossings that result are strongly influenced by the size of the (isotropic) Gaussian, change filter-size in applet demo. z As the smoothing is increased then fewer and fewer zero crossing contours will be found, and those that do remain will correspond to features of larger and larger scale in the image. Slide 30

UCLA, Ivo Dinov

Slide 31

UCLA, Ivo Dinov

5

Marr vs. Haar’s edge encoding

A good representation should respect edges z Edges are important features of images.

z Marr’s edge detector is to use second derivative to locate the maxima of the first derivative (which the edge contours pass through).

z A good image representation (or basis) should be capable of providing the edge information easily.

z Haar Basis (1909) encodes the edges into image representation via the first derivative operator (i.e. moving average/difference):

z Edge is a local feature. Local operators, like differentiation, must be incorporated into the representation, as in the coding by the Haar basis.

(... x 2 n , x 2 n +1 ,... )←Haar →  ... a n = x2 n +2x2 n +1 , d n = x2 n −2x2 n +1 ,... 

z Wavelets improve Haar encoding, while respecting the above edge representation principle.

Slide 32

What is a good image representation?

UCLA, Ivo Dinov

Mathematics of wavelet image representation

z Mathematically rigorous (i.e. a clean and stable analysis and synthesis program exists. FT & IFT…). z Having an independent fast digital formulation framework (e.g., FFT, FWT… ).

z Let Ω denote the collection of all images. What is the mathematical structure of Ω ? Suppose that f ∈ Ω is an observed image. Then Ω should be invariant under Euclidean motion of the scanner/imager:

z Capturing the characteristics of the input signals, and thus many existing processing operators (e.g. image indexing, image searching …) are directly permitted on such representation. z Improving statistical inference power.

Slide 34

Slide 33

UCLA, Ivo Dinov

f ( x) → f (Qx + a), Q ∈ O(2), a ∈ R 2 .

Brightness/Flashing:

f ( x ) → µ f ( x ), f ( x ) → h ( f ( x )),

Scaling:

h : R → R , h' > 0.

f ( x ) → f ( λ x ), λ ∈ R + . Slide 35

UCLA, Ivo Dinov

E.g., Zooming in 2-D

µ ∈ R+ ,

or, more generally, a morphological transform –

UCLA, Ivo Dinov

What is zooming? z Zooming (aiming) center: a. z Zooming scale: h. z Zoom into the h-neighborhood at a in a given image I:

I a , h ( x ) = I (a + h ⋅ x ), x ∈ Ω , the visual field; y−a  y −a I a , h   = I ( y ) ⋅ 1Ω  h , the aperture.  h    z Zooming is one of the most fundamental and characteristic operators for image analysis and visual communication. It reflects the multi-scale nature of images and vision. Slide 36

UCLA, Ivo Dinov

Slide 37

UCLA, Ivo Dinov

6

The zooming/scaling and wavelets

Good Signal representation should be adaptable

ψ (x)

z The base function:

z A good representation should react strongly to abrupt changes, and weakly to stable intensities

z Aiming (a) and zooming (h), in-or-out:

ψ

a ,h

(x) =

1 h

ψ (

x−a h

).

z Generating response (to centering/scaling of signals):

I a , h = I ,ψ

a ,h

z The zooming space:

=

∫ I ( x )ψ

a ,h

z That means, for a stable image, e.g., constant, I=c, the responses Ia , h should be approximately zeros:

I a ,h =

( x ) dx .

(a, h) ∈ R × R + . Slide 38

∫R ψ

≈ 0

(x) = 0.

z This is the “differentiating” property of the encoding, e.g., d/dx Slide 39

UCLA, Ivo Dinov

UCLA, Ivo Dinov

Synthesizing a wavelet representation

The continuous wavelet representation

z Goal: to recover perfectly an image signal I from its wavelet representation I(a , h).

Definition: Definition A differentiating zooming function ψ(x) is said to be a (continuous) wavelet. Representing a given image I(x) by all the scale/shift responses

I a , h = I ,ψ

z (Continuous) Wavelet synthesis (where ^ denotes the FT ): I (a, h ) =

I ,ψ

a ,h

=

∧

I ,ψ

∧ a ,h

∧

∧

I ψ ( h ξ ), e

=

− ia ξ

,

∧ ∧

a ,h

J (ξ , h ) = I ψ ( h ξ ) which is in the form of IFT. Let then it can be perfectly recovered via the a-FT of I(a , h).

is the corresponding wavelet representation.

∧

Questions:

Then I can be perfectly recovered from J via

Is there a best wavelet ψ(x) ? Does such wavelet representation allow perfect image reconstruction? Slide 40

∧

∫

∫

∞

0

∧

R

UCLA, Ivo Dinov

z Make a scale-adaptive discretization of the zooming centers: at scale

∧

= 0 , and ψ ( h ) = ch + o ( h ).

The (1D Laplacian of Gaussian) Marr wavelet (Mexican-hat): second derivative of Gaussian.

ψ ( x) = 2sinc (2x) − sinc ( x). sin(π t ) sinc(t ) = πt

Slide 42

dh = 1 .

j → j = − log 2 h j = 0 , ± 1, ± 2 , L .

z Examples:

The Shannon wavelet:

h

z Make a log-linear discretization to the scale parameter h:

|ψ ( h ) | / h dh < ∞ .

∫ ψ ( x ) dx

2

The discrete set of zooming scales

2

z A differentiating zooming operator satisfies the admissibility condition since:

ψ (0) =

∫[ 0 , ∞ )

ψ (h)

Slide 41

UCLA, Ivo Dinov

z The admissibility condition of a continuous wavelet:

∧

∧

J (ξ , h ) ψ ( h ξ ) dh , as I (ξ ) = h [ 0 ,∞ )

The admissibility condition & differentiation

∧

I ,ψ a , h

UCLA, Ivo Dinov

h

j

= 2 − j : k → a k = kh

j

= k / 2 j,

k = 0 , ± 1, ± 2 , L .

z The discrete set of zooming operators:

ψ

j ,k

( x) =

1 hj

ψ(

x − kh hj

j

)= 2

Slide 43

j/2

ψ ( 2 j x − k ). UCLA, Ivo Dinov

7

The discrete wavelet representation z The wavelet coefficients:

d j , k = I ,ψ j , k = 2

j/2

∫

Example 1: Haar wavelet z The Haar “aperture” function is

I ( x )ψ ( 2 x − k ) dx.

ψ (Harr) ( x ) = I {0 ≤ x < 1 }( x ) − I {1 ≤ x