Data Loading...
Wavelet-Based Statistical Analysis of fMRI Data Flipbook PDF
1 UCLA, Ivo Dinov Slide 1 Stat 233 Statistical Methods in Biomedical Imaging Wavelet-Based Statistical Analysis of fMRI
106 Views
18 Downloads
FLIP PDF 4.61MB
Stat 233 Statistical Methods in Biomedical Imaging
fMRI Simulation movie
Wavelet-Based Statistical Analysis of fMRI Data Ivo Dinov http://www.stat.ucla.edu/~dinov
UCLA, Ivo Dinov
Slide 3
Slide 1
Problems with BOLD
Activation Imaging using BOLD
z How localized is the BOLD response to the site of neuronal activity?
z Resting state versus Active state e.g. Finger tapping, word recognition.
Is the signal from draining veins rather than the tissue itself?
z Whole brain scanned in ~3 seconds using a high speed imaging technique (EPI).
z How is the signal change coupled to neuronal activity?
z Perform analysis to detect regions which show a signal increase in response to the stimulus.
Do changes in timing of BOLD responses reliably tell us about changes in timing of neural activity?
Slide 4
UCLA, Ivo Dinov
Slide 5
UCLA, Ivo Dinov
UCLA, Ivo Dinov
Hypotheses vs. Data driven Approaches
fMRI Data Analysis Tools
Hypothesis-driven z Brain Voyager http://www.brainvoyager.de/ z CARET http://brainmap.wustl.edu/resources/caretnew.html z FSL http://www.fmrib.ox.ac.uk/fsl/ z SCIrun http://software.sci.utah.edu/scirun.html z McGill BIC www.bic.mni.mcgill.ca/usr/local/matlab5/toolbox/fmri/
• •
• • • •
z AFNI http://afni.nimh.nih.gov/afni/ z SPM http://www.fil.ion.ucl.ac.uk/spm/ UCLA, Ivo Dinov
Examples: t-tests, correlations, general linear model (GLM) a priori model of activation is suggested data is checked to see how closely it matches components of the model most commonly used approach (e.g., SPM)
Data-driven – Independent Component Analysis (ICA)
•
z LONI Pipeline http://www.loni.ucla.edu
Slide 6
• • •
No prior hypotheses are necessary Multivariate techniques determine the patterns in the data that account for the most variance across all voxels Can be used to validate a model (see if the math comes up with the components you would’ve predicted) Can be inspected to see if there are things happening in your data that you didn’t predict Can be used to identify confounds (e.g., head motion) Need a way to organize the many possible components New and upcoming Slide 7
UCLA, Ivo Dinov
1
Two approaches: Global vs. Regional
Two Approaches: Whole Brain Stats
A. Region-Of-Interest (ROI) approach 1. 2. 3.
A localizer run(s) to find a region ( e.g., show moving rings to find medial temporal MT area) Extract time course information from that region in separate independent runs See if the trends in that region are statistically significant The runs that are used to generate the area are independent from those used to test the hypothesis. Example study: Tootell et al, 1995, Motion Aftereffect Localize “motion area” MT in a run comparing moving vs. stationary rings
Extract time courses from MT in subsequent runs while subjects see illusory motion (motion aftereffect)
B. Whole volume statistical approach 1. Make predictions about what differences you should see if your hypotheses are correct 2. Decide on statistical measures to test for predicted differences (e.g., t-tests, correlations, GLMs) 3. Determine appropriate statistical threshold 4. See if statistical estimates are significant
Statistics available 1. T-test 2. Correlation 3. Frequency-Based (Fourier/Wavelet/Fractal) modeling 4. General Linear Model -overarching statistical model that lets you perform many types of statistical analyses (incl. correlation/regression, ANOVA)
MT
Slide 8
Slide 9
UCLA, Ivo Dinov
Comparing the two approaches
Why do we need statistics?
Region of Interest (ROI) Analyses • Gives you more statistical power because you do not have to correct for the number of comparisons • Hypothesis-driven • ROI is not smeared due to intersubject averaging • Easy to analyze and interpret • Neglects other areas which may play a fundamental role • Popular in North America Whole Brain Analysis • Requires no prior hypotheses about areas involved • Includes entire brain • Can lose spatial resolution with intersubject averaging • Can produce meaningless lists of areas that are difficult to interpret • Depends highly on statistics and threshold selected • Popular in Europe NOTE: Though different experimenters tend to prefer one method over the other, they are NOT mutually exclusive. You can check ROIs you predicted and then check the data for other areas. Slide 10
MR Signal intensities are random -variation caused by scanner (magnet, coil, time) -variation in neurophysiology (area, task, subject) The rest-state scans used against activation tasks in subtraction paradigm setting to correct for some of these effects within the same run. Statistical analyses help us confirm whether the eyeball tests of significance based on visual thresholding are real/correct. Because we do so many comparisons (~106), we need a way to compensate (increase of Type I error). Slide 11
UCLA, Ivo Dinov
Paradigm: Event-related
design to assess between-population differences of amplitude and variance of hemodynamic response to visual stimulus (14 young, 14 old noAD and 13 AD subjects) blank
Look for differences b/w One-Trial & Two-Trial Activation, for a given voxel: Measure average MR signal and SD for each volume in which 1-Trial stimuli were presented (7-8 epochs x 8 volumes/epoch = 56-64 volumes). Measure average MR signal and SD for each volume in which 2-Trial stimuli were presented (56-64 volumes).
Buckner et al. J. Cog. Neurosci. 2000
4 runs per Subject 128 volumes per run 2 (randomly chosen) conditions (always 7+8) 1 Trial = 21.44 sec (8 vol’s) Total 60 Trials/subject
1-trial condition–1.5 second 8Hz B/W checkerboard flicker
To look for OneTrial – TwoTrial Activation, look at the negative tail of the comparison
2-trial condition – 2 visual stimuli 5.36 sec apart 15 Random Trials / Run Run 1 1 8 16 24 32 40 48 56 64 72 80 88 96 104 112 120128
Run 2
1 8 16 24 32 40 48 56 64 72 80 88 96 104 112 120128
Run 3 1 8 16 24 32 40 48 56 64 72 80 88 96 104 112 120128
Run 4 1 8 16 24 32 40 48 56 64 72 80 88 96 104 112 120128
Slide 12
UCLA, Ivo Dinov
P>F
F>P
Stat-Significant Mean difference? Calculate To value. Look up p value for that number of degrees of freedom (df >= 56 x 2 = 112). e.g., For ~112 df To>1.98 → p 3.39 → p < 0.001 Repeat this process 65,536 more times (64x64x16), once for each voxel.
fMRI Signal
16 coronal slices 64x64 in-plane voxels volume time = 2 sec 3 x 3 x 5 mm3 voxels
UCLA, Ivo Dinov
One Approach – Voxel-Based T-tests
fMRI of Young, Nondemented and Demented Adults
Stimulus required a right hand index-finger click
Source: Tootell et al UCLA, Ivo Dinov
1 Trial 2 Trial 1 Trial 2 Trial If p = .001 and 65,536 voxels, 65,536*0.001 = 66 voxels could be significant purely by chance Slide 13
UCLA, Ivo Dinov
2
T-test: Maps
Another Statistical Analysis – Basic Correlation
For each voxel in the brain, we can now color code a map based on the computed t and p values:
• voxels with time course correlated with a reference function • can incorporate hemodynamic response function (HRF) to predict time course more accurately Value of fMRI Signal
We can do this for the positive tail (1 Trial – 2 Trial)
Orange = low significance Yellow = high significance Schmutz or ESP voxels?
56 data points 56 data points
Calculate r
0 1 Ref.Model Value of Predictor 2 Remember r is the proportion of variance accounted for by our predictor, e.g., if r=0.7, r2=0.49Î49% Data
And we can also do this for the negative tail (2 Trial – 1 Trial) Blue = low significance Green = high significance Slide 14
UCLA, Ivo Dinov
For each voxel: • Find the correlation between the predictor and the MR signal • Extract the correlation (r value) and find the corresponding p value (Pearson/Spearman). • Determine whether it is statistically significant • Similar in spirit to a T-test. UCLA, Ivo Dinov Slide 15
Predictor – Hemodynamic Response Function - HRF signal begins to rise soon after stimulus begins
m(x)
time to peak signal peaks 4-6 sec after stimulus begins Stimulus On
Time from Stimulus Onset (sec)
post stimulus undershoot
To find a 1-Trial responsive area, we can correlate the convolved face predictor with each voxel time course
Model
(m*g)(x)
signal suppressed after stimulation ends
% signal change = (point – baseline)/baseline usually 0.5-3%
Data
f(x)
initial dip -more focal and potentially a better measure -somewhat elusive so far, not everyone can find it Slide 16
56 Model points
56 data points
0 1 Model Value of Predictor Data
Slide 17
UCLA, Ivo Dinov
Correlations: Negative Tail
UCLA, Ivo Dinov
Two Main fMRI Designs
Looking for anything that shows a reduced response during 1-Trial condition (compared to both 2-Trial and fixation/rest)
Slide 18
g(x) Kernel
Predictor
Value of fMRI Signal
Percent Change From Baseline
time to rise
Correlation: Incorporating the HRF We can model the expected curve of the data by convolving our predictor with the hemodynamic response function.
UCLA, Ivo Dinov
Block design • Compare long periods (e.g., 16 sec) of one condition with long periods of another • Traditional approach • Most statistically powerful approach • Less dependent on how well you model the HRF (hemo) • Habituation effects?!?
Event-related design • Compare brief trials (1 sec) of 1 condition with brief of another • Newer approach (1997+) • Less statistically powerful but has some advantages • Trials can be well-spaced to allow the MR signal to return to baseline b/w trials (e.g., 21+ sec apart) or closely spaced (e.g., every 2 sec) Slide 19
UCLA, Ivo Dinov
3
General Linear Model for fMRI
Problems with T-tests and Correlation Analyses 1) How do we evaluate runs with different orders? We could average our 4 runs (60 trials) by 1-Trial of 2-Trial stimuli and then do stats comparing the functional differences between the 2 paradigms. There is no way to collapse between orders.
2) If we test more subjects, how can we evaluate all subjects together?
Parse out variance in the voxel’s time course to the contributions of six predictors plus residual noise (what the predictors can’t account for).
As with the single subject runs, we could average all the subjects together (after morphing them into an atlas space) but that still means we have to run all of them in the same order.
3) We can get nice hemodynamic predictors for 1-Trial or 2Trial stimuli, but how can we compare them accurately?
Completeness and Efficiency of Signal Representation – Wavelet functions 1D
Subject Run
+
β4 ×
… +
β5 ×
+
Trial
residuals
Group (AD/Young) +
β6 ×
A Solution: Use General Linear Modeling Slide 20
+
β3 ×
fMRI signal
Examples: Stimulus
+
β2 ×
=
If this predictor is significant, we won’t know if it’s because 1-Trial > 2-Trial OR 1-Trial > Fixation
UCLA, Ivo Dinov
Design Matrix
β1 ×
Adapted from Brain Voyager course slides
ROI Slide 21
UCLA, Ivo Dinov
Features of Natural Images
SOCR Wavelet/Fourier Demo!
z Mostly no global multi-scale self-similarity.
http://socr.stat.ucla.edu/
z Contain both man-made and natural features z Mostly no simple and universal underlying physical or biological processes that generate the patterns in a general image.
Ref. www.math.umn.edu/~jhshen Slide 22
UCLA, Ivo Dinov
Slide 23
UCLA, Ivo Dinov
Efficiency of representation
Fourier Representation Claim: Harmonic waves are bad vision neurons… Because. A typical Fourier neuron is
z David Field (Cornell U, Vision psychologist):
φ = exp(iax ).
To process a simple bright spot
“To discriminate between objects, an
δ (x )in the visual field,
effective transform (representation) encodes each image using the smallest possible number of neurons, chosen from a large pool.”
all such neurons have to respond to it (!) since
δ ,φ
=
∫
δ ( x ) e − ( i π x ) dx Slide 24
= 1 . UCLA, Ivo Dinov
Slide 25
UCLA, Ivo Dinov
4
Marr’s Edge Detection
Asking our own “headtop”…
z Detection of edge contours is a critical ability of human vision (Marr, 1982). Marr and Hildreth (1980) proposed a model for human detection of edges at all scales. This is Marr’s Theory of Zero-Crossings: 2 2
z Psychologists show that visual neurons are spatially organized, and each behaves like a small sensor (receptor) that can respond strongly to spatial changes such as edge contours of objects (Fields, 1990).
x +y exp − , Gaussian Kernel 2πσ 2σ 2 Laplacian of the Gaussian is : ∂ 2Gσ ∂ 2Gσ ∆Gσ = + = ∂x 2 ∂y 2 x2 + y 2 1 x 2 + y 2 , =− 1− exp − 4 2 2σ 2σ 2 πσ Edge occurs in I where ( ∆Gσ ∗ I ) = 0. Why?
Gσ =
Slide 26
1
2
Slide 27
UCLA, Ivo Dinov
UCLA, Ivo Dinov
Laplacian of a Gaussian – Why an Edge at Zero-Crossing?
Marr’s Edge Detection z Laplacian of (an isotropic) Gaussian Kernel (at (0,0)) (http://www.dai.ed.ac.uk/HIPR2/flatjavasrc/)
z Uses the second spatial derivatives of an image. z The operator response will be zero in areas where the image has a constant intensity (i.e. where the intensity gradient is zero). z Near a change in image intensity the operator response
will be positive on the darker side, and negative on the lighter side.
z A sharp edge between two regions of different uniform
intensities, the Laplacian of Gaussians response will be:
Gσ =
x2 + y2 exp − 2 2πσ 2σ 2 1
∆Gσ =
Slide 28
∂ 2Gσ ∂ 2Gσ + ∂x 2 ∂y 2
zero away from the edge, positive to one side of the edge, negative to the other zero on the edge in b/w. Example: Response of 1D Laplacian of Gaussian (σ = 3 pixels) to a 1D step edge. Slide 29
UCLA, Ivo Dinov
Laplacian of a Gaussian – Why an Edge at Zero-Crossing? z A zero crossing detector looks for places where the Laplacian changes sign. Such points often occur around edges in images.
UCLA, Ivo Dinov
Laplacian of a Gaussian – Why an Edge at Zero-Crossing? z Demos: C:\Ivo.dir\UCLA_Classes\Applets.dir\ImageProcessing\LaplacianOfGaussian
z But they also occur at places that are not as easy to associate with edges. Zero crossings always lie on closed contours.
LaplacianOfGaussianDemo.html
z The starting point for the zero crossing detector is an image which has been convolved with a Laplacian of Gaussian filter.
LaplacianDemo.html
z The zero crossings that result are strongly influenced by the size of the (isotropic) Gaussian, change filter-size in applet demo. z As the smoothing is increased then fewer and fewer zero crossing contours will be found, and those that do remain will correspond to features of larger and larger scale in the image. Slide 30
UCLA, Ivo Dinov
Slide 31
UCLA, Ivo Dinov
5
Marr vs. Haar’s edge encoding
A good representation should respect edges z Edges are important features of images.
z Marr’s edge detector is to use second derivative to locate the maxima of the first derivative (which the edge contours pass through).
z A good image representation (or basis) should be capable of providing the edge information easily.
z Haar Basis (1909) encodes the edges into image representation via the first derivative operator (i.e. moving average/difference):
z Edge is a local feature. Local operators, like differentiation, must be incorporated into the representation, as in the coding by the Haar basis.
(... x 2 n , x 2 n +1 ,... )←Haar → ... a n = x2 n +2x2 n +1 , d n = x2 n −2x2 n +1 ,...
z Wavelets improve Haar encoding, while respecting the above edge representation principle.
Slide 32
What is a good image representation?
UCLA, Ivo Dinov
Mathematics of wavelet image representation
z Mathematically rigorous (i.e. a clean and stable analysis and synthesis program exists. FT & IFT…). z Having an independent fast digital formulation framework (e.g., FFT, FWT… ).
z Let Ω denote the collection of all images. What is the mathematical structure of Ω ? Suppose that f ∈ Ω is an observed image. Then Ω should be invariant under Euclidean motion of the scanner/imager:
z Capturing the characteristics of the input signals, and thus many existing processing operators (e.g. image indexing, image searching …) are directly permitted on such representation. z Improving statistical inference power.
Slide 34
Slide 33
UCLA, Ivo Dinov
f ( x) → f (Qx + a), Q ∈ O(2), a ∈ R 2 .
Brightness/Flashing:
f ( x ) → µ f ( x ), f ( x ) → h ( f ( x )),
Scaling:
h : R → R , h' > 0.
f ( x ) → f ( λ x ), λ ∈ R + . Slide 35
UCLA, Ivo Dinov
E.g., Zooming in 2-D
µ ∈ R+ ,
or, more generally, a morphological transform –
UCLA, Ivo Dinov
What is zooming? z Zooming (aiming) center: a. z Zooming scale: h. z Zoom into the h-neighborhood at a in a given image I:
I a , h ( x ) = I (a + h ⋅ x ), x ∈ Ω , the visual field; y−a y −a I a , h = I ( y ) ⋅ 1Ω h , the aperture. h z Zooming is one of the most fundamental and characteristic operators for image analysis and visual communication. It reflects the multi-scale nature of images and vision. Slide 36
UCLA, Ivo Dinov
Slide 37
UCLA, Ivo Dinov
6
The zooming/scaling and wavelets
Good Signal representation should be adaptable
ψ (x)
z The base function:
z A good representation should react strongly to abrupt changes, and weakly to stable intensities
z Aiming (a) and zooming (h), in-or-out:
ψ
a ,h
(x) =
1 h
ψ (
x−a h
).
z Generating response (to centering/scaling of signals):
I a , h = I ,ψ
a ,h
z The zooming space:
=
∫ I ( x )ψ
a ,h
z That means, for a stable image, e.g., constant, I=c, the responses Ia , h should be approximately zeros:
I a ,h =
( x ) dx .
(a, h) ∈ R × R + . Slide 38
∫R ψ
≈ 0
(x) = 0.
z This is the “differentiating” property of the encoding, e.g., d/dx Slide 39
UCLA, Ivo Dinov
UCLA, Ivo Dinov
Synthesizing a wavelet representation
The continuous wavelet representation
z Goal: to recover perfectly an image signal I from its wavelet representation I(a , h).
Definition: Definition A differentiating zooming function ψ(x) is said to be a (continuous) wavelet. Representing a given image I(x) by all the scale/shift responses
I a , h = I ,ψ
z (Continuous) Wavelet synthesis (where ^ denotes the FT ): I (a, h ) =
I ,ψ
a ,h
=
∧
I ,ψ
∧ a ,h
∧
∧
I ψ ( h ξ ), e
=
− ia ξ
,
∧ ∧
a ,h
J (ξ , h ) = I ψ ( h ξ ) which is in the form of IFT. Let then it can be perfectly recovered via the a-FT of I(a , h).
is the corresponding wavelet representation.
∧
Questions:
Then I can be perfectly recovered from J via
Is there a best wavelet ψ(x) ? Does such wavelet representation allow perfect image reconstruction? Slide 40
∧
∫
∫
∞
0
∧
R
UCLA, Ivo Dinov
z Make a scale-adaptive discretization of the zooming centers: at scale
∧
= 0 , and ψ ( h ) = ch + o ( h ).
The (1D Laplacian of Gaussian) Marr wavelet (Mexican-hat): second derivative of Gaussian.
ψ ( x) = 2sinc (2x) − sinc ( x). sin(π t ) sinc(t ) = πt
Slide 42
dh = 1 .
j → j = − log 2 h j = 0 , ± 1, ± 2 , L .
z Examples:
The Shannon wavelet:
h
z Make a log-linear discretization to the scale parameter h:
|ψ ( h ) | / h dh < ∞ .
∫ ψ ( x ) dx
2
The discrete set of zooming scales
2
z A differentiating zooming operator satisfies the admissibility condition since:
ψ (0) =
∫[ 0 , ∞ )
ψ (h)
Slide 41
UCLA, Ivo Dinov
z The admissibility condition of a continuous wavelet:
∧
∧
J (ξ , h ) ψ ( h ξ ) dh , as I (ξ ) = h [ 0 ,∞ )
The admissibility condition & differentiation
∧
I ,ψ a , h
UCLA, Ivo Dinov
h
j
= 2 − j : k → a k = kh
j
= k / 2 j,
k = 0 , ± 1, ± 2 , L .
z The discrete set of zooming operators:
ψ
j ,k
( x) =
1 hj
ψ(
x − kh hj
j
)= 2
Slide 43
j/2
ψ ( 2 j x − k ). UCLA, Ivo Dinov
7
The discrete wavelet representation z The wavelet coefficients:
d j , k = I ,ψ j , k = 2
j/2
∫
Example 1: Haar wavelet z The Haar “aperture” function is
I ( x )ψ ( 2 x − k ) dx.
ψ (Harr) ( x ) = I {0 ≤ x < 1 }( x ) − I {1 ≤ x