Even the eggheads at government think tanks occasionally need outside assistance. Large agencies in particular are most prone to needing external expertise due to how niche some of the agency functions get. For example, the National Health Service (NHS) — the U.K. equivalent of the Department of Health and Human Services (DHHS) in the U.S. — has a Foundation Trust tasked with conducting studies related to public health.
James Morgan, the Mental Health Act Manager at the Leeds and York Partnership NHS Foundation Trust, recently asked for help by posting in the “Psychology Students Network” LinkedIn group the following stats-related question — hence removed but with the URL recorded via hyperlink for posterity:
Nov. 25, 2013
I have been asked to propose an experiment related to prenatal alcohol assumption and working memory. The experiment I’ve designed has three groups: a control group; a group containing children whose mother drank within the recommended limits whilst pregnant; and a group containing children diagnosed with FAS.
I planned on giving each participant four tests: two testing the phonological loop capabilities and two testing the visual-spatial sketchpad. From there, I want to firstly compare the scores from tests measuring the same component to determine whether there is a statistical link – I presume this is testing for validity?
Then I want to combine each participants’ scores into two – one score representing phonological capabilities and one visual-spatial – and then compare these results across groups to determine if there’s a statistical difference in results.
My hypothesis is that as alcohol consumption increases, there will be a significant reduction in the power of working memory components. Could anybody possible (sic) explain the statistical tests I would need to do to achieve this?
Thank you! :)
I sent the following message in response to James Morgan’s inquiry. It appeared I had cornered the market on answering his question — no one else had responded! Then, three things happened:
1) The postback stalled when I attempted to answer in the discussion thread;
2) I re-submitted my answer only for the error message, “This discussion is no longer available,” to appear; and
3) I re-sent my response via the private message feature.
Nov. 25, 2013
Joseph Ohler, Jr.
You’ll use a series of 6 one-way ANOVAs to test for a mean score difference in any direction, namely whether there exists a significantly non-null variance that exceeds the chance variance within each testing condition:
1) Phonological Test A
2) Phonological Test B
3) Visual-Spatial Test A
4) Visual-Spatial Test B
5) Composite Phonological Score
6) Composite Visual-Spatial Score
The ANOVAs on the composite scores serve as a reliability check for your other ANOVAs because the scores they measure are based on the scores measured by the earlier ANOVAs. Equally important is to include the Tukey HSD post-hoc option on each ANOVA to run pairwise comparisons.
Remember that an ANOVA tests for a main effect over all values of a category; the Tukey values indicate which individual comparisons are significant and which are trivial. If the ANOVA is a line of best fit, then the Tukey is the closeup of local peaks and valleys along that line.
The Tukey values show whether the differences between pairs of conditions are in the same direction (in which case the ANOVA difference is high due to additive main effects between categories) or are in opposite directions (in which case the ANOVA difference is low or non-existent due to subtractive main effects between categories).
This means that even if your ANOVA’s F ratio is insignificant, nonetheless consider the Tukey values because you may very well have a multimodal main effect. As you may know, the regression mean or line of best fit for a multimodal distribution may be nearly flat so always keep an open mind that your set of observations might be from a non-normal distribution.
All things equal, use a one-way ANOVA with the Tukey options selected. Include the “descriptives” field to add your confidence intervals for each score; these indicate the range of values where the actual mean score of each categorical population is 95% likely to actually be (if testing against a 5% significance threshold).
Although James did not respond verbally, he promptly removed the discussion thread almost exactly at the time of my first attempt to send the message: Interpret that as you will. Perhaps Morgan was embarrassed by having to ask in the first place and wanted to cover his tracks.
To that, I say: There are no “stupid” questions about statistics because most people feel out-of-place when having to address the technical aspects of how substantial a difference among groups is really significant or within a margin of error. Experts from many fields frequently use a subcontractor to achieve the proper statistical analysis that will withstand scrutiny by university professors, journalists, and government inspectors alike.
There’s really no reason for James or anyone to feel ashamed of asking the question he did, but if James Morgan chose another method for evaluating results, then he’s welcome to comment, providing his email address corresponds to his position in the National Health Service. (Email syntax may vary among government agencies, and I’m familiar enough to know who’s probably faking and who has a 99% chance of being the actual person.)
December 23, 2013 Update: Mr. Morgan has informed me via LinkedIn that the ANOVA with Tukey test was just the solution, and I thank him for the follow-up!
That’s what I love about the Internet — a blogger with a day job can interact with, even collaborate with, people of high authority from Western countries. (I’d advise staying clear of the less thoroughly monitored states, though — you don’t know what characters you’ll meet, and verifying ID is sketchier.)