Become a Site Supporter and Never see Ads again!

Author Topic: AES Paper: A Meta-Analysis of High Resolution Audio Perceptual Evaluation  (Read 1340 times)

0 Members and 1 Guest are viewing this topic.

Offline aaronji

  • Site Supporter
  • Trade Count: (5)
  • Needs to get out more...
  • *
  • Posts: 2094
^ First of all, welcome to taperssection and thanks for coming in and discussing this with us.  To be honest, I was initially very surprised to see you post in this little backwater of the web, catering to the practitioners of a pretty uncommon hobby, but I am fairly certain I know how you arrived here on further reflection.  At any rate, I would like to respond to a couple of your comments on my more major criticisms (the presence or absence of the appendix, for example, is immaterial in the end).

- I’m aware of the importance of homogeneity, and the heterogeneity issues here are more serious than those that would typically be found in medical research, and a world apart from formal clinical trials. However, meta-analysis has been successfully applied to social and behavioural science research with far more heterogeneity problems than those seen here. Anyway, this is a judgement call. So the approach I took was to use all possible studies (for which I could do inverse variance analysis), and then do sensitivity or subgroup analysis on more homogeneous subsets of the data.

With respect to the bolded part, what does "successfully" mean?  Obtained a P-value?  Published a paper?  Generated a useful result that led to downstream hypotheses that were also tested successfully?  Settled an open debate?  Whatever that defintion, though, do you think your work should fall into the category of "squishy" science (like a lot of social and behavioural science)?  I always thought of engineering as "hard" science, with experiments conducted rigorously and in the most methodologically proper way possible.  I am sorry, but "others did it worse!" is not a valid rebuttal of this criticism, which, in my mind, completely undermines the entire paper.  You are right that, in the end, it is a series of judgement calls, but others can freely interpret the merit of the work based on their assessment of the quality of those judgements.

- I agree that the work would have been improved by using an approach specific to binomial distributions. However, for much of the analysis, the normal approximation is justified. As for independence in the binomial test, under the null hypothesis every randomised trial would be uncorrelated, regardless of whether they involved the same participant or same study (think guessing a truly random coin toss). I also agree that the aggregate binomial test is not appropriate for meta-analysis. It was included only for completeness along with the binomial values for the individual studies in Section 2, and not used as part of the meta-analysis in Section 3.

The normal approximation may be justified, particularly for large numbers, but I think you need to show that.  It is kind of beside the point, though.  Why make those additional, potentially spurious, assumptions when it is easy to implement the correct analysis, modelled on the correct distribution, in freely available software?  With respect to the aggregate binomial analysis being included for "completeness", wouldn't it have been more complete to actually put the correct estimate in there?  The intra-individual trials are not like coin flips, in my opinion; there is a discrete set of perceptual apparatus that is unique to each individual that causes correlation between that individual's observations.  If such correlations did not exist, nobody would ever score higher (or lower) than 50% in a sufficiently large number of trials.
 
With respect to publication bias, I never said you didn't consider it, only that you never mention, specifically, the implication about the type of study that is not reported based on that funnel plot.  In any event, that is a lesser concern for me than the above.  I certainly appreciate your comments here, and I hope you understand I am not trying to be a dick in anyway (this, after all, is the nature of scientific discourse), but your rebuttal doesn't much impact my previous assessment...

While you are here, on a somewhat related topic, can you comment on the Journal's review policy?  The website says there is a "review board".  Who comprises that board?  How large is it?  Do all reviewers come from this board or are outside experts brought in?

Thanks, I understood a good part of that. ;)

Sorry about that!  I'll try to make it a little more obtuse next time; maybe toss in some formulas...   :D

Offline joshr

  • Trade Count: (0)
  • Taperssection Newbie
  • *
  • Posts: 3
Apologies in advance if I don't continue the discussion much. I've just got a long 'to do' list to catch up on.

“what does ‘successfully’ mean? ” – I meant something loosely along the lines of ‘Generated a useful result that led to downstream hypotheses that were also tested successfully.’
How about https://www2.ed.gov/rschstat/eval/tech/evidence-based-practices/finalreport.pdf . This was a massive, well-cited study that has led to a better understanding of potential benefits & drawbacks of online learning. And it tested hypotheses that were generated from previous meta-studies in the field. But the data had a huge amount of heterogeneity issues.
Note that I didn't follow the approach from that paper though. I kept mainly to guidelines in the Cochrane Handbook. I'm just using it as an example.
I fully agree about best effort and rigour in research, and did not mean to imply an ‘others did it worse’ justification. But nor do I think the heterogeneity issues are insurmountable here. The studies were all looking at discrimination between high resolution and standard resolution audio. Almost all looked at it directly, and a couple of others (King 2012 and Repp 2006) had data that could be transformed into that form. All had multiple participants, each performing multiple dichotomous trials. And all yielded (single outcome measure) results where, if differences could always be perceived then one expects 100% discrimination, and if differences could never be perceived then one expects 50% correct discrimination. And almost all tests were forced choice, either same/different or an ABX variant (these two approaches were also treated to subgroup analysis). I’ll also note that a random effects model was used, and it can be easily seen from the main forest plot and associated statistics that heterogeneity is not readily apparent from the results with the training subgroup.
Anyway, this is going back to the ‘apples and oranges’ analogy. Meta-analysis is comparing apples and oranges (two studies using different dependent and independent variables), but that is ok if you are trying to learn about the nature of fruit (both studies looking at the same research question).
Regarding normal approximation, binomial analysis, etc. First, I wasn’t aware of the full functionality of the ‘meta’ package in R, and so didn’t use it. But I don’t think that use of the normal approximation invalidates any results. Also, the null hypothesis in this case results in exactly what you said, ‘nobody would ever score higher (or lower) than 50% in a sufficiently large number of trials.’ To clarify, suppose randomly you called the correct answer A half the time and randomly you called it B the other half, but that there is no way anyone can distinguish between them. Then it doesn’t matter how someone answers, it still converges on 50% correct. And given that, then we can give a probability for at least 6736 ‘correct’ results out of 12645 trials.
But these are minor details. I agree that binomial distribution is preferred, that the aggregate binomial analysis is not the right approach, and that if there is any perceptual difference at all then individual’s scores are highly correlated (I make note of that in the paper when discussing Meyer 2007). The disagreement is only over the severity and importance of these things. I don’t think the analysis or conclusions are in any sense invalidated, and I still strongly encourage others to revisit the data.

Offline joshr

  • Trade Count: (0)
  • Taperssection Newbie
  • *
  • Posts: 3
can you comment on the Journal's review policy?  The website says there is a "review board".  Who comprises that board?  How large is it?  Do all reviewers come from this board or are outside experts brought in?


The editorial staff of the journal are listed at http://www.aes.org/journal/masthead.cfm . They have a much larger pool of reviewers that they pick from, and also use outside experts. I think they aim for a minimum of three reviews per paper. That said, its always a struggle (as is the case for many journals) to maintain a talented and diverse pool of reviewers, and its hard to find just the right outside experts. I'm sure that they would welcome more potential reviewers.

Offline aaronji

  • Site Supporter
  • Trade Count: (5)
  • Needs to get out more...
  • *
  • Posts: 2094
^^  I think we'll just have to agree to disagree, as we have a fundamental philosophical divide with respect to the tautness of the hypothesis and it's relationship to the meaningfulness and interpretability of the results...

Also, the null hypothesis in this case results in exactly what you said, ‘nobody would ever score higher (or lower) than 50% in a sufficiently large number of trials.’ To clarify, suppose randomly you called the correct answer A half the time and randomly you called it B the other half, but that there is no way anyone can distinguish between them. Then it doesn’t matter how someone answers, it still converges on 50% correct.


I didn't state it very well for the case of the null being true, but, even when the null holds, the intra-individual results are not truly independent (i.e. not coin flips).  Even in simple tests like this, there are a wide range of subtle individual biases and others introduced by the experimental design.  So that non-independence is a factor in analyses like these and should be accounted for; this is generally difficult to do at the meta-analysis level (although if you had the individual results for all of the studies, you could do it no problem with a linear mixed effects model), but it does complicate the interpretation of the results.

The editorial staff of the journal are listed at http://www.aes.org/journal/masthead.cfm . They have a much larger pool of reviewers that they pick from, and also use outside experts. I think they aim for a minimum of three reviews per paper. That said, its always a struggle (as is the case for many journals) to maintain a talented and diverse pool of reviewers, and its hard to find just the right outside experts. I'm sure that they would welcome more potential reviewers.
 

Interesting.  Thanks.  At least in theory, it works a little differently in my field, in that anyone is a potential reviewer based on relevant experience.  In reality, editors usually have some "go to" people, though.

 

RSS | Mobile
Page created in 0.078 seconds with 25 queries.
© 2002-2017 Taperssection.com
Powered by SMF
Website Design by Foxtrot Media, Inc., a Baltimore Website Company