Friday, October 24, 2014

The Delphi technique as a forecasting tool: issues and analysis

This 1999 meta-analysis in the International Journal of Forecasting gathered all the peer-reviewed journal articles and book chapters (English-language) that experimentally evaluated the Delphi technique as a structured forecasting method with specific control conditions. The search yielded 27 studies in all and the researchers produced tables summarizing the methods and findings. Additionally, the researchers contacted the authors of the evaluative studies to comment upon the coding and interpretation of each author's own paper. The meta-analysis found that Delphi groups outperform statistical models and unstructured interacting groups. However, no conclusive evidence found that Delphi outperforms other structured group procedures such as Nominal Group Technique (NGT). Two studies found that NGT groups make more accurate judgments than Delphi groups, three studies found no notable differences in accuracy between them, and one study showed Delphi superiority. One study found that the Problem Centered Leadership (PCL) approach, which involves instructing group leaders in appropriate group-directing skills, outperforms Delphi. An unintended finding was that generalizations about Delphi from the meta-analysis is difficult because of confounding variables in various studies that did not sufficiently control for group, task, and technique characteristics such as panelist expertise and the nature of feedback used. Therefore, the researchers conclude that a Delphi conducted according to "ideal" specifications might perform better than laboratory interpretations. The researchers conclude that future research requires a shift of focus from the final estimative output to analyzing the process of judgment change within groups. 

Delphi was explicitly designed for use with experts in cases where a variety of relevant factors (economic, technical, etc.) ensure that individual panelists have limited knowledge and could reasonably benefit from communicating with others possessing different information.  

The majority of Delphi evaluative studies tend to use artificial tasks that may be easily validated, contain sampling problems, and simple feedback in the place of meaningful and coherent tasks, experts, and complex feedback. 

Two primary suggestions the researchers prescribe to make future evaluative studies more effective include following a precise definition of Delphi to prevent the technique from being misrepresented in the laboratory and requiring a much greater understanding of factors that influence Delphi effectiveness to prevent the technique's potential utility from being underestimated through inaccurate representations used in badly designed scenarios. 

The solution the researchers propose is focusing research on the way in which the estimates from an expert panel in round 1 are transformed through the Delphi process into a final round estimate. The transformation must be measured through changes over rounds in judgments, changes in the individual accuracy of panelists, judgmental intercorrelations, and characteristics such as attrition rate and group size. Through this process, researchers will identify which factors are the most important in explaining how and why an individual changes a judgment and which are related to change in the direction of increased accuracy. Only after that can research explaining the differences between structured procedures commence. 

The four necessary defining attributes of Delphi are anonymity, iteration, controlled feedback, and the statistical aggregation of group response however there are numerous ways in which they may be applied. One of the goals of Delphi is achieving greater consensus, usually measured by the variance in responses of panelists over the course of the Delphi process. While this is a typical trend, a study measuring "post-group consensus" found that Delphi produces little increased agreement and that panelists simply alter their estimates to conform to the group without actually changing their opinions. Another study found that experts with extreme views are more likely to drop out of a Delphi procedure, suggesting that consensus may be due to attrition. Further research is required to determine the extent that consensus is due to "true consensus" versus conformity pressures. 

The researchers state the the first round of a classical Delphi procedure is unstructured to allow individual experts to identify and elaborate on the issues most important to solving a problem. The monitor team then produces a structured questionnaire from which the judgments of the Delphi panelists may be elicited in a quantitative manner in subsequent rounds. After each round, the monitor team aggregates the responses and sends them back to the panelists for further consideration. From the third round onwards, panelists have the opportunity to alter prior estimates on the basis of the provided feedback and if a judgment falls outside the upper or lower quartiles they may be asked to give reasons why they believe their selections are correct against the majority opinion. The procedure continues until the judgments of the panelists attain a designated threshold.  

The majority of the studies in the meta-analysis contained structured first rounds in which event statements devised by researchers are presented to panelists for assessment with no opportunity for them to indicate the issues they believe to be the greatest importance on the topic via an unstructured round. Other studies used almanac questions involving estimating the diameter of planets in the solar system or the tonnage of a certain material shipped from New York in a certain year, which are inappropriate for Delphi.

Studies in the article showed that panels composed of experts tend to benefit from a Delphi procedure to a greater extent than groups of novices and that laboratory studies sampling from homogenous groups underestimate the value of Delphi. Technique-comparison studies ask if Delphi works yet use techniques that differ from one study to the next and that deviate from the intended purpose of Delphi. 

Pertaining to the role of feedback in improving forecasting accuracy, one study found that different types of feedback provide different effects. A study comparing iteration, statistical feedback (means and medians) and 'reasons' feedback (with no averages) found that the greatest degree of improvement in accuracy over the course of Delphi occurred in the 'reasons' condition. Although participants were less likely to change their forecasts under the 'reasons' condition, when they did change their forecasts they became more accurate, which was not the case for the iteration and statistical treatment groups. Another study found that feedback combining reasons, the median, and range of estimates was more accurate than providing only a median and range. 

Previous research used many technique formats to represent Delphi, varying on every aspect such as the type of feedback used, selection of panelists, and types of questions. Therefore, using Delphi as originally intended may lead to greater enhancement of accuracy than is reflected in the articles in the meta-analysis. Whenever an experiment changes an analytic step in Delphi procedures shown to influence the performance of the technique, the experiment is essentially studying a different technique. 
This is a comprehensive meta-analysis. Out of the 27 experimental designs factored in the meta-analysis, no experimental design followed the procedures of a classical Delphi group as referenced in the article. The researchers describe the methodological features of Delphi in each experimental study over 4 tables across multiple pages and sought feedback from the original authors to ensure that the meta-analysis coded and interpreted the procedures and findings of each study appropriately. Despite problems in representing Delphi under experimental conditions, existing literature finds the formats used to represent Delphi produce more accurate forecasts than unstructured group interaction. Nevertheless, no definitive empirical comparisons to other structured techniques can be made until an evaluative study sufficiently controls for group, task, and technique characteristics such as panelist expertise, the nature of feedback used, the structure of a designated amount of rounds, and the type of resolvable questions asked of the group using Delphi. 

The Delphi technique as a forecasting tool: issues and analysis. Gene Rowe , George Wright (1999)