Monday, October 27, 2014

Summary of Findings: Delphi Technique (4 out of 5 stars)

Note: This post represents the synthesis of the thoughts, procedures and experiences of others as represented in the 5 articles read in advance (see previous posts) and the discussion among the students and instructor during the Advanced Analytic Techniques class at Mercyhurst University in October 2014  regarding Delphi Technique specifically. This technique was evaluated based on its overall validity, simplicity, flexibility and its ability to effectively use unstructured data.

The Delphi technique is a method that relies on expert and group knowledge to make more accurate forecasts using incomplete information.  The individual forecasts are compiled after a series of rounds.  Then, the individuals’ responses are anonymized and dispersed to the remainder of the group for consideration, and new individuals forecasts are given.  

The RAND corporation created the Delphi technique in order to support accurate decision making in the face of incomplete information.  There is a substantial amount of research on the validity of the Delphi technique dating back to its creation in the 1950s, but the methodologies scholars have used to test Delphi’s effectiveness have varied in almost every study.  

1. Is conducting in writing or electronically and does not require face-to-face meetings
2. Helps generate consensus or identify divergence of opinions among group members
3. Participants are relatively free of social pressure, influence, and dominance from other group members
4. Anonymous responses allow respondents to keep opinion until they are comfortable changing an estimate
5. Is inexpensive

1.  Time for answers may not be given to the problem and consensus may not be obtained
2. Participants may ignore feedback
3. Experts may not be defined among the group
4. Requires adequate time and participant commitment
5. More time consuming than other group methods
6.Broad guidelines-- there are at least 27 different ways to conduct the method

Step by Step:  
  1. Use a group of 5-20 heterogeneous experts or people with appropriate knowledge of the subject.
  2. The entire process must use a systematic process, particularly with anonymous feedback and a controlled method of dispersing responses and feedback.
  3. A minimum of three iterations should be conducted with polling continuing until there is a stability in responses.

We used Delphi Decision Aid online software to conduct three 5 minute rounds of Delphi to forecast how many second year Applied Intelligence graduate students will have at least one full time job offer in an intelligence-related field by graduation and how many pages second year Applied Intelligence will have completed on average by October 29 toward a thesis. The first round also contained a ranking question to rank panel expertise on various topics to inform further Delphi questions for subsequent rounds. Subsequent rounds asked the two original questions in addition to predicting the outcome of the National Football League AFC division this session, how many selfies Kim Kardashian will have in her book scheduled for publishing in April 2015, and what the S&P 500 index will be in early November. After each round, the panel had a few minutes to review the feedback of the round through statistical aggregation of responses and written comments explaining why a panelist made the estimate that they did. 

What did we learn from the Delphi Exercise
  1. Delphi works well with broad questions where the expertise of one person is not sufficient to encompass the entire scope of the question.
  2. Literature suggests that panelists tend to perform poorly on questions asking them to rank various items from best to worst and that self-reported expertise is not a best practice for panel selection.
  3. Delphi is designed to collect expert estimates  in cases where a variety of relevant factors (economic, technical, etc.) ensure that individual panelists have limited knowledge and could reasonably benefit from communicating with other experts possessing different information.   
  4. Estimates from panelists do not have to be quantitative such as in prediction markets.

Additional Resources Of Interest:

Friday, October 24, 2014

The Delphi technique as a forecasting tool: issues and analysis

This 1999 meta-analysis in the International Journal of Forecasting gathered all the peer-reviewed journal articles and book chapters (English-language) that experimentally evaluated the Delphi technique as a structured forecasting method with specific control conditions. The search yielded 27 studies in all and the researchers produced tables summarizing the methods and findings. Additionally, the researchers contacted the authors of the evaluative studies to comment upon the coding and interpretation of each author's own paper. The meta-analysis found that Delphi groups outperform statistical models and unstructured interacting groups. However, no conclusive evidence found that Delphi outperforms other structured group procedures such as Nominal Group Technique (NGT). Two studies found that NGT groups make more accurate judgments than Delphi groups, three studies found no notable differences in accuracy between them, and one study showed Delphi superiority. One study found that the Problem Centered Leadership (PCL) approach, which involves instructing group leaders in appropriate group-directing skills, outperforms Delphi. An unintended finding was that generalizations about Delphi from the meta-analysis is difficult because of confounding variables in various studies that did not sufficiently control for group, task, and technique characteristics such as panelist expertise and the nature of feedback used. Therefore, the researchers conclude that a Delphi conducted according to "ideal" specifications might perform better than laboratory interpretations. The researchers conclude that future research requires a shift of focus from the final estimative output to analyzing the process of judgment change within groups. 

Delphi was explicitly designed for use with experts in cases where a variety of relevant factors (economic, technical, etc.) ensure that individual panelists have limited knowledge and could reasonably benefit from communicating with others possessing different information.  

The majority of Delphi evaluative studies tend to use artificial tasks that may be easily validated, contain sampling problems, and simple feedback in the place of meaningful and coherent tasks, experts, and complex feedback. 

Two primary suggestions the researchers prescribe to make future evaluative studies more effective include following a precise definition of Delphi to prevent the technique from being misrepresented in the laboratory and requiring a much greater understanding of factors that influence Delphi effectiveness to prevent the technique's potential utility from being underestimated through inaccurate representations used in badly designed scenarios. 

The solution the researchers propose is focusing research on the way in which the estimates from an expert panel in round 1 are transformed through the Delphi process into a final round estimate. The transformation must be measured through changes over rounds in judgments, changes in the individual accuracy of panelists, judgmental intercorrelations, and characteristics such as attrition rate and group size. Through this process, researchers will identify which factors are the most important in explaining how and why an individual changes a judgment and which are related to change in the direction of increased accuracy. Only after that can research explaining the differences between structured procedures commence. 

The four necessary defining attributes of Delphi are anonymity, iteration, controlled feedback, and the statistical aggregation of group response however there are numerous ways in which they may be applied. One of the goals of Delphi is achieving greater consensus, usually measured by the variance in responses of panelists over the course of the Delphi process. While this is a typical trend, a study measuring "post-group consensus" found that Delphi produces little increased agreement and that panelists simply alter their estimates to conform to the group without actually changing their opinions. Another study found that experts with extreme views are more likely to drop out of a Delphi procedure, suggesting that consensus may be due to attrition. Further research is required to determine the extent that consensus is due to "true consensus" versus conformity pressures. 

The researchers state the the first round of a classical Delphi procedure is unstructured to allow individual experts to identify and elaborate on the issues most important to solving a problem. The monitor team then produces a structured questionnaire from which the judgments of the Delphi panelists may be elicited in a quantitative manner in subsequent rounds. After each round, the monitor team aggregates the responses and sends them back to the panelists for further consideration. From the third round onwards, panelists have the opportunity to alter prior estimates on the basis of the provided feedback and if a judgment falls outside the upper or lower quartiles they may be asked to give reasons why they believe their selections are correct against the majority opinion. The procedure continues until the judgments of the panelists attain a designated threshold.  

The majority of the studies in the meta-analysis contained structured first rounds in which event statements devised by researchers are presented to panelists for assessment with no opportunity for them to indicate the issues they believe to be the greatest importance on the topic via an unstructured round. Other studies used almanac questions involving estimating the diameter of planets in the solar system or the tonnage of a certain material shipped from New York in a certain year, which are inappropriate for Delphi.

Studies in the article showed that panels composed of experts tend to benefit from a Delphi procedure to a greater extent than groups of novices and that laboratory studies sampling from homogenous groups underestimate the value of Delphi. Technique-comparison studies ask if Delphi works yet use techniques that differ from one study to the next and that deviate from the intended purpose of Delphi. 

Pertaining to the role of feedback in improving forecasting accuracy, one study found that different types of feedback provide different effects. A study comparing iteration, statistical feedback (means and medians) and 'reasons' feedback (with no averages) found that the greatest degree of improvement in accuracy over the course of Delphi occurred in the 'reasons' condition. Although participants were less likely to change their forecasts under the 'reasons' condition, when they did change their forecasts they became more accurate, which was not the case for the iteration and statistical treatment groups. Another study found that feedback combining reasons, the median, and range of estimates was more accurate than providing only a median and range. 

Previous research used many technique formats to represent Delphi, varying on every aspect such as the type of feedback used, selection of panelists, and types of questions. Therefore, using Delphi as originally intended may lead to greater enhancement of accuracy than is reflected in the articles in the meta-analysis. Whenever an experiment changes an analytic step in Delphi procedures shown to influence the performance of the technique, the experiment is essentially studying a different technique. 
This is a comprehensive meta-analysis. Out of the 27 experimental designs factored in the meta-analysis, no experimental design followed the procedures of a classical Delphi group as referenced in the article. The researchers describe the methodological features of Delphi in each experimental study over 4 tables across multiple pages and sought feedback from the original authors to ensure that the meta-analysis coded and interpreted the procedures and findings of each study appropriately. Despite problems in representing Delphi under experimental conditions, existing literature finds the formats used to represent Delphi produce more accurate forecasts than unstructured group interaction. Nevertheless, no definitive empirical comparisons to other structured techniques can be made until an evaluative study sufficiently controls for group, task, and technique characteristics such as panelist expertise, the nature of feedback used, the structure of a designated amount of rounds, and the type of resolvable questions asked of the group using Delphi. 

The Delphi technique as a forecasting tool: issues and analysis. Gene Rowe , George Wright (1999)