Intercollegiate specialty board oral examinations in surgery : purpose , content and marking criteria

The traditional viva has been much criticised in the literature for lack of reliability and validity, but its use has continued in some postgraduate exams in the UK and commonwealth countries. This study was designed to identify what examiners in nine sub-specialties of one discipline (surgery) thought they were testing in a viva and to introduce standardisation into the existing five-point marking scale used to grade candidates’ performance in the viva. Three hundred and one surgical examiners took part in one of eight similar workshops that used a plenary method to identify what the examiners thought they were assessing in the viva and a small group approach to identify descriptors for each point on the five-point marking scale. The examiners identified what they were assessing, and their answers were then categorised into three meta-competencies: professional ability/patient care; knowledge and judgement; and quality of response. The examiners used their experience of examining to identify descriptors for each point on the existing five-point rating scale. The descriptors were grouped under one of the three meta-competencies to create marking criteria for the new scenario-based oral exam that will be part of a larger clinical exam. The meta-competencies are similar to those identified by other postgraduate oral exams. This study is reported as the methodology for the study and the meta-competencies being assessed may have wider application.


Introduction
The oral examination (or viva voce), has been a traditional part of the examiner toolkit for many years.Examiners, sometimes working in pairs, ask the candidate random questions.Both the selection of questions and the scoring of the candidate answers are at the discretion of the examiners involved in the viva.The viva is usually unstructured and does not have pre-validated questions and answers.The topics assessed, the level of difficulty of the questions asked, and the amount of help or prompting the candidate receives may vary widely (Davis & Karunathilake, 2005).The exam may thus be unfair to individual candidates.
There is evidence from studies of viva type oral exams that in addition to knowledge the format tends to assess candidate traits, such as personality (Bull, 1959;Holloway et al., 1967;Holloway et al., 1968); verbal style and dress (Rowland-Morin et at., 1991;Burchard et al., 1995); ethnicity (Roberts et al., 2000); and social background and gender (Esmail & May, 2000).This raises the question of what is being tested in the viva voce.Other concerns include cost effectiveness (McGuire, 1966), and examinee stress and acceptability (Jolly & Grant, 1997;Schiff, 2001).Norman (2000) reported that the American exam boards discontinued the viva about 30 years ago because of these problems.The UK and other commonwealth countries, however, continued with the viva for some exams.The Joint Committee on Intercollegiate Examinations (JCIE) represents the Royal Colleges of Surgery and is the body responsible for the postgraduate surgical sub-specialty examinations in the United Kingdom and Ireland: Intercollegiate Specialty Board (ISB) exams.To identify why the ISB continued to use vivas, we asked the ISB examiners what they valued about the viva and what they assessed in the exam.Using the answers that we received for the above two questions we developed meta-competencies for the new ISB oral exams to replace the viva.A small group approach was used to develop marking descriptors for each point on a five-point scale for each metacompetence.The ISB refined the descriptors to create marking criteria.The new ISB oral exams take place within a structured, standardised clinical exam, part of which is scenario-based (oral exams) and part of which is based on real, standardised or simulated patients.The development of the structured, standardised clinical exam will be reported at a later date.

Methods
Between August 2004 and February 2005 eight similar workshops were held for examiners in the ISB exams.In all, 301 ISB examiners attended a workshop.Each examiner was from one of the nine subspecialties of the JCIE (i.e.general surgery, plastic surgery, cardio-thoracic surgery, urology, paediatric surgery, oral and maxillofacial surgery, otolaryngology, neurosurgery, and trauma and orthopaedic surgery).The number of examiners per workshop varied from 16 to 57 (mean = 38).Examiners worked in plenary session and in small groups during the workshops.Some small groups comprised examiners from a single sub-specialty while others were made up of examiners from more than one sub-specialty.
At plenary brainstorming sessions, the examiners were asked: what do you assess at the oral exam?The answers were written on a flip chart during the sessions.Later the answers were refined by removing duplications and categorised to identify the meta-competencies that the examiners assessed at the oral exam.
Small groups of approximately eight examiners were asked to develop a scenario and questions for a viva lasting approximately five minutes.Two groups then joined to form a group of 16.This group of 16 role played two vivas, one developed by each group of eight.Two examiners from the group that developed the scenario questioned a simulated candidate, who was a volunteer from the second group.Fifteen examiners scored each viva.The simulated candidate for one of the two vivas was asked to role play a borderline candidate.All 15 examiners were asked to reveal their scores.In no case was the scoring unanimous.In all cases where the examiner simulated a borderline candidate some of the examiners awarded a passing score and some a failing score.This exercise was used to convince the examiners of the need for marking descriptors for the scoring system to improve inter-rater reliability and was the only purpose of this exercise.The examiners, working in groups of 16, were then asked to make use of their experience to provide descriptors for each point on the 4 to 8 rating scale, currently used in the ISB exams.
After all the workshops were held, all descriptors generated by different examiner groups were collated to form a single set of descriptors for each rating point.The duplications were removed.The rating descriptors were then categorised under the meta-competencies that the examiners identified to formulate scoring criteria.These criteria were further refined by the JCIE for ease of use and were agreed by the JCIE for use in the ISB exams.

The meta-competencies
Table 1 shows a summary of what the examiners said they were assessing in the oral exam, categorised into metacompetencies.

The marking descriptors
All the descriptors that the examiners identified for each rating point are available from the first named author on request.Duplicate descriptors were removed.The remaining descriptors for each point on the rating scale were categorised under one of the three metacompetencies.
Table 2 shows the descriptors for each rating point, linked to the three meta-competencies.Refinement for ease of use The JCIE and the authors refined the marking descriptors and the wording to develop the final marking criteria, and the 'questions -answers -prompting' subcategories under 'quality of response' in table 3, for ease of use.

Discussion
Much of the published work on postgraduate oral exam scoring rubrics is related to general practice.Wakeford et al. (1995) report that the general practice oral examiners identified "the candidates' approach to practice, their decision making skills and their justification for their decisions" as the areas assessed by their oral examination.Of the three metacompetencies in table 1 (i.e.overall professional capability/patient care; knowledge and judgement; and quality of response), the first two encapsulate what Wakeford et al. (1995) have identified.As reported by Ryding & Murphy (1999), Reinhart's (1995) review on oral examinations identifies 'assessing higher order thinking' as one of the key areas.Ryding & Murphy (1999) further state that Libert et al. (1993) have also used the oral examination at Harvard Dental School to assess the ability to think independently; to synthesise interdisciplinary information rapidly; and to exercise sound clinical judgement.All the above literature findings agree with what the surgical examiners identified in the present study.
As for the third meta-competency in table 1 (i.e.quality of response), Wakeford et al. (1995) lay down specific guidelines for the examiners regarding how the questions should be selected and phrased, the answers to be expected from the candidates and guidelines on prompting.'Questions, answers and prompting' in table 3 are the three areas that the surgical examiners identified in relation to 'quality of response'.
The general practice examiners use a ninepoint rating scale, whereas the surgical examiners use a five-point rating scale.We discussed the potential to change to a ninepoint scale, but the examiners said they would find the change difficult as they were already calibrated on the five-point scale.The descriptors in the five-point, surgical exam scale, are more elaborate than the single phrase descriptors described by Yaphe & Street (2003).More elaborate descriptors will especially help examiner decisions around the pass-fail borderline; i.e. rating points 5 and 6 in table 3. Scales with too many rating points may also become unwieldy and may produce poor inter-rater reliability (Gray, 1996).There are strong similarities between the meta-competencies assessed and the scoring rubrics used in the oral examinations in surgical sub-specialties and in general practice.These meta-competencies may have wider application in postgraduate exams in other disciplines.If such application in other disciplines is confirmed, it may be possible to develop a generic model for medical postgraduate oral exams that will, in part, resuscitate the reputation of the oral exam.

Practice points
• Viva voces are notoriously unreliable forms of assessment.
• Identification of what is being assessed in an oral exam may improve reliability.
• There is similarity between what general practitioners and surgeons say they assess in oral exams.
• The viva role play methodology demonstrated to the examiners the need for marking criteria.
• Identification of descriptors for rating scales may improve inter-rater reliability.