Testing is an interesting and challenging area for teachers. Constructing tests is not a task to be undertaken lightly. It is easy to create tests which do not have face or content validity, do not test what they purport to test and are unreliable.
Below you will find an example of an oral assessment prepared for a very specific context. While this test might not be suitable for your context it might give you some ideas for your own test writing.
Also there is a document for Oral Placement Interview which I used in Tallinn with the police and border guards, and at IH Riga.
The Test Writing Brief
To write an end of year oral proficiency test [and new assessment scale] for teens for a large private language school in Riga.
The previous oral test was simply a choice of topics written on slips of paper. Each pair of students selected a sip of paper and then talked about the topic for about five minutes. The teacher listened and perhaps asked questions before evaluating the students on an assessment scale.
The class size is approximately 10. Lessons last 90 minutes (2x45mins). The spoken test is part of the final year proficiency test, which includes writing, use of English, reading and listening papers. The test should be administered in class time [1 lesson of 90 mins] together with the listening paper, which takes approximately 15 mins to administer.
The test should:
- be delivered by one teacher acting as interlocutor and Assessor
- take approximately 10 minutes to administer
- be taken by pairs of students
- be multilevel
- be ‘light’ in the way of materials
A test covering Intermediate to Advanced, depending on the principle that the complexity of the question does not determine the level of the test but the complexity of the response determines the level of the candidate. Simple questions can elicit complex responses. In a multilevel test all candidates should be able to understand the questions.
The examiner can extend the script to develop the interaction between candidates and ‘push’ higher level candidates to extend their responses.
One page of examiner script and one page of candidate materials.
The test has three parts.
- Introductory Interview: The examiner asks the candidates 2 simple questions to settle them.
- Pair task: Candidates are given a prompt card and asked to talk about the topic. The candidates have 30 seconds to gather their thoughts before they should start talking. The examiner listens and times the interaction.
- Extension interview: The examiner takes back the topic card and then asks the candidate’s further questions on the topic. The examiner can ask unscripted questions to develop the interaction based on what the candidates say and to encourage a particular candidate to make more extended contributions so that a sufficient sample of language has been elicited.
- End of interview: The examiner closes the interview by thanking the candidates, and they leave the room. The examiner then rates the candidates according to the rating scale.
The Rating Scale
There are a number of options for rating scales. One is to design a rating scale for all levels [like the 0-9 IELTS rating scale], another is to design a rating scale for a particular level [like the Cambridge CAE rating scale] and a third is to design a rating scale tied to the examiner/teacher expectations of an average candidate performance at a particular level.
This last approach is the one taken here where the candidate is given a score out of 25 over 4 criteria. This is mainly because the test result is used in student certificates and lower level candidates are looking for results which reflect their ability at that level, not compared to higher level candidates. A pre-intermediate candidate and an advanced candidate could both score 20/25 etc. Results are supposed to be comparable across the level – not between levels. The rating scales are explicitly tied to average performance expected at that level and this uses [and values] the professional expertise of the teacher.
A standardisation session of rating sample tests is also, of course, important to make sure that the teachers of an institution have roughly the same expectations of what an average performance across the specified level means. To achieve this, sample tests should be recorded with pairs of students from different levels and ratings agreed by the teachers. The first time this is done would be the benchmarking and would provide the agreed scores which would be used in subsequent years. Publicly available videos from examining boards like Cambridge ESOL, now Cambridge English, could also be used to adjust expectations and provide a link to the CEF and ALTE levels.
This criterion evaluates how well the candidate does the tasks of the test: answers questions and discusses a topic. The appropriately of the length of turns and their coherence are evaluated here.
This criterion evaluates how well the candidate interacts with the other candidate and the examiner. Higher level candidates will use more sophisticated means to develop and maintain the interaction.
Grammar: Accuracy and Range
This criterion evaluates the candidate’s use of a range of structures and their accuracy, related to their level.
Vocabulary: Accuracy and Range
This criterion evaluates the candidate’s range of vocabulary and it’s accuracy, related to their level.
Fluency and Pronunciation
This criterion is perhaps the most problematic. Fluency is usually rated separately from pronunciation but fluency is not just speed of delivery – it is a complex outcome of a mix of factors – correctly chunked tone units which make up utterances are the key to fluency and further factors are the number of hesitations [and where they occur] and the length of turns.
False fluency is characterised by fast speech which can mask ungrammatical and badly chunked tone units. Fluency is a function of the correct linking and assimilation, stress and intonation within the tone unit and correct pauses between them.
Pronunciation is to some extent independent of level – a beginner could have very good pronunciation of learned and memorised phrases, while a higher level candidate may have patches of dis-fluency because of the demands of the task.