Testing Summer Camp Effectiveness

They want me to do this using a recorded mock TOEIC oral test format, before and after the summer camp. 80 students so AFAICT about 3 hours of recordings per tester, assuming they get 4 testers.

Given that you have to assess the recordings, that seems to translate into a pretty solid days work for 4 people. I wonder why they think its worth it. Maybe window-dressing for the MOE?

I’ve never done TOEIC, but from a quick look it seems quite an elaborate and onerous protocol.

toeic.com.tw/sw/file/TOEIC_S … ndbook.pdf

I dunno if its intended to run the test “blind” but I’d guess it’d be guaranteed to have no validity otherwise.

Seems optimistic to expect a summer camp to have much detectable impact on basic speaking ability, but I suppose if it reduces shyness it might have a detectable impact via an effect on willingness to speak.

Not been involved with anything like this before, and not sure I want to be now.

Any thoughts?

I work at a 12-instructional-day summer camp (teaching Chinese, not English) every summer. We are focusing on tests of structure (grammar) in output. We will be pre-testing spontaneous output of 40 or so patterns that use extremely simple vocabulary (because we’re interested in breakdowns due to grammar, not vocabulary lack) and then post-testing after our camp. It’s easier to score, but doesn’t have the cachet of being a “real” test. We do give oral post-tests to our zero Chinese beginners (no pre-test needed, they know nothing) requiring a brief spontaneous speech sample, a reading test,and a writing (composition) test. The more advanced kids are tested on writing as well but not on speaking or reading because of the difficulty of giving them a meaningful pretest. This is the first year we’re going to expand our structure testing to the more advanced group, to see if there is any meaningful effect.

The number one thing that most administrations miss out when they schedule testing is the question of what they really want to know, as you’re pointing out. They should be testing the thing their instruction is designed to improve, because the whole reason for testing is to see whether it really did anything or not. It’s scary to think of several days of work for several people, all for – what?

[quote=“ironlady”]

The number one thing that most administrations miss out when they schedule testing is the question of what they really want to know, as you’re pointing out. They should be testing the thing their instruction is designed to improve, because the whole reason for testing is to see whether it really did anything or not. It’s scary to think of several days of work for several people, all for – what?[/quote]

Thanks. You seem to share some of my reservations.

I’ve been making myself unpopular by asking the “What’s this for?” question, and havn’t had much of a reply beyond the repetition of Key Performance Indicator as a sort of mantra.

I’ve also suggested that if they’re going to do it, they need to use blind testing. No response to that so far. They might not have understood, or they might be concerned that it’ll stop them getting the result they want, if MOE window-dressing is indeed the objective.

The details of the proposed testing aren’t clear, which is understandable given we’re in the end-of-term grading marathon, but the outline of the tasks and criteria proposed for the four testers is:-

  1. Read a text aloud : Pronunciation Intonation and stress

    Describe a picture : All of the above, plus Grammar Vocabulary and Cohesion

  2. Respond to questions : All the above,plus Relevance of content and Completeness of content

    Respond to questions using information provided : All of the above

  3. Propose a solution : All of the above

  4. Express an opinion : All of the above

Seems likely to be non-trivial to me. Also seems unlikely that grammar, vocabulary and cohesion, for example, are going to be significantly improved by a 7-day “cultural exchange” programme.

At the risk of sounding like I’m trying to re-invent the Heisenberg Principle, IF you got a (statistically? t-test?) significant improvement on the second test, how do you know its not due to the practice effect of the successive testing?

The test, after all, is inherently more relevant to itself than it is to the summer school activities

If you don’t get a significantly significant improvement, so what?

Is that evidence that the Summer School is a waste of time?

If it is, are the “clients” going to have a use for that unpalatable conclusion?

There’s no point in asking a question if you can’t interpret or use the answer.

Worst fears almost fully realised. That’ll teach me NOT to give them the benefit of any tiny amount of doubt.

I had to score two questions (out of a total of 11), for pronunciation, intonation and stress, grammar, vocabulary, cohesion, relevance and completeness. Each question generated 136 one minute audio-response files. One question they had to describe a picture, the other they had to justify a choice between fast-food and sit-down (?) restaurants.

I find this kind of thing extremely difficult to do consistently on a single pass, but only did one (and some checking) since it was a complete waste of time anyway. As it was it took me a full day and some change.

It was a complete waste of time because, although they did grudgingly follow my recommendation to randomise the samples (at least for my questions) they were asking the same questions pre and post summer camp.

In that context, attempting to interpret any improvement as a consequence of the other 5-day summer camp activities, which must be the intention, is an insult to the intelligence.

Never again. (I keep saying that, but the context keeps shifting.)