training teachers for large-scale fitness testing: the georgia experience mike metzler shannon...

36
Training Teachers for Large- Training Teachers for Large- Scale Fitness Testing: The Scale Fitness Testing: The Georgia Experience Georgia Experience Mike Metzler Mike Metzler Shannon Williams Shannon Williams Georgia State University Georgia State University NASPE PETE Conference NASPE PETE Conference Las Vegas, NV Las Vegas, NV October 5, 2012 October 5, 2012 1

Upload: carmella-fowler

Post on 26-Dec-2015

212 views

Category:

Documents


0 download

TRANSCRIPT

Training Teachers for Large-Scale Training Teachers for Large-Scale Fitness Testing: The Georgia Fitness Testing: The Georgia

ExperienceExperience

Mike MetzlerMike Metzler

Shannon WilliamsShannon Williams

Georgia State UniversityGeorgia State University

NASPE PETE ConferenceNASPE PETE Conference

Las Vegas, NVLas Vegas, NV

October 5, 2012October 5, 2012

11

OverviewOverview

History of required fitness assessments in GeorgiaHistory of required fitness assessments in Georgia

2010-2011 Pilot Evaluation Project2010-2011 Pilot Evaluation Project

2011-2012 First Year Implementation Evaluation 2011-2012 First Year Implementation Evaluation Project (EP)Project (EP)

What we’ve learned in GeorgiaWhat we’ve learned in Georgia

Q&A as time permitsQ&A as time permits22

History of Required Fitness History of Required Fitness Assessments in Georgia Public SchoolsAssessments in Georgia Public Schools

Spring 2010 HB 229 passed and signed:Spring 2010 HB 229 passed and signed:• Established SHAPE PartnershipEstablished SHAPE Partnership

• Required fitness assessments in all gradesRequired fitness assessments in all grades

• Data to be reported by school, district and stateData to be reported by school, district and state

• First annual report due to Governor in October 2012First annual report due to Governor in October 2012

Spring-Summer 2010 GA DOE Fitness Assessment Committee:Spring-Summer 2010 GA DOE Fitness Assessment Committee:• 5 components of FITNESSGRAM would be used statewide5 components of FITNESSGRAM would be used statewide

• FITNESSGRAM 9 on-line software to be used for data entry and reportingFITNESSGRAM 9 on-line software to be used for data entry and reporting

• Determined rules for student exemptions from testingDetermined rules for student exemptions from testing

• FITNESSGRAM Parent Reports to be sent home for all tested studentsFITNESSGRAM Parent Reports to be sent home for all tested students

• ““Test familiarity” in grades 1-3Test familiarity” in grades 1-3

• BMI measured in 1BMI measured in 1stst-3-3rdrd grade but not reported grade but not reported

• All other grades are to be tested and reportedAll other grades are to be tested and reported

33

History of Required Fitness History of Required Fitness Assessments in Georgia Public SchoolsAssessments in Georgia Public Schools

2010-2011 School Year Pilot Training and Evaluation:2010-2011 School Year Pilot Training and Evaluation:• 4 participating districts4 participating districts

• 250 teachers trained by state/national FG trainers250 teachers trained by state/national FG trainers

• ~25,000 students tested and data entered for~25,000 students tested and data entered for

• GSU research team contracted by SHAPE to conduct evaluation of the training and GSU research team contracted by SHAPE to conduct evaluation of the training and testing programtesting program

2011-2012 statewide implementation of required fitness assessments:2011-2012 statewide implementation of required fitness assessments:• All 182 districts participatedAll 182 districts participated

• ~3,000 teachers trained by state/national FG trainers~3,000 teachers trained by state/national FG trainers

• ~1.2 million students tested, with entered data on some/all FG tests~1.2 million students tested, with entered data on some/all FG tests

• GSU research team contracted by SHAPE to conduct evaluation of the training and GSU research team contracted by SHAPE to conduct evaluation of the training and testing programtesting program

44

Teacher Training for 2011-2012Teacher Training for 2011-2012Statewide ImplementationStatewide Implementation

~3,000 teachers trained by 6 state/national FG ~3,000 teachers trained by 6 state/national FG trainerstrainers

1-day, 8-hour training1-day, 8-hour training

Training manual developed by HealthMPowersTraining manual developed by HealthMPowers

Training mostly on testing, some on data entryTraining mostly on testing, some on data entry

Data entry training done with on-line webinarsData entry training done with on-line webinars55

Subject Selection or 2011-2012Subject Selection or 2011-2012Statewide Implementation EvaluationStatewide Implementation Evaluation

Requests sent to 177 school district superintendentsRequests sent to 177 school district superintendents

27 districts agreed to allow their teachers to be asked to participate in 27 districts agreed to allow their teachers to be asked to participate in EPEP

Initial pool of 1050 teachers, taken from training sign-in logsInitial pool of 1050 teachers, taken from training sign-in logs

Random sample of 371 teachers: Distributed by location, district Random sample of 371 teachers: Distributed by location, district enrollment, school type (ES, MS, HS), gender, experienceenrollment, school type (ES, MS, HS), gender, experience

Final pool of 351 teachers with usable email contact addressesFinal pool of 351 teachers with usable email contact addresses

N of teacher participants varied by evaluation componentN of teacher participants varied by evaluation component

66

Scope of the 2011-2012Scope of the 2011-2012Evaluation ProjectEvaluation Project

1.1. Fitness testing trainingFitness testing training

2.2. Fitness test administrationFitness test administration

3.3. Preparing and distributing fitness assessment reportsPreparing and distributing fitness assessment reports

4.4. Teacher, student, and parent/guardian perceptions of Teacher, student, and parent/guardian perceptions of fitness testingfitness testing

5.5. Recommendations for future assessmentsRecommendations for future assessments77

Evaluation of Fitness Testing TrainingEvaluation of Fitness Testing Training

Four data sources were used to evaluate the fitness testing Four data sources were used to evaluate the fitness testing training:training:

1.1.““Ticket out the Door” (comment slips immediately after each training session) (n Ticket out the Door” (comment slips immediately after each training session) (n = 331)= 331)

2.2.Teacher knowledge test before and immediately following training (n = 331)Teacher knowledge test before and immediately following training (n = 331)

3.3.Teacher responses on surveys after completion of testing (n = 71)Teacher responses on surveys after completion of testing (n = 71)

4.4.Teacher comments on surveys after completion of testing (n = 157)Teacher comments on surveys after completion of testing (n = 157)

88

““Ticket Out The Door”Ticket Out The Door”

““Actually seeing the test and getting a chance to test on each other” (multiple Actually seeing the test and getting a chance to test on each other” (multiple comments)comments)

““Every aspect!” or “Everything” (by many teachers)Every aspect!” or “Everything” (by many teachers)

““Hands on activities, pre-written letters for parents, score sheets.”Hands on activities, pre-written letters for parents, score sheets.”

““Helpful hints on how to administer the test”Helpful hints on how to administer the test”

““Seeing the test and the protocol [trainer] modeled useful teaching techniques”Seeing the test and the protocol [trainer] modeled useful teaching techniques”

““[trainer’s name ] knowledge of the material”[trainer’s name ] knowledge of the material”

““The training manual and the powerpoint”The training manual and the powerpoint” 99

Teacher Knowledge Test After TrainingTeacher Knowledge Test After Training

Teachers were given a short test of their knowledge of the Teachers were given a short test of their knowledge of the fitness testing procedures, applicable policies, and testing fitness testing procedures, applicable policies, and testing requirements before and then immediately after each training requirements before and then immediately after each training session.session.

Pre-training mean score correct = 58.0% (50% for pilot Pre-training mean score correct = 58.0% (50% for pilot

year teachers)year teachers)

Post-training mean score correct = 77.0%Post-training mean score correct = 77.0%

1010

Evaluation of Fitness TestingEvaluation of Fitness Testing

Five data sources were used to evaluate the fitness Five data sources were used to evaluate the fitness testing:testing:1.1.Observations of teachers’ adherence to test protocolsObservations of teachers’ adherence to test protocols

2.2.Observations for test reliabilityObservations for test reliability

3.3.Time analysis for testing, data entry and report generation Time analysis for testing, data entry and report generation and distributionand distribution

4.4.Teacher comments on surveys after completion of testingTeacher comments on surveys after completion of testing

5.5.Focus group interviews with teachers and studentsFocus group interviews with teachers and students 1111

Teachers’ Adherence to Test ProtocolsTeachers’ Adherence to Test Protocols

Test component

N of students

Observed

N of itemson checklist

Mean % included and

correct

Low -- High %

Push Ups 271 7 92.8 42.9 -- 100

PACER 454 6 84.0 44.4 – 100

Sit and Reach 557 9 78.9 57.1 -- 100

Curl Ups 426 11 94.5 36.4 -- 100

Height 130 6 87.6 83.3 – 100

Weight 184 4 82.9 50.0 -- 100

1212

Observations for Test ReliabilityObservations for Test Reliability

““Expert” Observer TrainingExpert” Observer Training• 8 GSU graduate students (PETE and Exercise Science)8 GSU graduate students (PETE and Exercise Science)

• 2 GSU faculty trainers2 GSU faculty trainers

• On-campus training (similar to the 8-hour teacher training)On-campus training (similar to the 8-hour teacher training)

• On-campus training for observation/data collectionOn-campus training for observation/data collection

• School-based (high school) observation of all test School-based (high school) observation of all test componentscomponents

• Video-based form break training (curl-ups, push ups, and sit Video-based form break training (curl-ups, push ups, and sit and reach)and reach)

• All observers had >80% IOA before collecting data in All observers had >80% IOA before collecting data in schoolsschools

1313

Observations for Test ReliabilityObservations for Test Reliability

On site observations by expert observers:On site observations by expert observers:• Prior arrangements with teacher and signed consentPrior arrangements with teacher and signed consent

• Conducted checklist for adherence to testing protocolsConducted checklist for adherence to testing protocols

• Observed student/s being testedObserved student/s being tested

• Recorded their “expert” score and the student’s “official” score (what Recorded their “expert” score and the student’s “official” score (what was used by the teacher in that student’s report)was used by the teacher in that student’s report)

• Recorded the number of students being tested at one timeRecorded the number of students being tested at one time

• Recorded what non-tested students were doingRecorded what non-tested students were doing

• Recorded the duration of each testRecorded the duration of each test

• Recorded who the “counter” was (self-, peer, PE teacher, parapro, Recorded who the “counter” was (self-, peer, PE teacher, parapro, volunteer)volunteer)

• Did not Did not report student age and genderreport student age and gender1414

Observations for Test ReliabilityObservations for Test Reliability

Agreement definitionsAgreement definitions

Test component

Unit of measurement

Reliability Agreement Range

Push ups Number completed correctly +/- 1

Curl ups Number completed correctly +/- 1

Sit and reach .50 inch +/- .50 inch

PACER Laps completed +/- 1

Height .25 in. +/- .25 in.

Weight .25 lbs +/- .25 lbs.

1515

Observations for Test ReliabilityObservations for Test Reliability

 

Test componentScorer/Reporter

(n scores reported)

Agreements with expert observers’

scores

Mean variation from expert

observers’ scores

Height PE Teacher (130) 98.1% - 1.8%Weight PE Teacher (184) 100% 0.0%Sit and Reach PE Teacher (557) 71.8% + 0.05% 

 PACER

All (454) 72.5% + 15.1%PE Teacher (186) 70.4% + 29.2%Other Teacher (18) 88.9% - 1.1%Peer Student (200) 66.0% + 6.8%Self reported (50) 60.0% + 10.4%

 

 Push Ups

All (271) 45.7% + 90.3%PE Teacher (146) 65.8% + 32.8%Paraprofessional (15) 33.3% + 226.5%Peer Student (110) 20.9% + 198.4%

 

 Curl Ups

All (426) 56.9% + 45.7%PE Teacher (208) 69.7% + 13.6%Paraprofessional (19) 63.2% + 89.0%Peer Student (194) 42.3% + 72.0%Self reported (5) 0.0% + 550.0%

1616

Observations for Test ReliabilityObservations for Test Reliability

1. On those tests that the PE teachers scored and recorded the performance of one student at a time (height, weight), the data are extremely reliable;

2. On Sit and reach, with 1 student tested at once by the teacher, the data are also not reliable (problem with multiple scales on box)

3. On those tests that the PE teachers shared the responsibility for scoring/reporting and multiple students were tested at once, the reliability of the data in unacceptable (PACER, 72.5%; Push ups, 45.7%; Curl ups, 45.7%)

4. Student peer and self-reported scores were extremely unreliable on PACER, Push ups and Curl ups;

5. With the exception of “Other teachers” on PACER the means for scorers/recorders on PACER, Push up and Curl up tests overestimated actual performance, often by significant amounts.

1717

Compliance with Testing GuidelinesCompliance with Testing Guidelines

Test component

RecommendedNumber to Test at

One time

Compliance with Recommendations

Height 1 100%

Sit and reach 1 100%

Weight 1 100%

Curl ups No more than 4 64.5%

Push ups No more than 4 54.6%

PACER No more than 6 41.4%

1818

Involvement by Non-Tested StudentsInvolvement by Non-Tested Students

  Test

SittingAnd/or Waiting

Counting other students

Engaged in physical activity or lesson content

Sit and Reach 52.9% 0.0% 47.1%

Height 57.1% 0.0% 42.9%

Weight 63.6% 0.0% 36.4%

Curl Ups 32.5% 48.4% 19.4%

PACER 48.3% 35.5% 16.1%

Push Ups 46.2% 46.2% 7.6%

All tests combined 46.3% 30.9% 22.8%

1919

Time analysis for testing, data entry and Time analysis for testing, data entry and report generation and distribution*report generation and distribution*

  Indicator Elem. MS HS

Class size Mean 41.4 32.5 22.6

PE Class time Mean 44.5 mins 66.6 mins 74.3 mins

PE classes/week Mode 2 5 5

PE class days to complete testing

Mode 10(5 weeks)

10(2 weeks)

3(< 1 week)

Percent of PEInstructional time

Mean 14%(Annual)

11%(9 weeks)

3.5%(9 weeks)

Data entry Mean 157 mins 62.7 mins 64.5 mins

Report prep and distribution

Mean 105 mins 62.7 mins 110.5 mins

*Based on 1 intact class of tested students, identified by each teacher*Based on 1 intact class of tested students, identified by each teacher

2020

Teacher comments on surveys after completion of testing

On their training:

“The training was excellent.”

“Practicing with the students helped work out the kinks. The training also helped because we got to see exactly what was expected of the students during the test.”

“I have had experience with conducting the FITNESSGRAM in 1994-95. HOWEVER, the training was a refresher course and was helpful towards collecting data from students.”

2121

Teacher comments on surveys after completion of testing

On conducting testing in their schools:

“There were no problems.” (many same or similar responses on all tests)

“It is hard to watch for all the form errors that might occur when testing more than 2 students per trained adult.” (Curl Ups)

“It is just time-consuming. To test with fidelity, one student per adult was the most that could be tested at a time.” (Push Ups)

“Keeping non testing students engaged and supervised while testing. It was difficult to count and supervise at the same time.”

“It was difficult to motivate some students to do their best. A few students walked a good bit even after being encouraged to run at a steady pace and walk only if they needed to.”

2222

Teacher comments on surveys after completion of testing

On data entry:

“The data entry was the worst part, it was not organized like it has been in the past. I like it better when only my students are on a list. It is too much to have to sort through 20 different classes. I hated it [data entry] this year.”

“I had to use pencil and paper to record my scores, then take them back to the office to input data, so it took twice as long. It would have better if I had a device (like an Ipad) to enter scores directly into the program.”

“Instead of only my class appearing to enter data, all physical education teachers’ students were on the list. I accidentally deleted my coworkers' scores because I thought only my class was listed on my log in.”

“The entire process is time consuming. More training is definitely needed. We were actually taught HOW to administer the test, but data entry was where the training REALLY needed to be.” 2323

Teacher Comments on Surveys After Completion of Testing

On report preparation and distribution“While generating reports, I had several student reports that were not printed. I discovered that this was due to the fact that these students had a report in another class in which they were tested. Even though the testing data was shared between teachers, it was a problem printing the reports. It would only print with one teacher's name and not both.”

“Generating reports--very slow and labor-intensive. I finally printed them out from home on a Sunday evening. This was much quicker than trying to print them from the FITNESSGRAM (sic) website during my 30-minute planning period. The software would often time out before the info to be printed was loaded.”

“Additionally, I couldn't sort it because the name of the teacher (my identifier) is not listed at the top of the report when printed out. So unless you have an awesome memory once the reports print all mixed up there is no way to determine what students were in what class. I also had some issues with how the report came out based upon Spanish or not. The school I teach in is majority Hispanic. I got it figured out but it was through trial and error.”

2424

Teacher Confidence in Accurate Scoring

Level of Confidence After Training/Before Testing

After Testing

1. Not at all confident 0.3% 0.0%

2. 0.3% 1.7%

3. Somewhat confident 3.0% 20.3%

4. 34.0% 52.5%

5. Extremely confident 62.3% 25.4%

Mean confidence score 4.58

N = 331

4.02**

N = 71

** p. < .0002525

Focus Group Interviews with Teachers and Students

From 6 different districts around the state

57 PE teachers

56 students (upper elementary and middle school)

Semi-structured interviews, transcribed and analyzed with NVivo Qualitative Software

2626

Focus Group Interviews with Students

Overall Experience.

Focus groups facilitators asked students to tell them about the best and worst parts of the testing. A majority of children reported that fitness testing made them feel good about themselves. Focus groups showed that children seemed to enjoy testing more as they saw themselves progress in their physical abilities. One student echoed a response heard from many, “…we set a goal or we had our own personal goal, but if we exceeded that, we felt really good…I like the personal boost you get from doing well.” An elementary student reported, “I like the test, because if you do the test you’ll have a healthy life and you can do more stuff than you used to be able to do.”

2727

Focus Group Interviews with Students

Communication to them about the Fitness Assessments

Students had a clear understanding of the purpose of the testing (as communicated by teachers), and received a clear explanation of the reason for fitness testing. Many had a sense of the value of fitness in their lives. Students expressed an understanding that the “state is trying to get a good assessment of the general health of the school’s population…” or “trying to find out how healthy is the state of Georgia.” Students expressed that they felt informed and prepared before testing began. One student commented “they didn’t just throw it at us. We knew long before that we were going to have to do this.”

2828

Focus Group Interviews with Students

Time for Conducting Testing

The amount of time perceived by students to complete fitness testing varied greatly among districts. Some perceived testing to have only “taken three days so it wasn’t that bad…” Some students “didn’t mind doing it at all…” Others felt that time taken to do testing detracted from regular PE and was not as enjoyable for that reason. One student commented, “Games are the best part…but we only got to play [games] half the time.” For a fair amount of students this was a reason that they did not like testing, whereas others did not mind and enjoyed the time taken to test. Students recommended cutting down on the time spent testing, however did not provide recommendation on how to do so.

2929

Focus Group Interviews with StudentsAccuracy of Data

It was widely reported among students that peer testing generated inaccurate testing results. One student remarked, “…if it was your friend spotting you, they’d let you slide… and we’re all kind of friends, so the numbers might not be terribly accurate. I think with like a peer review, it’s not very accurate because your friends cheat for you all the time.” Students recommend not using peer testers, because they tend to report inaccurate scores for their friends. Students also expressed frustration when they realized scores were not being recorded accurately, or when obvious cheating was occurring. Students recommended testing students in smaller groups to avoid cheating. Cheating was particularly obvious to some students when classes were testing in large groups. 3030

Focus Group Interviews with Students

Make Fitness Testing More Enjoyable

Students had general recommendations of how to improve fitness testing for students. One student commented that “girls don’t like that everyone is watching them, so they don’t go as far as they can.” This same student, among others, commented that perhaps separating testing for boys and girls would improve girls’ testing experiences. Students recommended that fitness testing could be done in another way, “…ya’ll could make a game out of this fitness test. That would be cool. So we don’t actually know we’re doing the fitness test.”

3131

Focus Group Interviews with Teachers

Overall Experience:

The teachers who participated in the focus groups were generally supportive of FITNESSGRAM® testing. Overall, it was viewed as important and as an improvement upon the Presidential Fitness Test. Teachers overwhelmingly agreed that the strength of the program was that tests were based on individual goals, and that this contributed to children developing self-esteem, confidence, and in some cases highlighted the accomplishments of students who may not have realized what they could achieve. Teachers expressed general satisfaction with training, preparedness, and administrative support provided to them to implement testing. They also discussed some of the challenges, and had suggestions for improvements.

3232

Focus Group Interviews with Teachers

Primary Challenge 1: Time Required for Testing

Teachers in elementary and middle grades unanimously expressed concern regarding the large volume of time required to complete fitness testing. Teachers noted that field days, CRCTs, tutoring, and other kind of disruptions frequently challenged their ability to test. One teacher complained, “We don’t have enough time to get this done. That is the biggest challenge.” Time challenges varied according to grade level. Middle school had a particular challenge because they received a new class every nine weeks and had to test every new student who enters at that time. Elementary teachers were challenged because “[young students] need time; they need help and it really goes slow.” High school classes, on the other hand, seemed to be able “roll really quickly.” A number of teachers commented that it would be valuable to test at the beginning and end of the year, but added that it would be difficult due to time constraints.

3333

Focus Group Interviews with Teachers

Primary Challenge 2: Software Issues

A very clear theme of the focus groups was that teachers faced challenges with the FITNESSGRAM® software, with entering, maintaining, and printing data. A number of teachers described the software as “aggravating.” School district technology staff support was available but often was slow to respond due to dealing with many complaints or issues. Some teachers commented that data entry “once it got rolling, was a piece of cake.” A few teachers described their use of the Ipad for data entry, explaining that the software did not work well with it. Most significantly, teachers overwhelmingly reported wanting to have control over their class rosters, to be able to add and remove students as needed. Teachers described their frustration when students whose names appeared on their roster were “not even enrolled in our school.”

3434

Focus Group Interviews with Teachers

Primary Challenge 3: Achieving Accurate Scores

Many teachers were concerned with the accuracy of data. It was difficult to ensure standardized readings on the push-ups and curl-ups. Teachers also explained that because this was the first year of testing and students did not know how to properly do curl-ups and push-ups. According to one teacher, “it was a matter of just getting it done, the data will reflect that.” Also, teachers reported that they did not have confidence in peer testers, that “there [were] definitely some inconsistencies with that.” Another teacher commented that human error would always be a factor in testing, “…one person is going to implement and test them in a certain way, and then another teacher may do it another way. And so the data that you get for one class may be totally differently skewed from what you would get from another teacher.” 3535

What Have We Learned In Georgia?

• It is possible to train almost every physical education teacher in a state as large as Georgia

•It takes more than good training to get reliable data

•We know that some teachers did a really good job, but other teachers didn’t

•In order to get consistently reliable data, teachers need to comply with the recommended testing group sizes

•Kids can’t be trusted to count for peers or themselves

•Accuracy vs time tradeoff

•There needs to be a high degree of coordination between FG software and district technology coordinators

•Teachers need a voice about how fitness testing (and the reporting of results) will be administered

3636