About CSB & SJU | Academics | Admission | Alumnae/i and Friends | Arts and Culture | News, Events and Sports | Student Life


Program Goal VIII: Assessment

As James Cooper (1999) has noted, assessment data is only useful if it contributes to sound educational decision-making. In order to use this data effectively, beginning teachers must have knowledge and skills in the areas of measurement fundamentals, standardized tests and their interpretation, validity and reliability, constructing and using the results of formal and informal assessments, and utilizing assessment information for the purposes of grading. Additionally, educators in Minnesota must be knowledgeable about the Minnesota Academic Standards and how to measure student progress toward meeting them. The paragraphs that follow describe the knowledge and skills we believe our students must obtain in order to make effective assessment decisions.

Measurement Fundamentals: Educators must understand the basic terminology and concepts related to assessment, evaluation, and measurement. The following concepts and definitions are of central importance to this understanding:

Assessment vs. Evaluation: Though the terms assessment and evaluation are often used interchangeably (Cooper, 1999), the Minnesota Department of Education differentiates between the two terms. They define assessment as gathering information or evidence while evaluation involves using that information or evidence to make judgments (Aune, 1999).

Teacher-Made vs. Standardized Assessments: In the broadest sense, assessments may be classified into two categories: teacher-made assessments and standardized assessments. Teacher-made assessments are constructed by an individual teacher or a group of teachers in order to measure the outcome of classroom instruction. Standardized assessments, on the other hand, are commercially prepared and have uniform procedures for administration and scoring. They are meant for gathering information on large groups of students in multiple settings (Karmel and Karmel, 1978).

Criterion-Referenced vs. Norm-Referenced Assessment: Standardized assessments may be norm-referenced or criterion referenced. Norm-referenced assessments compare individual students’ scores to those of a norm-reference group, generally students of the same grade or age. They are designed to demonstrate "differences between and among students to produce a dependable rank order" (Bond, 1996, p.1) and are often used to classify students for ability grouping or to help identify them for placement in special programs. They are also used to provide information to report to parents. Criterion-referenced tests, on the other hand, determine the specific knowledge and skills possessed by a student. Thus, this form of testing "uses as its interpretive frame of reference a specified content domain, rather than a specified population of persons" (Anastasi, 1976, p. 97). Competency tests, such as the Minnesota Basic Skills Tests, represent a specific sub-category of criterion-referenced assessment and are used to insure that students possess minimal basic skills (Biehler and Snowman, 1997).

Formative vs. Summative Evaluation: Formative evaluation involves "collecting, synthesizing, and interpreting data for the purpose of improving learning or teaching (Airasian, 1997, p. 402). Thus, formative assessment is used to provide feedback and not for grading. It typically occurs while instruction is ongoing. Summative evaluation, on the other hand, involves "collecting, synthesizing, and interpreting information for the purpose of determining pupil learning and assigning grades" (Airasian, 1997, p. 404). It typically occurs at the end of instruction.

Types of Standardized Tests: A description of all forms of standardized tests is far beyond the scope of this document. Therefore, only those standardized tests most central to educational decision-making, individual and group tests designed to measure intelligence and academic achievement, will be included in this brief summary.

Intelligence Tests: Intelligence tests are often classified into two categories: individual and group. Individual intelligence tests, such as the Stanford-Binet and Wechsler Scales, are given in a one-to-one setting by a trained examiner, usually a psychologist. They are most frequently given as part of an overall psychological evaluation, often to determine if a student is eligible for special education. Though group intelligence tests, such as the Otis-Lennon Mental Ability Tests, are "no longer administered to all students everywhere" (LeFrancois, 1999), some school districts still include them in their annual testing programs (Biehler and Snowman, 1997). Results of intelligence tests, whether group or individual, are often reported as intelligence quotients or IQ scores. Though IQ scores used to be determined by calculating a ratio (mental age divided by chronological age multiplied by 100, thus the term intelligence "quotient"), they are now typically calculated as standard scores with a mean of 100 and a standard deviation of about 15. Though the results of individual intelligence tests tend to be more valid and reliable than those of group tests, teachers must be aware of the following limitations of intelligence tests noted by Biehler and Snowman (1997): 1) They generally only sample abilities that relate to classroom achievement rather than overall intellectual functioning. Therefore, many educators prefer the term scholastic aptitude test; 2) Intelligence test results provide only an estimate of a child’s abilities to deal with "certain kinds of problems at a particular point in time" (p. 180). Because of this, their results can vary over multiple administrations; 3) The tests may not provide a valid estimate of the abilities of minority and low income children. Therefore, teachers must exercise caution when making educational decisions based on the results of these tests.

Traditional Tests of Academic Achievement: As is the case for intelligence tests, individual achievement tests such as the Wide Range Achievement Test and Peabody Individual Achievement Test, are most commonly administered to students who have been referred for possible placement in special education or remedial programs. Group achievement tests, such as the California Test of Basic Skills, on the other hand, are administered either annually or at planned intervals as part of the district-wide testing program in order to certify students’ achievement and provide information to parents (LeFrancois, 1999).

Assessment Data Interpretation: To understand and appropriately use the results of standardized assessments, educators must have a working knowledge of descriptive statistics, including measures of central tendency, measures of dispersion, norms, and standard scores.

Measures of central tendency include the mean, median, and mode. The mean is the arithmetic average and is obtained by adding all scores and dividing by the total number of scores. It is especially important because it is a necessary statistic for the calculation of standard scores. The median is the middle score of a distribution. In cases where there are extreme scores (or outliers), the median may be a better measure of central tendency than the mean. The mode is the most frequent score in a distribution. In large, normally-distributed populations, the mode does not differ greatly from the mean and median. However, in distributions with small numbers, it is typically the least useful of these three statistics (Glass and Stanley, 1970).

Measures of dispersion include the range and standard deviation. The range is the spread of scores in a distribution (Vogt, 1993). It may be calculated by subtracting the lowest score from the highest and adding one. It is, at best, a rather crude statistic because it is based on only the two most extreme scores in the distribution. The standard deviation is a more precise and useful measure of dispersion. It is calculated by finding the average of the absolute distance scores vary from the mean. Calculating the standard deviation is essential for determining standard scores such as those described below (Anastasi, 1988).

Norms: The results of standardized tests are often reported as norms, which are statistics that allow one to interpret the score of an individual student in comparison to others of the same age or grade level. They include percentiles (the percentage of scores in the norm-reference group that fall at or below that of a particular student), grade equivalents (which describe a student’s performance in terms of school grade levels), and standard scores such as Z-scores, T-scores, stanines, and I.Q. scores (all of which indicate how far a student’s score varies from the mean in standard deviation units). It is important for teachers to understand the meaning of norms so that they are able to use the results of standardized tests to interpret them to parents and effectively plan instruction.

Validity and Reliability: In order to make effective decisions based on assessment data, the instruments used to collect that data must be valid and reliable.

Validity is defined as "the extent to which a test measures what it intends to measure" (Lefrancois, 1999, p.560). The validity of a test may be determined in different ways including carefully examining the test’s content in regard to the curriculum (content validity), determining the degree to which test results reflect a theory or construct (construct validity), and examining the extent to which assessment results can be used to predict student performance (predictive validity). Of the various types of validity, content validity is by far the most important for classroom assessments, as it is essential that the content of teacher-made assessments accurately reflects the instructional outcomes being assessed.

Reliability refers to the consistency or stability of scores yielded by a test (Airasian, 1997). In other words, a reliable test yields consistent scores over repeated administrations (repeated-measures reliability). A concrete estimate of a test’s reliability is provided by the standard error of measurement, which is an index of the amount of error or unreliability in the scores yielded by a test. Though reliability and standard error of measurement are important considerations in choosing standardized tests, it is often difficult to determine the reliability and standard error of a teacher-made test. However, teachers should be knowledgeable regarding ways to increase the reliability of classroom assessments such as those described by Aiarasian (1997).

Constructing Formal Assessments: Formal, teacher-made assessments can be classified as traditional or alternative. Traditional assessments are typically paper-and-pencil tests and are often categorized as objective or essay. Alternative assessments are most often performance-based. The type of assessment one chooses is dependent on many factors including grade level, content, and time constraints. Regardless of the assessment format, good assessments have three features in common: "1) the assessment exercises and questions are related to the teacher’s objectives and instruction; 2) the exercises and questions cover a representative sample of what students were taught; and 3) the items, directions, and scoring procedures are clear and appropriate" (Airasian, 1997, p. 149). Additionally, as noted by Aune (2000), effective teachers typically employ multiple assessments, including both traditional and alternative.

Traditional Assessments: Objective assessments are traditional assessments on which students are expected to provide the one, correct answer. Typical objective assessment formats include multiple choice, true-false, matching, completion, and short answer. All of these assessments have the advantage of sampling students’ knowledge of a wide range of content (and, therefore, have the potential for good content validity) in a minimal amount of time. A major disadvantage of objective test items is that they typically measure only student learning at the knowledge and comprehension levels of Bloom’s Taxonomy. However, multiple choice items can be constructed to sample higher cognitive levels (Airasian, 1997). In order to construct effective objective assessments, teachers must be familiar with and apply guidelines for effective test construction such as those described by Nitko (1996), Oosterhof (1999), and Airasian (1997).

The Essay Test is another form of traditional, paper-and-pencil assessment. It is an excellent format for assessing students’ abilities to communicate ideas in writing. Other advantages of essay tests include measuring higher cognitive levels and directly measuring behaviors specified by performance objectives (Ooosterhof, 1999). However, essay tests have disadvantages as well in that they sample less content than objective tests and, therefore, may have poorer content validity. Additionally, essay tests are time-consuming to score, and their scoring tends to be less reliable (Oosterhof, 1999). Because of these potential disadvantages, it is especially important for teachers to follow appropriate guidelines for constructing and scoring essay tests such as those described by Nitko (1996), Oosterhof, 1999), and Aarasian (1997).

Alternative Assessments: The most common form of alternative assessment is performance assessment, which can be defined as an assessment activity requiring students to use their knowledge and skills to perform complex tasks or solve problems (Biehler and Snowman, 1997). Though the term authentic assessment is often used interchangeably with performance assessment, Oosterhof (1999) differentiates between the two, defining authentic assessments as tasks that require "a real application of a life skill beyond the instructional context" (p. 151). Therefore, according to Oosterhof, "All authentic assessments are performance assessments, but the inverse is not true" (p. 151). A major advantage of performance assessments is that they allow evaluation of skills that cannot be easily measured by paper-and-pencil tests. Additionally, they allow for evaluation of the process as well as the product. They are, however, time-consuming to administer and, therefore, typically do not allow one to sample a wide range of outcomes. Consistency of scoring (reliability) is also problematic (Oosterhof, 1999). Like essay tests, performance assessments can be scored analytically or holistically. Analytical scoring involves breaking the desired response into its component parts and using checklists or rating scales to calculate a score (Oosterhof, 1999). Holistic scoring is typically done using a scoring rubric which allows for comparisons of students’ work to "descriptions of performances that range from higher to lower" (Oosterhof, p. 160, 1999). As Wiggin’s has noted, "scoring rubrics must be based on a careful analysis of existing performances of varying quality" (p. 238).

Portfolios: According to Arter (1995), "a portfolio is a purposeful collection of student work that tells the story of student achievement or growth" (p. 1). Though portfolios are often considered to be a form of authentic or performance assessment, as Oosterhof (1999) noted, they typically include materials that are not authentic or performance-based. Portfolios are particularly useful for tracking change or growth in student performance over time and are most effective when they "are characterized by a clear vision of the student skills to be addressed, student involvement in selecting what goes into the portfolio, use of criteria to define quality performance and provide a basis for communication, and self-reflection through which students share what they think about their work, their learning environment, and themselves" (Arter, p. 4, 1995).

Informal Assessment: As noted by Oosterhof (1999), the majority of classroom assessments are informal in nature. They typically take place during instruction and allow the teacher to monitor student learning and make any necessary adjustments. These informal assessments most often take the form of observations and questions. Though these techniques are efficient and adaptable, their technical quality "tends to be inferior to techniques associated with formal assessments" (pp. 148 and 149). However, teachers can improve their use of informal questions by basing them on instructional goals, allowing sufficient wait-time, and recognizing the importance of teacher reactions to student answers (Oosterhof, 1999). The effectiveness of observations can be improved by keeping anecdotal records, using informal checklists (Oosterhof, 1999), and recognizing one’s own bias (Good and Brophy, 1984).

Reporting Grades: Grades serve the important purposes of communicating the extent to which learners have achieved classroom goals and communicating this information to students and their parents (Tombari and Borich (1999) suggest eight steps for teachers to follow when constructing a grading system: 1) Identify district policy; 2) Determine the meaning of each grading symbol; 3) Distinguish between reporting and grading factors; 4) Identify grade components; 5) Decide on component weights; 6) Determine how components will be combined; 7) Choose a method for calculating grades; and 8) Decide how to deal with borderline grades. Methods for calculating grades include norm-referenced (or relative) grading, criterion-referenced (or absolute) grading, and individual-referenced grading:

Norm-referenced (or relative) grading involves comparing "a pupil’s performance to that of other pupils in the class" (Airasian, p. 301, 1997). Grading "on the curve" is an example of norm-referenced grading.

Criterion-referenced (or absolute) grading involves comparing a "pupil’s performance to a predetermined standard of mastery" (Airasian, p. 301, 1997). Calculating grades based on fixed ranges of cumulative scores is an example of criterion-referenced grading (Tombari and Borich, 1999).

Individual-referenced grading involves comparing pupils’ performance to their perceived abilities. Though this form of grading is sometimes used for students with disabilities, Airasian, (1997) recommends that other grading approaches such as contract grading, IEP-based grading, or narrative grading be used for determining these students’ grades.

Of the three approaches for assigning letter grades, the use of criterion-referenced grading is most consistent with our departmental philosophy.

No Child Left Behind: The Federal No Child Left Behind Act (NCLB), which reauthorized the Elementary and Secondary Education Act (ESEA) of 1965, was signed into law by President Bush in January 2001. It has become an important factor in specifying the timing and nature of school-wide assessments.

The goals of NCLB include increasing accountability for states, school districts, and schools; providing greater choice for parents and students, particularly those attending low-performing or failing schools; giving local education agencies (LEAs) more flexibility in using Federal education money (mostly Title I funds); and placing a stronger emphasis on reading, especially for young children. Every state must comply with NCLB or lose their Title I and other Federal education funds (U.S. Department of Education, 2004).

Under the new law, all states must now hold schools and districts accountable for student proficiency gains by subgroups in math, reading, and eventually, science. To determine if students are proficient, all who are enrolled in grades 3-8 must be tested annually in reading and math by 2005-06. By 2007, all states must administer an annual assessment in science in at least one grade level: 3-5, 6-9, or 10-12. These tests are not directly related to the Minnesota graduation requirements. However, their content will be based on the new Minnesota Academic Standards in reading, math, and science.

Students will be categorized into the following subgroups for testing: low-income, racial groups (Caucasian, Hispanic, African-American, Native American, and Asian Pacific), those with disabilities, and those with limited English skills. Each of these sub-groups, as well as the total student body, must be proficient. By 2014, 100 percent of students must be proficient in reading and math. Between 2005 and 2014, 100 percent of students in all subgroups must meet the standards for adequate yearly progress (AYP) toward proficiency.

What is proficiency? Each state will set its own criteria for proficiency. In Minnesota, students are proficient when they have “grade-level knowledge and skills” (AFT, 2003).

Additionally, by 2005-2006 all states must ensure that every classroom teacher is “highly qualified.” To meet this criterion, teachers must be certified or licensed in their teaching areas, hold a bachelors degree, and have demonstrated competencies in their teaching areas.

Other aspects of the law focus on graduation rates and attendance. High schools must graduate at least 80% of their students or show sufficient progress toward this goal, and elementary/middle schools must have 90% attendance or be making progress toward this attendance level. Consequences for schools not meeting the goals of NCLB are described below:

After two consecutive years of not meeting one or more of the above goals, a school will be put on the “needs improvement” list, and the district must allow any student to transfer to another school in the district that is making AYP. The district must set aside 5-10 % of Title I funds to pay transportation costs for students who transfer. Priority must be given to the lowest-achieving students from low-income families.

After three consecutive years, the district must continue to provide choice AND provide low-achieving students with supplemental services (tutoring) paid for by the district. The district must set aside an additional 5-10% of Title I funds to pay for the tutoring. These services must occur outside the school day, and the district must ensure students have at least one private provider available when they seek supplemental services. 

After four consecutive years, the district must continue to offer school choice and supplemental instruction AND implement a “corrective action plan,” such as replacing staff and replacing curricula. 

After five consecutive years, the district must continue to offer school choice and supplemental services AND implement alternative school governance, such as converting to a charter school, hiring a private management firm, or having the state take over the school.

Whether the goals of NCLB are attainable remains to be seen. One thing is certain, however: NCLB has added another layer of high-stakes testing that students must complete.

Assessment of Performance Toward Meeting the Minnesota Standards: Teachers in Minnesota must be knowledgeable about the Minnesota Standards so that they can facilitate and assess student progress toward their attainment. The Minnesota statewide graduation requirements consist of two parts: the Basic Standards and the Minnesota Academic Standards.

Basic Standards: The Basic Standards specify the minimum skills students must possess in order to graduate from a public high school in Minnesota. These skills are assessed by the Basic Standards Tests, which  are competency tests designed to assure that students have basic skills in mathematics, reading, and composition. Students take the math and reading tests during 8th grade and the writing test during 10th grade. They must pass these tests in order to obtain a high school diploma. Those who do not pass a test on their first attempt may retake it annually until they achieve a passing score (Minnesota Department of Education, 2004).

The Minnesota Academic Standards define a new core of five academic content standards areas: language arts, mathematics, science, social studies and the arts. Standards for Mathematics, Language Arts, and Arts were adopted in a 2003 law. The 2004 Legislature adopted science and social studies standards. Each of the academic standards will be supplemented by grade-level benchmarks. These benchmarks will specify the academic knowledge and skills that students must achieve to complete a state standard and will determine the content of the Minnesota Comprehensive Assessments, which are used to assess compliance with the the federal No Child Left Behind Act

In addition to the core academic standards areas, there are several elective subject areas. School districts must create local elective standards and must offer elective courses covering health and physical education, vocational and technical education, and world languages. The law requires students to complete a specified number of course credits covering both core and elective subject areas (Minnesota Legislative Reference Library, 2003).
 

References:

AFT (2003) State-by-State Resources: Minnesota. http://www.aft.org/esea/states/MN.html

Arter, J. (1995). Portfolios for assessment and instruction. Greensboro, NC: ERIC Clearinghouse on Guidance. (http://www.ed.gov/databases/ERIC_Digests/ed388890.html).

Airasian, P. (1997). Classroom assessment (3rd Edition). New York: Macmillan.

Anastasi, A. (1988). Psychological testing (6th Edition). New York: Collier Macmillan Publishers.

Aune, B. (1999) Minnesota graduation standards: Teacher education program characteristics and teacher competencies. St.. Paul: Minnesota Department of Children, Families, and Learning.

Aune, B. (2000) Minnesota graduation standards. St. Paul: Minnesota Department of Children, Families, and Learning.

Aune, B. (2000) Links between research and reform. St. Paul: Minnesota Department of Children, Families, and Learniing.

Bierhler, R. and Snowman, J. (1997). Psychology applied to teaching (8th Edition). Boston: Houghton-Mifflin.

Bond, L. (1996). Norm- and criterion-referenced testing. Washington, D.C. Eric/AE Digest. Frisbee, D. and Waltman, K. (1992). 

Cooper, James M. (1999). The teacher as a decision maker. In classroom teaching skills. (6th edition) James M. Cooper (editor) pp. 1-19. Boston: Houghton-Mifflin. 

Glass, G. and Stanley, J. (1970). Statistical methods in education and psychology. Engelwood Cliffs, NJ: Prentice-Hall, Inc.

Good, T.L.  and Brophy, J.E. (1984) Looking in classrooms. NY: Harper & Row.

Karmel, L. and Karmel, M. (1978). Measurement and evaluation in the schools (2nd Edition). New York: Macmillan Publishers.

Lefrancois, G. (1999). Psychology applied to teaching (10th Edition).Belmont, CA: Wadsworth.

Minnesota Department of Education. (2002). The No Child Left Behind Act of 2001. http://education.state.mn.us/content/010993.pdf

Minnesota Department of Education (2004). Graduation Requirements. http://education.state.mn.us/html/intro_acad_prof_grad.htm

Minnesota Legislative Resource Library.  Resources on Minnesota Issues Academic Standards), September 2003. (7/8/2004) Available at http://www.leg.state.mn.us/lrl/issues/grad2.asp

Nitko, A. (1996). Educational assessment of students (2nd. Ed.). Columbus, Ohio: Merrill Prentice Hall.

Oosterhof, A. (1999). Developing and using classroom assessments (2nd Edition). Upper Saddle River, NJ: Merrill Prentice Hall.

Tombari, M. and Borich, G. (1999). Authentic assessment in the classroom. Upper Saddle River, NJ: Prentice-Hall, Inc.

U.S. Department of Education (2004). No Child Left Behind

U.S. Department of Education (2004). Introduction: No Child left Behind. http://www.ed.gov/nclb/overview/intro/index.html

Vogt, W. P. (1993).Dictionary of statistics and methodology. Newbury Park, CA: Sage.

Wiggins, G. (1993). Assessing student performance. San Francisco, CA: Jossey-Bass Publishers.