(P.S.: Review this and all STEPPE/Hedbring reference
lists often; they are in constant flux
as new literature is uncovered, recommended, or encountered. We
are all in this together, so please be quick and eager to point
out errors or additional literature you find compelling. Thank
you.)
Abelson, R.P. (1995). Statistics as principled argument.
Hillsdale, NJ: Erlbaum.
Airasian, P., & Madaus, G. (1983). Linking, testing and
instruction: Policy issues. Journal of Educational Measurement,
20, 103-118.
Algozzine, B. (1978). A statistical interpretation primer for
the practitioner (test of means). Exceptional Children, 45, 94-98.
American Psychiatric Association. (1987). Diagnostic and
statistical manual of mental disorders (3rd ed., rev.)
(DSM-III-R). Washington, DC: Author.
American Psychiatric Association. (1994). Diagnostic and
statistical manual of mental disorders (4th ed.). Washington, DC:
Author.
Anderson, N.H. (1961). Scales and statistics: Parametric and
nonparametric. Psychological Bulletin, 58, 305-316.
Angoff, W.H. (1988). Proposals for theoretical and applied
development in measurement. Applied Measurement in Education, 1,
215-222.
Barrett, B.H., Johnston, J.M., & Pennypacker, H.S. (1986).
Behavior: Its units, dimensions, and measurement. In R.O. Nelson
& S.C. Hayes (Eds.), Conceptual foundations of behavioral
assessment (pp. 156-200). New York: Guilford.
Bennett, R.E., Rock, D.A., & Wang, M. (1991). Equivalence
of free-response and multiple-choice items. Journal of Educational
Measurement, 28, 77-92.
Berk, R.A. (1979). Generalizability of behavioral observations:
A clarification of interobserver agreement and interobserver
reliability. Journal of Mental Deficiency, 83, 460-472.
Berk, R.A. (1980). Criterion-referenced measurement: The state
of the art. Baltimore: The Johns Hopkins University Press.
Beyer, W.H. (1966). Handbook of tables for probability and
statistics. Cleveland, OH: The Chemical Rubber Company.
Bickel, P.J., & Lehmann, E.L. (1975). Descriptive
statistics for nonparametric models. II. Location. Annals of
Statistics, 3, 1045-1069.
Bickel, P.J., & Lehmann, E.L. (1976). Descriptive
statistics for nonparametric models. III. Dispersion. Annals of
Statistics, 43, 1139-1158.
Bock, R.D. (1975). Multivariate statistical methods in
behavioral research. New York: McGraw-Hill.
Bolton, B. (Ed.). (1987). Handbook of measurement and
evaluation in rehabilitation. Baltimore: Brookes.
Boring, E.G. (1920). The logic of the normal law of error in
mental measurement. American Journal of Psychology, 31, 1-33.
Bracht, G.H., & Glass, G.V. (1968). The external validity
of experiments. American Education Research Journal, 5, 437-474.
Bradley, J.W. (1977). A common situation conducive to bizarre
distribution shapes. The American Statistician, 31, 147-150.
Brant, R. (1990). Comparing classical and resistant outlier
rules. Journal of the American Statistical Association, 85,
1083-1090.
Brinberg, D., & McGrath, J.E. (1985). Validity and the
research process. Newbury Park, CA: Sage.
Buros, O.K. (1968). The story behind the Mental Measurements
Yearbooks. Measurement and Evaluation in Guidance, 1, 86-95.
Buros, O.K. (1978). The Eighth Mental Measurements Yearbook.
Highland Park, IL: Gryphon.
Campbell, D., & Campbell, M. (1995). The student's guide to
doing research on the internet. Reading, MA: Addison-Wesley.
Campbell, D.T. (1957). Factors relevant to the validity of
experiments in social settings. Psychological Bulletin, 54,
297-312.
Campbell, D.T. (1960). Blind variation and selective retention
in creative thought as in other knowledge processes. Psychological
Review, 67, 380-400.
Campbell, D.T. (1963). From description to experimentation:
Interpreting trends as quasi-experiments. In C.W. Harris (Ed.),
Problems in measuring change (pp. 212-242). Madison, WI:
University of Wisconsin Press.
Campbell, D.T. (1969). Reforms as experiments. American
Psychologist, 24, 409- 429.
Campbell, D.T. (1970). Considering the case against
experimental evaluations of social innovations. Administrative
Science Quarterly, 15, 110-113.
Campbell, D.T. (1975). "Degrees of freedom" and the
case study. Comparative Political Studies, 8, 178-193.
Campbell, D.T. (1988). Methodology and epistemology for social
science: Selected papers. Chicago: University of Chicago Press.
Campbell, D.T., & Stanley, J.C. (1963). Experimental and
quasi-experimental designs for research on teaching. In N.L. Gage
(Ed.), Handbook of Research on Teaching (pp. 171-246). Chicago:
Rand McNally. (Also published in 1966 as Experimental and
Quasi-experimental designs for research. Chicago: Rand McNally.)
Carver, R.P. (1978). The case against statistical significance
testing. Harvard Educational Review, 48, 378-399.
Cheng, B. & Titterington, D. M. (1994). Neural networks: A
review from a statistical perspective. Statistical Sciences, 9(1),
2-54.
Cherkassky, V., Friedman, J.H., & Wescheler, H. (Eds.).
(1994). From statistics to neural networks: Theory and pattern
recognition applications. New York: Springer-Verlag.
Cliff, N. (1993). Dominance statistics: Ordinal analyses to
answer ordinal questions. Psychological Bulletin, 114, 494-509.
Cohen, J. (1960). A coefficient of agreement for nominal
scales. Educational and Psychological Measurement, 20, 37-46.
Cohen, J. (1962). The statistical power of abnormal-social
psychological research: A review. Journal of Abnormal and Social
Psychology, 65, 145-153.
Cohen, J. (1969). Statistical power analysis for the social
sciences. New York: Academic Press.
Cohen, J. (1977). Statistical power analysis for the behavioral
sciences. New York: Academic Press.
Cohen, J. (1988). Statistical power analysis for the behavioral
sciences (2nd ed.). Hillsdale, NJ: Erlbaum.
Conoley, J., & Kramer, J. (Eds.). (1989). The tenth mental
measurements yearbook. Lincoln, NE: The University of Nebraska
Press.
Cooper, J.O. (1974). Measurement and analysis of behavioral
techniques. Columbus, OH: Charles E. Merrill.
Cooper, J.O., & Johnson, J. (1979). Guideline for direct
and continuous measurement of academic behavior. The Directive
Teacher, 1(3), 10-12.
Cronbach, L.J. (1957). The two disciplines of scientific
psychology. American Psychologist, 12, 671-684.
Cronbach, L.J. (1975). Beyond the two disciplines of scientific
psychology. American Psychologist, 30, 116-136.
Cronbach, L.J. (1975). Five decades of public controversy over
mental testing. American Psychologist, 30, 1-14.
Cronbach, L.J. (1982). Designing evaluations of educational and
social programs. San Francisco: Jossey-Bass.
Cronbach, L.J. (1988). Five perspectives on the validity
argument. In H. Waner & H.I. Braun (Eds.), Test validity (pp.
3-17). Hillsdale, NJ: Erlbaum.
Cronbach, L.J. (1990). Essentials of psychological testing (5th
ed.). New York: Harper & Row.
Cronbach, L.J., & Furby, L. (1970). How we should measure
"change"--or should we? Psychological Bulletin, 74,
68-80.
Cronbach, L.J., & Meehl, P.E. (1955). Construct validity in
psychological tests. Psychological Bulletin, 52, 281-302.
Cronbach, L.J., & Shapiro, K. (1982). The limited research
of internal validity. In L.J. Cronbach & K. Shapiro (Eds.),
Designing evaluations of educational and social programs (pp.
112-157). San Francisco: Jossey-Bass.
Cronbach, L.J., & Snow, R.E. (1977). Aptitudes and
instructional methods: A handbook on research for interactions.
New York: Irvington Press.
Cronbach, L.J., & Snow, R.E. (1977). Interaction of
abilities with variations in instructional programming. In L.J.
Cronbach & R.E. Snow, Aptitudes and instructional methods: A
handbook for research on interactions (pp. 175-215). New York:
Irvington Press.
Cronbach, L.J., & Snow, R.E. (1977). Interaction of
abilities with variations in curriculum and instruction. In L.J.
Cronbach & R.E. Snow, Aptitudes and instructional methods: A
handbook for research on interactions (pp. 295-340). New York:
Irvington Press.
Cronbach, L.J., & Snow, R.E. (1977). Cognitive skills,
structures, and styles. In L.J. Cronbach & R.E. Snow,
Aptitudes and instructional methods: A handbook for research on
interactions (pp. 341-391). New York: Irvington Press.
Cronbach, L.J., & Snow, R.E. (1977). What do we know about
ATI? What should we learn? In L.J. Cronbach & R.E. Snow,
Aptitudes and instructional methods: A handbook for research on
interactions (pp. 492-522). New York: Irvington Press.
Davies, L., & Gather, U. (1993). The identification of
multiple outliers. Journal of the American Statistical
Association, 88, 782-792.
Dixon, W.J., Brown, M.B., Engelman, L., & Jenrich, R.I.
(Eds.). (1990). BMDP statistical software manual (Vol. 1).
Berkeley, CA: University of California Press.
Draper, D. (1995). Inference and hierarchical modeling in the
social sciences. Journal of Educational and Behavioral Statistics,
20, 115-147.
Ebel, R.L. (1972). Essentials of educational measurement.
Englewood Cliffs, NJ: Prentice-Hall.
Eisenhart, M., & Howe, K. (1992). Validity in educational
research. In M.D. LeCompte, W.L. Millroy, & J. Preissle
(Eds.), The handbook of qualitative research in education (pp.
643-680). New York: Academic Press.
Eisner, E.W. (1981). On the differences between scientific and
artistic approaches to qualitative research. Educational
Researcher, 10, 5-9.
Eisner, E.W. (1984). Can educational research inform
educational practice? Phi Delta Kappan, 65, 447-452.
Frick, T., & Semmel, M.I. (1978). Observer agreement and
reliabilities of classroom observational measures. Review of
Educational Research, 48, 157-184.
Fuchs, L.S., & Deno, S.L. (1992). Paradigmatic distinctions
between instructionally relevant measurement models. Exceptional
Children, 57, 488-501.
Gentry, D., & Haring, N. (1976). Essentials of performance
measurement. In N. Haring & L. Brown (Eds.), Teaching the
severely handicapped (Vol. 1, pp. 209- 236). New York: Grune &
Stratton.
Gitomer, D.H. (1993). Performance assessment and educational
measurement. In R. E. Bennett & W.C. Ward (Eds.), Construction
versus choice in cognitive measurement (pp. 241-263). Hillsdale,
NJ: Erlbaum.
Glaser, R. (1963). Instructional technology and the measurement
of learning outcomes: Some questions. American Psychologist, 18,
519-521.
Glaser, R., Lesgold, A., & Lajoie, S. (1985). Toward a
cognitive theory for the measurement of achievement. In R.R.
Ronning, J. Glover, J.C. Conoley, & J.C. Witt (Eds.), The
influence of cognitive psychology on testing and measurement (pp.
41-85). Hillsdale, NJ: Erlbaum.
Glass, G.V. (1978). Standards and criteria. Journal of
Educational Measurement, 15, 237-261.
Glass, G.V., & Hopkins, D.D. (1984). Statistical methods in
education and psychology (2nd ed.). Englewood Cliffs, NJ:
Prentice-Hall.
Green, B.F. (1988). Critical problems in computer-based
psychological measurement. Applied Measurement in Education, 1,
223-231.
Green, B.F. (1987). Construct validity of computer-based tests.
In H. Wainer & H.I. Braun (Eds.), Test validity (pp. 77-86).
Hillsdale, NJ: Lawrence Erlbaum.
Gronlund, N.E. (1985). Measurement and evaluation in teaching
(5th ed.). New York: Macmillan.
Haertel, E. (1985). Construct validity and criterion-referenced
testing. Review of Educational Research, 55, 23-46.
Hambleton, R.K., Swaminathan, H., Algina, J., & Coulson,
D.B. (1978). Criterion- referenced testing and measurement: A
review of technical issues and developments. Review of Educational
Research, 48, 1-47.
Hartmann, D.P. (1977). Considerations in the choice of
interobserver reliability estimates. Journal of Applied Behavior
Analysis, 10, 103-116.
Hedges, L.V., & Olkin, I. (1985). Statistical methods for
meta-analysis. New York: Academic Press.
Hinkle, D.E., Wiersma, W., & Jurs, S.G. (1979). Applied
statistics for the behavioral sciences. Chicago: Rand McNally.
Hoge, R.D. (1985). The validity of direct observation measure
of pupil classroom behavior. Review of Educational Research, 55,
469-483.
Hopkins, B.L., & Hermann, J.A. (1977). Evaluating
interobserver reliability of interval data. Journal of Applied
Behavior Analysis, 10, 121-126.
Hopkins, K.D. (1982). The unit of analysis: Group means versus
individual observations. American Educational Research Journal,
19(1), 5-18.
Hopkins, K.D., Stanley, J.C., & Hopkins, B.R. (1990).
Educational and psychological measurement and evaluation (7th
ed.). Englewood Cliffs, NJ: Prentice-Hall.
House, A., House, B., & Campbell, M. (1981). Measures of
interobserver agreement: Calculation formulas and distribution
effects. Journal of Behavioral Assessment, 3, 37-58.
Huber, P.J. (1981). Robust statistics. New York: Wiley.
Huck, S.W., Cormier, W.H., & Bounds, W.G. (1974). Reading
statistics and research. New York: Harper & Row.
Huck, S.W., Cormier, W.H., & Bounds, W.G. (1974).
Pseudo-experimental designs. In S.W. Huck, W.H. Cormier, & W.G.
Bounds, Reading statistics and research (pp. 226-240). New York:
Harper & Row.
Huck, S.W., Cormier, W.H., & Bounds, W.G. (1974). True
experimental designs. In S.W. Huck, W.H. Cormier, & W.G.
Bounds, Reading statistics and research (pp. 243-269). New York:
Harper & Row.
Jacobson, N.S., & Revenstorf, D. (1988). Statistics for
assessing the clinical significance of psychotherapy techniques:
Issues, problems, and new developments. Behavioral Assessment, 10,
133-145.
Jacobson, N.S., & Truax, P. (1991). Clinical significance:
A statistical approach to defining meaningful change in
psychotherapy research. Journal of Consulting and Clinical
Psychology, 59, 12-19.
Jones, L.V. (1988). Educational assessment as a promising area
for psychometric research. Applied Measurement in Education, 1,
233-241.
Jones, R.R., Weinrott, M., & Vaught, R.S. (1978). Effects
of serial dependency on the agreement between visual and
statistical inference. Journal of Applied Behavior Analysis, 11,
277-283.
Karweit, N., & Slavin, R.E. (1981). Measurement and
modeling choices in studies of time and learning. American
Educational Research Journal, 18, 157-171.
Kazdin, A.E. (1976). Statistical analyses for single-case
experimental designs. In M. Hersen & D.H. Barlow (Eds.),
Single case experimental designs: Strategies for studying human
behavior (pp. 265-316). New York: Pergamon Press.
Kazdin, A.E. (1977). Artifact, bias, and complexity of
assessment: The ABCs of reliability. Journal of Applied Behavior
Analysis, 10, 141-150.
Kazdin, A.E. (1979). Unobtrusive measures in behavioral
assessment. Journal of Applied Behavior Analysis, 12, 713-724.
Kempthorne, O. (1955). The randomization theory of statistical
inference. Journal of the American Statistical Association, 50,
946-967.
Kennedy, C.H. (1992). Trends in the measurement of social
validity. The Behavior Analyst, 15, 147-156.
Kent, R.N., Kanowitz, J., O'Leary, K.D., & Cheiken, M.
(1977). Observer reliability as a function of circumstances of
assessment. Journal of Applied Behavior Analysis, 10, 317-324.
Kiess, H.O. (1996). Statistical concepts for the behavioral
sciences (2nd ed.). Needham Heights, MA: Allyn & Bacon.
Kim, J.O., & Mueller, C.W. (1986). Factor analysis:
Statistical methods and practical issues. London: Sage.
Kirk, J., & Miller, M. (1986). Reliability and validity in
qualitative research. Newbury Park, CA: Sage.
Kirk, R. (1982). Experimental design: Procedures for the
behavioral sciences (2nd ed.). Monterey, CA: Brooks/Cole.
Lindsley, O.R. (1964). Direct measurement and prosthesis of
retarded behavior. Journal of Education, 147, 62-81.
Linn, R.L. (1983). Testing and instruction: Links and
distinctions. Journal of Educational Measurement, 20, 179-189.
Linn, R.L. (Ed.). (1989). Educational measurement (3rd ed.).
New York: Macmillan.
Lipsey, M.W. (1990). Design sensitivity: Statistical power for
experimental research. Newbury Park: Sage.
Little, R.J., & Rubin, D.B. (1987). Statistical analysis
with missing data. New York: Wiley.
Lubin, A. (1961). The interpretation of significant
interaction. Education and Psychological Measurement, 21, 807-817.
Lykken, D.T. (1968). Statistical significance in psychological
research. Psychological Bulletin, 70, 151-159.
Marascuilo, L.A., & Levin, J.R. (1983). Multivariate
statistics for the social sciences. Monterey, CA: Brooks/Cole.
Marascuilo, L.A., & Serlin, R.C. (1988). Statistical
methods for the social and behavioral sciences. New York: W.H.
Freeman.
Mather, N., & Kirk, S. (1985). The type III error and other
concerns in learning disability research. Learning Disabilities
Research, 1, 56-64.
Maxwell, J.A. (1992). Understanding and validity in qualitative
research. Harvard Educational Review, 62, 279-300.
McNemar, Q. (1962). Psychological statistics (3rd ed.). New
York: Wiley.
Mehrens, W.A., & Lehmann, I.J. (1984). Measurement and
evaluation in education and psychology (3rd ed.). New York: Holt,
Rinehart and Winston.
Messick, S. (1980). Test validity and the ethics of assessment.
American Psychologist, 35, 1012-1027.
Messick, S. (1984). The psychology of educational measurement.
Journal of Educational Measurement, 21, 215-237.
Messick, S. (1989). Validity. In R. Linn (Ed.), Educational
measurement (3rd ed., pp. 13-104). New York: Macmillan.
Messick, S. (1995). Validity of psychological assessment.
American Psychologist, 50, 741-749.
Metfessel, N.S., & Michael, W.B. (1967). A paradigm
involving multiple criterion measures for the evaluation of the
effectiveness of school programs. Educational and Psychological
Measurement, 27, 931-944.
Michael, J. (1974). Statistical inference for individual
organism research: Some reactions to a suggestion by Gentile,
Roden, and Klein. Journal of Applied Behavior Analysis, 7,
627-628.
Michael, J. (1974). Statistical inference for individual
organism research: Mixed blessing or curse? Journal of Applied
Behavior Analysis, 7, 647-653.
Miller, D.C. (1983). Handbook of research design and social
measurement (4th ed.). New York: Longman.
Mitchell, J.V. (Ed.). (1985). The ninth mental measurements
yearbook. Lincoln, NE: University of Nebraska Press.
Mitchell, J.V. (1988). Applied measurement in the Oscar Buros
tradition: Current implications. Applied Measurement in Education,
1, 5-16.
Mosteller, F., & Tukey, J.W. (1977). Data analysis and
regression: A second course in statistics. Reading, MA:
Addison-Wesley.
Murphy, G., & Goodall, E. (1990). Measurement error in
direct observations: A comparison of common recording methods.
Behaviour Research & Therapy, 18, 147-150.
Narens, L., & Luce, R.D. (1986). Measurement: The theory of
numerical assignments. Psychological Bulletin, 99, 166-180.
Nettelbeck, T., & Lally, M. (1976). Inspection time and
measured intelligence. British Journal of Psychology, 67, 17-22.
Newman, D.L., Kundert, D.K., Lande, D.S., & Bull, K.S.
(1988). Effect of varying item order on multiple-choice test
scores: Importance of statistical and cognitive difficulty.
Applied Measurement in Education, 1, 89-97.
Nie, N., Hull, C., Jenkins, J., Steinbrenner, K., & Bent,
D. (1975). SPSS: Statistical package for social services (2nd
ed.). New York: McGraw-Hill.
Nunnally, J. (1960). The place of statistics in psychology.
Education and Psychological Measurement, 20, 641-650.
Payne, R.W., & Jones, H.G. (1957). Statistics for the
investigation of individual cases. Journal of Clinical Psychology,
13, 115-121.
Osgood, C.E. (1952). The natue and measurement of meaning.
Psychological Bulletin, 49, 197-237.
Osgood, C.E., Suci, G.J., & Tannenbaum, P.H. (1957). The
measurement of meaning. Urbana, IL: University of Illinois Press.
Payne, R.W., & Jones, H.G. (1957). Statistics for the
investigation of individual cases. Journal of Clinical Psychology,
13, 115-121.
Pedhazur, E.J., & Schmelkin, L.P. (1991). Measurement,
design, and analysis: An integrated approach. Hillsdale, NJ:
Erlbaum.
Popham, W.J. (1978). Criterion-referenced measurement.
Englewood Cliffs, NJ: Prentice-Hall.
Popham, W.J. (1987). The merits of measurement-driven
instruction. Phi Delta Kappan, 68, 679-682.
Powell, J., Martindale, B., & Kulp, S. (1975). An
evaluation of time-sample measures of behavior. Journal of Applied
Behavior Analyis, 8, 463-470.
Powell, J., Martindale, B., Kulp, S., Martindale, A., &
Bauman, R. (1977). Taking a closer look: Time sampling and
measurement error. Journal of Applied Behavior Analysis, 10,
325-332.
Rowley, G.L. (1976). The reliability of observational measures.
American Educational Research Journal, 13, 51-59.
Ryff, C.D., & Singer, B. (1996). Psychological well-being:
Meaning, measurement, and implications for psychotherapy research.
Psychotherapy and Psychosomatics, 65, 14-23.
Saffran, J.R., Aslin, R.N., & Newport, E.L. (1996).
Statistical learning by 8-month old infants. Science, 274,
1926-1928.
Schmeiser, C.B. (1982). Use of experimental design in
statistical item bias studies. In R.A. Berk (Ed.), Handbook of
methods for detecting test bias (pp. 64-95). Baltimore, MD: The
Johns Hopkins University Press.
Schmidt, W. (1983). Content biases in achievement tests.
Journal of Educational Measurement, 20, 165-178.
Sechest, L., & Yeaton, W.H. (1981). Meaningful measures of
effect. Journal of Consulting and Clinical Psychology, 49,
766-767.
Siegel, S. (1956). Nonparametric statistics for the behavioral
sciences. New York: McGraw-Hill.
Sethi, I., & Jain, A.K. (Eds.). (1991). Artificial neural
networks and statistical pattern recognition: Old and new
connections. New York: Elseiver Science.
Siegel, S., & Castellan, N.J. (1988). Nonparametric
statistics for the behavioral sciences (2nd ed.). New York:
McGraw-Hill.
Smith, B., & Sechrest, L. (1991). Treatment of aptitude X
treatment interactions. Journal of Consulting and Clinical
Psychology, 59, 233-244.
Snow, R.E. (1969). Unfinished Pygmalion. Contemporary
Psychology, 14, 197-199.
Snow, R.E. (1974). Representative and quasi-representative
designs for research on teaching. Review of Educational Research,
44, 265-291.
Snow, R.E. (1991). Aptitude-treatment interaction as a
framework for research on individual differences in psychotherapy.
Journal of Consulting and Clinical Psychology, 59, 205-216.
Snow, R.E. (1993). Construct validity and constructed-response
tests. In R.E. Bennett & W.C. Ward (Eds.), Construction versus
choice in cognitive measurement (pp. 45-60). Hillsdale, NJ:
Erlbaum.
Snow, R.E., & Farr, M.J. (1984). Aptitude, learning, and
instruction, Vol. 3: Cognitive and affective process analysis.
Hillsdale, NJ: Erlbaum.
Stanley, J.C. (1957). Controlled experimentation in the
classroom. Journal of Experimental Education, 25, 195-201.
Stanley, J.C. (1957). Research methods: Experimental design.
Review of Educational Research, 27, 449-459.
Stanley, J.C. (1965). Quasi-experimentation. School Review, 73,
197-205.
Stanley, J.C. (1966). A common class of pseudo-experiments.
American Educational Research Journal, 3, 79-87.
Stanley, J.C., George, W.C., & Solano, C.H. (Eds.). (1977).
The gifted and creative: A fifty-year perspective. Baltimore: The
Johns Hopkins University Press.
Sternberg, R.J., & Detterman, D.K. (Eds.). (1979). Human
intelligence: Perspectives on its theory and measurement. Norwood,
NJ: Ablex.
Stine, W.W. (1989). Meaningful inference: The role of
measurement in statistics. Psychological Bulletin, 105, 147-555.
Tabachnick, B.G., & Fidell, L.S. (1989). Using multivariate
statistics (2nd ed.). New York: Harper & Row.
Terman, L.M. (1960). A measurement of intelligence. Boston:
Houghton Mifflin.
Thorndike, R.L., & Hagen, E. (1969). Measurement and
evaluation in psychology and education (3rd ed.). New York: Wiley.
Toothaker, L.E., & Miller, L. (1996). Introductory
statistics for the behavioral sciences (2nd ed.). Pacific Grove,
CA: Brooks/Cole.
Wainer, H., & Braun, H.I. (Eds.). (1987). Test validity.
Hillsdale, NJ: Erlbaum.
Webb, E.J., Campbell, D.T., Schwartz, R.D., & Sechrest, L.
(1966). Unobtrusive measures: Nonreactive research in the social
sciences. Chicago: Rand McNally.
Weiderholt, J.L., Cronin, M.E., & Stubbs, V. (1982).
Measurement of functional competencies and the handicapped:
Constructs, assessments, and recommendations. In J.T. Neisworth
(Ed.), Assessment in special education (pp. 101-115). Rockville,
MD: Aspen.
Weiner, B.J. (1971). Statistical principles in experimental
design. New York: McGraw-Hill.
Wesson, C, King, R., & Deno, S. (1984). Direct and frequent
measurement of student performance: If it's so good for us why
don't we do it? Learning Disabilities Quarterly, 7, 45-58.
White, O.R., & Liberty, K.A. (1976). Behavioral assessment
and precise educational measurement. In N.G. Haring & R.L.
Schiefelbusch (Eds.), Teaching special children (pp. 31-71). New
York: McGraw-Hill.
Williams, R., & Zimmerman, D. (1984). On the virtues and
the vices of standard error of measurement. The Journal of
Experimental Education, 52, 231-233.
Wilson, V.L. (1986). Analysis of interactions in
aptitude-treatment interaction research. In C.R. Reynolds &
V.L. Wilson (Eds.), Methodological and statistical advances in the
study of individual differences. New York: Plenum.
Winer, B.J. (1971). Statistical principles in experimental
design (2nd ed.). New York: McGraw-Hill.
Wolcott, H.F. (1990). On seeking--and rejecting--validity in
qualitative research. In E.W. Eisner & A. Peshkin (Eds.),
Qualitative inquiry in education: The continuing debate (pp.
121-152). New York: Teachers College Press.
Wolcott, H.F. (1990). Writing up qualitative research. Newbury
Park, CA: Sage.
Wolf, M.M. (1978). Social validity: The case for subjective
measurement, or how applied behavior analysis is finding its
heart. Journal of Applied Behavior Analysis, 11, 203-214.