National
Institute of Standards and Technology works to ensure the
computational accuracy of statistical software for conducting
descriptive, multiple regression, ANOVA and nonlinear regression
analyses, by providing a library of statistical reference datasets.
There are innumerable
university, departmental
and faculty sites, in the US and
internationally, that provide data, including:
Manchester
Metropolitan University provides examples of behavioral, biological,
medical and weather data, suitable for principal components analysis,
cluster analysis, multiple regression analysis, discriminant analysis,
etc., in ASCII, EXCEL and SPSS system files.
German
Rodriguez of Princeton
University provides about 20 (largely frequency) well-documented
datasets on issues like births, deaths, salaries of professors,
time-to-doctorate, contraceptive use, ship damage, etc. for his WWS509
Course on generalized linear models.
UCLA
Academic Technology Services provides many "textbook"
examples, each containing the datasets, and programming
(HLM, MLwiN, S+, SAS, SPSS, Stata) for generating the book exhibits, in
over 30 applied statistics books, including many of the standards (ours
too!).
National
Center for Education Statistics provides data from the major
educational surveys in the USA (and overseas), including "standards" like:
ECLS, HS&B, NLS72, NELS88/2000, SASS, etc. All datasets
are free, and are distributed by mail on CD-ROM (some
by online download).