/* Number of Deaths by Horsekicks in the Prussian Army from 1875-1894 for 14 Corps Data from von Bortkiewicz (1898), given by Andrews & Herzberg (1985), on number of deaths by horse or mule kicks in 10 (of 14 reported) corps of the Prussian army. 4 corps were not considered by Fisher (1925) as they had a different organization. http://www.maths.lth.se/help/R/.R/library/vcd/html/HorseKicks.html who in turn give as the Source Michael Friendly (2000), Visualizing Categorical Data, p. 18. and the References D. F. Andrews & A. M. Herzberg (1985), Data: A Collection of Problems from Many Fields for the Student and Research Worker. Springer-Verlag, New York, NY. R. A. Fisher (1925), Statistical Methods for Research Workers. Oliver & Boyd, London. L. von Bortkiewicz (1898), Das Gesetz der kleinen Zahlen. Teubner, Leipzig. more on him at http://www-gap.dcs.st-and.ac.uk/~history/Mathematicians/Bortkiewicz.html M. Friendly (2000), Visualizing Categorical Data. SAS Institute, Cary, NC. */ * --- for SAS ---- ; options ls= 85 ps=50; run; data a; input year corps1-corps14 total; lines; 1875 0 0 0 0 0 0 0 1 1 0 0 0 1 0 3 1876 2 0 0 0 1 0 0 0 0 0 0 0 1 1 5 1877 2 0 0 0 0 0 1 1 0 0 1 0 2 0 7 1878 1 2 2 1 1 0 0 0 0 0 1 0 1 0 9 1879 0 0 0 1 1 2 2 0 1 0 0 2 1 0 10 1880 0 3 2 1 1 1 0 0 0 2 1 4 3 0 18 1881 1 0 0 2 1 0 0 1 0 1 0 0 0 0 6 1882 1 2 0 0 0 0 1 0 1 1 2 1 4 1 14 1883 0 0 1 2 0 1 2 1 0 1 0 3 0 0 11 1884 3 0 1 0 0 0 0 1 0 0 2 0 1 1 9 1885 0 0 0 0 0 0 1 0 0 2 0 1 0 1 5 1886 2 1 0 0 1 1 1 0 0 1 0 1 3 0 11 1887 1 1 2 1 0 0 3 2 1 1 0 1 2 0 15 1888 0 1 1 0 0 1 1 0 0 0 0 1 1 0 6 1889 0 0 1 1 0 1 1 0 0 1 2 2 0 2 11 1890 1 2 0 2 0 1 1 2 0 2 1 1 2 2 17 1891 0 0 0 1 1 1 0 1 1 0 3 3 1 0 12 1892 1 3 2 0 1 1 3 0 1 1 0 1 1 0 15 1893 0 1 0 0 0 1 0 2 0 0 1 3 0 0 8 1894 1 0 0 0 0 0 0 0 1 0 1 1 0 0 4 ; run; * a new file where each count is on a different observation indexed by year and corps ; data b; keep year corps deaths; array c(14) corps1-corps14; set a; do corps = 1 to 14; deaths= c(corps) ; output; end; run; proc means data=b; var deaths; proc univariate data=b; var deaths; proc means data=b ; class year; var deaths; proc means data=b; class corps; var deaths; proc genmod data=b; model deaths = / distribution = poisson link = identity; proc genmod data=b; model deaths = / distribution = normal link = identity; proc genmod data=b; model deaths = / distribution = poisson link = log; run; * --- for Stata ---- ; clear input year deaths1-deaths14 total 1875 0 0 0 0 0 0 0 1 1 0 0 0 1 0 3 1876 2 0 0 0 1 0 0 0 0 0 0 0 1 1 5 1877 2 0 0 0 0 0 1 1 0 0 1 0 2 0 7 1878 1 2 2 1 1 0 0 0 0 0 1 0 1 0 9 1879 0 0 0 1 1 2 2 0 1 0 0 2 1 0 10 1880 0 3 2 1 1 1 0 0 0 2 1 4 3 0 18 1881 1 0 0 2 1 0 0 1 0 1 0 0 0 0 6 1882 1 2 0 0 0 0 1 0 1 1 2 1 4 1 14 1883 0 0 1 2 0 1 2 1 0 1 0 3 0 0 11 1884 3 0 1 0 0 0 0 1 0 0 2 0 1 1 9 1885 0 0 0 0 0 0 1 0 0 2 0 1 0 1 5 1886 2 1 0 0 1 1 1 0 0 1 0 1 3 0 11 1887 1 1 2 1 0 0 3 2 1 1 0 1 2 0 15 1888 0 1 1 0 0 1 1 0 0 0 0 1 1 0 6 1889 0 0 1 1 0 1 1 0 0 1 2 2 0 2 11 1890 1 2 0 2 0 1 1 2 0 2 1 1 2 2 17 1891 0 0 0 1 1 1 0 1 1 0 3 3 1 0 12 1892 1 3 2 0 1 1 3 0 1 1 0 1 1 0 15 1893 0 1 0 0 0 1 0 2 0 0 1 3 0 0 8 1894 1 0 0 0 0 0 0 0 1 0 1 1 0 0 4 end * reshape dataset from wide to long ... reshape long deaths, i(year) j(corps) drop total * descriptively... summ deaths, detail inspect deaths * fit (calculate) mean as the intercept in null regression glm deaths, family(poisson) link(identity) glm deaths, family(normal) link(identity)