/* Source ------ USA TODAY, August 1995 Sidebar to a photograph concerning a tropical storm "Hurricanes in this century The number of hurricanes with winds geater than 110 mph has declined since the 1950s. Data list: Major hurricanes to hit the USA: (no. & decade) " Data setup below ---------------- USA_yrs 10 for the full 10 years in the decades 1900 to 1980 5 for the 1990's decade (1990-94 *) * Presuming the data given in Aug 95 exclude the 1995 'season', the time span included in the 1990s decade covers the 5 seasons 1990, 1991, 1992, 1993 and 1994. count: No. major hurricanes that hit the USA in indicated no. of 'USA-years' */ * --- for SAS --- (see belwoe for Stata ); data a; input decade USA_yrs count; rate = count/USA_yrs ; * compute observed rate each decade ; LINES; 1900 10 6 1910 10 8 1920 10 5 1930 10 8 1940 10 8 1950 10 9 1960 10 6 1970 10 4 1980 10 6 1990 5 2 ; RUN; title naive average (ignores fact that last obsn is for 5 years) ; proc means data = a mean clm; var count ; RUN; title weighted average od rates, weights proportional to length of observation ; proc means data = a mean clm; var rate ; weight USA_yrs; RUN; title regression model.. Expected no. of events = rate x Denominator ; * E[ count | Denominator ] = B1 x X ; * note that when Denominator = 0, expect 0 events.. ie intercept ("B0") = 0 ; * so ask that the 'constant' or 'intercept' be omitted.. ; proc genmod data=a ; model count = USA_yrs / dist=poisson link=identity noint waldci; RUN; title constant (rather than Poisson) variation around the expected value ; * use constant (rather than Poisson) variation around the expected value ; * ( Gaussian variation is independent of mean [ homoskedastic ] ) ; proc genmod data=a ; model count = USA_yrs / dist=normal link=identity noint waldci; RUN; * XXXXXXXXXXXXXXXX cut here XXXXXXXXXXXXXXX ; * ---- for Stata clear input decade USA_yrs count 1900 10 6 1910 10 8 1920 10 5 1930 10 8 1940 10 8 1950 10 9 1960 10 6 1970 10 4 1980 10 6 1990 5 2 end * naive average (ignores fact that last obsn is for 5 years) ci count * use country-years as the denominato... ci count, poisson exposure(USA_yrs) * rate each decade... gen rate = count/USA_yrs * weighted average, weight proportional to length of observation ci rate [aweight=USA_yrs] * regression model.. Expected no. of events = rate x Denominator * * E[ count | Denominator ] = B1 x X * * note that when Denominator = 0, expect 0 events.. ie intercept ("B0") = 0 * * so ask that the 'constant' or 'intercept' be omitted.. * glm count USA_yrs, family(poisson) link(identity) noconstant * * use constant (rather than Poisson) variation around the expected value * ( Gaussian variation is independent of mean [ homoskedastic ] ) * glm count USA_yrs, family(normal) link(identity) noconstant cii 95 62, poisson cii 1 62, poisson