Section 1: Introduction – The Problem of Binned Data
Hypothetically, say you’re given data like this in Table 1 below, and you’re asked to find the mean:
Group
|
Frequency
|
0
to 25
|
114
|
25
to 50
|
76
|
50
to 75
|
58
|
75
to 100
|
51
|
100
to 250
|
140
|
250
to 500
|
107
|
500
to 1000
|
77
|
1000
to 5000
|
124
|
5000
or more
|
42
|
Table 1: Example Binned Data.
Border cases go to the lower bin.
The immediate problem is that the mean (and the variance, and many other statistics) is the average of exact values by we have ranges of values. There are a few things similar to getting the mean that could be done: