Median of Grouped Data

As you have studied in Class IX, the median is a measure of central tendency which gives the value of the middle-most observation in the data. Recall that for finding the median of ungrouped data, we first arrange the data values of the observations in ascending order. Then, if n is odd, the median is the n+12 th observation. And, if n is even, then the median will be the average of the n2 th and the n/2+1th observations.

Suppose, we have to find the median of the following data, which gives the marks, out of 50, obtained by 100 students in a test :

Marks obtained	20	29	28	33	42	38	43	25
Number of students	6	28	24	15	2	4	1	20

First, we arrange the marks in ascending order and prepare a frequency table as follows :

Marks obtained	Number of students
20	6
25	20
28	24
29	28
33	15
38	4
42	1
43	20
Total

Here n = 100, which is even. The median will be the average of the n2th and the n/2+1 observations, i.e., the 50th and 51st observations. To find these observations, we proceed as follows:

Now we add another column depicting this information to the frequency table above and name it as cumulative frequency column.

Marks obtained	Number of students	Cumulative frequency
20	6	6
upto 25	20	6 + 20 =
upto 28	24	26 + 24 =
upto 29	28
upto 33	15
upto 38	4
upto 42	2
upto 43	1
Total

From the table above, we see that: 50th observaton is 28

51th observaton is 29

Median = 28 + 292 =

Remark : The part of Table 13.11 consisting Column 1 and Column 3 is known as Cumulative Frequency Table. The median marks 28.5 conveys the information that about 50% students obtained marks less than 28.5 and another 50% students obtained marks more than 28.5.

Now, let us see how to obtain the median of grouped data, through the following situation.

Consider a grouped frequency distribution of marks obtained, out of 100, by 53 students, in a certain examination, as follows:

Marks	Number of students
0 - 10	5
10 - 20	3
20 - 30	4
30 - 40	3
40 - 50	3
50 - 60	4
60 - 70	7
70 - 80	9
80 - 90	7
90 - 100	8

Similarly, we can compute the cumulative frequencies of the other classes, i.e., the number of students with marks less than 30, less than 40, . . ., less than 100. We give them in Table given below:

Marks obtained	Number of students (Cumulative frequency)
Less than 10	5
Less than 20	5 + 3 =
Less than 30	8 + 4 =
Less than 40	12 + 3 =
Less than 50	15 + 3 =
Less than 60	18 + 4 =
Less than 70	22 + 7 =
Less than 80	29 + 9 =
Less than 90	38 + 7 =
Less than 100	45 + 8 =

Marks obtained

Number of students (Cumulative frequency)

Less than 10

Less than 20

5 + 3 =

Less than 30

8 + 4 =

Less than 40

12 + 3 =

Less than 50

15 + 3 =

Less than 60

18 + 4 =

Less than 70

22 + 7 =

Less than 80

29 + 9 =

Less than 90

38 + 7 =

Less than 100

45 + 8 =

The distribution given above is called the cumulative frequency distribution of the less than type. Here 10, 20, 30, . . . 100, are the upper limits of the respective class intervals.

We can similarly make the table for the number of students with scores, more than or equal to 0, more than or equal to 10, more than or equal to 20, and so on.

From the above Table, we observe that all 53 students have scored marks more than or equal to 0. Since there are 5 students scoring marks in the interval 0 - 10, this means that there are 53 – 5 = 48 students getting more than or equal to 10 marks.

Continuing in the same manner, we get the number of students scoring 20 or above as 48 – 3 = 45, 30 or above as 45 – 4 = 41, and so on, as shown in below Table.

Marks obtained	Number of students (Cumulative frequency)
More than or equal to 0	53
More than or equal to 10	53 – 5 =
More than or equal to 20	48 – 3 =
More than or equal to 30	45 – 4 =
More than or equal to 40	41 – 3 =
More than or equal to 50	38 – 3 =
More than or equal to 60	35 – 4 =
More than or equal to 70	31 – 7 =
More than or equal to 80	24 – 9 =
More than or equal to 90	15 – 7 =

Marks obtained

Number of students (Cumulative frequency)

More than or equal to 0

More than or equal to 10

53 – 5 =

More than or equal to 20

48 – 3 =

More than or equal to 30

45 – 4 =

More than or equal to 40

41 – 3 =

More than or equal to 50

38 – 3 =

More than or equal to 60

35 – 4 =

More than or equal to 70

31 – 7 =

More than or equal to 80

24 – 9 =

More than or equal to 90

15 – 7 =

The table above is called a cumulative frequency distribution of the more than type. Here 0, 10, 20, . . ., 90 give the lower limits of the respective class intervals.

Now, to find the median of grouped data, we can make use of any of these cumulative frequency distributions.

Let us combine Tables to get Table given below:

Marks	Number of students(f)	Cumulative frequency (cf)
0 - 10	5	5
10 - 20	3	8
20 - 30	4	12
30 - 40	3	15
40 - 50	3	18
50 - 60	4	22
60 - 70	7	29
70 - 80	9	38
80 - 90	7	45
90 - 100	8	53

Now in a grouped data, we may not be able to find the middle observation by looking at the cumulative frequencies as the middle observation will be some value in a class interval.

It is, therefore, necessary to find the value inside a class that divides the whole distribution into two halves. But which class should this be?

To find this class, we find the cumulative frequencies of all the classes and n2.

We now locate the class whose cumulative frequency is greater than (and nearest to) n2. This is called the median class. In the distribution above, n = 53. So, n2 = .

Now 60 – 70 is the class whose cumulative frequency 29 is greater than (and nearest to) n2, i.e., 26.5.

Therefore, 60 – 70 is the median class.

After finding the median class, we use the following formula for calculating the median.

Median = l + n/2−cf f x h

where l = lower limit of median class,

n = number of observations,

cf = cumulative frequency of class preceding the median class,

f = frequency of median class,

h = class size (assuming class size to be equal).

Substituting the values n2 = 26.5, l = 60, cf = 22, f = 7, h = 10 in the formula above, we get

Median = 60 + 26.5 227 x 10

= 60 + 457 =

So, about half the students have scored marks less than 66.4, and the other half have scored marks more than 66.4.

Example 7

The median of the following data is 525. Find the values of x and y, if the total frequency is 100.

Class intervals	0-100	100-200	200-300	300-400	400-500	500-600	600-700	700-800	800-900	900-1000
Frequency	2	5	x	12	17	20	y	9	7	4

Class intervals	Frequency	Cumulative frequency
0 - 100	2	2
100 - 200	5
200 - 300	x
300 - 400	12
400 - 500	17
500 - 600	20
600 - 700	y
700 - 800	9
800 - 900	7
900 - 1000	4

calculate median

It is given that n = 100
So, 76 + x + y = 100, i.e., x + y =
The median is 525, which lies in the class 500 – 600
l = ,
frequency =
Cumulative frequency =
height =
Now, using the formula:Median=l+ n/2-cff x h. Substitute the values in the formula.
subtract the values
now multiply with 5
Therefore 5x=
So, x =
Therefore, from (1), we get 9 + y =
Therefore, y =

Now, that you have studied about all the three measures of central tendency, let us discuss which measure would be best suited for a particular requirement. The mean is the most frequently used measure of central tendency because it takes into account all the observations, and lies between the extremes, i.e., the largest and the smallest observations of the entire data. It also enables us to compare two or more distributions. For example, by comparing the average (mean) results of students of different schools of a particular examination, we can conclude which school has a better performance.

However, extreme values in the data affect the mean. For example, the mean of classes having frequencies more or less the same is a good representative of the data. But, if one class has frequency, say 2, and the five others have frequency 20, 25, 20, 21, 18, then the mean will certainly not reflect the way the data behaves. So, in such cases, the mean is not a good representative of the data.

In problems where individual observations are not important, and we wish to find out a ‘typical’ observation, the median is more appropriate, e.g., finding the typical productivity rate of workers, average wage in a country, etc. These are situations where extreme values may be there. So, rather than the mean, we take the median as a better measure of central tendency.

Remarks:

There is a empirical relationship between the three measures of central tendency :

3 Median = Mode + 2 Mean

The median of grouped data with unequal class sizes can also be calculated. However, we shall not discuss it here.

Statistics

Glossary

Median of Grouped Data

Example 7

Sign in to Innings2

Statistics

Reset Progress

Glossary

Median of Grouped Data

Example 7