Organisation of Grouped Data
In statistics, when we deal with small amounts of data, we can easily use tally marks to organize and analyze it. However, when we have a large dataset, this becomes impractical. That's where data grouping comes in - it's a way to organize large datasets into meaningful categories or intervals.
Let me illustrate this with a different example:
Imagine a college admissions office collected total exam scores from 200 applicants. The scores are scattered across numbers like 1120, 1145, 1167, 1172, 1189, 1205, 1230, and so on up to 1550. Instead of listing all 200 individual scores, which would be cumbersome, we can group them into intervals:
1100-1200: All scores from 1100 up to (but not including) 1200
1200-1300: All scores from 1200 up to (but not including) 1300
1300-1400: All scores from 1300 up to (but not including) 1400
And so on...
Each of these ranges is called a class interval. In each interval: the starting number (like 1100) is called the lower limit while the ending number (like 1200) is called the upper limit.
This grouping makes the data, easier to understand, simpler to analyze and create visual representations and more manageable for statistical calculations.
Limits and Boundaries
When working with class intervals (like 1-10, 11-20, etc.), we sometimes encounter values that fall exactly on the boundaries between intervals. To handle these cases precisely, we use what are called "true class limits" or "boundaries".
Let's start by finding Boundaries Between Classes:
Take the upper limit of one class and lower limit of next class
Calculate their average to find the boundary
Say, between 1-10 and 11-20, average of 10 and 11 = (10 + 11)/2 =
which becomes the boundary.
Class Boundaries for the First Class:
For the first interval (1-10)
Consider what would be the previous class's upper limit (usually 0)
Lower Boundary = (0 + 1)/2 =
Class Boundaries for the Last Class:
For the last interval (31-40)
Consider what would be the next class's lower limit (41)
Upper Boundary = (40 + 41)/2 =
Thus we get:
| Class Intervals: | 1-10 | 11-20 | 21-30 | 31-40 |
|---|---|---|---|---|
| Corresponding boundaries: | 0.5-10.5 | 10.5-20.5 | 20.5-30.5 | 30.5-40.5 |
This system ensures there's no ambiguity about where to place values that fall exactly on the original class limits, making data classification more precise and consistent.
Drawing grouped frequency distribution Graph
Method for creating a grouped frequency distribution table:
Step 1: Find the Range
First, identify the maximum and minimum values in the dataset
Range = Maximum value - Minimum value
Step 2: Determine Class Intervals
Decide how many class intervals you want (typically 5-8)
Calculate length of each interval = Range ÷ Number of intervals
Step 3: Create Class Intervals
Start from the minimum value
Create intervals of equal width
Step 4: Tally the Data
- Go through each value in the dataset and record using using tally marks for each of the intervals.
Step 5: Count Frequencies
- Count the total tally marks in each interval and record these as frequencies
This organized presentation makes it easier to see the distribution of the datasets across different ranges and identify patterns in the data.
Charateristics of Grouped Frequency Distribution
Large data sets are divided into smaller groups known as class intervals for easy analysis.
Each class interval has a lower limit (smallest value) and an upper limit (largest value).
In inclusive intervals (e.g., 1-10, 11-20, 21-30), both the lower and upper limits are part of the interval.
In exclusive intervals (e.g., 0-10, 10-20, 20-30), only the lower limit is included, but the upper limit belongs to the next interval.
The upper boundary of one class and the lower boundary of the next class are calculated by averaging the upper limit of one class and the lower limit of the next.
In exclusive class intervals, class limits and boundaries are equal.
In inclusive class intervals, limits and boundaries are different.
The difference between the upper and lower boundary of a class is called the class length or class width.
Since individual values cannot be determined from grouped data, the class mark (mid value) is used to represent a class. It is calculated as:
Class Mark =