Mapping Real World to AI
Let's now see step by step how we can take a real world example and how we can map it to an AI problem. What exactly do we mean by AI problems?
Consider a simple example, predicting if a person is overweight based on their height and weight. We can represent this as a table:
| Height (cm) | Weight (kg) | Overweight |
|---|---|---|
| 160 | 50 | No |
| 170 | 78 | Yes |
| 180 | 90 | Yes |
| 190 | 100 | Yes |
| 150 | 40 | No |
| 160 | 60 | No |
Let us say this is our training data. Let us say now, we got a new point, height=168 and weight=70. Can you guess if this person is overweight or not?
From the data given above can we find some patterns. What can we say about the relationship between height and weight and the label "Overweight"? Can we learn from this data how to predict if a person is overweight?
Let's try to visualize this data. We have created some more data and made the overweight points as red and the normal weight points as blue.
A new purple point is added. Can you guess if it is overweight or not? How will you guess?
We can guess based only with the data available to us. Because this is a coordinate system, we can maybe look at the points and see which points are near the purple point. If most of the points near the purple point are red, we can guess that the purple point is also red. If most of the points are blue, we can guess that the purple point is blue. But how many points should we look at?
1 point is too less. Lets start with 3 points. If 2 of the 3 points are red, we can guess that the purple point is red. If 2 of the 3 points are blue, we can guess that the purple point is blue.
Let's try to find the 3 nearest neighbors of the purple point.
So, looking at the nearest points we can say that the new point is
This is called the k-nearest neighbors algorithm. In this case, k=
It is a simple algorithm that can be used to classify data points based on the data points that are near it. Turns out, our mind also does the same. We observe things, we extract features, we map them in a space in our mind and the next time something similar happens, we find the closest match and predict based on our observations.
Features and Labels
In the example above, we had two
The features are the input to our model, and the label is the output that we want to predict.
The features also define the dimensionality of our data. In the example above, we had 2 features, so our data was 2-dimensional. In general, we can have n features, so our data is n-dimensional.
As long as it is 2-D we can easily visualize it. But what if we have more than 2 features? How do we visualize it? Luckily for us, we can use the same
Understanding Features and Labels in Machine Learning
Let us do some practice to understand features and labels.
What are Features?
Features are the characteristics or properties we use to make predictions. Think of them as the input data we give to our model.
Let's understand this with a simple example:
If we want to predict if a fruit is an apple or an orange, what features might we use?
Select all relevant features:
Understanding Labels
Feature Identification Practice
Dimensionality Practice
For each scenario, determine the number of dimensions (features):
Scenario 1: Movie Recommendation System
- User age
- Watch history
- Genre preference
- Average rating given
- Time spent watching
What's the dimensionality?
Scenario 2: Weather Prediction
- Temperature
- Humidity
- Wind speed
- Atmospheric pressure
- Cloud cover
- Precipitation
- UV index
What's the dimensionality?
Complex Dimensionality Practice
Car Price Prediction System:
- Year of manufacture
- Mileage
- Engine size
- Number of previous owners
Real-world Scenarios Practice
Match the correct dimensionality with each scenario:
Advanced Feature-Label Practice
Comprehensive Knowledge Check
Practical Application Exercise
Identify which scenarios can be modeled with 2D visualization:
Final Challenge
Understanding Distances in k-Nearest Neighbors (kNN)
Let's learn how to find the nearest neighbors by calculating distances between points! We'll start simple and build up to more complex examples.
Starting with 2D Points
Imagine we have two points on a graph:
- Point A: (3, 4)
- Point B: (0, 0)
To find how "near" these points are to each other, we need to calculate their distance. In kNN, we typically use the [Euclidean distance(gloss:euclideanDistance)] formula:
Let's break this down step by step...
First, what is (x₂-x₁)²?
- x₂ = 0 (from Point B)
- x₁ = 3 (from Point A)
- So, (0-3)² =
Completing the 2D Distance
Great! Now let's finish the calculation:
- We found (x₂-x₁)² = 9
- For (y₂-y₁)²:
- y₂ = 0 (Point B)
- y₁ = 4 (Point A)
- (0-4)² =
Now we can plug these into our formula:
- distance = √(9 + 16)
- distance = √
- distance =
Let's practice with a sorting exercise! Sort these calculations into the right steps:
Moving to 3D Points
In 3D space, we just add one more term to our formula:
Let's solve a problem! Find the distance between:
Point A: (1, 2, 2)
Point B: (4, 6, 5)
Testing Your Understanding
Let's check if you can identify which calculations are correct. Drag each statement to True or False:
Final Challenge
Given these points in a 3D space:
- Point P: (2, 3, 1)
- Point Q: (5, 1, 4)
- Point R: (3, 4, 2)
Pick the correct distance between points P and Q!
Congratulations! You now understand how to calculate distances in kNN! Remember:
- Start with the differences between coordinates
- Square each difference
- Add all squared differences
- Take the square root of the sum
The number of dimensions just means more terms to add under the square root. This same process works whether you have 2, 3, or even more dimensions!