Innings2
Powered by Innings 2

Glossary

Select one of the keywords on the left…

Introduction to AI > Mapping Real World to AI

Mapping Real World to AI

Let's now see step by step how we can take a real world example and how we can map it to an AI problem. What exactly do we mean by AI problems?

Consider a simple example, predicting if a person is overweight based on their height and weight. We can represent this as a table:

Height (cm)Weight (kg)Overweight
16050No
17078Yes
18090Yes
190100Yes
15040No
16060No

Let us say this is our training data. Let us say now, we got a new point, height=168 and weight=70. Can you guess if this person is overweight or not?

From the data given above can we find some patterns. What can we say about the relationship between height and weight and the label "Overweight"? Can we learn from this data how to predict if a person is overweight?

Let's try to visualize this data. We have created some more data and made the overweight points as red and the normal weight points as blue.

A new purple point is added. Can you guess if it is overweight or not? How will you guess?

We can guess based only with the data available to us. Because this is a coordinate system, we can maybe look at the points and see which points are near the purple point. If most of the points near the purple point are red, we can guess that the purple point is also red. If most of the points are blue, we can guess that the purple point is blue. But how many points should we look at?

1 point is too less. Lets start with 3 points. If 2 of the 3 points are red, we can guess that the purple point is red. If 2 of the 3 points are blue, we can guess that the purple point is blue.

Let's try to find the 3 nearest neighbors of the purple point.

So, looking at the nearest points we can say that the new point is .

This is called the k-nearest neighbors algorithm. In this case, k=

It is a simple algorithm that can be used to classify data points based on the data points that are near it. Turns out, our mind also does the same. We observe things, we extract features, we map them in a space in our mind and the next time something similar happens, we find the closest match and predict based on our observations.

Features and Labels

In the example above, we had two features - height and weight, and a label - overweight. In general, we can say that we have a set of features and a label. The features are the input to our model, and the label is the output that we want to predict.

The features are the input to our model, and the label is the output that we want to predict.

The features also define the dimensionality of our data. In the example above, we had 2 features, so our data was 2-dimensional. In general, we can have n features, so our data is n-dimensional.

As long as it is 2-D we can easily visualize it. But what if we have more than 2 features? How do we visualize it? Luckily for us, we can use the same kNNalgorithm to classify data in higher dimensions as well. Most of the rules that apply in 2D apply in higher dimensions as well.

Understanding Features and Labels in Machine Learning

Let us do some practice to understand features and labels.

What are Features?

Features are the characteristics or properties we use to make predictions. Think of them as the input data we give to our model.

Let's understand this with a simple example:

If we want to predict if a fruit is an apple or an orange, what features might we use?

Select all relevant features:

Understanding Labels

Labels are what we want to predict - they're the output of our model.

Whether a student passed or failed
Hours studied by the student
If an email is spam or not
Number of words in the email
House price prediction
Square footage of the house
Customer churn prediction
Customer age and purchase history

Feature Identification Practice

Age of a person
Number of bedrooms
Cancer diagnosis (positive/negative)
Time spent on website
Distance from city center
Number of previous purchases
Features
Labels

Dimensionality Practice

For each scenario, determine the number of dimensions (features):

Scenario 1: Movie Recommendation System

  • User age
  • Watch history
  • Genre preference
  • Average rating given
  • Time spent watching

What's the dimensionality?

Scenario 2: Weather Prediction

  • Temperature
  • Humidity
  • Wind speed
  • Atmospheric pressure
  • Cloud cover
  • Precipitation
  • UV index

What's the dimensionality?

Complex Dimensionality Practice

4
3
5
2

Car Price Prediction System:

  • Year of manufacture
  • Mileage
  • Engine size
  • Number of previous owners

Real-world Scenarios Practice

Match the correct dimensionality with each scenario:

Height and weight for BMI prediction
Age, income, and education level for loan approval
Price and square footage for house value
Temperature, humidity, and wind speed for weather prediction

Advanced Feature-Label Practice

Credit score, income, employment history, and current debt
Loan approval status
Patient symptoms, age, medical history, and test results
Disease diagnosis
Student attendance, homework completion, and test scores
Final grade prediction
Features Set
Prediction Label

Comprehensive Knowledge Check

Practical Application Exercise

Identify which scenarios can be modeled with 2D visualization:

Height vs Weight correlation
Customer behavior analysis using 5 different metrics
Price vs Distance from city center
Medical diagnosis using 8 different symptoms
Age vs Income correlation
Weather prediction using 6 different measurements

Final Challenge

A social media platform analyzing user engagement using: time spent, number of posts, number of friends, and activity score
A manufacturing plant predicting machine failure using: temperature, vibration, noise level, and pressure readings
A 2D video game using only player position (x,y coordinates) to predict collision
A text analysis system using every word in a document as a separate feature
4D Data
4D Data
2D Data
High Dimensional Data

Understanding Distances in k-Nearest Neighbors (kNN)

Let's learn how to find the nearest neighbors by calculating distances between points! We'll start simple and build up to more complex examples.

Starting with 2D Points

Imagine we have two points on a graph:

  • Point A: (3, 4)
  • Point B: (0, 0)

To find how "near" these points are to each other, we need to calculate their distance. In kNN, we typically use the [Euclidean distance(gloss:euclideanDistance)] formula:

= √[(x₂-x₁)² + (y₂-y₁)²]

Let's break this down step by step...

First, what is (x₂-x₁)²?

  • x₂ = 0 (from Point B)
  • x₁ = 3 (from Point A)
  • So, (0-3)² =

Completing the 2D Distance

Great! Now let's finish the calculation:

  1. We found (x₂-x₁)² = 9
  2. For (y₂-y₁)²:
    • y₂ = 0 (Point B)
    • y₁ = 4 (Point A)
    • (0-4)² =

Now we can plug these into our formula:

  • distance = √(9 + 16)
  • distance = √
  • distance =

Let's practice with a sorting exercise! Sort these calculations into the right steps:

Calculate the difference in x coordinates: (0-3)
Take the square root: √25 = 5
Square the differences: (-3)² = 9
Add the squared differences: 9 + 16 = 25

Moving to 3D Points

In 3D space, we just add one more term to our formula:

= √[(x₂-x₁)² + (y₂-y₁)² + (z₂-z₁)²]

Let's solve a problem! Find the distance between:

Point A: (1, 2, 2)

Point B: (4, 6, 5)

Testing Your Understanding

Let's check if you can identify which calculations are correct. Drag each statement to True or False:

The distance between (0,0) and (3,4) is 5 units
The distance between (1,1) and (4,5) is 3 units
Adding a third dimension only adds one term under the square root
The distance between (0,0,0) and (1,1,1) is 2 units
The distance formula is also called Euclidean distance

Final Challenge

Given these points in a 3D space:

  • Point P: (2, 3, 1)
  • Point Q: (5, 1, 4)
  • Point R: (3, 4, 2)

Pick the correct distance between points P and Q!

4.12
5
3.74
6.1

Congratulations! You now understand how to calculate distances in kNN! Remember:

  1. Start with the differences between coordinates
  2. Square each difference
  3. Add all squared differences
  4. Take the square root of the sum

The number of dimensions just means more terms to add under the square root. This same process works whether you have 2, 3, or even more dimensions!