Mapping Real World to AI

Let's now see step by step how we can take a real world example and how we can map it to an AI problem. What exactly do we mean by AI problems?

Consider a simple example, predicting if a person is overweight based on their height and weight. We can represent this as a table:

Height (cm)	Weight (kg)	Overweight
160	50	No
170	78	Yes
180	90	Yes
190	100	Yes
150	40	No
160	60	No

Let us say this is our training data. Let us say now, we got a new point, height=168 and weight=70. Can you guess if this person is overweight or not?

From the data given above can we find some patterns. What can we say about the relationship between height and weight and the label "Overweight"? Can we learn from this data how to predict if a person is overweight?

Let's try to visualize this data. We have created some more data and made the overweight points as red and the normal weight points as blue.

A new purple point is added. Can you guess if it is overweight or not? How will you guess?

We can guess based only with the data available to us. Because this is a coordinate system, we can maybe look at the points and see which points are near the purple point. If most of the points near the purple point are red, we can guess that the purple point is also red. If most of the points are blue, we can guess that the purple point is blue. But how many points should we look at?

1 point is too less. Lets start with 3 points. If 2 of the 3 points are red, we can guess that the purple point is red. If 2 of the 3 points are blue, we can guess that the purple point is blue.

Let's try to find the 3 nearest neighbors of the purple point.

So, looking at the nearest points we can say that the new point is .

This is called the k-nearest neighbors algorithm. In this case, k=

It is a simple algorithm that can be used to classify data points based on the data points that are near it. Turns out, our mind also does the same. We observe things, we extract features, we map them in a space in our mind and the next time something similar happens, we find the closest match and predict based on our observations.

Features and Labels

In the example above, we had two features - height and weight, and a label - overweight. In general, we can say that we have a set of features and a label. The features are the input to our model, and the label is the output that we want to predict.

The features are the input to our model, and the label is the output that we want to predict.

The features also define the dimensionality of our data. In the example above, we had 2 features, so our data was 2-dimensional. In general, we can have n features, so our data is n-dimensional.

As long as it is 2-D we can easily visualize it. But what if we have more than 2 features? How do we visualize it? Luckily for us, we can use the same kNNalgorithm to classify data in higher dimensions as well. Most of the rules that apply in 2D apply in higher dimensions as well.

Understanding Features and Labels in Machine Learning

Let us do some practice to understand features and labels.

What are Features?

Features are the characteristics or properties we use to make predictions. Think of them as the input data we give to our model.

Let's understand this with a simple example:

If we want to predict if a fruit is an apple or an orange, what features might we use?

Select all relevant features:

Understanding Labels

Labels are what we want to predict - they're the output of our model.

Whether a student passed or failed

Hours studied by the student

If an email is spam or not

Number of words in the email

House price prediction

Square footage of the house

Customer churn prediction

Customer age and purchase history

Feature Identification Practice

Age of a person

Number of bedrooms

Cancer diagnosis (positive/negative)

Time spent on website

Distance from city center

Number of previous purchases

Features

Labels

Dimensionality Practice

For each scenario, determine the number of dimensions (features):

Scenario 1: Movie Recommendation System

User age
Watch history
Genre preference
Average rating given
Time spent watching

What's the dimensionality?

Scenario 2: Weather Prediction

Temperature
Humidity
Wind speed
Atmospheric pressure
Cloud cover
Precipitation
UV index

What's the dimensionality?

Complex Dimensionality Practice

Car Price Prediction System:

Year of manufacture
Mileage
Engine size
Number of previous owners

Real-world Scenarios Practice

Match the correct dimensionality with each scenario:

Height and weight for BMI prediction

Age, income, and education level for loan approval

Price and square footage for house value

Temperature, humidity, and wind speed for weather prediction

Advanced Feature-Label Practice

Credit score, income, employment history, and current debt

Loan approval status

Patient symptoms, age, medical history, and test results

Disease diagnosis

Student attendance, homework completion, and test scores

Final grade prediction

Features Set

Prediction Label

Comprehensive Knowledge Check

Practical Application Exercise

Identify which scenarios can be modeled with 2D visualization:

Height vs Weight correlation

Customer behavior analysis using 5 different metrics

Price vs Distance from city center

Medical diagnosis using 8 different symptoms

Age vs Income correlation

Weather prediction using 6 different measurements

Final Challenge

A social media platform analyzing user engagement using: time spent, number of posts, number of friends, and activity score

A manufacturing plant predicting machine failure using: temperature, vibration, noise level, and pressure readings

A 2D video game using only player position (x,y coordinates) to predict collision

A text analysis system using every word in a document as a separate feature

4D Data

2D Data

High Dimensional Data

Understanding Distances in k-Nearest Neighbors (kNN)

Let's learn how to find the nearest neighbors by calculating distances between points! We'll start simple and build up to more complex examples.

Starting with 2D Points

Imagine we have two points on a graph:

Point A: (3, 4)
Point B: (0, 0)

To find how "near" these points are to each other, we need to calculate their distance. In kNN, we typically use the [Euclidean distance(gloss:euclideanDistance)] formula:

= √[(x₂-x₁)² + (y₂-y₁)²]

Let's break this down step by step...

First, what is (x₂-x₁)²?

x₂ = 0 (from Point B)
x₁ = 3 (from Point A)
So, (0-3)² =

Completing the 2D Distance

Great! Now let's finish the calculation:

We found (x₂-x₁)² = 9
For (y₂-y₁)²:
- y₂ = 0 (Point B)
- y₁ = 4 (Point A)
- (0-4)² =

Now we can plug these into our formula:

distance = √(9 + 16)
distance = √
distance =

Let's practice with a sorting exercise! Sort these calculations into the right steps:

Calculate the difference in x coordinates: (0-3)

Take the square root: √25 = 5

Square the differences: (-3)² = 9

Add the squared differences: 9 + 16 = 25

Moving to 3D Points

In 3D space, we just add one more term to our formula:

= √[(x₂-x₁)² + (y₂-y₁)² + (z₂-z₁)²]

Let's solve a problem! Find the distance between:

Point A: (1, 2, 2)

Point B: (4, 6, 5)

Testing Your Understanding

Let's check if you can identify which calculations are correct. Drag each statement to True or False:

The distance between (0,0) and (3,4) is 5 units

The distance between (1,1) and (4,5) is 3 units

Adding a third dimension only adds one term under the square root

The distance between (0,0,0) and (1,1,1) is 2 units

The distance formula is also called Euclidean distance

Final Challenge

Given these points in a 3D space:

Point P: (2, 3, 1)
Point Q: (5, 1, 4)
Point R: (3, 4, 2)

Pick the correct distance between points P and Q!

4.12

3.74

6.1

Congratulations! You now understand how to calculate distances in kNN! Remember:

Start with the differences between coordinates
Square each difference
Add all squared differences
Take the square root of the sum

The number of dimensions just means more terms to add under the square root. This same process works whether you have 2, 3, or even more dimensions!

Introduction to AI

Glossary

Mapping Real World to AI

Features and Labels

Understanding Features and Labels in Machine Learning

What are Features?

Understanding Labels

Feature Identification Practice

Dimensionality Practice

Complex Dimensionality Practice

Real-world Scenarios Practice

Advanced Feature-Label Practice

Comprehensive Knowledge Check

Practical Application Exercise

Final Challenge

Understanding Distances in k-Nearest Neighbors (kNN)

Starting with 2D Points

Completing the 2D Distance

Moving to 3D Points

Testing Your Understanding

Final Challenge

Sign in to Innings2

Introduction to AI

Reset Progress

Glossary

Mapping Real World to AI

Features and Labels

Understanding Features and Labels in Machine Learning

What are Features?

Understanding Labels

Feature Identification Practice

Dimensionality Practice

Complex Dimensionality Practice

Real-world Scenarios Practice

Advanced Feature-Label Practice

Comprehensive Knowledge Check

Practical Application Exercise

Final Challenge

Understanding Distances in k-Nearest Neighbors (kNN)

Starting with 2D Points

Completing the 2D Distance

Moving to 3D Points

Testing Your Understanding

Final Challenge