Course Resources

Search

Search IconIcon to open search

Last updated Unknown

# Model Training and Evaluation

# K-Fold Cross Validation

# Example of K-Fold in Python

# admissions.csv

StudentIDGRE_ScoreGPACollege_RatingAdmission
13203.541
23103.030
33253.851
43002.820
53153.431
63053.220

# Example Code

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
# Import necessary libraries
import pandas as pd
from sklearn.model_selection import cross_val_score
from sklearn.ensemble import RandomForestClassifier

# Load the dataset
df = pd.read_csv('admissions.csv')

# Features and target
X = dfspan>&
y = df['Admission']

# Define the model
model = RandomForestClassifier()

# Perform K-Fold Cross-Validation
scores = cross_val_score(model, X, y, cv=3, scoring='accuracy')  # 3 folds

# Print results
print("K-Fold Cross-Validation Results")
print(f"Fold Accuracies: {scores}")
print(f"Mean Accuracy: {scores.mean():.2f}")

# Learning and Validation Curves

# Handling Overfitting / Underfitting

Overfitting: High training accuracy, low test accuracy.

Underfitting: Low accuracy on both training and test data.

# Overfitting

Overfitting happens when your model performs well on training data but poorly on unseen data because it “memorizes” the training details rather than learning general patterns.

Possible Solutions to Consider:

# Underfitting

Underfitting happens when your model performs poorly on both training and unseen data because it’s too simple to capture the underlying patterns.

Possible Solutions to Consider:

The goal is to find the sweet spot where the model is neither overfitting nor underfitting by carefully tuning complexity, hyperparameters, and features.

# Back AI For Devs - ML Pathway