# Model Training and Evaluation
# K-Fold Cross Validation
# Example of K-Fold in Python
# admissions.csv
StudentID | GRE_Score | GPA | College_Rating | Admission |
---|---|---|---|---|
1 | 320 | 3.5 | 4 | 1 |
2 | 310 | 3.0 | 3 | 0 |
3 | 325 | 3.8 | 5 | 1 |
4 | 300 | 2.8 | 2 | 0 |
5 | 315 | 3.4 | 3 | 1 |
6 | 305 | 3.2 | 2 | 0 |
… | … | … | … | … |
- StudentID: A unique identifier for each student
- GRE_Score: The student’s GRE test score (out of 340)
- GPA: Undergraduate Grade Point Average (on a 4.0 scale)
- College_Rating: College prestige rating (1 = lowest, 5 = highest)
- Admission: Target variable (1 = Admitted, 0 = Not Admitted)
# Example Code
|
|
# Learning and Validation Curves
# Handling Overfitting / Underfitting
Overfitting: High training accuracy, low test accuracy.
Underfitting: Low accuracy on both training and test data.
# Overfitting
Overfitting happens when your model performs well on training data but poorly on unseen data because it “memorizes” the training details rather than learning general patterns.
Possible Solutions to Consider:
- Simplify the Model: Use a less complex algorithm (e.g., reduce the depth of decision trees or use fewer layers in neural networks).
- Regularization: Apply L1 (Lasso) or L2 (Ridge) regularization to penalize overly complex models.
- Reduce Features: Remove irrelevant or redundant features via feature selection.
- Increase Training Data: Gather more varied data to help the model generalize better.
- Tune Hyperparameters with Cross-Validation: Tune hyperparameters using techniques like K-Fold Cross-Validation to find the best model settings.
# Underfitting
Underfitting happens when your model performs poorly on both training and unseen data because it’s too simple to capture the underlying patterns.
Possible Solutions to Consider:
- Increase Model Complexity: Use an algorithm capable of modeling more complex relationships in the data (such as moving from linear regression to random forests or from random forests to neural networks, depending on the problem type).
- Add Features: Include more informative features that help the model learn better patterns.
- Train Longer: Increase the number of training epochs for models like neural networks to better fit the data.
The goal is to find the sweet spot where the model is neither overfitting nor underfitting by carefully tuning complexity, hyperparameters, and features.