- Tuning the model parameters—ML algorithms are configured with parameters specific to the underlying algorithm, and the optimal value of these parameters often depends on the type and structure of the data. The value of each parameter, or any of them combined, can have an impact on the performance of the model. We introduce various ways to find and select the best parameter values, and show how this can help in determining the best algorithm for the dataset in question.
- Selecting a subset of features—Many ML problems include a large number of features, and the noise from those features can sometimes make it hard for the algorithm to find the real signal in the data, even though they might still be informative on their own. For many ML problems, having a lot of data is a good thing; but it can sometimes be a curse. And because you don’t know beforehand when this will affect your model performance, you have to carefully determine the features that make up the most general and accurate model.
- Preprocessing the data—If you search the internet for machine-learning datasets, you’ll find easy-to-use datasets that many ML algorithms can be quickly applied to. Most real-world datasets, however, aren’t in such a clean state, and you’ll have to perform cleaning and processing, a process widely referred to as data munging or data wrangling. The dataset may include names that are spelled differently, although they refer to the same entity, or have missing or incorrect values, and these things can hurt the performance of the model. It may sound like edge cases, but you’ll be surprised how often this happens even in sophisticated, data-driven organizations.
Saturday, June 27, 2020
Optimizing model performance
Subscribe to:
Post Comments (Atom)
No comments:
Post a Comment