Friday, October 30, 2015
How to Solve Missing Values
- Ignore the tuple: This is usually done when the class label is missing (assuming the mining task involves classification). This method is not very effective, unless the tuple contains several attributes with missing values. It is especially poor when the percent- age of missing values per attribute varies considerably.
- Fill in the missing value manually: In general, this approach is time-consuming and may not be feasible given a large data set with many missing values.
- Use a global constant to fill in the missing value: Replace all missing attribute values by the same constant, such as a label like “Unknown” or −∞. If missing values are replaced by, say, “Unknown,” then the mining program may mistakenly think that they form an interesting concept, since they all have a value in common—that of “Unknown.” Hence, although this method is simple, it is not foolproof.
- Use the attribute mean to fill in the missing value: For example, suppose that the average income of AllElectronics customers is $56,000. Use this value to replace the missing value for income.
- Use the attribute mean for all samples belonging to the same class as the given tuple: For example, if classifying customers according to credit risk, replace the missing value with the average income value for customers in the same credit risk category as that of the given tuple.
- Use the most probable value to fill in the missing value: This may be determined with regression, inference-based tools using a Bayesian formalism, or decision tree induction. For example, using the other customer attributes in your data set, you may construct a decision tree to predict the missing values for income.
Wednesday, October 28, 2015
Quality decisions must be based on quality data
Data preprocessing is an important step in the knowledge discovery process, because quality decisions must be based on quality data. Detecting data anomalies, rectifying them early, and reducing the data to be analyzed can lead to huge payoffs for decision making.
Wednesday, October 21, 2015
Monday, October 12, 2015
Wednesday, October 07, 2015
Subscribe to:
Posts (Atom)