Data Preparation for Machine Learning: Data Cleaning, Feature Selection, and Data Transforms in PythonMachine Learning Mastery, 30 de jun. de 2020 - 398 páginas Data preparation involves transforming raw data in to a form that can be modeled using machine learning algorithms. Cut through the equations, Greek letters, and confusion, and discover the specialized data preparation techniques that you need to know to get the most out of your data on your next project. Using clear explanations, standard Python libraries, and step-by-step tutorial lessons, you will discover how to confidently and effectively prepare your data for predictive modeling with machine learning. |
Conteúdo
III Data Cleaning | 37 |
IV Feature Selection | 110 |
V Data Transforms | 212 |
VI Advanced Transforms | 316 |
VII Dimensionality Reduction | 337 |
VIII Appendix | 369 |
IX Conclusions | 380 |
Termos e frases comuns
average performance Box-Cox transform categorical categorical variables classification accuracy columns compare the average complete example Consider running cross-validation cv=cv data leakage data preparation data transforms DataFrame diabetes dataset differences in numerical dimensionality reduction discretization transform evaluate the model evaluating a model evaluation procedure example is listed Example of evaluating Example output Feature Engineering feature importance scores feature selection Gaussian given the stochastic header=None histograms horse colic imputation input and output input data input features load the dataset LogisticRegression machine learning algorithms matplotlib matplotlib import pyplot MinMaxScaler missing values model performance mutual information n_repeats=3 number of features numerical precision ordinal encoding outliers pandas import read_csv plot probability distribution pyplot.show quantile random forest random_state=1 regression results may vary rows Running the example scale scikit-learn selected features sklearn.datasets import sonar dataset specific results statistical stochastic nature target variable techniques train and test train_test_split(X training dataset tutorial vary given X_test X_test_fs X_train_fs y_train