Bad Data Handbook: Cleaning Up The Data So You Can Get Back To Work"O'Reilly Media, Inc.", 7 de nov. de 2012 - 264 páginas What is bad data? Some people consider it a technical phenomenon, like missing values or malformed records, but bad data includes a lot more. In this handbook, data expert Q. Ethan McCallum has gathered 19 colleagues from every corner of the data arena to reveal how they’ve recovered from nasty data problems. From cranky storage to poor representation to misguided policy, there are many paths to bad data. Bottom line? Bad data is data that gets in the way. This book explains effective ways to get around it. Among the many topics covered, you’ll discover how to:
|
Conteúdo
1 | |
5 | |
Chapter 3 Data Intended for Human Consumption Not Machine Consumption | 31 |
Chapter 4 Bad Data Lurking in Plain Text | 53 |
Chapter 5 ReOrganizing the Webs Data | 69 |
Chapter 6 Detecting Liars and the Confused in Contradictory Online Reviews | 83 |
Chapter 7 Will the Bad Data Please Stand Up? | 95 |
Chapter 8 Blood Sweat and Urine | 107 |
A Guide for When to Stick to Files | 151 |
Chapter 13 Crouching Table Hidden Network | 163 |
Chapter 14 Myths of Cloud Computing | 175 |
Chapter 15 The Dark Side of Data Science | 187 |
Chapter 16 How to Feed and Care for Your MachineLearning Experts | 195 |
Chapter 17 Data Traceability | 205 |
Erasable Ink? | 213 |
Knowing When Your Data Is Good Enough | 225 |
Chapter 9 When Data and Reality Dont Match | 119 |
Chapter 10 Subtle Sources of Bias and Error | 129 |
Is Bad Data Really Bad? | 143 |
239 | |
About the Author | 246 |
Outras edições - Ver todos
Termos e frases comuns
administrative data algorithm allocation analysis application ASCII asset bad data bias chapter character encoding characters classifier cloud computing Code Page 1252 Code Page 858 code points column cost center crawl create data quality data science data scientists data values database dataset decode delimiter distribution earnings end user error example expect extract Facebook field Figure format function gender="Male Google graph Hadoop histogram iconv imputed infrastructure inputs JSON Kmart look machine-learning MacRoman male students MapReduce myths negative nodes non-ASCII ofthe perl plain text problem Python query question recommendations records resyndication rows sample Schwabish script server simple social media services spreadsheet statistics there’s things topcode Unicode URL encoding validation variables web crawler web scraping write YAML