The 13 best data science books to read, according to experts

The 13 best data science books to read, according to experts

Data science – the amalgamation of mathematics, statistics, computer science, machine learning , cluster analysis, data mining, and visualization – is no longer just the purview of data scientists. It is no wonder, therefore, that it has become a popular topic among business leaders, economists, anthropologists, and others. That said, those who are well entrenched in the ins and outs of data science can find an avalanche of new titles on the market. Here is a small compendium of some of the best.

General data science books

Everybody Lies: Big Data, New Data, and What the Internet Can Tell Us about Who We Really Are
By Seth Stephens-Davidowitz

This book is aimed at those seeking a greater understanding of how we express ourselves in our digital age. Their discoveries show how our digital actions – how and where we search the internet, for example – belie the image we have of ourselves. Every day, Stephens-Davidowitz notes, humans searching the Internet amass 8 trillion gigabytes of data. These data reveal our fears, desires, and behaviors, as well as conscious and unconscious biases. For example, what percentage of white voters did not vote for Barack Obama because he is black? They also reveal insights into a range of things, from the economy to sports to sex. For example, does where you go to school influence your success in life? Do parents secretly favor boys over girls?

Stephens-Davidowitz received a bachelor’s degree in philosophy from Stanford, where he graduated Phi Beta Kappa, and a Ph.D. in economics from Harvard. The book is available on Amazon , on the author ‘s website , at Barnes & Noble , and elsewhere.

Naked Statistics: Stripping the Dread from the Data
By Charles Wheelan

The second of three books in Wheelan’s Naked series, Naked Statistics brings the arcane and harrowing world of statistics closer together through tongue-in-cheek observations and unexpected real-world applications. Explore, for example, how Netflix recommends movies to watch and why “Let’s Make A Deal” contestants make the choices they do. Along the way, Wheelan clarifies key concepts like inference, correlation, and regression analysis. And, perhaps most importantly in this day and age of controversial polling practices, Wheelan explains how bias or carelessness can manipulate or misrepresent data.

Wheelan is a professor at the University of Chicago’s Harris School of Public Policy and a former correspondent for The Economist. Naked Statistics is available from Amazon , Apple Books , Barnes & Noble , and other places.

Weapons of Math Destruction: How Big Data Increases Inequality and Threatens Democracy
By Cathy O’Neill

O’Neill, a former Wall Street quant (an expert in quantitative data analysis and management), wrote this exceptionally important book after working as a data scientist, where she built models that predicted people’s purchases and clicks. Following her early and well-credited career – she received a Ph .in 2013. He also launched the Data Journalism Lede Program at Columbia in 2014 and founded ORCAA, an algorithmic auditing firm. One of ORCAA’s tenets—that AI ethics cannot be automated—explains the rationale it lays out in Weapons of Math Destruction : “[T]here is no excuse for an algorithm to be racist, sexist, discriminatory based on age or any other reason.”

Weapons of Math Destruction can be purchased from Amazon and Barnes & Noble . More information about the title can be found on the author’s blog ; More information about ORCAA can be found on the company’s website .

Algorithms of Oppression: How Search Engines Reinforce Racism

Dr. Safiya U. Noble, associate professor in the departments of Information Studies and African American Studies at UCLA, argues that the combination of private interests in promoting certain sites, along with the monopoly status of a relatively small number of internet search engines, leads to a set of biased search algorithms that privilege whiteness and discriminate against people of color, specifically women of color. Noble draws her conclusions – that there is a culture of racism and sexism in the way online discoverability is created – after analyzing textual and media searches and investigating paid advertising online. Algorithms of Oppressionwas listed in the New York Public Library’s Best Books for Adults (Nonfiction) of 2018 and recognized by Bustle magazine as one of “10 Books About Race You Should Read Instead of Asking a POC to Tell You explain it.”

Noble has appointments in African American Studies, Gender Studies and is a Research Associate at the Oxford Internet Institute at the University of Oxford. Currently, she is co-director of the UCLA Center for Critical Internet Research. Algorithms of Oppression can be purchased from Amazon , Barnes & Noble , , and other sites.

Data science books for beginners

Practical Statistics for Data Scientists: 50+ Essential Concepts Using R and Python
Por Peter Bruce, Andrew Bruce y Peter Gedeck

Although we have included this book in the beginners category, remember that everything is relative. Unlike titles listed in the general data science category, Practical Statistics for Data Scientists assumes some knowledge of the R programming language and some exposure to statistics. The authors, Peter and Andrew Bruce, try to find common ground between data scientists, many of whom they say have never received formal training in statistics, and statisticians, who often lack a data science perspective. of data. Practical Statistics for Data Scientists explains how to apply various statistical methods to data science and how to avoid their misuse. This title is available for purchase on Amazon.

Data Science from Scratch: First Principles with Python, 2nd Edition

Joel Grus, a principal engineer at Capital Group and a former software engineer at the Allen Institute for AI and Google, says that to really learn data science, you have to understand the principles behind it. The idea of ​​it is to show you how data science libraries, frameworks, modules and toolsets work, implementing them from scratch. Grus promises that if you have an aptitude for math and some programming knowledge, he can help you get comfortable with the math and statistics at the core of data science, and with the hacking skills you need to get started as a data scientist. data.

You can find more information about the author on his blog . Data Science from Scratch can be purchased from Amazon . The code and examples for the book (which requires at least Python 3.6), can be found on GitHub .

Python for Data Science: The Ultimate Beginners’ Guide to Learning Python Data Science Step by Step
Por Ethan Williams

Part of a voluminous series by Ethan Williams, this book is for absolute beginners who want to learn Python programming and its application to data science. Some Python libraries are introduced, such as NumPy, Pandas, Matplotlib, and Seaborn for data analysis and visualization. Practical examples and applications of each lesson are given, and the reader is also encouraged to practice the techniques through exercises. In addition, references to relevant reading and practice materials are provided.

Python for Data Science is available for purchase on Amazon , as an audiobook on Apple Books , from The Book Stall , and elsewhere.

An Introduction to Statistical Learning: With Applications in R
By Gareth James, Daniela Witten, Trevor Hastie and Robert Tibshirani

In its seventeenth edition, An Introduction to Statistical Learning follows the authors’ 2009 bestseller , The Elements of Statistical Learning (Hastie, Tibshirani, and Friedman, 2nd edition, 2009). An Introduction to Statistical Learning, which is aimed at statisticians and non-statisticians alike, only assumes a previous course in linear regression and no knowledge of matrix algebra. This book provides an accessible overview of the field of statistical learning, used to sort through the vast data sets that have emerged in biology, finance, marketing, astrophysics, and other fields in the last 20 years. Topics covered include linear regression, classification, resampling methods, contraction approaches, tree-based methods, support vector machines, clustering, and others. Color charts and actual examples are used to illustrate the methods presented. Each chapter contains a tutorial for implementing the analyzes and methods presented in R.

An Introduction to Statistical Learning is available for purchase on Amazon , Apple Books , and other sites.

Hands-On Machine Learning with Scikit-Learn, Keras, and TensorFlow: Concepts, Tools, and Techniques to Build Intelligent Systems

Aurélien Géron, an artificial intelligence engineer and former product manager at Google, aims to help programmers with little knowledge of machine learning by using simple and efficient tools to implement programs capable of learning from data. Using concrete examples, a minimum of theory, and two production-ready Python frameworks, Scikit-learn and TensorFlow, Géron offers an intuitive understanding of the concepts and tools for building intelligent systems. Readers will start with simple linear regression and progress to deep neural networks. Exercises are offered in each chapter to help the reader apply what they have learned. Hands-On Machine Learningexplores neural networks and various training models, such as support vector machines, decision trees, random forests, and assembly methods.

Hands-On Machine Learning is available for purchase from Amazon , Apple Books , Barnes & Noble , and elsewhere.

Python Crash Course for Data Analysis: A Complete Beginner Guide for Python Coding, NumPy, Pandas and Data Visualization
By AI Publishing

The book is for those who are new to Python and data science. It focuses primarily on hands-on learning. Offering an extensive library of AI titles, AI Publishing offers readers the ability to shorten the learning curve by using practical tools—including sample Python code, references, and exercises—available on the AI ​​website. the publisher, at no additional cost. The topics covered are the following

– Introduction to data analysis

– Python for data analysis: basic and advanced

– IPython and Jupyter notebooks

– NumPy for numerical data processing

– Pandas for data manipulation

– Data visualization

Python Crash Course for Data Analysis is available from Amazon , Book Depository , and other places.

Advanced Data Science Books
Pattern Recognition and Machine Learning
Por Christopher M. Bishop

Christopher Michael Bishop – Director of the Microsoft Research Laboratory in Cambridge and Professor of Computer Science at the University of Edinburgh and Fellow of Darwin College, Cambridge – presents this first textbook on pattern recognition that expounds the Bayesian point of view. The book presents approximate inference algorithms that allow quick and approximate answers to be obtained in situations where exact answers are not feasible. It uses graphical models to describe probability distributions when no other book applies graphical models to machine learning. No prior knowledge of pattern recognition or machine learning concepts is assumed, although familiarity with multivariate calculus and basic linear algebra is required.

Pattern Recognition and Machine Learning is available for purchase from Amazon and Springer .

Data Science with Python and Dask
by Jesse Daniel

Data Science with Python and Dask teaches you how to build scalable projects that can handle massive data sets. Dask provides dynamic task scheduling and parallel collections that extend the functionality of NumPy, Pandas, and Scikit-learn, allowing users to scale their code from a single laptop to a cluster of hundreds of machines with ease. The great thing about Data Science with Python and Dask is that it uses the example of parsing the NYC parking ticket database. Next, you simplify the process by using DataFrames. Using Dask-ML, you will build machine learning models. Then, using AWS and Docker, you’ll create interactive visualizations and clusters.

Data Science with Python and Dask is available from Amazon , Manning , and elsewhere. You can find more information about Dask at .

The Hundred-Page Machine Learning Book
by Andriy Burkov

Available in 11 languages, The Hundred-Page Machine Learning Book is the latest book from Andriy Burkov, who has a Ph.D. in AI and is the leader of a machine learning team at Gartner. This AI book is packed with best practices and design patterns for building reliable and scalable machine learning solutions. It is based on Burlov’s 15 years of experience in solving problems with AI and the published experience of industry leaders. In the book’s foreword, Cassie Kozyrkov, chief decision scientist at Google, describes The Hundred-Page Machine Learning Book as “one of the few true applied machine learning books out there.”


Leave a Reply