From the course: Learning Data Science: Understanding the Basics

Unlock the full course today

Join today to access over 25,100 courses taught by industry experts.

Sift through big garbage

Sift through big garbage

- Unstructured data brings a whole new set of challenges. One of the first questions you run into is whether you ever want to delete some of your data. Remember that a data science team uses the scientific method with their data, you want to be able to ask interesting questions. So you need to decide if there's any limit to the questions that you'll ever want to ask. There are good arguments to keep and throw away parts of your data. Some data analysts argue that you'll never know every question that you might want to ask. It's also relatively cheap to keep massive amounts of data. Usually only a few cents per gigabyte. You may as well keep it as opposed to making real decisions about what to throw away. It might be cheaper to buy new hard drives than it is to spend time in long retention meetings. On the other hand, some analysts argue that you should throw away your data. There's a lot of garbage in those big data clusters. The more garbage you have, the more difficult it is to find…

Contents