Algorithm selection
We need to iterate on the complex problem of the creating the algorithm. This entails exploring the data to gain a deep understanding of the underlying variables. Once we have an idea of the kind of algorithm we want to apply, we'll need to further prepare the data, possibly combining it with other data sources (for example, census data). In our example, this could mean creating a song similarity matrix. Once we have the data, we can train a model so that it is capable of making predictions, and test that model against holdout data to see how it performs. There are many considerations in this process that make it complex:
- How the data is encoded (for example, how the song matrix is constructed)
- What algorithm is used (example, collaborative filtering or content-based filtering)
- What parameter values your model takes (for example, values for smoothing constants or prior distributions)
Our goal in this book is to make this step easier for you by presenting iterations a data scientist would undergo in the task of creating a successful model using real-world applications as examples.