
Define what is going to be predicted
When working in predictive analytics, sometimes the stakeholder will have a general idea about what they want to predict: for example, churners, buyers, prices of houses, fraudulent transactions, and so on. In other cases, as we discussed before, you will get vague statements such as, "How can we know which product to offer to a customer who wants to leave the company?" In either case, in the Problem definition and understanding stage, it is your job to clarify and make the requirements explicit in terms of the outputs of the model: What do the outputs look like? In other words, what is being predicted, and what is the target that will solve the business problem?
For example, suppose we are asked to build a model for predicting the churn of clients in a telecommunications company; the target, in this case, could be a categorical variable, with two categories: "churners" (clients who will leave the company) versus "non-churners" (clients that will stay). However, based on your domain knowledge, you know that in fact there are two types of churners: "voluntary churners" and "involuntary churners." So, which target is better? The first one with two categories (churners versus non-churners) or the second one with three (voluntary churners, involuntary churners, and non-churners)? That answer of course, depends on the business goals for the model, and it will be your task to recommend or decide which target is better.