Understanding the revised approach
In this section, we will be looking at the key concepts and approaches for alignment and smoothing. It is not that difficult to implement the Logistic Regression algorithm; we will be using the scikit-learn API. So, we will start with understanding the concepts and approaches for implementation.
Understanding concepts and approaches
Here, we will discuss how alignment and smoothing will work. Once we understand the technicality behind alignment and smoothing, we will focus on the Logistic Regression-based approach.
Alignment-based approach
Using this approach, we will be increasing the prices using a constant value so that our predicted price and actual price in testing the dataset will be aligned. Suppose we take 10 days into consideration. We will generate the average of the value of the prices. After that, we generate the average value for the prices that have been predicted by the first ML model. Once we generate both average values, we need to subtract the values, and the answer is the alignment value for those 10
days.
Let's take an intuitive working example that will help clear your vision. Consider 10 days from January 2, 2015, to January 11, 2015. For each record, you will take the average value for the actual price. Suppose the number will come to 17,676 and the average of predicted price value will be 13,175. In this case, you will get a difference of 4,501, which is the value for the alignment. We will add this value to our testing dataset so that testing price values and predicted price values will be aligned. You will find the code implementation in the Implement revised approach section.
Smoothing-based approach
In this approach, we will be using EWMA. EWMA stands for Exponentially Weighted Moving Average. The smoothing approach is based on the weighted average concept. In general, a weighted moving average is calculated by the following equation:
Here, xt is the input and yt is the output. Weights are calculated using the following equations:
Here, α is the smoothing constant. If the value of the smoothing constant is high, then it will be close to the actual value, and if the smoothing constant is low, then it will be smoother but not close to the actual value. Typically, in statistics the smoothing constant ranges between 0.1 and 0.3. Therefore, we can generate the smoothed value using the smoothing constant.
Let's take a working example. Take a smoothing constant = 0.3; if the actual value is 100 and the predicted value is 110, then the smoothed value can be obtain using this equation, which is (smoothing constant * actual value ) + (1- smoothing constant) * predicted value. The value that we will obtain is (0.3* 100) + (1-0.3)*110 = 107. For more information, you can refer to http://pandas.pydata.org/pandas-docs/stable/computation.html#exponentially-weighted-windows.
We will see the actual code-level implementation in the Implement revised approach section. pandas already has an API, so we can easily implement EWMA.
Logistic Regression-based approach
Implementing the Logistic Regression algorithm is a simple task because we just need to use the scikit-learn API. For the testing dataset, we will apply alignment and smoothing. After evaluating accuracy, we will decide whether we need to change the ML algorithm or not. We started with our intuition and slowly we improved our approaches. I don't really need to explain the Logistic Regression algorithm itself, but during the implementation, we will discuss the important points.
Now, it is time to move on to the implementation part of our revised approach. So, let's take a look at the next section.