Preprocessing of the corpora