Modern Big Data Processing with Hadoop
上QQ阅读APP看书,第一时间看更新

Text/CSV file

Text and CSV files are very common in Hadoop data processing algorithms. Each line in the file is treated as a new record. Typically, each line ends with the n character. These files do not support column headers. Hence, while processing, an extra line of the code is always required to remove column headings. CSV files are typically compressed using GZIP codec because they do not support block level compression; it adds to more processing costs. Needless to mention they do not support schema evolution.