In
Mapreduce, InputSplit’s RecordReader will start and end at a record boundary.
In SequenceFiles, every 2k bytes has a 20 bytes sync mark between the records.
These sync marks allow the RecordReader to seek to the start of the InputSplit,
which contains a file, offset and length and find the first sync mark after the
start of the split. The RecordReader continues processing records until it
reaches the first sync mark after the end of the split. Text files are handled
similarly, using newlines instead of sync marks.