Abstract: e-Newspapers are made up of complex multi article page layout. Accordingly, each individual article is divided into multiple blocks which are not in reading order sequence. This paper proposes an approach to reconstruct the articles which includes the task of article aggregation with the English text reading order of blocks. Therefore an interpolation model is used to combine a part of speech based and a word based n-gram language models to predict the word in a sentence. This sequence probability model identifies the correct sequence of the blocks of article in English e-newspaper. Consequently, the operation is conducted by computing the probability of sequence from the given corpus.
Keywords: HMM, N-Gram, Newspaper, NLP, POS Tagging, Word Prediction.