English: Zipf law plot (frequency as function of frequency rank) for the first four books (the Gospels) of the New Testament in the Vulgate Latin Bible, translated from Aramaic by St. Jerome around 400 CE. Converted to lowercase.
The books and the respective word frequency files are:
Book 1 - Gospel of Matthew. Sample: liber generationis iesu christi filii david filii abraham abraham genuit [...] ecce ego vobiscum sum omnibus diebus usque ad consummationem saeculi. File latn/nwt/mat.1/gud.wfr (16431 words, N = 3911 distinct).
Book 2 - Gospel of Mark. Sample: initium evangelii iesu christi filii dei sicut scriptum est in esaia [...] confirmante sequentibus signis. File latn/nwt/mrk.1/gud.wfr (10280 words, N = 2913 distinct).
Book 3 - Gospel of Luke. Sample: quoniam quidem multi conati sunt ordinare narrationem quae in nobis [...] erant semper in templo laudantes et benedicentes deum amen. File latn/nwt/luk.1/gud.wfr (18004 words, N = 4406 distinct).
Book 4 - Gospel of John. Sample: in principio erat verbum et verbum erat apud deum et deus erat verbum [...] eos qui scribendi sunt libros amen. File latn/nwt/joh.1/gud.wfr (14026 words, N = 2523 distinct).
The word frequency files '*/*/*/gud.wfr' are available at the UNICAMP website. The original annotated full texts are in the companion files */*/org/main.src. The extracted texts -- one word per line, without punctuation -- are in */*/*/gud.tlw.
to share – to copy, distribute and transmit the work
to remix – to adapt the work
Under the following conditions:
attribution – You must give appropriate credit, provide a link to the license, and indicate if changes were made. You may do so in any reasonable manner, but not in any way that suggests the licensor endorses you or your use.
share alike – If you remix, transform, or build upon the material, you must distribute your contributions under the same or compatible license as the original.