Last updated: 16/10/10 The Clementine Vulgate: plain text
Introduction
This page is for those who want to convert the text of the Clementine Vulgate into a format of their choosing. You can also view the text online, search it and print it at this site. This document describes the mark-up used, and provides copies of the source files together with some sample conversion scripts.
Download
You most probably want the latest source files and sample scripts:
Subversion repository
However, you can also recreate the text as it stood at any time since its release. The text is now stored in a Subversion repository, and you can browse the history of the text online.
If you want to download old versions of the text, it will be more convenient to install a (free) Subversion client on your machine. For example, if you install the official client and you want to get the text as it stood at the start of 2006, you can simply do:
svn co -r {"2006-01-01 12:00"} https://svn.sourceforge.net/svnroot/vulsearch/clemtext NewYearText
Checksums for each revision of the text are available, and this page also matches Subversion revision numbers to actual dates.
The source package contains the simple scripts I use to make HTML and PDF files from the source files. All of the scripts are released under the GPL, and can be run using free software (see the Tools section below for download links); they are described in detail below.
Description of the markup
The text is plain text, codepage 1252, with DOS-style line endings.
Commas and periods have no space before, and a single space after
(unless they end a line—there is never a space at the end of a
verse), whereas : ; ? ! each have a single space before, and a single
space after (unless they end a line). In general, the first word of a
verse is not capitalized, nor the first word of a line of poetry, but
the first word of a sentence, as well as the first word of direct speech
or quotation, is capitalized.
The text really has two structures: the traditional division into books, chapters and verses, and a 'natural' structure as sentences and paragraphs. This latter structure is not an intrinsic part of the text, and has been imposed differently by each editor of the Vulgate through the centuries; for my part I have tried to use punctuation both to make the meaning transparent, and to reflect the natural cadences in the text.
- Paragraph divisions are indicated by a backslash
\, though this is omitted at the very start or end of a chapter. This is followed by a space if it should occur in the middle of a verse. - When text is set as verse, the start and end of a section of verse are
indicated by brackets
[(preceded by a space) and](followed by a space unless it end the verse) respectively. Line breaks within the verse are indicated by a slash/(followed by a space unless it end the verse). - When different speakers are indicated (e.g. in the
Lamentations), the speaker's name is placed between angle brackets
<…>, with no space after the closing bracket. - Lamentations and Ecclesiasticus have prologues (which I believe
are non-canonical?). In the source,
this appears at the start of 1:1, though logically it belongs before
the start of ch. 1. The prologue is preceded by
<Prologus>and in both books the text of verse 1 begins at the first bracket[. - Information on the creators and proof-readers of each book can be
found in
source/data.txt; a description of the format of this file appears at its head.
The example scripts
Included with the source files are some example scripts, which I hope might be helpful models for anyone who wants to convert the text into another format—these are the scripts used to produce the HTML and PDF versions of the text available on this site. I hope the scripts themselves are well enough commented: below is a description.
makehtml
This consists of two scripts:
- A sed script that does the meat of the work: it converts the source files to the body of a valid XHTML document. It should be an easy matter to modify this script to produce XML with whatever tags you want.
- A perl wrapper round the sed script that adds a header and footer and so on, in order to make a valid complete XHTML document.
To generate the HTML, ensure that perl and sed are in your path, open
a shell or command window at the clemtext directory, and
type perl makehtml.pl (the output is placed in the
html directory).
makelatex
Once again, a sed script does all the work, and a perl wrapper cleans
things up. There are also two batch files (which are also valid shell
scripts) to generate two different PDF files from the LaTeX source, one
("vulgate") A4 paper with wide margins; the other ("twocolumn"), letter
paper (8.5 x 11") in two columns. To make everything, ensure that perl,
sed, and pdflatex are in your path, open a shell or command window at
the clemtext directory and type
perl makelatex.pl, then
makepdf-vulgate.bat, then
makepdf-twocolumn.bat (the output is placed in the
latex directory).
Tools
The tools needed to run the scripts described here are a standard part of any Linux distribution. Here is a list of sites where Windows users can download them freely.

