Clementine text project

VulSearch & the Clementine Vulgate project

Last updated: 21/4/08 The Clementine Vulgate: plain text

Introduction

This page is for those who want to convert the text of the Clementine Vulgate into a format of their choosing. You can also view the text online, search it and print it at this site. This document describes the mark-up used, and provides copies of the source files together with some sample conversion scripts.

Download

You most probably want the latest source files and sample scripts:

Subversion repository

However, you can also recreate the text as it stood at any time since its release. The text is now stored in a Subversion repository, and you can browse the history of the text online.

If you want to download old versions of the text, it will be more convenient to install a (free) Subversion client on your machine. For example, if you install the official client and you want to get the text as it stood at the start of 2006, you can simply do:

svn co -r {"2006-01-01 12:00"} https://svn.sourceforge.net/svnroot/vulsearch/clemtext NewYearText

Checksums for each revision of the text are available, and this page also matches Subversion revision numbers to actual dates.

The source package contains the simple scripts I use to make HTML and PDF files from the source files. All of the scripts are released under the GPL, and can be run using free software (see the Tools section below for download links); they are described in detail below.

Description of the markup

The text is plain text, codepage 1252, with DOS-style line endings. Commas and periods have no space before, and a single space after (unless they end a line—there is never a space at the end of a verse), whereas : ; ? ! each have a single space before, and a single space after (unless they end a line). In general, the first word of a verse is not capitalized, nor the first word of a line of poetry, but the first word of a sentence, as well as the first word of direct speech or quotation, is capitalized.

The text really has two structures: the traditional division into books, chapters and verses, and a 'natural' structure as sentences and paragraphs. This latter structure is not an intrinsic part of the text, and has been imposed differently by each editor of the Vulgate through the centuries; for my part I have tried to use punctuation both to make the meaning transparent, and to reflect the natural cadences in the text.

The example scripts

Included with the source files are some example scripts, which I hope might be helpful models for anyone who wants to convert the text into another format—these are the scripts used to produce the HTML and PDF versions of the text available on this site. I hope the scripts themselves are well enough commented: below is a description.

makehtml

This consists of two scripts:

To generate the HTML, ensure that perl and sed are in your path, open a shell or command window at the clemtext directory, and type perl makehtml.pl (the output is placed in the html directory).

makelatex

Once again, a sed script does all the work, and a perl wrapper cleans things up. There are also two batch files (which are also valid shell scripts) to generate two different PDF files from the LaTeX source, one ("vulgate") A4 paper with wide margins; the other ("twocolumn"), letter paper (8.5 x 11") in two columns. To make everything, ensure that perl, sed, and pdflatex are in your path, open a shell or command window at the clemtext directory and type perl makelatex.pl, then makepdf-vulgate.bat, then makepdf-twocolumn.bat (the output is placed in the latex directory).

Tools

The tools needed to run the scripts described here are a standard part of any Linux distribution. Here is a list of sites where Windows users can download them freely.