Friday, May 06, 2005
The Presentation of Electronic Texts
Thanks to Sauvage Noble for drawing my attention to a very interesting collection of Latin texts, under the name Bibliotheca Latinitatis Romana, which includes some off-beat things I have not seen elsewhere, such as Ilias Latina, Testamentum Porcelli, and Dares Phrygius.
Grateful as I am, I cannot resist a few quibbles about the presentation of these texts.
I once wrote a quick and dirty computer program, for my own use, to extract word frequencies from texts. One useful form of output was a complete list of all words arranged by frequency, from the most frequent words to the hapax legomena. You can get statistics like this from Bibliotheca Latinitatis Romana, but they're split over several pages. It would also be a real boon if someone would make available a similar program that grouped together related Latin words regardless of their inflections (e.g. statistics for ferre and tulit under fero).
Despite my quibbles, I cannot stress enough how grateful I am to have ancient texts freely available online. Scholars with university affiliations have access to corpora of ancient texts in digital form (such as TLG and PHI), but these collections are too expensive for independent scholars on limited budgets.
Newer› ‹Older
Grateful as I am, I cannot resist a few quibbles about the presentation of these texts.
- You have to click through at least three links to get from the table of contents to an actual text.
- Many texts, even very short ones like Contra Haereticos, are split up into multiple pages, which is a pain when you want to download a complete text to your own computer, as I often do. I always like to see a single text on a single web page (although it's OK to split up something like Vergil's Aeneid into books). The presentation of the texts at Perseus is frustrating, for the same reason -- they're all chopped up into little pieces.
- Line numbers are missing, at least for the couple of plays of Plautus I spot checked, which is a major hindrance if you're starting from a citation with a line number and want to find the passage where it occurs. Some texts at The Latin Library also lack line numbers, e.g. Ovid's Heroides.
Frequency - Word FormThe word html isn't in any Latin lexicon I've ever seen!
1 habent
1 haec
2 homines
2 html
I once wrote a quick and dirty computer program, for my own use, to extract word frequencies from texts. One useful form of output was a complete list of all words arranged by frequency, from the most frequent words to the hapax legomena. You can get statistics like this from Bibliotheca Latinitatis Romana, but they're split over several pages. It would also be a real boon if someone would make available a similar program that grouped together related Latin words regardless of their inflections (e.g. statistics for ferre and tulit under fero).
Despite my quibbles, I cannot stress enough how grateful I am to have ancient texts freely available online. Scholars with university affiliations have access to corpora of ancient texts in digital form (such as TLG and PHI), but these collections are too expensive for independent scholars on limited budgets.