About the Project
The Corpus Liberatum Linguæ Graecæ (CLLG) is a collaborative open-science project aimed at building a freely accessible, high-quality corpus of ancient Greek texts. It provides a complete pipeline from scanned edition images to structured TEI XML, serving the needs of philology, digital humanities, and natural language processing.
Access to ancient Greek texts is currently dominated by the Thesaurus Linguae Graecae (TLG), a proprietary resource that limits reproducible research. While existing open initiatives such as Perseus, First1KGreek, and the Patristic Text Archive (PTA) have made significant contributions, their coverage remains partial — in particular for late antique (Christian and non-Christian) and Byzantine texts. The CLLG aims to fill this gap with an open, sustainable alternative.
What We Do
The project operates along three major axes:
- Technological development — OCR for polytonic Greek, layout analysis, and automatic TEI XML encoding.
- Corpus production — High-quality, interoperable corpora structured in TEI XML, covering prose texts with canonical references.
- Open distribution — All data and tools are released under free licenses and published via Nakala and GitLab.
Legal Context
The project operates within the framework established by French case law (TGI, Droz v. Classiques Garnier, 27 March 2014, confirmed on appeal 9 June 2017), which holds that the scholarly transcription of ancient texts does not constitute an original creative work protected by copyright, as the transcriber’s choices are governed by scientific method rather than personal expression.
Funding
This project is supported by the ANR (Agence Nationale de la Recherche) and carried out within the PIQ project. The project « Corpus Liberatum Linguae Graecae » was supported by the French National Research Agency (ANR) under the France 2030 grant reference number ANR-24-RRII-0002, operated by the Inria Quadrant Program.
The longer-term infrastructure goal is Biblissima Textes, a new component of Biblissima that will serve CLLG texts and other open corpora for ancient and medieval languages via the Distributed Text Services (DTS) API.