Mark-up and Annotation in the Corpus of Historical English Law Reports (CHELAR): Potential for Historical Genre Analysis

  • Paula Rodríguez-Puente Universidad de Oviedo
  • Cristina Blanco-García Universidade de Santiago de Compostela
  • Iván Tamaredo Universidade de Vigo


Adding annotation and mark-up to linguistic corpora has become a standard practice in corpus building over the past few decades as a way to facilitate data extraction and at the same time guarantee that new corpora are compatible with existing and future tools. The purpose of this article is twofold. First, we provide an overview of the main forms of annotation and mark-up available to the research community and how they have been applied to the Corpus of Historical English Law Reports 1535-1999 (CHELAR), a specialized corpus consisting of law reports or records of judicial decisions. Second, we give an account of preliminary research based on the annotated versions of CHELAR, which so far has been primarily aimed at identifying the distinctive linguistic characteristics of law reports, as well as at investigating how the language of law reports has evolved over a time span of almost five centuries. Our article illustrates the multiple advantages of applying a simple annotation schema to a corpus and how this can enhance the potential of a corpus for historical genre analysis.Keywords: corpus annotation; corpus mark-up; law reports; TEI-XML; legal English

Author Biographies

Paula Rodríguez-Puente, Universidad de Oviedo
Paula Rodríguez-Puente is Assistant Professor (tenure track) of English language and linguistics at the University of Oviedo. Her research interests include English historical linguistics and corpus linguistics. She has published widely in international peer-reviewed journals and edited volumes forJohn Benjamins, Peter Lang and Cambridge Scholars. Her monograph on the history of phrasal verbs was published by Cambridge University Press.
Cristina Blanco-García, Universidade de Santiago de Compostela
Cristina Blanco-García holds an MA in English Language and Literature from the University of Santiago de Compostela. She is currently working on her PhD on ephemeral adverbial subordinators in the history of English, with a special focus on concessives, conditionals and causals. Her research interests also include linguistic variation and change, grammaticalization processes and corpus linguistics.
Iván Tamaredo, Universidade de Vigo
Iván Tamaredo, formerly a predoctoral researcher at the University of Santiago de Compostela, is currently a Lecturer in English at the University of Vigo. His research interests include varieties of English and linguistic complexity. He has presented papers at international conferences and publishedarticles in peer-reviewed journals such as English World-Wide, English Language and Linguistics and ICAME Journal.


Aarts, Bas et al., eds. 2013. The Verb Phrase in English: Investigating Recent Language Change with Corpora. Cambridge: Cambridge UP.

Atkinson, Dwight. 1999. Scientific Discourse in Sociohistorical Context. The Philosophical Transactions of the Royal Society of London, 1675-1975. Mahwah, NJ: Lawrence Erlbaum.

Banks, David. 2005. “On the Historical Origins of Nominalized Process in Scientific Text.” English for Specific Purposes 24 (3): 347-57.

Bhatia, Vijay K. 1987. “Language of the Law.” Language Teaching 20 (4): 227-34.

—. 1993. Analysing Genre: Language Use in Professional Settings. Harlow: Pearson Education.

Biber, Douglas. 1988. Variation Across Speech and Writing. Cambridge: Cambridge UP.

Biber, Douglas and Edward Finegan. 1989. “Drift and the Evolution of English Style: A History of Three Genres.” Language 65 (3): 487-517.

—. 1997. “Diachronic Relations among Speech-Based and Written Registers in English.” In Nevalainen and Kahlas-Tarkka 1997, 253-75.

Biber, Douglas et al. 1999. Longman Grammar of Spoken and Written English. London: Longman.

Biber, Douglas and Bethany Gray. 2019. “Are Law Reports an ‘Agile’ or an ‘Uptight’ Register? Tracking Patterns of Historical Change in the Use of Colloquial and Complexity Features.” In Fanego and Rodríguez-Puente 2019, 149-69.

Bray, Tim et al., eds. 2008. Extensible Markup Language (XML) 1.0. 5th edition. W3C Recommendation 26 November 2008. [Accessed online on July 10, 2019].

Breeze, Ruth, Maurizio Gotti and Carmen Sancho-Guinda, eds. 2014. Interpersonality in Legal Genres. Bern: Peter Lang.

Burnard Lou and Syd Bauman, eds. 2013. TEI P5: Guidelines for Electronic Text Encoding and Interchange. Charlottesville, VI: Text Encoding Initiative Consortium.

Carletta, Jean et al. 2004. “A Generic Approach to Software

Support for Linguistic Annotation Using XML.” In Sampson and McCarthy 2004, 449-59.

Dalton-Puffer, Christiane et al., eds. 2006. Syntax, Style and Grammatical Norms: English from 1500-2000. Bern: Peter Lang.

Fanego, Teresa et al. 2017. “The Corpus of Historical English Law Reports 1535-1999 (CHELAR): A Resource for Analysing the Development of English Legal Discourse.” ICAME Journal 41: 53-82.

Fanego, Teresa and Paula Rodríguez-Puente. 2019. “‘Why May not that Be the Skull of a Lawyer?’ English Legal Discourse Past and Present.” In Fanego and Rodríguez-Puente 2019, 1-21.

Fanego, Teresa and Paula Rodríguez-Puente, eds. 2019. Corpus-Based Research on Variation in English Legal Discourse. Amsterdam and Philadelphia: John Benjamins.

Garside, Roger. 1987. “The CLAWS Word-Tagging System.” In Garside, Leech and Sampson 1987, 30-41.

Garside Roger, Geoffrey Leech and Geoffrey Sampson, eds. 1987. The Computational Analysis of English: A Corpus-Based Approach. London: Longman.

Goźdź-Roszkowski, Stanisław. 2011. Patterns of Linguistic Variation in American Legal English. A Corpus-Based Study. Bern: Peter Lang.

Görlach, Manfred. 1999. Nineteenth-Century England: An Introduction. Cambridge: Cambridge UP.

Gries, Stefan and Andrea Berez. 2017. “Linguistic Annotation in/for Corpus Linguistics.” In Ide and Pustejovsky 2017, 379-409.

Halliday, M. A. K. and James R. Martin. 1993. Writing Science: Literary and Discursive Power. London: Falmer Press.

Hardie, Andrew. 2014. “Modest XML for Corpora: Not a Standard, but a Suggestion.” ICAME Journal 38: 73-103.

Hiltunen, Risto. 1990. Chapters on Legal English: Aspects Past and Present of the Language of the Law. Helsinki: Suomalainen Tiedeakatemia.

Hundt, Marianne and Christian Mair. 1999. “‘Agile’ and ‘Uptight’ Genres: The Corpus-Based Approach to Language Change in Progress.” International Journal of Corpus Linguistics 4 (2): 221-42.

Hunston, Susan. 2002. Corpora in Applied Linguistics. Cambridge: Cambridge UP.

Ide, Nancy and James Pustejovsky, eds. 2017. Handbook of Linguistic Annotation. Berlin and New York: Springer.

Jucker, Andreas, Daniel Schreier and Marianne Hundt, eds. 2009. Corpora: Pragmatics and Discourse. Amsterdam and New York: Rodopi.

Kytö, Merja and Terry Walker. 2003. “The Linguistic Study of Early Modern English Speech-Related Texts: How ‘Bad’ Can ‘Bad’ Data Be?” Journal of English Linguistics 31 (3): 221-48.

Lass, Roger, ed. 1999. The Cambridge History of the English Language. Vol. 3, 1476-1776. Cambridge: Cambridge UP.

Leech, Geoffrey. 2005. “Adding Linguistic Annotation.” In Wynne 2005, 17-29.

Leech, Geoffrey et al. 2009. Change in Contemporary English: A Grammatical Study. Cambridge: Cambridge UP.

Ljung, Magnus, ed. 1997. Corpus-Based Studies in English: Papers from the Seventeenth International Conference on English Language Research on Computerized Corpora (ICAME 17). Amsterdam: Rodopi.

López-Couso, María José, et al. eds. 2016. Corpus Linguistics on the Move: Exploring and Understanding English through Corpora. Leiden: Brill.

Magrath, Paul, ed. 2015. The Law Reports 1865-2015. London: The Incorporated Council of Law Reporting for England and Wales.

Mair, Christian. 1997. “The Corpus-Based Approach to Language Change in Progress.” In Ljung 1997, 195-209.

Mair, Christian. 2006. Twentieth-Century English: History, Variation and Standardization. Cambridge: Cambridge UP.

Mattiello, Elisa. 2010. “Nominalization in English and Italian Normative Legal Texts.” ESP Across Cultures 7: 129-46.

McEnery, Tony and Anita Wilson. 2001. Corpus Linguistics. 2nd ed. Edinburgh: Edinburgh UP.

McEnery, Tony, Richard Xiao and Yukio Tono. 2006. Corpus-Based Language Studies: An Advanced Resource Book. London and New York: Routledge.

Mitchell, Paul. 2015. “Between Speech and Writing.” In Magrath 2015, 37-46.

Nevalainen, Terttu. 1999. “Early Modern English Lexis and Semantics.” In Lass 1999, 332-459.

Nevalainen, Terttu and Leena Kahlas-Tarkka, eds. 1997. To Explain the Present: Studies in the Changing English Language in Honour of Matti Rissanen. Helsinki: Société Néophilologique.

Oxford English Dictionary Online. 2019. [Accessed online on June 17, 2019].

Rodríguez-Puente, Paula. 2011. “Introducing the Corpus of Historical English Law Reports: Structure and Compilation Techniques.” Revista de Lenguas para Fines Específicos 17: 99-120.

—. 2018a. “On the Active/Passive Alternation in Law Reports.” Paper presented at the 42nd AEDEAN Conference, Córdoba, April 2018.

—. 2018b. “Frequency and Productivity of Nominalizations in Law Reports: A Diachronic Perspective.” Paper presented at the 20th International Conference of English Historical Linguistics, Edinburgh, August 2018.

—. 2019. “Interpersonality in Legal Written Discourse. A Diachronic Analysis of Personal Pronouns in Law Reports, 1535 to Present.” In Fanego and Rodríguez-Puente 2019, 171-99.

Rodríguez-Puente, Paula et al., comps. 2016. Corpus of Historical English Law Reports 1535-1999 (CHELAR), v.1. Santiago de Compostela: Research Unit for Variation, Linguistic Change and Grammaticalization, University of Santiago de Compostela.

Rodríguez-Puente, Paula et al., comps. 2018. Corpus of Historical English Law Reports 1535-1999 (CHELAR), v.2. Santiago de Compostela: Research Unit for Variation, Linguistic Change and Grammaticalization, University of Santiago de Compostela.

Rühlemann, Christoph and Martin Hilpert. 2017. “Colloquialization in Journalistic Writing.” Journal of Historical Pragmatics 18 (1): 101-35.

Sampson, Geoffrey and Diana McCarthy, eds. 2004. Corpus Linguistics: Readings in a Widening Discipline. London and New York: Continuum.

Sancho-Guinda, Carmen, Maurizio Gotti and Ruth Breeze. 2014. “Framing Interpersonality in Law Contexts.” In Breeze, Gotti and Sancho-Guinda 2014, 9-35.

Šarčević, Susan. 2000. New Approach to Legal Translation. The Hague: Kluwer Law International.

Seoane, Elena. 2006a. “Information Structure and Word Order Change: The Passive as an Information Rearranging Strategy in the History of English.” In van Kemenade and Los 2006, 360-91.

—. 2006b. “Changing Styles: On the Recent Evolution of Scientific British and American English.” In Dalton-Puffer et al. 2006, 191-221.

—. 2013. “On the Conventionalisation of the Passive Voice in Late Modern English Scientific Discourse.” Journal of Historical Pragmatics 14 (1): 70-99.

Seoane, Elena and Christopher Williams. 2006. “Questions of Style: Legal Drafting Manuals and Scientific Style Manuals in Contemporary English.” Linguistica e Filologia 22: 115-37.

Sperberg-McQueen, Michael and Lou Bernard, eds. 1990. TEI P1: Guidelines for the Encoding and Interchange of Machine-Readable Texts. Chicago, IL and Oxford: Text Encoding Initiative.

Taavitsainen, Irma, et al. 2014. “‘Late Modern English Medical Texts 1700-1800’: A Corpus for Analysing Eighteenth-Century Medical English.” ICAME Journal 38: 137-53.

Tiersma, Peter M. 1999. Legal Language. Chicago, IL: The U. of Chicago P.

Tyrkkö, Jukka and Turo Hiltunen. 2009. “Frequency of Nominalization in Early Modern English Medical Writing.” In Jucker, Schreier and Hundt 2009, 297-320.

Van Kemenade, Ans and Bettelou Los, eds. 2006. The Handbook of the History of English. Oxford: Blackwell.

Widlitzki, Bianca and Magnus Huber. 2016. “Taboo Language and Swearing in Eighteenth Century and Nineteenth Century English: A Diachronic Study Based on the ‘Old Bailey Corpus’.” In López-Couso et al. 2016, 313-36.

Williams, Christopher. 2004. “Pragmatic and Cross-Cultural Considerations in Translating Verbal Constructions in Prescriptive Legal Texts in English and Italian.” Textus 17 (1), 217-46.

—. 2007. Tradition and Change in Legal English: Verbal Constructions in Prescriptive Texts. 2nd ed. Bern: Peter Lang.

—. 2013. “Changes in the Verb Phrase Legislative Language in English.” In Aarts et al. 2013, 353-71.

Wynne, Martin, ed. 2005. Developing Linguistic Corpora: A Guide to Good Practice. Oxford: Oxbow.