Welcome / Bienvenue / Benvinguts / Bienvenidos
For information about my translation services, please visit the main site.
Pour des informations sur mes services, merci de regarder le site principal.
Para información sobre mis servicios de traducción, visite el web principal.

Corpus analysis tools

For professional translations, visit timtranslates.com.

I was first introduced to corpus analysis when taking the Spanish-English module at the UAB as part of my degree. It proved very useful, especially for medical texts, since up to 500 texts can be batch downloaded from Medline in txt format. I don’t want to go into more detail right now, but if you want to see how corpus analysis works, you can read an article of mine on the subject.

At the time we used WordSmith tools, and at the cheap price of ₤50 (approx €75) I quickly bought myself a copy. My copy was version 4.0, while at university we were using version 3.0. The new version had the advantage of not being limited to DOS names (8 characters only) and generally had a better layout and a few interesting new functions, in particular the WebGetter. Unfortunately it also became less stable, and certain functions stopped working, like searching for words in context and bilingual text allignment.

This summer, while attending Mediterranean Editors and Translators, I came across a poster for a similar tool called AntConc. Since this is open-source software, I downloaded it and quickly tried it out. Now I’ve been giving a lecturing job, I’m particularly interested in AntConc, since students are much more likely to use a tool if it’s free. I have not looked much into it, but it does seem to be more stable than WordSmith Tools 4.0. However, it is possibly not quite so user friendly, as it does not calculate everything (concordance, collocates, etc) in one go, but rather you have to run a search separately for each tool. So for the moment I’m sticking with WordSmith, but I’ll definitely show my students AntConc as well, and encourage them to download it at home.

I have to say that this is an area that has not been exploited to its full potential. A program like this can’t be particularly hard to make, and if only someone could come up with a really excellent corpus analysis program, I’m sure it would be really successful. Or has someone already come up with one, and I’ve just not found it yet? Please let me know if you know a better tool.

Share:

5 thoughts on “Corpus analysis tools

  1. I am developing an open-source version of WordSmith Tools 4. It is called Tenka Text. The program is still at early development stages but its WordLister is already stable and performs 3x faster than WordSmith Tools 4. It also has a Concordancer which supports both WordSmith Tools-like wildcards and the more advanced regular expressions.

    Please also check the development blog of Tenka Text at
    https://tenkatext.blogspot.com/

    I am looking forward to reading your comments.

    By the way, I had an introductory course to corpus linguistics and was also disappointed by the negative answer of my lecturer to my request that students get copies of the course material for free. I aim to render WordSmith Tools 4 and any possible upcoming versions unnecessary.

  2. Good to see someone just land on my blog, without me asking them to come. Sounds interesting. I’ll have to get round to trying it out.

  3. Hola, Tim,

    et contesto a aquesta anotació del blog molt tard, però és que no hi havia entrat mai i ara ho he fet arrel del teu comentari de la Renfe a l’ATD.
    Sobre eines d’anàlisi textual, aquí en tens unes quantes d’explicades:

    http://liceu.uab.es/~joaquim/language_resources/lang_res/Herram_TecnTex.html

    No sé quines són freeware i quines no, però no deurà costar massa investigar-ho. La pàgina d’en Llisterri, col·lega del dept. de Filologia espanyola de la UAB, és una mina d’enllaços útils.

    També pots investigar a partir d’aquí:
    (link is no longer accessible)

    Hope it helps!

    Laia

Leave a Reply to Laia Cancel reply

Your email address will not be published. Required fields are marked *