Speech Recognition and Text Processing for Language Learning: Improved Pronunciation, Grammar, and Vocabulary

Research areas



Motivation: Good language skills, both orally and in writing, are important to ensure efficiency in communication and eliminate risks of misunderstanding. Improving these skills is even harder for non-native speakers, such as professionals working in their second or third language or children learning to write.

Approach: Electronic environments, such as online eLearning games and apps,  equipped with speech recognition and language proofing, may provide individualised feedback on, for example, how to spell “prospicience”, why “determine” is pronounced as “dɪˈtəːmɪn/” but “mine” as “mʌɪn/”, how to translate the meaning of literal saying “written using a font size as big as a cat” [kirjoittaa kissan kokoisin kirjaimin, FIN], or why we rather have a quick rather than fast shower. When combined with active machine learning, they can assign the exercises that serve personal learning and encouragement needs the best. This will make the training experience more engaging and fun.


The project methodology consists of Artificial Intelligence (AI), Computational Linguistics, Computer Games, Edutainment, eLearning, Learning Apps, Machine Learning (ML), Natural Language Processing (NLP), Software Design, Speech Recognition, User-Computer Interface, either by studying a small set of methods in depth or larger set in combination. Research on state-of-the-art deep, transfer, and active learning methods is called for to minimise the amount of annotated data for method setup whilst maximising the processing correctness and system adaptability. Multi-modal aspects of combining linguistic, musical, and visual content  can be considered. The project is truly interdisciplinary and tightly connected to authentic data, real-life applications, and business internships. Both experimental and theoretical work, in other words, applied and fundamental research, go hand in hand with their emphasis depending on the student’s individual interests and expertise.


This project will appeal to students with excellent skills in experimentation, programming, and teamwork. The preference is on students who have finished/are taking the units of Artificial Intelligence, Document Analysis, and/or Machine Learning in The ANU or similar.

Background Literature

See, for example, the following recent paper:  Suominen H, Zhou L, Hanlen L, Ferraro G. Benchmarking clinical speech recognition and information extraction: New data, methods and evaluations.  JMIR Medical Informatics 2015 3(2), e19. http://medinform.jmir.org/2015/2/e19/


This student project is a part of the activities of the NLP Team within ML Group in The Australian National University (ANU) and Data61 in Canberra, the capital of Australia. The OECD Regional Well-Being Report 2014 evaluated Canberra as the most livable city in the world.

The ML Group has been recently (in 2014) ranked among the top five in the world in ML, the others being Microsoft Research, Max Planck Institute Tübingen, University of Berkeley, and University of Cambridge. According to the QS World University Rankings for 2015-16, The ANU ranks within the top-20 universities globally with the overall score of 91.0 out of 100.0 (19th) whilst the next best Australian university scored 83.1 (42nd) and for the field of research (FOR) code of Artificial Intelligence and Image Processing, applicable to ML and NLP, under Information and Computer Sciences, The ANU has obtained the top 5 out of 5 score in the Excellence in Research for Australia (ERA) evaluations, both in 2010 and 2012. 

The NLP Team is experienced in developing powerful low-cost techniques to free-form text them into structured representations. Our deep and transfer ML methods are able to use less than a hundred expert-annotated sentences to achieve performance comparable to the state-of-the-art systems, initialised with ten times more data. Similarly, our language processing methods have been among the finest elite in the ALTA, CLEF, and TREC shared tasks on automated understanding, use, summarisation, and  translation in difficult genres of “Doctors’ Latin” in electronic health records and “Lawyers’ French” in patents.


Artificial Intelligence, Computational Linguistics, Computer Games, Edutainment, eLearning, Learning Apps, Machine Learning, Natural Language Processing, Software Design, Speech Recognition, User-Computer Interface


Updated:  1 June 2019/Responsible Officer:  Head of School/Page Contact:  CECS Marketing