Classical and modern Arabic corpora
Genre and language change
Our Artificial Intelligence research group at the University of Leeds has collected, analysed and annotated Classical
Arabic corpus resources: the Quranic Arabic Corpus with several layers of linguistic annotation; the
QurAna Quran pronoun anaphoric co-reference corpus; the QurSim Quran verse similarity corpus; the Qurany Quran corpus
annotated with English translations and verse topics; the Boundary-Annotated Quran Corpus; the
Quran Question and Answer Corpus; the Multilingual Hadith Corpus; the
King Saud University Corpus of Classical Arabic; and the Corpus for teaching about Islam. We have
also developed Modern Arabic corpus resources spanning several genres and language types: Arabic By
Computer; the Corpus of Contemporary Arabic; the Arabic Internet
Corpus; the World Wide Arabic Corpus; the Arabic Discourse Treebank; the
Arabic Learner Corpus; the Arabic Children’s Corpus; and the Arabic
Dialect Text Corpus. These corpus resources have informed Arabic corpus linguistics and Artificial
Intelligence research, and development of Arabic text analytics tools.
Article outline
- 1.Classical Arabic corpora for religious education and understanding
- 1.1
Quranic Arabic Corpus
- 1.2QurAna: Quran pronoun anaphoric co-reference corpus
- 1.3QurSim: Quran verse similarity corpus
- 1.4Qurany: Classical Arabic Quran with English translations and verse topics
- 1.5
Boundary-Annotated Quran Corpus
- 1.6
Quran Question and Answer Corpus
- 1.7
Multilingual Hadith Corpus
- 1.8KSUCCA: King Saud University Corpus of Classical Arabic
- 1.9Corpus for teaching about Islam
- 2.Modern Arabic corpora for language teaching, lexicography, and text analytics
- 2.1ABC:
Arabic By Computer
- 2.2CCA:
Corpus of Contemporary Arabic
- 2.3
Arabic Internet Corpus
- 2.4
World Wide Arabic Corpus
- 2.5
Arabic Discourse Treebank
- 2.6
Arabic Learner Corpus
- 2.7
Arabic Children’s Corpus
- 2.8
Arabic Dialect Text Corpus
- 3.Machine learning from the Quran for Modern Arabic text analytics
-
References
References
Abbas, Noorhan Hassan
2009 Quran Ssearch for a concept tool and website. MRes thesis, University of Leeds, UK.
Abbas, Noorhan Hassan & Atwell, Eric
2013 Annotating the Arabic Quran with a classical semantic ontology. Proceedings of
WACL2 Second Workshop on Arabic Corpus Linguistics
. Lancaster, UK.
Abbas, Noorhan Hassan, Aldhubayi, Luluh, Al-Khalifa, Hend, Alqassem, Zainab, Atwell, Eric, Dukes, Kais, Sawalha, Majdi & Sharaf, Muhammad
2013 Unifying linguistic annotations and ontologies for the Arabic Quran. Proceedings of
WACL2 Second Workshop on Arabic Corpus Linguistics
. Lancaster, UK.
Abdelhamid, Yasser, Mahmoud, Mostafa & El-Sakka, Tarek M.
2013 Using ontology for associating Web multimedia resources with the Holy Quran. Proceedings of
Advances in Information Technology for the Holy Quran and Its Sciences
. Medina, Saudi Arabia.

Abdul Razak, Zainur
2011 Modern media Arabic: A study of word frequency in world affairs and sports sections in Arabic
newspapers. PhD thesis, University of Birmingham, UK.
Abed, Qusay Abdullah
2015 Ontology-based Approach for Retrieving Knowledge in Al-Quran. PhD thesis, Universiti Utara Malaysia.
Abushariah, Mohammad, Ainon, Raja, Zainuddin, Roziati, Elshafei, Moustafa & Khalifa, Othman Omran
2012 Arabic speaker-independent continuous automatic speech recognition based on a phonetically rich and
balanced speech corpus.
International Arab Journal of Information Technology 9(1): 84–93.

Abu Shawar, Bayan & Atwell, Eric
2004 An Arabic chatbot giving answers from the Quran. Proceedings of
TALN’2004 Traitement Automatique des Langues Naturelles
. Fez, Morocco.
Abu Shawar, Bayan & Atwell, Eric
Abu Shawar, Bayan & Atwell, Eric
2005b A chatbot system as a tool to animate a corpus.
ICAME Journal: International Computer Archive of Modern and Medieval English Journal 29: 5–24.

Abu Shawar, Bayan & Atwell, Eric
2007 Chatbots: Sind Sie wirklich nützlich? [Chatbots: Are they really useful?].
Journal for Computational Linguistics and Language Technology 22: 31–50.

Abu Shawar, Bayan & Atwell, Eric
2009 Arabic question-answering via instance based learning from an FAQ corpus. Proceedings of
CL2009 Corpus Linguistics
. Liverpool, UK.
Abu Shawar, Bayan & Atwell, Eric
2015 ALICE chatbot: Trials and outputs.
Computacion y Sistemas 19(4): 625–632.

Abu Shawar, Bayan & Atwell, Eric
2016 Usefulness, localizability, humanness, and language-benefit: Additional evaluation criteria for
natural language dialogue systems.
International Journal of Speech Technology 19(2): 373–383.


Affeich, Andree
2011 La métaphore dans le discours technique d’Internet et son passage de l’anglais vers
l’arabe. Proceedings of
JéTou’2011 Journées d’études Toulousaines
. Toulouse, France.
Ahmad, Nor Diana, Bennett, Brandon & Atwell, Eric
2017 Retrieval performance for Malay Quran.
International Journal on Islamic Applications in Computer Science and Technology 5(2): 13–25.

Ahmed, Saad, Hina, Saman, Atwell, Eric & Ahmed, Farrakh
2017 Aspect based sentiment analysis framework using data from social media network.
International Journal of Computer Science and Network Security 17(7): 100–105

Alasmari, Jawharah, Watson, Janet C. E. & Atwell, Eric
2016 A comparative analysis of the Arabic and English verb systems using a Quranic Arabic
Corpus. Proceedings of
IMAN’2016 Islamic Applications in Computer Science and Technologies
. Khartoum, Sudan.
Alasmari, Jawharah, Watson, Janet C. E. & Atwell, Eric
2017 Using the Quranic Arabic Corpus for comparative analysis of the Arabic and English verb
systems.
International Journal on Islamic Applications in Computer Science And Technology 5(3): 1–8.

Alfaifi, Abdullah & Atwell, Eric
2012 Arabic Learner Corpora (ALC): A taxonomy of coding errors. Proceedings of
ICCA’2012 International Computing Conference in Arabic
. Cairo, Egypt.
Alfaifi, Abdullah & Atwell, Eric
2013a Arabic Learner Corpus v1: A new resource for Arabic language research. Proceedings of
WACL’2 Second Workshop on Arabic Corpus Linguistics
. Lancaster, UK.
Alfaifi, Abdullah & Atwell, Eric
2013b Arabic Learner Corpus: Texts transcription and files format. Proceedings of
CORPORA’2013 International Conference on Corpus Linguistics
. St Petersburg, Russia.
Alfaifi, Abdullah, Atwell, Eric & Abuhakema, Ghazi
2013 Error Annotation of the Arabic Learner Corpus: A new error tagset. Proceedings of
GSCL’2013 German Society for Computational Linguistics: Language Processing and Knowledge
in the Web
. Darmstadt, Germany.

Alfaifi, Abdullah & Atwell, Eric
2014a Tools for searching and analysing Arabic corpora: An evaluation study.
Proceedings BAAL-CUP’2014 British Association for Applied Linguistics and Cambridge
University Press Applied Linguistics Conference
. Leeds, UK.
Alfaifi, Abdullah & Atwell, Eric
2014b An evaluation of the Arabic error tagset v2. Proceedings of
AACL’2014 American Association for Corpus Linguistics
. Flagstaff, USA.
Alfaifi, Abdullah & Atwell, Eric
2015 Computer-aided error annotation: A new tool for annotating Arabic error. Proceedings of
UK Saudi Students Conference
. London, UK.
Alfaifi, Abdullah & Atwell, Eric
2016 Comparative evaluation of tools for Arabic corpora search and analysis.
International Journal of Speech Technology 19(2): 347–357.


Alghamdi, Ayman, Atwell, Eric & Brierley, Claire
2016 An empirical study of Arabic formulaic sequence extraction methods. Proceedings of
LREC’2016 Language Resources and Evaluation Conference
. Portorož, Slovenia.
Alghamdi, Ayman & Atwell, Eric
2017 Towards comprehensive computational representations of Arabic multi-word
expressions. Proceedings of
EUROPHRAS’2017 European Conference on Computational and Corpus-Based Phraseology
. London, UK.
Al-Haidari, Fahd, Gutub, Adnan, Al-Kahsah, Khalid & Hamodi, Jameel
2009 Improving security and capacity for Arabic text steganography using Kashida
extensions. Proceedings of
CSA’2009 Computer Systems and Applications
. Jeju, Korea.

Ali, Imran
2012 Application of a mining algorithm to finding frequent patterns in a text corpus: A case study of
Arabic.
International Journal of Software Engineering and Its Applications 6(3): 127–134.

Al-Khalifa, Hend, Al-Yahya, Maha, Bahanshal, Alia, Al-Odah, Iman & Al-Helwah, Nawal
2010 An approach to compare two ontological models for representing Quranic words. Proceedings of the
12th International Conference on Information Integration and Web-based Applications and
Services
. Paris, France.

Almaayah, Manal, Sawalha, Mohammad A. & Abushariah, Majdi
2014 A proposed model for Quranic Arabic WordNet. Proceedings of
LRE-REL2 2nd workshop on Language Resources and Evaluation for Religious
Texts. Reykjavik, Iceland.
Almaayah, Manal, Sawalha, Mohammad A. & Abushariah, Majdi
2016 Towards an automatic extraction of synonyms for Quranic Arabic WordNet.
International Journal of Speech Technology 19(2): 177–189.


Alosaimy, Abdulrahman & Atwell, Eric
2017a Joint alignment of segmentation and labelling for Arabic morphosyntactic taggers.
International Journal of Computational Linguistics 8(2): 45–58.

Alosaimy, Abdulrahman & Atwell, Eric
2017b Tagging classical Arabic text using available morphological analysers and part of speech
taggers.
Journal for Language Technology and Computational Linguistics 32(1): 1–26.

Alosaimy, Abdulrahman & Atwell, Eric
2018a Diacritization of a highly cited text: A classical Arabic book as a case. Proceedings of
ASAR’2018 Arabic Script Analysis and Recognition
. London, UK.
Alosaimy, Abdulrahman & Atwell, Eric
2018b Web-based annotation tool for inflectional language resources. Proceedings of
LREC 2018 Language Resources and Evaluation Conference
. Miyazaki, Japan.
Alqahtani, Mohammad & Atwell, Eric
2016 Arabic Quranic search tool based on ontology. Proceedings of
NLDB’2016 Natural Language and Information Systems
. Salford, UK.

Alqahtani, Mohammad & Atwell, Eric
2017 Evaluation criteria for computational Quran search.
International Journal on Islamic Applications in Computer Science and Technology 5(1): 12–22.

Alqahtani, Mohammad & Atwell, Eric
2018 Developing bilingual Arabic-English ontologies of Al-Quran. Proceedings of
ASAR’2018 Arabic Script Analysis and Recognition
. London, UK.

Alqassem, Zainab
2013 Unifying Quranic analyses into a single database. BSc Research Project Report, School of Computing, University of Leeds, UK.
Alqurneh, Ahmed, Mustapha, Aida, Murad, Masrah & Sharef, Nurfahdlina
2014 Stylometric model for detecting oath expressions: A case study for Quranic texts.
Literary and Linguistic Computing Journal 31(1): 1–20.

Alrabiah, Maha, Al-Salman, AbdulMalik & Atwell, Eric
2013 The design and construction of the 50 million words KSUCCA King Saud University Corpus of Classical
Arabic. Proceedings of
WACL’2 Second Workshop on Arabic Corpus Linguistics
. Lancaster, UK.
Alrabiah, Maha, Al-Salman, AbdulMalik, Atwell, Eric & Alhelewh, Nawal
2014a KSUCCA: A key to exploring Arabic historical linguistics.
International Journal of Computational Linguistics 5: 27–36.

Alrabiah, Maha, Alhelewh, Nawal, Al-Salman, AbdulMalik & Atwell, Eric
2014b An empirical study on the holy Quran based on a large classical Arabic corpus.
International Journal of Computational Linguistics 5: 1–13.

Alrehaili, Sameer & Atwell, Eric
2013 Linguistics features to confirm the chronological order of the Quran. Proceedings of
WACL’2 Second Workshop on Arabic Corpus Linguistics
. Lancaster, UK.
Alrehaili, Sameer & Atwell, Eric
2014 Computational ontologies for semantic tagging of the Quran. Proceedings of
LRE-Rel2 2nd Workshop on Language Resource and Evaluation for Religious Texts
. Reykjavik, Iceland.
Alrehaili, Sameer & Atwell, Eric
2016 A hybrid-based term extraction method on the Arabic text of the Quran. Proceedings of
IMAN’2016 Islamic Applications in Computer Science and Technologies
. Khartoum, Sudan.
Alrehaili, Sameer & Atwell, Eric
2017 Extraction of multi-word terms and complex terms from the classical Arabic text of the
Quran.
International Journal on Islamic Applications in Computer Science and Technology 5(3): 15–27.

Alrehaili, Sameer & Atwell, Eric
2018 Discovering Qur’anic knowledge through AQD: Arabic Qur’anic Database, a multiple resources
annotation-level search. Proceedings of
ASAR’2018 Arabic Script Analysis and Recognition
. London, UK.
Alrehaili, Sameer, Alqahtani, Mohamad & Atwell, Eric
2018 A hybrid method of aligning Arabic Qur’anic semantic resources. Proceedings of
ASAR’2018 Arabic Script Analysis and Recognition
. London, UK.

Alromima, Waseem, Elgohary, Rania, Moawad, Ibrahim F. & Aref, Mostafa
2015 Applying ontological engineering approach for Arabic Quran corpus: A comprehensive
survey. Proceedings of
ICICIS’2015 International Conference on Intelligent Computing and Information
Systems
. Cairo, Egypt.
Alruily, Meshrif
2012 Using text mining to identify crime patterns from Arabic Crime News Report Corpus. PhD dissertation, De Montford University, UK.
Al-Saif, Amal & Markert, Katja
2010 The Leeds Arabic Discourse Treebank: Annotating discourse connectives for Arabic. Proceedings of
LREC’2010: Language Resources and Evaluation Conference
. Valletta, Malta.
Al-Saleh, Asma Bader & Menai, Mohammad El Bachir
2016 Automatic Arabic text summarization: A survey.
Artificial Intelligence Review 45(2): 203–234.


Alshutayri, Areej, Atwell, Eric, Alosaimy, Abdulrahman, Dickins, James, Ingleby, Michael & Watson, Janet
2016 Arabic language WEKA-based dialect classifier for Arabic automatic speech recognition
transcripts. Proceedings of
VarDial’2016 Third Workshop on NLP for Similar Languages, Varieties and Dialects
. Osaka, Japan.
Alshutayri, Areej & Atwell, Eric
2017 Exploring twitter as a source of an Arabic dialect corpus.
International Journal of Computational Linguistics 8(2): 37–44.

Alshutayri, Areej & Atwell, Eric
2018a Creating an Arabic dialect text corpus by exploring Twitter, Facebook, and online
newspapers. Proceedings of
OSACT’2018 Open-Source Arabic Corpora and Processing Tools
. Miyazaki, Japan.
Alshutayri, Areej & Atwell, Eric
2018b A social media corpus of Arabic dialect text. In
Computer-Mediated Communication and Social Media Corpora,
Ciara R. Wigham &
Egon Stemle (eds). Clermont-Ferrand: Presses Universitaires Blaise Pascal.

Al-Sulaiti, Latifa & Atwell, Eric
2005 Extending the Corpus of Contemporary Arabic. Proceedings of
CL’2005 Corpus Linguistics
. Birmingham, UK.
Al-Sulaiti, Latifa, Roberts, Andrew & Atwell, Eric
2005 The use of corpora and concordance in the teaching of contemporary Arabic. Proceedings of
EuroCALL’2005 European conference on Computer Assisted Language Learning
. Krakow, Poland.
Al-Sulaiti, Latifa & Atwell, Eric
Al-Sulaiti, Latifa, Roberts, Andrew, Abu Shawar, Bayan & Atwell, Eric
2007 The use of corpus, concordancer and chatbot in the teaching of contemporary Arabic. Proceedings of
CL’2007 Corpus Linguistics
. Birmingham, UK.
Al-Sulaiti, Latifa, Abbas, Noorhan, Brierley, Claire, Atwell, Eric & Alghamdi, Ayman
2016 Compilation of an Arabic Children’s Corpus. Proceedings of
LREC’2016 Language Resources and Evaluation Conference
. Portorož, Slovenia.
Altoum, S. & Atwell, Eric
2016 Compilation of an Islamic Hadith Corpus (تجمع مدونة الحديث النبوي الشريف). Proceedings of
ICCA’2016 International Conference on Computing in Arabic
. Khartoum, Sudan.
Aly, Walid Mohamed & Kelleny, Hany Atef
2014 Adaptation of cuckoo search for documents clustering.
International Journal of Computer Applications 86(1): 4–10.


Attia, Mohammed, Pecina, Pavel, Tounsi, Lamia, Toral, Antonio & Van Genabith, Josef
2011 Lexical profiling for Arabic. Proceedings of
eLex’2011 Electronic Lexicography in the 21st Century
. Bled, Slovenia.
Atwell, Eric
1982 LOB Corpus Tagging Project: Manual Postedit Handbook. University of Lancaster, UK.

Atwell, Eric
1987a A parsing expert system which learns from corpus analysis. In
Corpus. Linguistics and Beyond: Proceedings of the ICAME 7th International Conference on English Language.
Research on Computerised Corpora,
Willem Meijs (ed.), 227–235. Amsterdam: Rodopi,

Atwell, Eric
1987b How to detect grammatical errors in a text without parsing it. Proceedings of
EACL’1987 Third Conference of the European Chapter of the Association for Computational
Linguistics
. Copenhagen, Denmark.

Atwell, Eric & Drakos, Nicos
1987 Pattern recognition applied to the acquisition of a classification system from unrestricted English
text. Proceedings of
EACL’1987 Third Conference of the European Chapter of the Association for Computational
Linguistics
. Copenhagen, Denmark.

Atwell, Eric
1993a The HEFC’s knowledge based systems initiative.
Artificial Intelligence and Simulation of Behaviour Quarterly 83: 29–34.

Atwell, Eric
(ed.) 1993b Knowledge at Work in Universities - Proceedings of the second annual conference of the Higher Education
Funding Council’s Knowledge Based Systems Initiative. Leeds: Leeds University Press.

Atwell, Eric
1996 Machine learning from corpus resources for speech and handwriting recognition. In
Using Corpora for Language Research: Studies in Honour of Geoffrey Leech,
Jenny Thomas &
Mick Short (eds), 151–166. London: Longman.

Atwell, Eric
1999 The Language Machine. London: The British Council.

Atwell, Eric, Howarth, Peter, Souter, Clive, Baldo, Patrizio, Bisiani, Roberto, Bonaventura, Patrizia, Menzel, Wolfgang, Herron, Daniel, Morton, Rachel & Wick, Juergen
2000 User-guided system development in Interactive Spoken Language Education.
Natural Language Engineering Journal 6(3-4): 229–241.


Atwell, Eric, Al-Sulaiti, Latifa, Al-Osaimi, Saleh & Abu Shawar, Bayan
2004 Un examen d’outils pour l’analyse de corpus arabes: A review of Arabic corpus analysis
tools. Proceedings of
TALN‘2004 Traitement Automatique des Langues Naturelles
. Fez, Morocco.
Atwell, Eric
2005 Sleeping with the enemy: Infiltrating AI into the broader curriculum. Proceedings of
1st UK Workshop on Artificial Intelligence in Education
. Cambridge, UK.
Atwell, Eric, Arshad, Junaid, Lai, Chien-Ming, Nim, Lan, Rezapour Asheghi, Noushin, Wang, Josiah & Washtell, Justin
2007 Which English dominates the World Wide Web, British or American? Proceedings of
CL’2007 Corpus Linguistics
. Birmingham, UK.
Atwell, Eric
2008 Development of tag sets for part-of-speech tagging. In
Corpus Linguistics: An International Handbook,
Anke Lüdeling &
Merja Kytö (eds), 501–526. Berlin: Mouton de Gruyter.

Atwell, Eric, Abbas, Noorhan, Abu Shawar, Bayan, Alsaif, Amal, Al-Sulaiti, Latifa, Roberts, Andrew & Sawalha, Majdi
2008 Mapping Middle Eastern and North African diasporas. Proceedings of
BRISMES’2008 British Society for Middle Eastern Studies
. Leeds, UK.
Atwell, Eric, Al-Sulaiti, Latifa & Sharoff, Serge
2009 Arabic and Arab English in the Arab world. Proceedings of
CL2009 Corpus Linguistics
. Liverpool, UK.
Atwell, Eric, Dukes, Kais, Sharaf, Abdul Baquee, Habash, Nizar, Louw, Bill, Abu Shawar, Bayan, McEnery, Tony, Zaghouani, Wajdi & El-Haj, Mahmoud
2010 Understanding the Quran: A new Grand Challenge for Computer Science and Artificial
Intelligence. Proceedings of
GCCR’2010 Grand Challenges in Computing Research
. Edinburgh, Scotland, UK.
Atwell, Eric
2011 Exploiting new technology and innovation for detecting terrorist activities.
Counter Terror Expo
. London, UK.
Atwell, Eric, Brierley, Claire, Dukes, Kais, Sawalha, Majdi & Sharaf, Abdul Baquee
2011 An artificial intelligence approach to Arabic and Islamic content on the Internet.
Proceedings of NITS’2011 National Information Technology Symposium
. Riyadh, Saudi Arabia.
Atwell, Eric & Alfaifi, Abdullah
2015 أبحاث جامعة ليدز في مجال لسانيات المدونات العربية (Arabic corpus linguistics research at the University of Leeds). In
Arabic Language and Computing,
Ysi Elarian (ed.). Riyadh: King Abdullah bin Abdulaziz International Center for Arabic Language Service.

Atwell, Eric
2018 Using the Web to model Modern and Quranic Arabic. In
Arabic Corpus Linguistics,
Tony McEnery,
Adrew Hardie &
Younis Nagwa Ibrahim Abdel-Fattah (eds). Edinburgh: Edinburgh University Press.

Bakari, Wided, Bellot, Patrice & Neji, Mahmoud
2015 Literature review of Arabic question-answering: Modeling, generation, experimentation and
performance analysis. Proceedings of
FQAS’2015 Flexible Query Answering Systems. Krakow, Poland.
Bannister, Andrew G.
2014 An Oral-Formulaic Study of the Quran. Lanham MD: Lexington Books.

Baqai, Sumayya, Basharat, Amna, Khalid, Hira, Hassan, Amna & Zafar, Shehneela
2009 Leveraging semantic web technologies for standardized knowledge modeling and retrieval from the
Holy Qur’an and religious texts. Proceedings of the
7th International Conference on Frontiers of Information Technology
. Abbottabad, Pakistan.

Baroni, Maco & Bernardini, Silvia
2004 BootCaT: Bootstrapping corpora and terms from the web. Proceedings of
LREC’2004 Language Resources and Evaluation Conference
. Lisbon, Portugal.
Basharat, Asma, Yasdansepas, D. & Rasheed, Khaled
2015 Comparative study of verse similarity for multi-lingual representations of the
Quran. Proceedings of
ICAI’2015 International Conference on Artificial Intelligence
. Las Vegas, USA.
Bentrcia, Rahima, Zidat, Samir & Marir, Farhi
2017 Extracting semantic relations from the Quranic Arabic based on Arabic conjunctive
patterns.
Journal of King Saud University Computer and Information Sciences.

Bijankhan, Mahmood, Sheykhzadegan, Javad, Bahrani, Mohammad & Ghayoomi, Masood
2011 Lessons from building a Persian written corpus: Peykare.
Language Resources and Evaluation Journal 45(2): 143–164.


Brierley, Claire, Sawalha, Majdi & Atwell, Eric
2012a Open-source boundary-annotated corpus for Arabic speech and language processing. Proceedings of
LREC’2012 Language Resources and Evaluation Conference
. Istanbul, Turkey.
Brierley, Claire, Sawalha, Majdi & Atwell, Eric
2012b Boundary Annotated Quran Corpus for Arabic phrase break prediction. Proceedings of
IVACS’2012 Inter-Varietal Applied Corpus Studies
. Cambridge, UK.
Brierley, Claire, Sawalha, Majdi & Atwell, Eric
2012c Visualisation of prosody in English and Arabic speech corpora. Proceedings of
AVML’2012 Advances in Visual Methods for Linguistics
. York, UK.
Brierley, Claire, Atwell, Eric, Rowland, Chris & Anderson, John
2013 Semantic pathways: A novel visualization of varieties of English.
ICAME Journal of the International Computer Archive of Modern and medieval English 37: 5–36.

Brierley, Claire, Sawalha, Majdi & Atwell, Eric
2014 Tools for Arabic Natural Language Processing: A case study in qalqalah prosody. Proceedings of
LREC’2014 Language Resources and Evaluation Conference
. Reykjavik, Iceland.
Brierley, Claire, Sawalha, Majdi, Heselwood, Barry & Atwell, Eric
2016 A verified Arabic-IPA mapping for Arabic transcription technology, informed by Quranic recitation,
traditional Arabic linguistics, and modern phonetics.
Journal of Semitic Studies 61(1): 157–186.


Brockett, Adrian, Atwell, Eric, Taylor, Owen & Page, Matthew
1989 An Arabic text database and glossary system for students. Proceedings of the
Seminar on Bilingual Computing in Arabic and English
. Cambridge, UK.
Chelli, Assem
2012 Advanced Search/Indexing in Holy Quran. Magister Thesis, National Higher School of Computer Science, Algeria.
Clivaz, Claire
2013 Digital religion out of the book: The loss of the illusion of the ‘original text’ and the notion of
a ‘religion of a book’.
Scripta Journal 25: 26–41.


Dukes, Kais & Habash, Nizar
2010 Morphological annotation of Quranic Arabic. Proceedings of
LREC’2010 Language Resources and Evaluation Conference
. Valletta, Malta.
Dukes, Kais & Buckwalter, Tim
2010 A dependency treebank of the Quran using traditional Arabic grammar. Proceedings of
INFOS’2010 7th Informatics and Systems conference
. Cairo, Egypt.
Dukes, Kais, Atwell, Eric & Sharaf, Abdul Baquee
2010 Syntactic annotation guidelines for the Quranic Arabic dependency treebank. Proceedings of
LREC’2010 Language Resources and Evaluation Conference
. Valletta, Malta.
Dukes, Kais & Atwell, Eric
2012 LAMP: A multimodal web platform for collaborative linguistic analysis. Proceedings of
LREC’2012 Language Resources and Evaluation Conference
. Istanbul, Turkey.
Dukes, K., Atwell, Eric & Habash, Nizar
2013 Supervised collaboration for syntactic annotation of Quranic Arabic.
Language Resources and Evaluation Journal 47: 33–62.


El-Beltagy, Samhaa & Ali, Ahmed
2013 Open issues in the sentiment analysis of Arabic social media: A case study. Proceedings of
IIT’2013 Innovations in Information Technology Conference
. Abu Dhabi, United Arab Emirates.
El-Haj, Mahmoud, Kruschwitz, Udo & Fox, Chris
2015 Creating language resources for under-resourced languages: Methodologies, and experiments with
Arabic.
Language Resources and Evaluation Journal 49(3): 549–580.


El Hadj, Yahja Ould Mohamed, Al-Sughayeir, Imad Abdulrahman & Al-Ansari, Abdullah Mahdi
2009 Arabic part-of-speech tagging using the sentence structure. Proceedings of the
Second International Conference on Arabic Language Resources and Tools
. Cairo, Egypt.
Erradi, Abdelkarim, Nahia, Sajeda, Almerekhi, Hind & Al-Kailani, Lubna
2012 ArabicTutor: A multimedia m-learning platform for learning Arabic spelling and
vocabulary. Proceedings of
ICMCS’2012 International Conference on Multimedia Computing and Systems
. Tangier, Morocco.
Friginal, Eric & Hardy, Jack A.
2014 Corpus-based Sociolinguistics: A Guide for Students. London: Routledge.

Froud, Hahane, Benslimane, R., Lachkar, Abdelmonaime & Ouatik, Said Alaoui
2010 Stemming and similarity measures for Arabic documents clustering.
Proceedings of ISVC’2010 5th International Symposium on
I/V Communications
. Rabat, Morocco.

Froud, Hahane, Lachkar, Abdelmonaime & Ouatik, Said Alaoui
2013 Arabic text summarization based on latent semantic analysis to enhance Arabic documents
clustering.
International Journal of Data Mining and Knowledge Management Process 3(1): 79–95.


Gehrels, Sjoerd
2016 Liquid hospitality: Wine as the metaphor. In
The Routledge Handbook of Hospitality Studies,
Conrad Lashley (ed.), 247–259. London: Routledge.

Haider, Ahmad S.
2016 A Corpus-assisted Critical Discourse Analysis of the Arab Uprisings: Evidence from the Libyan
Case. PhD dissertation, University of Canterbury, New Zealand.
Hakkoum, Aimad & Raghay, Said
2015a Ontological approach for semantic modeling and querying the Quran.
International Journal on Islamic Applications in Computer Science And Technology 4(1): 37–45.

Hakkoum, Aimad & Raghay, Said
2015b Advanced search in the Quran using semantic modeling. Proceedings of
AICCSA’2015 Arab International Conference on Computer Systems and Applications
. Marrakech, Morocco.
Hamdelsayed, Mohamed Adany & Atwell, Eric
2016a Islamic applications of automatic question-answering.
Journal of Engineering and Computer Science 17(2): 51–57.

Hamdelsayed, Mohamed Adany & Atwell, Eric
2016b Using Arabic numbers (singular, dual, and plurals) patterns to enhance question answering system
results. Proceedings of
IMAN’2016 Islamic Applications in Computer Science and Technologies
. Khartoum, Sudan.
Hamdelsayed, Mohamed Adany & Atwell, Eric
2017 Quran question answering system using Arabic number patterns (singular, dual,
plural).
International Journal on Islamic Applications in Computer Science and Technology 5(2): 1–12.

Hammo, Bassam, Yagi, Sane, Ismail, Omaima & Abushariah, Mohammad
2016 Exploring and exploiting a historical corpus for Arabic.
Language Resources and Evaluation Journal 50(4): 839–861.


Hamoud, Bothaina & Atwell, Eric
2016a Using an islamic question and answer knowledge base to answer questions about the Holy
Quran.
International Journal on Islamic Applications in Computer Science And Technology 4 (4): 20–29.

Hamoud, Bothaina & Atwell, Eric
2016b Quran question and answer corpus for data mining with WEKA. Proceedings of
IEEE Conference of Basic Sciences and Engineering Studies
. Khartoum, Sudan.

Hamoud, Bothaina & Atwell, Eric
2016c Compiling a Quran Question and Answer Corpus. تجميع مدونة اسئلة واجوبة للقرآن الكر. Proceedings of
ICCA’2016 International Conference on Computing in Arabic
. Khartoum, Sudan.
Hassan, Haslina, Daud, Nuraihan Mat & Atwell, Eric
2010 Connectives in the World Wide Arabic corpus. Proceedings of
IVACS’2010 Inter-Varietal Applied Corpus Studies
. Leeds, UK.
Hassan, Haslina, Daud, Nuraihan Mat & Atwell, Eric
2013 Connectives in the World Wide Web Arabic corpus.
World Applied Sciences Journal (Special Issue of Studies in Language Teaching and Learning) 21: 67–72.

Hassan, Samah & Atwell, Eric
2016a Concept search tool for multilingual Hadith corpus.
International Journal of Science and Research 5(4): 1326–1328.

Hassan, Samah & Atwell, Eric
2016b Design requirements for multilingual Hadith corpus.
International Journal of Science and Research 5(4): 494–498.

Hassan, Samah & Atwell, Eric
2016c Design and implementing of multilingual Hadith corpus.
International Journal of Recent Research in Social Sciences and Humanities 3(2): 100–104.

Herron, Daniel, Menzel, Wolfgang, Atwell, Eric, Bisiani, Roberto, Daneluzzi, Fabio, Morton, Rachel & Schmidt, Juergen A.
1999 Automatic localization and diagnosis of pronunciation errors for second-language learners of
English. Proceedings of
EUROSPEECH’1999 Sixth European Conference on Speech Communication and Technology
. Budapest, Hungary.
Hughes, John & Atwell, Eric
1994 The automated evaluation of inferred word classifications. Proceedings of
ECAI-1994 11th European Conference on Artificial Intelligence
. Amsterdam, The Netherlands.
Ibrahim, Eiman, Ataelfadiel, Mohammed & Atwell, Eric
2017 Provisions of Quran Tajweed ontology.
International Journal of Science and Research 6(8): 756–761.

Itani, Maher, Roast, Chris & Al-Khayatt, Samir
2017 Corpora for sentiment analysis of Arabic text in social media. Proceedings of
ICICS’2017 IEEE International Conference on Information and Communication
Systems
. Irbid, Jordan.

Jarrar, Mustafa, Habash, Nizar, Alrimawi, Faeq, Akra, Diyam & Zalmout, Nasser
2017 Curras: an annotated corpus for the Palestinian Arabic dialect.
Language Resources and Evaluation Journal 51(3): 745–775.


Jilani, Aisha
2013 Parallel Corpus Multi Stream Question Answering with Applications to the Quran. PhD dissertation, University of Huddersfield, UK.
Johansson, Stig, Atwell, Eric, Garside, Roger & Leech, Geoffrey
1986 The Tagged LOB Corpus - User Manual. Bergen: Norwegian Computing Centre for the Humanities.

Kadir, Rabiah A. & Yauri, Aliyu Rufai
2017 Automated semantic query formulation using machine learning approach.
Journal of Theoretical and Applied Information Technology 95(12): 2761–2775.

Khaliq, Bilal & Carroll, John
2013 Induction of root and pattern lexicon for unsupervised morphological analysis of
Arabic. Proceedings of
IJCNLP’2013 International Joint Conference on Natural Language Processing
. Nagoya, Japan.
Kilgarriff, Adam, Baisa, Vit, Bušta, Jan, Jakubíček, Miloš, Kovář, Vojtěch, Michelfeit, Jan, Rychlý, Pavel & Suchomel, Vit
2014a The Sketch Engine: Ten years on.
Lexicography Journal 1(1): 7–36.


Kilgarriff, Adam, Charalabopoulou, Frieda, Gavrilidou, Maria, Johannessen, Janne Bondi, Khalil, Saussan, Johansson, Sofie, Lew, Robert, Sharoff, Serge, Vadlapudi, Ravikiran & Volodina, Elena
2014b Corpus-based vocabulary lists for language learners for nine languages.
Language Resources and Evaluation Journal 48(1): 121–163.


Leech, Geoffrey, Garside, Roger & Atwell, Eric
1983a Recent developments in the use of computer corpora in English language research.
Transactions of the Philological Society 1983: 23–40.


Leech, Geoffrey, Garside, Roger & Atwell, Eric
1983b The automatic grammatical tagging of the LOB Corpus.
ICAME Journal: International Computer Archive of Modern and medieval English Journal 7: 13–33.

Mahmoud, Mostafa & Hassan, Iman
2013 Artificial intelligence techniques for extracting individual recitation of the Holy Quran from its
combinations. Proceedings of
Advances in Information Technology for the Holy Quran and Its Sciences
. Medina, Saudi Arabia.

Makhambetov, Olzhas, Makazhanov, Aibek, Yessenbayev, Zhandos, Matkarimov, Bakhyt, Sabyrgaliyev, Islam & Sharafudinov, Anuar
2013 Assembling the Kazakh Language Corpus. Proceedings of
EMNLP’2013 Empirical Methods in Natural Language Processing
. Seattle, USA.
Malmasi, Shervin & Dras, Mark
2014 Arabic native language identification. Proceedings of
EMNLP 2014 Empirical Methods in Natural Language Processing Workshop on Arabic Natural
Language
. Doha, Qatar.

Malmasi, Shervin & Dras, Mark
2015 Multilingual native language identification.
Natural Language Engineering Journal 23(2):163–215.


Merakchi, Khadidja & Rogers, Margaret
2013 The translation of culturally bound metaphors in the genre of popular science articles: A
corpus-based case study from Scientific American translated into Arabic.
Intercultural Pragmatics Journal 10(2): 341–372.

Menzel, Wolfgang, Atwell, Eric, Bonaventura, Patrizia, Herron, Daniel, Howarth, Peter, Morton, Rachel & Souter, Clive
2000 The ISLE corpus of non-native spoken English. Proceedings of
LREC’2000 Language Resources and Evaluation Conference
. Athens, Greece.
Mohammed, Mona Ali Mohammed & Omar, Nazlia
2011 Rule based shallow parser for Arabic language.
Journal of Computer Science 7(10): 1505–1514.


Mohamed, Reham, Ragab, Maha, Abdelnasser, Heba, El-Makky, Nagwa & Torki, Marwan
2015 Al-Bayan: A knowledge-based system for Arabic answer selection. Proceedings of
SemEval’2015 Workshop on Semantic Evaluation
. Denver, USA.
Mohit, Behrang, Rozovskaya, Alla, Habash, Nizar, Zaghouani, Wajdi & Obeid, Ossama
2014 The first QALB shared task on automatic text correction for Arabic. Proceedings of the
EMNLP 2014 Empirical Methods in Natural Language Processing Workshop on Arabic Natural
Language
. Doha, Qatar.

Muhammad, Abdul Baquee
2012 Annotation of conceptual co-reference and Text Mining the Quran. PhD dissertation, University of Leeds, UK.
Mukhtar, Tayyeba, Afzal, Hammad & Majeed, Awais
2012 Vocabulary of Quranic concepts: A semi-automatically created terminology of Holy
Quran. Proceedings of
INMIC’2012 International Multitopic Conference
. Islamabad, Pakistan.
Onyenwe, Ikechukwu
2017 Developing methods and resources for automated processing of the African Language
Igbo. PhD dissertation, University of Sheffield, UK.
Ouda, Karim
2015 QuranAnalysis: A semantic search and intelligence system for the Quran. MSc thesis, University of Leeds, UK.
Panju, Maysum H.
2014 Statistical Extraction and visualization of topics in the Quran Corpus. MMath thesis, University of Waterloo, Canada.
Rabiee, Hajder S.
2011 Adapting standard open-source resources to tagging a morphologically rich language: A case study
with Arabic. Proceedings of
RANLP’2011 Recent Advances in Natural Language Processing
. Hissar, Bulgaria.
Roberts, Andrew, Al-Sulaiti, Latifa & Atwell, Eric
2005 aConCorde: towards a proper concordance of Arabic. Proceedings of
CL’2005 Corpus Linguistics
. Birmingham, UK.
Roberts, Andrew, Al-Sulaiti, Latifa & Atwell, Eric
2006 aConCorde: Towards an open-source, extendable concordancer for Arabic.
Corpora Journal 1: 39–57.


Romli, Taj, Hassan, Abd Rauf & Mohamad, Hasnah
2016 Equivalent Malay-Arabic data corpus collection.
European Journal of Language and Literature Studies 4(1): 65–73.


Saad, Motaz K. & Ashour, Wesam
2010 Arabic text classification using decision trees. Proceedings of
CSIT’2010 12th international workshop on Computer Science and Information
Technologies
. Moscow and Saint-Petersburg, Russia.
Saad, Saidah, Salim, Naomie & Zainuddin, Suhaila
2011 An early stage of knowledge acquisition based on Quranic text. Proceedings of
STAIR’2011 Semantic Technology and Information Retrieval
. Putrajaya, Malaysia.

Saad, Saidah, Salim, Naomie & Zainal, Hakim
2013 Rules and natural language pattern in extracting Quranic knowledge. Proceedings of
Advances in Information Technology for the Holy Quran and Its Sciences
. Medina, Saudi Arabia.

Sawalha, Majdi & Atwell, Eric
2008 Comparative evaluation of Arabic language morphological analysers and stemmers. Proceedings of
COLING’2008 Computational Linguistics
. Manchester, UK.
Sawalha, Majdi & Atwell, Eric
2009 Linguistically informed and corpus informed morphological analysis of Arabic. Proceedings of
CL’2009 Corpus Linguistics
. Liverpool, UK.
Sawalha, Majdi & Atwell, Eric
2010a Fine-grain morphological analyzer and part-of-speech tagger for Arabic text. Proceedings of
LREC’2010 Language Resources and Evaluation Conference
. Valletta, Malta.
Sawalha, Majdi & Atwell, Eric
2010b Constructing and using broad-coverage lexical resource for enhancing morphological analysis of
Arabic. Proceedings of
LREC’2010 Language Resources and Evaluation Conference
. Valletta, Malta.
Sawalha, Majdi & Atwell, Eric
2011 Morphological analysis of classical and modern standard Arabic. Proceedings of
ICCA’2011 International Computing Conference in Arabic
. Riyadh, Saudi Arabia.
Sawalha, Majdi & Atwell, Eric
2012 Visualization of Arabic morphology. Proceedings of
AVML’2012 Advances in Visual Methods for Linguistics
. York, UK.
Sawalha, Majdi, Brierley, Claire & Atwell, Eric
2012a Prosody prediction for Arabic via the open-source boundary-annotated Qur’an corpus.
Journal of Speech Sciences 2: 175–191.

Sawalha, Majdi, Brierley, Claire & Atwell, Eric
2012b Automatic analysis of phrase-break prediction for Arabic التحليل الآلي للوقف والابتداء في نصوص
اللغة العربية الحديثة والكلاسيكية. Proceedings of ICCA’2012 International Computing Conference in Arabic. Cairo, Egypt.
Sawalha, Majdi, Brierley, Claire & Atwell, Eric
2012c Predicting phrase breaks in classical and modern standard Arabic text. Proceedings of
LREC’2012 Language Resources and Evaluation Conference
. Istanbul, Turkey.
Sawalha, Majdi & Atwell, Eric
2013a Accelerating the processing of large corpora: using grid computing for lemmatizing the 176 million
words Arabic Internet Corpus. Proceedings of
WACL’2 2nd Workshop of Arabic Corpus Linguistics
. Lancaster, UK.
Sawalha, Majdi & Atwell, Eric
2013b A standard tag set expounding traditional morphological features for Arabic language part-of-speech
tagging.
Word Structure Journal 6: 43–99.


Sawalha, Majdi & Atwell, Eric
2013c Comparing morphological tag-sets for Arabic and English. Proceedings of
CL’2013 Corpus Linguistics
. Lancaster, UK.
Sawalha, Majdi, Atwell, Eric & Abushariah Mohammad
2013 SALMA: Standard Arabic Language Morphological Analysis. Proceedings of
ICCSPA’2013 International Conference on Communications Signal Processing and
Applications
. Sharjah, United Arab Emirates.
Sawalha, Majdi, Brierley, Claire & Atwell, Eric
2014a Automatically generated phonemic Arabic-IPA pronunciation tiers for the boundary annotated Qur’an
dataset for machine learning. Proceedings of
LRE-Rel’2 2nd Workshop on Language Resource and Evaluation for Religious Text
. Reykjavik, Iceland.
Sawalha, Majdi, Brierley, Claire, Atwell, Eric & Dickins, James
2014b Text analytics and transcription technology. Proceedings of
IMAN’2014 Islamic Applications in Computer Science And Technology
. Amman, Jordan.
Sawalha, Majdi, Brierley, Claire, Atwell, Eric & Dickins, James
2017 Text analytics and transcription technology for Quranic Arabic.
International Journal on Islamic Applications in Computer Science and Technology 5 (2): 45–51.

Seddik, Khadiga M., Farghaly, Ali & Fahmy, Aly Aly
2015 Arabic anaphora resolution: Corpus of the Holy Quran annotated with anaphoric
information.
International Journal of Computer Applications 124(15): 35–43.


Sharaf, Abdul Baquee & Atwell, Eric
2009 A corpus-based computational model for knowledge representation of the Quran. Proceedings of
CL’2009 Corpus Linguistics
. Liverpool, UK.
Sharaf, Abdul Baquee & Atwell, Eric
2012a QurAna: Corpus of the Quran annotated with pronominal anaphora. Proceedings of
LREC’2012 Language Resources and Evaluation Conference
. Istanbul, Turkey.
Sharaf, Abdul Baquee & Atwell, Eric
2012b QurSim: A corpus for evaluation of relatedness in short texts. Proceedings of
LREC’2012 Language Resources and Evaluation Conference
. Istanbul, Turkey.
Shmeisania, Hashem, Tartir, Samir, Al-Nassaan, Ammar & Najid, Moath
2014 Semantically answering questions from the Holy Quran. Proceedings of
IMAN’2014 Islamic Applications in Computer Science and Technology
. Amman, Jordan.
Tabrizi, Arash Amini & Mahmud, Rohana
2013 Issues of coherence analysis on English translations of Quran. Proceedings of
ICCSPA’2013 International Conference on Communications Signal Processing and
Applications
. Sharjah, United Arab Emirates.

Wiechmann, Daniel & Fuhs, Stefan
2006 Concordance software.
Corpus Linguistics and Linguistics Theory Journal 2: 109–130.

Wood, Paul
2016 The pen and the sword: Reporting ISIS. Discussion paper, Shorenstein Center on Media Politics and Public Policy.
Yusof, Raja, Zainuddin, Roziati, Baba, Mohd & Yusoff, Zulkifi
2010 Quranic words stemming.
Arabian Journal for Science and Engineering 35(2): 37–49.

Zaghouani, Wajdi, Zerrouki, Taha & Balla, Amar
2015 SAHSOH@ QALB shared task: A rule-based correction method of common Arabic native and non-native
speakers’ errors. Proceedings of
ANLP’2015 Arabic Natural Language Processing Workshop
. Beijing, China.
Zeroual, Imad & Lakhouaja, Abdelhak
2016 A new Quranic corpus rich in morphosyntactical information.
International Journal of Speech Technology 19(2): 339–346.


Zouaghi, Anis, Merhbene, Laroussi & Zrigui, Mounir
2011 Word sense disambiguation for Arabic language using the variants of the Lesk
algorithm. Proceedings of
WORLDCOMP’2011 World Congress in Computer Science, Computer Engineering, and Applied
Computing
. Las Vegas, USA.
Cited by
Cited by 1 other publications
El-Farahaty, Hanem, Nouran Khallaf & Amani Alonayzan
2023.
Building the Leeds Monolingual and Parallel Legal Corpora of Arabic and English Countries’ Constitutions: Methods, Challenges and Solutions.
Corpus Pragmatics 
This list is based on CrossRef data as of 9 may 2023. Please note that it may not be complete. Sources presented here have been supplied by the respective publishers.
Any errors therein should be reported to them.