Update cookies preferences

E-book: Similar Languages, Varieties, and Dialects: A Computational Perspective

Edited by (Rochester Institute of Technology, New York), Edited by
Other books in subject:
  • Format - EPUB+DRM
  • Price: 83,97 €*
  • * the price is final i.e. no additional discount will apply
  • Add to basket
  • Add to Wishlist
  • This ebook is for personal use only. E-Books are non-refundable.
Other books in subject:

DRM restrictions

  • Copying (copy/paste):

    not allowed

  • Printing:

    not allowed

  • Usage:

    Digital Rights Management (DRM)
    The publisher has supplied this book in encrypted form, which means that you need to install free software in order to unlock and read it.  To read this e-book you have to create Adobe ID More info here. Ebook can be read and downloaded up to 6 devices (single user with the same Adobe ID).

    Required software
    To read this ebook on a mobile device (phone or tablet) you'll need to install this free app: PocketBook Reader (iOS / Android)

    To download and read this eBook on a PC or Mac you need Adobe Digital Editions (This is a free app specially developed for eBooks. It's not the same as Adobe Reader, which you probably already have on your computer.)

    You can't read this ebook with Amazon Kindle

Language resources and computational models are becoming increasingly important for the study of language variation. A main challenge of this interdisciplinary field is that linguistics researchers may not be familiar with these helpful computational tools and many NLP researchers are often not familiar with language variation phenomena. This essential reference introduces researchers to the necessary computational models for processing similar languages, varieties, and dialects. In this book, leading experts tackle the inherent challenges of the field by balancing a thorough discussion of the theoretical background with a meaningful overview of state-of-the-art language technology. The book can be used in a graduate course, or as a supplementary text for courses on language variation, dialectology, and sociolinguistics or on computational linguistics and NLP. Part 1 covers the linguistic fundamentals of the field such as the question of status and language variation. Part 2 discusses data collection and pre-processing methods. Finally, Part 3 presents NLP applications such as speech processing, machine translation, and language-specific issues in Arabic and Chinese.

Reviews

'Variation is a key aspect of human language, and yet it has been too often overlooked in computational linguistics. The book edited by Marcos Zampieri and Preslav Nakov is an important step towards filling this gap with top-level contributions that offer a new alliance between natural language processing and linguistic theory to understand this complex phenomenon and its impact on applications.' Alessandro Lenci, University of Pisa

More info

Introduces core topics in language variation and the computational methods applied to similar languages, varieties, and dialects.
List of Contributors
xi
Foreword xiii
Introduction xvi
Part I Fundamentals
1 Language Variation
3(1)
James A. Walker
1.1 Introduction: Defining Language Variation
3(2)
1.2 Types of Linguistic Variables
5(2)
1.3 Dimensions of Variation
7(8)
1.4 Conclusion
15(2)
2 Phonetic Variation in Dialects
17(1)
Rachael Tatman
2.1 Introduction
17(5)
2.2 Vowels
22(1)
2.3 Consonants
23(1)
2.4 Suprasegmentals
24(1)
2.5 Conclusion
24(3)
3 Similar Languages, Varieties, and Dialects: Status and Variation
27(24)
Miriam Meyerhoff
Steffen Klaere
3.1 Introduction
27(1)
3.2 Language: More Than Communication
28(1)
3.3 Language and Dialect
28(4)
3.4 Creoles as a Class of Natural Languages
32(2)
3.5 Introducing the Corpus of Spoken Bequia Creole English
34(2)
3.6 Transforming Spoken Word into Categorical Data
36(2)
3.7 Feature Interrelationship
38(2)
3.8 Graphical Models to Visualise Interactions
40(2)
3.9 Feature Relations at a Marginalised Level
42(1)
3.10 Feature Interrelationship within Communities
43(1)
3.11 Community Distinction
44(1)
3.12 Speaker-Specific Creole Frequency
45(3)
3.13 Conclusions: Principled Methods for Exploring Systematic Dialects within Languages
48(3)
4 Mutual Intelligibility
51(45)
Charlotte Gooskens
Vincent J. Van Heuven
4.1 Introduction
51(7)
4.2 How to Measure Intelligibility
58(9)
4.3 Extra-linguistic and Para-linguistic Factors Influencing Intelligibility
67(3)
4.4 Linguistic Determinants of Intelligibility
70(13)
4.5 Relationship between Intelligibility and Language Trees
83(4)
4.6 Conclusions, Discussion, and Desiderata for Future Research
87(9)
5 Dialectology for Computational Linguists
96(25)
John Nerbonne
Wilbert Heeringa
Jelena Prokic
Martijn Wieling
5.1 Introduction
96(1)
5.2 Dialectology
96(2)
5.3 Dialectometry
98(2)
5.4 Edit Distance on Phonetic Transcriptions
100(4)
5.5 Geography of Distributions
104(4)
5.6 Validation
108(3)
5.7 Emerging Opportunities and Issues
111(1)
5.8 Conclusions
112(9)
Part II Methods and Resources
6 Data Collection and Representation for Similar Languages, Varieties and Dialects
121(17)
Tanja Samardzic
Nikola Ljubesic
6.1 Representing Language Variability in Corpora
122(2)
6.2 Types of Micro-Variation and the Corresponding Data Collection Procedures
124(9)
6.3 Privacy and Linguistic Micro-Variation
133(1)
6.4 Conclusion
134(4)
7 Adaptation of Morphosyntactic Taggers
138(29)
Yves Scherrer
7.1 Introduction
138(7)
7.2 Model Transfer Methods
145(4)
7.3 Normalization and Other Data Transfer Methods
149(7)
7.4 Tagger Adaptation and Multilingual Models
156(2)
7.5 Conclusions
158(9)
8 Sharing Dependency Parsers between Similar Languages
167(20)
Zeljko Agic
8.1 Introduction
167(5)
8.2 A New Hope? Notable Exceptions
172(6)
8.3 To Conclude: A Glimpse of the Future
178(9)
Part III Applications and Language Specific Issues
9 Dialect and Similar Language Identification
187(17)
Marcos Zampieri
9.1 Introduction
187(2)
9.2 A Supervised Text Classification Problem
189(1)
9.3 Collecting Data
190(2)
9.4 Competitions
192(3)
9.5 DSL Shared Task 2015
195(3)
9.6 Conclusion and Future Perspectives
198(6)
10 Dialect Variation on Social Media
204(15)
Dong Nguyen
10.1 Introduction
204(2)
10.2 Social Media for Dialect Research
206(3)
10.3 Processing Data
209(1)
10.4 Patterns in Social Media
210(3)
10.5 Future Outlook
213(6)
11 Machine Translation between Similar Languages
219(35)
Preslav Nakov
Jorg Tiedemann
11.1 Introduction
219(1)
11.2 Models and Approaches
219(4)
11.3 Character-Level Machine Translation
223(5)
11.4 Closely Related Languages as MT Pivots
228(6)
11.5 Bitext Combination
234(4)
11.6 Language Adaptation
238(5)
11.7 Other Approaches
243(1)
11.8 Applications and Future Directions
244(2)
11.9 Conclusions
246(8)
12 Automatic Spoken Dialect Identification
254(25)
Pedro A. Torres-Carrasquillo
Bengt J. Borg Strom
12.1 Introduction
254(1)
12.2 Background
255(5)
12.3 Resources for Dialect Identification
260(5)
12.4 State of the Art
265(4)
12.5 Standarized Evaluations and Recent Performance
269(3)
12.6 Challenges and Future Outlook
272(7)
13 Arabic Dialect Processing
279(25)
Nizar Habash
13.1 Introduction
279(1)
13.2 Arabic and Its Variants
279(3)
13.3 Challenges of Arabic Dialect Processing
282(3)
13.4 Arabic Dialect Resources
285(4)
13.5 Arabic Dialect Processing Tools and Applications
289(4)
13.6 Conclusion and Outlook
293(11)
14 Computational Processing of Varieties of Chinese: Comparable Corpus-Driven Approaches to Light Verb Variation
304
Menghan Jiang
Hongzhi Xu
Jingxia Lin
Dingxu Shi
Chu-Ren Huang
14.1 Introduction
304(6)
14.2 Computational Approaches to Language Variations in Chinese and Other Languages
310(6)
14.3 Classification of Varieties of Mandarin Chinese
316(3)
14.4 Conclusion
319
Dr. Marcos Zampieri is an assistant professor at the Rochester Institute of Technology, where he teaches courses in linguistics and natural language processing. He received his PhD for Saarland University in Germany with a thesis on computational models applied to pluricentric languages. Dr. Zampieri is one of the organizers of the well-established VarDial workshop series on NLP for Similar Languages, Varieties, and Dialects. His research deals with the application of computational models to large collections of texts. He has worked on a variety of topics including language acquisition and variation, (machine) translation and post-editing, and social media mining. Dr. Preslav Nakov is Principal Scientist at Qatar Computing Research Institute at Hamad Bin Khalifa University. He leads the Tanbih mega-project, developed in collaboration with MIT. He co-authored a book on Semantic Relations between Nominals, two books on computer algorithms, and many research papers in top-tier conferences and journals. He received the Young Researcher Award at RANLP'2011. He was also the first to receive the Bulgarian President's John Atanasoff award, named after the inventor of the first automatic electronic digital computer. Dr. Nakov's research was featured in over 100 news outlets, including Forbes, Boston Globe, and MIT Technology Review.