Eleni Koutsomitopoulou - Ελένη Κουτσομητοπούλου
  • Home
  • Class material
    • Course Syllabus
    • Course Calendar
    • Sample Lecture
    • Sample Quiz
    • Additional Course Syllabi >
      • Corpus Linguistics
      • Machine Learning
      • Statistical NLP
  • Blog
  • Contact

SIL UND 2010 LING 507: Computational Syntax and Morphology Calendar

NB: All materials in the links of the table below are listed in their LING 507 Computational Syntax and Morphology folders on the SIL-UND fileserver and therefore are not currently working. Sample lectures and quizzes are provided.

#

Date

Topic

Assignment

Quiz

1

06/08/10

Introduction to Computational Linguistics


Readings

  • Parsing vs. Text Processing in dictionary construction (Ahlswede & Evens 1988)

  • What is computational linguistics (Hans Uszkoreit, 2000, http://www.coli.uni-saarland.de/~hansu/what_is_cl.html)

  • What do computational linguists needs to know about linguistics (Moore 2009)


Topics covered

What is Computational Linguistics

Introduction to Computational Syntax

FSMs

Chomsky hierarchy

Syntactic parsing with CFGs

Parsing vs. text processing


Material presented

Lecture: Introduction to Computational Syntax (presentation, pdf)



Assignment_June10.pdf


Writing assignments: Summaries of the articles in the readings list

Quiz_Fri_June_11

2

06/14/10

Unix text processing and regular expressions


Readings

  • Using Natural Language to Access Databases on the Web (Hongchi et al 2001)

  • Text Mining with Information Extraction (Nahm and Mooney 2002)

  • Data-driven Dependency Parsing across Languages and Domains: Perspectives from the CoNLL 2007 Shared Task (Nivre 2007)

  • Extracting a verb lexicon for deep parsing from FrameNet (McConville and Dzikovska 2005)


Topics covered

Unix file management commands

Unix text processing techniques

Finite State morphology: FSAs and RegEx's


Material presented

Lecture: FSAs and Regexs (presentation, pdf)

Lecture: Regexs & FSAs part 2 (presentation, pdf)

Unix file management: Handout_1.pdf

Unix file management and text processing: Handout_2.pdf

Unix file management and text processing: Handout_3.pdf


Unix_Assignment_1.pdf

Unix_Assignment_2.pdf

Unix_Assignment_3.pdf



Writing assignments: Summaries of the articles in the readings list

Quiz_Fri_June_18

3

06/21/10

Information Extraction and text processing


Readings

  • ExPRESS – Extraction Pattern Recognition

    Engine and Specification Suite (Piskorski 2007)

  • Information Extraction (McCallum 2005)

  • TextAnalysisInternational-IDE-VisualText (Text Analysis International Inc, 2001)

  • Hierarchical Hidden Markov Models for Information Extraction (Skounakis, Craven and Ray 2003)

Topics covered

Information Extraction (IE):

– What is Information Extraction (IE)?

– What types of terms are usually extracted?

– What are two approaches to building extraction

systems?

– How is the output usually evaluated?

– List some applications of IE

Regex metacharacters

More on grep processing

Relevance, Recall and Precision metrics

The concept of greediness in pattern matching


Material presented

Lecture: Introduction to IE (presentation, pdf)

Lecture: IE challenges (presentation, pdf)

Handout: Relevance, P, R.pdf

Handout: Regex metacharacters.pdf

Handout: Dots, greediness and white spaces.pdf

Unix Handout 5.pdf

Unix Handout 6.pdf


Project proposals due: drafts by end of wk3


Unix_Assignment_5.pdf

Unix_Assignment_6.pdf


Writing assignments: Summaries of the articles in the readings list

Quiz_Fri_June_25

4

06/28/10

Advanced text processing techniques (sed, regex's and automatic entity annotation and extraction)


Readings


  • Campbell et al. 2009. Practical Programming: An Introduction to Computer Science using Python.[ch1&2]

  • Automatic Alignment In ParallelCorpora (Papageorgiou, Cranias, Piperidis 1994)



Topics covered

Unix text processing commands:

– Removing dup's

– Translating txt

– Word freq's: uniq -c

– A few more sed commands

VisualText:

– entity types vs. entity terms

– entity relationships

Open_CALAIS: demo

Notepadd++: regex's and proper name entity recognition and extraction


Material presented

Unix_Handout_7.pdf

Notepad++ Regex Exercise.pdf

Handout_6302010_advanced_text_processing.pdf

Handout: Points to remember_Append_Uniq.pdf


Project proposals due: drafts by end of wk4


Assignment_7.pdf

Assignment_8.pdf

Assignment_9.pdf


Writing assignments: Summaries of the articles in the readings list

Quiz_Fri_July_2

5

07/06/10

VisualText: natural language processing and hands-on rule writing for the automatic recognition, annotation and extraction of named entities in text


Readings

  • Brian Roark and Richard Sproat. 2007. Computational Approaches to Morphology and Syntax. Oxford University Press [pp.1-19 and pp.62-66]

  • Computing with Realizational Morphology (Karttunen 2003)

  • The Corpus-based Approach: A New Paradigm in Translation Studies (Laviosa 1998)

  • Automatic Construction of English/Chinese Parallel Corpora (Yang & Wing Li 2003)


Topics covered


VisualText:

– How to create a new analyzer

– How to select an initial template

– How to reload your new analyzer

– How to create an input file

– How to run the analyzer and view the parse tree

– How to generate built-in rules from sample data

– How to create a stub for the passes and pass

files

– How to create concepts and concept folders for

the generated rules

– How to populate the concepts with sample data

– How to generate built-in rules from the sample

data


Extra Lab Visual Text rule-writing:

– How to write your own rules for finding NPs, VPs, PPs and Possessive Phrases

– How to use simple NLP++ syntax for rule-writing


Realizational morphology


Material presented

Lecture: Introduction to Computational Morphology (presentation, pdf)

Visual Text Tutorials: based on VisualText Help (part of the VisualText IDE)

Informal mid-term course evaluation (template)




Assignment_07072010.pdf

Assignment_VisualText_POS_tagging.pdf*


Writing assignments: Summaries of the articles in the readings list



*See class overview: week 5 class 2 Wed 07072010.pdf

Quiz_Fri_July_9

6

07/12/10

Corpus linguistics tools and techniques


Readings

  • English Corpus Introduction: An introduction (Meyer 2002, pp. 1-19)

  • A Computer-assisted Approach to the Analysis of Translation Shifts (Munday 1998)

  • Web As Parallel Corpus (Resnik and Smith 2003)

  • Corpus Analysis For Indexing (Oliveira et all 2005)

  • Exploiting Parallel Texts for WSD (Ng et al 2003)

  • Translation Prediction using Word Coocurrence graphs (Apidianaki 2005)

Topics covered

Unix file and dir permissions

More advanced sed commands

Corpus linguistics:

– techniques, principles and best practices

– concordancing and concordancers

– lexical priming and semantic prosody

– denotation vs. collocations and colligations and semantic associations

– techniques for mining corpora (KWIC, comparison across different corpora, freq's, etc.)

Tool: Wordsmith How To


Material presented

Lecture: Introduction to Corpus Lx part1.pdf

Unix_Handout_7122010.pdf

Unix_Handout_7132010.pdf

Handout:

CorpusBased Translation using WordSmith.pdf [includes corpora mining exercises on WordSmith]





Assignment_07122010.pdf

Assignment_07132010.pdf

Corpora hands on #1.pdf



Writing assignments: Summaries of the articles in the readings list

Quiz_Fri_July_16

7

07/19/10

Python Regex module, advanced string manipulation, morphological parsing using FLEx, interlinear texts


Readings:

  • Campbell et al. 2009. Practical Programming: An Introduction to Computer Science using Python. [pp. 29-39 and 153-176]

  • A Formal Framework For Interlinear Text (Maeda and Bird 2000)

  • The SIL FieldWorks Language Explorer Approach to Morphological Parsing (Black and Simons, 2006)


Topics covered

Complex Unix strings and string manipulation

Toolbox: demo

FLEx: demo

Python IDLE and shell

Python: RE module

Python: string manipulation and regex's

Morphological parsing, interlinear text


Material presented

Handout: Anagram_Eric_Clapton.pdf

Handout_7192010.pdf: string manipulation

Handout_7202010_part2.pdf: string manipulation

Handout_7202010_part3.pdf: string manipulation

Python_IDLE_intro_part_1.pdf

Python_IDLE_intro_part_2_REs_string_man.pdf

Python_IDLE_intro_part_3_more_REs&strings.pdf


Assignment_07192010.pdf

Assignment_07202010.pdf

Assignment_07212010.pdf

Assignment_2_07212010.pdf

Assignment_07222010.pdf


Writing assignments: Summaries of the articles in the readings list

Quiz_Fri_July_23

8

07/26/10

Machine Translation: shallow transfer for closely-related languages (and Python cont.)


Readings

  • Campbell et al. 2009. Practical Programming: An Introduction to Computer Science using Python.[ch.3,4,8]

  • Shallow-transfer rule-based machine translation for Swedish to Danish (Tyers and Nordfalk 2009)

  • Shallow Parsing for Portuguese–Spanish

    Machine Translation (Garrido-Alenda et al 2004)


Topics covered

Machine translation:

– models, approaches, evaluation metrics and challenges

– shallow transfer of closely-related languages

Parsing street addresses using Python

Parsing phone numbers using Python

Parsing NL input, extracting and tagging NL

elements using Python

How to write to files in Python


Material presented

Lecture: Overview of MT models, approaches, challenges.pdf

Handout: Python_RE_case_study_1.pdf

Handout: Python_RE_case_study_2.pdf

Handout: Python_RE_case_study_3_ExtractingFromSs.pdf

Handout: Python_RE_case_study_4.pdf

Handout: Python_NLTK.pdf

Machine translation freeware tools: demo



Dissemination of final project template: Monday 7/26


Students' final project write-up's due: Friday 7/30 by 4pm




Python_RE_case_study_1.pdf

Python_RE_case_study_2.pdf

Python_RE_case_study_3_ExtractingFromSs.pdf

Python_RE_case_study_4.pdf


Writing assignments: Summaries of the articles in the readings list




final quiz 7/30

9

08/02/10

Student in-class presentations and project evaluations


-handouts

-I/O sample

-demo of proposed implementation


Final course evaluations


No quiz












Copyright © Eleni Koutsomitopoulou 2010 All Rights Reserved. No part of this document may be reproduced or otherwise used without the written consent of the author.

Powered by Create your own unique website with customizable templates.