Statistical Natural Language Tagging
|
||||||||||||
![]() |
Description:Statistical Natural Language Tagging [CHARNIAK93] consists of assigning a lexical category to each word in a text. Usually, the first step in the parsing process is to identify which lexical category (noun, verb, etc) each word in the sentence belongs to, i.e to tag the sentence. The difficulty of the tagging process comes from the lexical ambiguity of the words: many words can be in more that one lexical class. The assignment of a tag to a word depends on the assignments to the other words. Let us considered the following words and their tags.
|
|||||||||||
The most common statistical models for tagging are
based on assigning a probability to a given tag according to its neighboring
tags (context). Then the tagger tries to maximize the total probability
of the tagging of each sentence. The raw data to obtain this probability
are extracted from the training text. That is, we need a hand-tagged
text such as the one provided in [BROWN79]. Then,
the probability of a sentence can be define as the sum of the probabilities
of the context of each word
where P(T|LC, RC) is the probability that the tag of word w is T, given that its context is formed by the sequence of tags LC to the left and the sequence RC to the right. This probability is estimated from the training text as
![]()
where occ(LC, T, RC) is the number of occurrences of the
list of tags LC, T, RC in the training table and Click here to get this description in tex format and here to get the figure in eps format. |
||||||||||||
Instances and best known solutions for those instances:The best statistical models typically perform at about a level of correctness oof the 96% [BRILL95,ARAUJO02]. |
||||||||||||
Related Papers:
Click here to get the bibliography in bibtex fotmat. |
||||||||||||
|
||||||||||||