9. Coreference resolution
Akbar told Asqar he shouldn’t run
again
: داشتیم خوبی پیشرفت حال به تا که هایی چالش
Word Sense Disambiguation(WSD)
I need new batteries for my
mouse
?
11. Parsing
I can see cooler from the window
Machine translation (MT)
第 25 届上海国际电影节开幕
In 25th
of the tir I have a NLP
presentation
Information extraction (IE)
You’re invited to our LUG
session, doshanbe tir 25 at 16:30
LUG
Tir
25th
12. : سخته خیلی که هنوز هایی چالش و
Question answering (QA)
Q. How effective is ibuprofen in
reducing
fever in patients with acute febrile
illness
?
XYZ acquired ABC yesterday
ABC has been taken over by
XYZ
Paraphrase
The Dow Jones is up
The S&P 500 jumped
Housing (Dollar) prices
Economy
is good
Summarizatio
n
چگونهibuprofenبیماری به مبتل بیماران تب کاهش در
است؟ مؤثر حاد تب
14. ،سخته طبیعی هاینزبا پردازش. . . چرا
At last, a computer that understands
you like your mother
It understands you as well as your
mother, understands you
It understands (that) you like your
mother
It understands you as well as it
understands your mother
1
2
3
. . . ابهام. . . ابهام
15. Firm XYZ is a full service advertising agency
specializing in direct and in-teractive
marketing. Located in Bigtown CA, Firm XYZ is
looking for an As-sistant Account Manager to
help manage and coordinate interactive marketing
initiatives for a marquee automative account.
Experience in online marketing, automative
and/or the advertising field is a plus.
Assistant Account Manager Re-sponsibilities
Ensures smooth implementation of programs and
initiatives Helps manage the delivery of
projects and key client deliverables ...
Compensation: $50,000-$80,000 Hiring
Organization: Firm XYZ
INDUSTRY
POSITION
LOCATION
COMPANY
SALARY
Advertising
Assistant Account
Manager
Bigtown, CA
Firm XYZ
$50,000-$80,000
اطلعات استخراج
16. At the semantic (meaning) level
They put money in the bank
buried in
mud
?
یانعم
هملک
24. Probabilistic Language Modeling
. :کلمات از ای زنجیره یا جمله آمدن احتمال محاسبه هدف
P(W) = P(w 1
,w 2
,w 3
,w 4
,w 5
…,w
n
)
. :جمله در بعدی کلمه آمدن محاسبه مرتبط وظایف
P(W5
| w 1
,w 2
,w 3
,w 4
)
(قوانین محاسبه طریقهchain rule:)
P(its, water, is, so, transparent,
that)
P(x 1
,x 2
,x 3
,...,x n
) = P(x 1
)P(x 2
|x 1
)P(x 3
|
x 1
, x 2
)...P(x n
|x 1
,...,x n-1
)
P(w i
| w 1
w 2
... w i-1
)
P(“its water is so transparent”) = P(its) ×
P(water|its) × P(is|its water) × P(so|its
water is) × P(transparent|its water is so)
25. ؟ بسنجیم را احتمالت این صحت چگونه حال
P(the | its water is so transparent that) = Count(its water is so transparent that
the)
Count(its water is so transparent
that)No! Too many possible sentences
We’ll never see enough data for estimating these
مارکوف آندری
روسی دانیریاض
زمینه در
احتمالت نظریه
وابستگی خودش به نزدیک کلمات به فقط اصطلح در ایهکلم هر
کلمات تمامی مشاهده به نیازی و دارد!نیست
the torvalds laughs
P(the torvalds laughs)= P (the|
start) * (torvalds|start,the) *
(laughs|the,torvalds) * (stop|
torvalds,laughs) با که جملتی احتمالthe torvaldsباشند شده شروع
با که آنهایی تعدادtheشوند می زوع
29. Types of spelling errors
Non-word Errors
graffe => giraffe
Real-word Errors
Typographical errors
three => there
Cognitive Errors (homophones)
piece => peace
too => two
پیشنهاد دادن با که
Suggest a
correction
از لیستی یا و تصحیح
پیشنهادی کلمات
Suggestion lists
.شود می برطرف
30. Non-word spelling error detection
Any word not in a dictionary is an error
The larger the dictionary the better
Non-word spelling error correction
Generate candidates
real words that are similar to error
Choose the one which is best
Shortest weighted edit distance
Highest noisy channel probability
31. The noisy channel model is a framework used in spell
checkers, question answering, speech recognition,
and machine translation. In this model, the goal is
to find the intended word given a word where the
letters have been scrambled in some manner
Insertion
Deletion
Substitution
Transpositio
n
32. Words within 1 of acress ٪80ها ارور1حرفی
مابقی اکثر حدودا و
دوحرفیهستند
39. Parts-of-speech are often ambiguous
I have to go there
I had a go at it
verb
nou
nIf the previous word is “to”, then it’s a
verb
If the previous word is “a”, then it’s a
noun
If the next word is
Writing rules manually is impossible
یروتسد هاگیاج هتب هتجوت یاج هتب
هتب هتلمج رد هتملک
شتقن
هجوت نتآ
!دوش یم
...
43. Table lookup
approach
:تدرویکرتمامی هریشت از داده پایگاه کیت
کلمات ریشه و تمیکن ایجاد تنکمم کلمات
.کنیم پیدا داده پایگاه این در مقایسه با را
:تکلتشمزبان برای بیسی اتتیتتد نتتچنی
.ندارد وجود ها زبان سایر یا و انگلیسی
و ،است زیاد نگهداری و وجوتتسج ربارتس
!ندارد اقتصادی صرفه
the short prefix "be", which is the stem of
such words as "be", "been" and "being", would
not be considered as the stem of the word
45. n-gram stemmers
statistics => st ta at ti is st ti ic cs
unique digrams = at cs ic is st ta ti
statistical => st ta at ti is st ti ic
ca al
unique digrams = al at ca ic is st ta ti
Dice’s coefficient (similarity)
a|b|c|d
ab|bc|cd
abc|bcd|cde
46. Affix Removal
Stemmers
.کنند می پاک زبان نحوی قوانین اساس بر را ها پسوند و پیشوند ها الگوریتم از اینگونه
آقای از زیر مثال نمونه عنوان باHarman 1991:است
If a word ends in “ies” but not ”eies”
or ”aies” Then “ies” -> y
If a word ends in “es” but not ”aes” ,
or ”ees ” or “oes” Then “es” -> e
If a word ends in “s” but not ”us” or
”ss” Then “s” -> NULL:دارد وجود ها الگوریتم از گونه این مورد در نکته دو ولی دارند خوبی کارایی اینکه وجود باI.منحصربه
زبان.هستندII.را موارد تمام.دهند نمیتشپوش!نمونه
agreed
47.
48. Lexical Semantics
Two alternative guesses of speech recognizer
For breakfast, she ate durian
For breakfast, she ate Dorian
Our corpus contains neither “ate durian” nor “ate Dorian”
words
But, our corpus contains “ate orange”, “ate banana” words
مردمانDorianترد ،مرکزیونانقدیم
استوایی شکل بیضی میوه یککه
است مانند ای خامه خمیر یک حاوی گوشتش