SlideShare a Scribd company logo
1 of 22
Download to read offline
PORTUGUESE NLP FOR
CPDOC/FGV?
PARGRAM/PARSEM PROJECT
ANNOUNCEMENT

 Valeria de Paiva
 October 2010
Fundacao Getulio Vargas (FGV)


 “Fundação Getulio Vargas (FGV) is a
   Brazilian higher education institution
   founded in December 20, 1944. It
   offers regular courses of Economics,
   Business Administration, Law,
   Social Sciences and
   Information technology management.
   Its original goal was to train people
   for the country's public- and private-
   sector management.[…] It is
   considered by Foreign Policy
   magazine to be a top-5
   "policymaker think-tank" worldwide.”
CPDOC

Centro de Pesquisa e
 Documentação de
 História Contemporânea
 do Brasil (Contemporary
 Brazilian History Research
 and Documentation
 Center) is a private
 higher education
 institution founded in
 1973 linked to the FGV.
CPDOC

Originally:
To house personal
 archives of public
 figures
Develop historic research
 using privileged archive
Documentation &
 Research
CPDOC

Now:
Graduate program on
 History, Political Science
 and Cultural Artifacts
 (since 1974)
School of Social Sciences
 and History of the FGV
 (since 2005)
CPDOC

          two centers Rio/SP
          Research:
          Núcleo de Pesquisa Social Aplicada
          Centro de Relações Internacionais

            Archives:
               Programa    de Arquivos Pessoais (PAP)
                program of personal archives
               Programa de História Oral (PHO) program of
                oral history
FGV School of Applied Math

         In the process of being created…
         Experts on image processing, signal/

          sound processing
         Not much on textual processing

         Possibility of impact…
Portuguese in PARGRAM/PARSEM?

        Alexandre Rademaker, Ph D, PUC-Rio,
         Mar2010
        Work on Automating Description Logics

        A Bridge-like system for Portuguese?
Bridge Layered Architecture
            XLE/LFG Pars
                           ing
                                                                 Inference
Text                                KR Mapping                    Engine
                    F-structure           Transfer
                                        semantics
                                                        AKR
 Sources
                                                                      Assertions



Question                                                              Query




                                   Unified Lexicon
           LFG                     Term rewriting             ECD
           XLE                     KR mapping rules           Textual Inference
           MaxEnt models           Factives,Deverbals         logics


           Basic idea: canonicalization of meanings
Portuguese NLP?
           Lots of Homework to do…
           STIL (Simpósio de Tecnologia da Informação e Linguagem
            Humana) 8th ed
           PROPOR (bienally, 9th edition)
            Linguistica de corpus (7a edition)

           Um panorama do Núcleo Interinstitucional de Linguística
            Computacional às vésperas de sua maioridade Maria das
            Graças V. Nunes, Sandra M. Aluisio, Thiago A. S. Pardo NILC –
            ICMC – Universidade de São Paulo São Carlos – SP, Brasil
            Junho de 2010
           Pardo, T.A.S.; Gasperin, C.V.; Caseli, H.M.; Nunes. M.G.V.
            (2010). Computational Linguistics in Brazil: An Overview. In the
            Proceedings of the NAACL-HLT Young Investigators Workshop on
            Computational Approaches to Languages of the Americas, pp.
            1-7. June 1-6, Los Angeles, CA/USA. Pdf
           SBC special interest group in NLP, 2007
Challenges from STIL09 survey
  Lack of large and robust language resources
  Lack of formal models for linguistic description and
   analysis of Portuguese
  Difficulty in attracting students and researchers

  Lack of multidisciplinary collaboration

  CL/NLP marginalization in both Computer Science
   and Linguistics.
  Poor interaction between universities and industry

  Insufficient funding.
NILC’s view 1
    Recently more robust semantic tools like TeP 2.016,
     Wordnet.PT17, and MWN.PT18, as well as named
     entities recognizers, e.g., REMBRANDT19.
    Topics: text summarization, machine translation, text
     simplification, automatic discourse analysis, coreference
     and anaphora resolution, information retrieval, text
     mining, terminology/lexicon research, ontologies and
     semantic tagging, and corpus linguistics.
    The largest CL/NLP research group in Brazil is NILC
     (Interinstitutional Center for Research and Development
     in Computational Linguistics) USP, UFSCar and UNESP
NILC’s view 2?
    Portuguese has got state of the art tools (as POS
     taggers and syntactic parsers) and comprehensive
     corpora of contemporary written language BUT
    still needs resources for particular applications
    Portuguese syntactic parsers (which are considered
     basic NLP tools) and wordnet-like resources are still too
     limited.
    CHARLA 2008 ?
    contests/conferences such as TAC33, Senseval/
     SemEval34, and TREC35, among others, might make
     Portuguese/Spanish datasets available, as CLEF36 has
     done.
Some detail on CPDOC data

        Brazilian Biographic-Historic Dictionary
         (Dicionario Historico-Biografico
         Brasileiro DHBB)
        First edition (printed) 1984:
             4.400   entries / 4 volumes.
          Second edition (printed and cd-rom)
           2001:
             6.626   entries / 5 volumes or cd-rom.
          Third edition (online) 2010:
             7.553   entries / online
CPDOC Online Portal

    Since 2009, search tool to access CPDOC’s archives
    Documents of personal archives, photos, interviews
     from the Oral History Project and entries of the
     DHBB.

    http://www.fgv.br/cpdoc/busca
User Profile
   Undergrads                   53,68%
   Grad  studies (lato sensu)   13,18%
   Masters                      11,50%
   High school                  11,07%
   Doctoral cand                 5,72%
User Profile/Goal of research
   Artigo acadêmico                                17,97%
   Curiosidade histórica                           14,61%
   Monografia (graduação)                          11,74%
   Dissertação (mestrado)                          11,39%
   Pesquisa escolar                                 8,52%
   Tese (doutorado)                                 7,57%
   Filme/vídeo (3,13 %) aparecem em penúltimo lugar, e matéria de imprensa
   (2,29%) em último.
User Profile - media
    Kind of media accessed:

         audiovisual        38,21%
         textual            33,51%
         printed matter     16,33%
         all of the above   11,95%
Organizational chart
Thanks!
Revista Estudos Históricos

  Since 1988, half yearly
  Multidisciplinar scientific magazine

  45 editions so far 2010

  All articles available online

  http://virtualbib.fgv.br/ojs/index.php/reh/index
ThToanks!



            Thanks!

More Related Content

What's hot

Targeted Language Resources for the Digitisation of Historical Collections
Targeted Language Resources for the Digitisation of Historical CollectionsTargeted Language Resources for the Digitisation of Historical Collections
Targeted Language Resources for the Digitisation of Historical CollectionsEmma Huber
 
Ph.D. Defense Presentation of "Open-Domain Word-Level Interpretation of Norwe...
Ph.D. Defense Presentation of "Open-Domain Word-Level Interpretation of Norwe...Ph.D. Defense Presentation of "Open-Domain Word-Level Interpretation of Norwe...
Ph.D. Defense Presentation of "Open-Domain Word-Level Interpretation of Norwe...Martin Thorsen Ranang
 
Statistics-based Approaches to Lexical Semantics
Statistics-based Approaches to Lexical SemanticsStatistics-based Approaches to Lexical Semantics
Statistics-based Approaches to Lexical SemanticsMartin Thorsen Ranang
 
AMAZIGH PART-OF-SPEECH TAGGING USING MARKOV MODELS AND DECISION TREES
AMAZIGH PART-OF-SPEECH TAGGING USING MARKOV MODELS AND DECISION TREESAMAZIGH PART-OF-SPEECH TAGGING USING MARKOV MODELS AND DECISION TREES
AMAZIGH PART-OF-SPEECH TAGGING USING MARKOV MODELS AND DECISION TREESijcsit
 
To Infinite Possibilities and Beyond...
To Infinite Possibilities and Beyond...To Infinite Possibilities and Beyond...
To Infinite Possibilities and Beyond...Valeria de Paiva
 
Disntinguished Speaker - Corina Forascu
Disntinguished Speaker - Corina ForascuDisntinguished Speaker - Corina Forascu
Disntinguished Speaker - Corina Forascuoxwocs
 
Digital Humanities Research, Europeana and The Embedded Semantic Research L...
Digital Humanities Research, Europeana  and The Embedded Semantic Research  L...Digital Humanities Research, Europeana  and The Embedded Semantic Research  L...
Digital Humanities Research, Europeana and The Embedded Semantic Research L...Net7
 
11 terms in corpus linguistics1 (1)
11 terms in corpus linguistics1 (1)11 terms in corpus linguistics1 (1)
11 terms in corpus linguistics1 (1)ThennarasuSakkan
 
Ontology Integration and Interoperability (OntoIOp) – Part 1: The Distributed...
Ontology Integration and Interoperability (OntoIOp) – Part 1: The Distributed...Ontology Integration and Interoperability (OntoIOp) – Part 1: The Distributed...
Ontology Integration and Interoperability (OntoIOp) – Part 1: The Distributed...Christoph Lange
 

What's hot (13)

Targeted Language Resources for the Digitisation of Historical Collections
Targeted Language Resources for the Digitisation of Historical CollectionsTargeted Language Resources for the Digitisation of Historical Collections
Targeted Language Resources for the Digitisation of Historical Collections
 
On the Integration of Real-Time and Fault-Tolerance in P2P Middleware
On the Integration of Real-Time and Fault-Tolerance in P2P MiddlewareOn the Integration of Real-Time and Fault-Tolerance in P2P Middleware
On the Integration of Real-Time and Fault-Tolerance in P2P Middleware
 
Ph.D. Defense Presentation of "Open-Domain Word-Level Interpretation of Norwe...
Ph.D. Defense Presentation of "Open-Domain Word-Level Interpretation of Norwe...Ph.D. Defense Presentation of "Open-Domain Word-Level Interpretation of Norwe...
Ph.D. Defense Presentation of "Open-Domain Word-Level Interpretation of Norwe...
 
Statistics-based Approaches to Lexical Semantics
Statistics-based Approaches to Lexical SemanticsStatistics-based Approaches to Lexical Semantics
Statistics-based Approaches to Lexical Semantics
 
Parsing techniques
Parsing techniquesParsing techniques
Parsing techniques
 
AMAZIGH PART-OF-SPEECH TAGGING USING MARKOV MODELS AND DECISION TREES
AMAZIGH PART-OF-SPEECH TAGGING USING MARKOV MODELS AND DECISION TREESAMAZIGH PART-OF-SPEECH TAGGING USING MARKOV MODELS AND DECISION TREES
AMAZIGH PART-OF-SPEECH TAGGING USING MARKOV MODELS AND DECISION TREES
 
To Infinite Possibilities and Beyond...
To Infinite Possibilities and Beyond...To Infinite Possibilities and Beyond...
To Infinite Possibilities and Beyond...
 
Disntinguished Speaker - Corina Forascu
Disntinguished Speaker - Corina ForascuDisntinguished Speaker - Corina Forascu
Disntinguished Speaker - Corina Forascu
 
Digital Humanities Research, Europeana and The Embedded Semantic Research L...
Digital Humanities Research, Europeana  and The Embedded Semantic Research  L...Digital Humanities Research, Europeana  and The Embedded Semantic Research  L...
Digital Humanities Research, Europeana and The Embedded Semantic Research L...
 
11 terms in corpus linguistics1 (1)
11 terms in corpus linguistics1 (1)11 terms in corpus linguistics1 (1)
11 terms in corpus linguistics1 (1)
 
2012 12 12_adam_v_final
2012 12 12_adam_v_final2012 12 12_adam_v_final
2012 12 12_adam_v_final
 
Ontology Integration and Interoperability (OntoIOp) – Part 1: The Distributed...
Ontology Integration and Interoperability (OntoIOp) – Part 1: The Distributed...Ontology Integration and Interoperability (OntoIOp) – Part 1: The Distributed...
Ontology Integration and Interoperability (OntoIOp) – Part 1: The Distributed...
 
CGW 2010 - NLPN
CGW 2010 - NLPNCGW 2010 - NLPN
CGW 2010 - NLPN
 

Similar to Portuguese NLP Tools and Resources for CPDOC/FGV

Estudo Surdos - Sign Write
Estudo Surdos -  Sign WriteEstudo Surdos -  Sign Write
Estudo Surdos - Sign Writeasustecnologia
 
OpenWordnet-PT: A Project Report
OpenWordnet-PT: A Project ReportOpenWordnet-PT: A Project Report
OpenWordnet-PT: A Project ReportAlexandre Rademaker
 
Languages, Ontologies and Automatic Grammar Generation - Prof. Pedro Rangel H...
Languages, Ontologies and Automatic Grammar Generation - Prof. Pedro Rangel H...Languages, Ontologies and Automatic Grammar Generation - Prof. Pedro Rangel H...
Languages, Ontologies and Automatic Grammar Generation - Prof. Pedro Rangel H...Facultad de Informática UCM
 
TUNING TRADITIONAL LANGUAGE PROCESSING APPROACHES FOR PASHTO TEXT CLASSIFICATION
TUNING TRADITIONAL LANGUAGE PROCESSING APPROACHES FOR PASHTO TEXT CLASSIFICATIONTUNING TRADITIONAL LANGUAGE PROCESSING APPROACHES FOR PASHTO TEXT CLASSIFICATION
TUNING TRADITIONAL LANGUAGE PROCESSING APPROACHES FOR PASHTO TEXT CLASSIFICATIONkevig
 
Tuning Traditional Language Processing Approaches for Pashto Text Classification
Tuning Traditional Language Processing Approaches for Pashto Text ClassificationTuning Traditional Language Processing Approaches for Pashto Text Classification
Tuning Traditional Language Processing Approaches for Pashto Text Classificationkevig
 
Computational linguistics
Computational linguisticsComputational linguistics
Computational linguisticsAdnanBaloch15
 
Supporting User's Exploration of Digital Libraries, Suedl 2012 workshop proce...
Supporting User's Exploration of Digital Libraries, Suedl 2012 workshop proce...Supporting User's Exploration of Digital Libraries, Suedl 2012 workshop proce...
Supporting User's Exploration of Digital Libraries, Suedl 2012 workshop proce...pathsproject
 
Body-Part Nouns and Whole-Part Relations in Portuguese
Body-Part Nouns and Whole-Part Relations in PortugueseBody-Part Nouns and Whole-Part Relations in Portuguese
Body-Part Nouns and Whole-Part Relations in PortugueseJorge Baptista
 
Plurilingualism and intercomprehension teaching forum 2012
Plurilingualism and intercomprehension teaching forum 2012Plurilingualism and intercomprehension teaching forum 2012
Plurilingualism and intercomprehension teaching forum 2012Tita Beaven
 
Developing corpus-based resources for language learning: looking back in "hope"
Developing corpus-based resources for language learning: looking back in "hope"Developing corpus-based resources for language learning: looking back in "hope"
Developing corpus-based resources for language learning: looking back in "hope"Pascual Pérez-Paredes
 
1 computational linguistics an introduction
1 computational linguistics   an introduction1 computational linguistics   an introduction
1 computational linguistics an introductionThennarasuSakkan
 

Similar to Portuguese NLP Tools and Resources for CPDOC/FGV (20)

Estudo Surdos - Sign Write
Estudo Surdos -  Sign WriteEstudo Surdos -  Sign Write
Estudo Surdos - Sign Write
 
OpenWordnet-PT: A Project Report
OpenWordnet-PT: A Project ReportOpenWordnet-PT: A Project Report
OpenWordnet-PT: A Project Report
 
Corpus Linguistics
Corpus LinguisticsCorpus Linguistics
Corpus Linguistics
 
Languages, Ontologies and Automatic Grammar Generation - Prof. Pedro Rangel H...
Languages, Ontologies and Automatic Grammar Generation - Prof. Pedro Rangel H...Languages, Ontologies and Automatic Grammar Generation - Prof. Pedro Rangel H...
Languages, Ontologies and Automatic Grammar Generation - Prof. Pedro Rangel H...
 
Using pedagogic corpora in ELT
Using pedagogic corpora in ELTUsing pedagogic corpora in ELT
Using pedagogic corpora in ELT
 
Sacodeyl Birmingham 2007
Sacodeyl Birmingham 2007Sacodeyl Birmingham 2007
Sacodeyl Birmingham 2007
 
Barbiers iclave-fr
Barbiers iclave-frBarbiers iclave-fr
Barbiers iclave-fr
 
LSDI.pptx
LSDI.pptxLSDI.pptx
LSDI.pptx
 
TUNING TRADITIONAL LANGUAGE PROCESSING APPROACHES FOR PASHTO TEXT CLASSIFICATION
TUNING TRADITIONAL LANGUAGE PROCESSING APPROACHES FOR PASHTO TEXT CLASSIFICATIONTUNING TRADITIONAL LANGUAGE PROCESSING APPROACHES FOR PASHTO TEXT CLASSIFICATION
TUNING TRADITIONAL LANGUAGE PROCESSING APPROACHES FOR PASHTO TEXT CLASSIFICATION
 
Tuning Traditional Language Processing Approaches for Pashto Text Classification
Tuning Traditional Language Processing Approaches for Pashto Text ClassificationTuning Traditional Language Processing Approaches for Pashto Text Classification
Tuning Traditional Language Processing Approaches for Pashto Text Classification
 
Computational linguistics
Computational linguisticsComputational linguistics
Computational linguistics
 
E-text in EFL - Four flavours
E-text in EFL - Four flavoursE-text in EFL - Four flavours
E-text in EFL - Four flavours
 
Applications of CL to FLT
Applications of CL to FLTApplications of CL to FLT
Applications of CL to FLT
 
Supporting User's Exploration of Digital Libraries, Suedl 2012 workshop proce...
Supporting User's Exploration of Digital Libraries, Suedl 2012 workshop proce...Supporting User's Exploration of Digital Libraries, Suedl 2012 workshop proce...
Supporting User's Exploration of Digital Libraries, Suedl 2012 workshop proce...
 
Body-Part Nouns and Whole-Part Relations in Portuguese
Body-Part Nouns and Whole-Part Relations in PortugueseBody-Part Nouns and Whole-Part Relations in Portuguese
Body-Part Nouns and Whole-Part Relations in Portuguese
 
The impact of standardized terminologies and domain-ontologies in multilingua...
The impact of standardized terminologies and domain-ontologies in multilingua...The impact of standardized terminologies and domain-ontologies in multilingua...
The impact of standardized terminologies and domain-ontologies in multilingua...
 
Plurilingualism and intercomprehension teaching forum 2012
Plurilingualism and intercomprehension teaching forum 2012Plurilingualism and intercomprehension teaching forum 2012
Plurilingualism and intercomprehension teaching forum 2012
 
Developing corpus-based resources for language learning: looking back in "hope"
Developing corpus-based resources for language learning: looking back in "hope"Developing corpus-based resources for language learning: looking back in "hope"
Developing corpus-based resources for language learning: looking back in "hope"
 
Ivan Derganskyi
Ivan DerganskyiIvan Derganskyi
Ivan Derganskyi
 
1 computational linguistics an introduction
1 computational linguistics   an introduction1 computational linguistics   an introduction
1 computational linguistics an introduction
 

More from Valeria de Paiva

Dialectica Categorical Constructions
Dialectica Categorical ConstructionsDialectica Categorical Constructions
Dialectica Categorical ConstructionsValeria de Paiva
 
Logic & Representation 2021
Logic & Representation 2021Logic & Representation 2021
Logic & Representation 2021Valeria de Paiva
 
Constructive Modal and Linear Logics
Constructive Modal and Linear LogicsConstructive Modal and Linear Logics
Constructive Modal and Linear LogicsValeria de Paiva
 
Dialectica Categories Revisited
Dialectica Categories RevisitedDialectica Categories Revisited
Dialectica Categories RevisitedValeria de Paiva
 
Networked Mathematics: NLP tools for Better Science
Networked Mathematics: NLP tools for Better ScienceNetworked Mathematics: NLP tools for Better Science
Networked Mathematics: NLP tools for Better ScienceValeria de Paiva
 
Going Without: a modality and its role
Going Without: a modality and its roleGoing Without: a modality and its role
Going Without: a modality and its roleValeria de Paiva
 
Problemas de Kolmogorov-Veloso
Problemas de Kolmogorov-VelosoProblemas de Kolmogorov-Veloso
Problemas de Kolmogorov-VelosoValeria de Paiva
 
Natural Language Inference: for Humans and Machines
Natural Language Inference: for Humans and MachinesNatural Language Inference: for Humans and Machines
Natural Language Inference: for Humans and MachinesValeria de Paiva
 
The importance of Being Erneast: Open datasets in Portuguese
The importance of Being Erneast: Open datasets in PortugueseThe importance of Being Erneast: Open datasets in Portuguese
The importance of Being Erneast: Open datasets in PortugueseValeria de Paiva
 
Negation in the Ecumenical System
Negation in the Ecumenical SystemNegation in the Ecumenical System
Negation in the Ecumenical SystemValeria de Paiva
 
Constructive Modal and Linear Logics
Constructive Modal and Linear LogicsConstructive Modal and Linear Logics
Constructive Modal and Linear LogicsValeria de Paiva
 
Semantics and Reasoning for NLP, AI and ACT
Semantics and Reasoning for NLP, AI and ACTSemantics and Reasoning for NLP, AI and ACT
Semantics and Reasoning for NLP, AI and ACTValeria de Paiva
 
Categorical Explicit Substitutions
Categorical Explicit SubstitutionsCategorical Explicit Substitutions
Categorical Explicit SubstitutionsValeria de Paiva
 
Logic and Probabilistic Methods for Dialog
Logic and Probabilistic Methods for DialogLogic and Probabilistic Methods for Dialog
Logic and Probabilistic Methods for DialogValeria de Paiva
 
Intuitive Semantics for Full Intuitionistic Linear Logic (2014)
Intuitive Semantics for Full Intuitionistic Linear Logic (2014)Intuitive Semantics for Full Intuitionistic Linear Logic (2014)
Intuitive Semantics for Full Intuitionistic Linear Logic (2014)Valeria de Paiva
 

More from Valeria de Paiva (20)

Dialectica Comonoids
Dialectica ComonoidsDialectica Comonoids
Dialectica Comonoids
 
Dialectica Categorical Constructions
Dialectica Categorical ConstructionsDialectica Categorical Constructions
Dialectica Categorical Constructions
 
Logic & Representation 2021
Logic & Representation 2021Logic & Representation 2021
Logic & Representation 2021
 
Constructive Modal and Linear Logics
Constructive Modal and Linear LogicsConstructive Modal and Linear Logics
Constructive Modal and Linear Logics
 
Dialectica Categories Revisited
Dialectica Categories RevisitedDialectica Categories Revisited
Dialectica Categories Revisited
 
PLN para Tod@s
PLN para Tod@sPLN para Tod@s
PLN para Tod@s
 
Networked Mathematics: NLP tools for Better Science
Networked Mathematics: NLP tools for Better ScienceNetworked Mathematics: NLP tools for Better Science
Networked Mathematics: NLP tools for Better Science
 
Going Without: a modality and its role
Going Without: a modality and its roleGoing Without: a modality and its role
Going Without: a modality and its role
 
Problemas de Kolmogorov-Veloso
Problemas de Kolmogorov-VelosoProblemas de Kolmogorov-Veloso
Problemas de Kolmogorov-Veloso
 
Natural Language Inference: for Humans and Machines
Natural Language Inference: for Humans and MachinesNatural Language Inference: for Humans and Machines
Natural Language Inference: for Humans and Machines
 
Dialectica Petri Nets
Dialectica Petri NetsDialectica Petri Nets
Dialectica Petri Nets
 
The importance of Being Erneast: Open datasets in Portuguese
The importance of Being Erneast: Open datasets in PortugueseThe importance of Being Erneast: Open datasets in Portuguese
The importance of Being Erneast: Open datasets in Portuguese
 
Negation in the Ecumenical System
Negation in the Ecumenical SystemNegation in the Ecumenical System
Negation in the Ecumenical System
 
Constructive Modal and Linear Logics
Constructive Modal and Linear LogicsConstructive Modal and Linear Logics
Constructive Modal and Linear Logics
 
Semantics and Reasoning for NLP, AI and ACT
Semantics and Reasoning for NLP, AI and ACTSemantics and Reasoning for NLP, AI and ACT
Semantics and Reasoning for NLP, AI and ACT
 
NLCS 2013 opening slides
NLCS 2013 opening slidesNLCS 2013 opening slides
NLCS 2013 opening slides
 
Dialectica Comonads
Dialectica ComonadsDialectica Comonads
Dialectica Comonads
 
Categorical Explicit Substitutions
Categorical Explicit SubstitutionsCategorical Explicit Substitutions
Categorical Explicit Substitutions
 
Logic and Probabilistic Methods for Dialog
Logic and Probabilistic Methods for DialogLogic and Probabilistic Methods for Dialog
Logic and Probabilistic Methods for Dialog
 
Intuitive Semantics for Full Intuitionistic Linear Logic (2014)
Intuitive Semantics for Full Intuitionistic Linear Logic (2014)Intuitive Semantics for Full Intuitionistic Linear Logic (2014)
Intuitive Semantics for Full Intuitionistic Linear Logic (2014)
 

Recently uploaded

New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024BookNet Canada
 
Build your next Gen AI Breakthrough - April 2024
Build your next Gen AI Breakthrough - April 2024Build your next Gen AI Breakthrough - April 2024
Build your next Gen AI Breakthrough - April 2024Neo4j
 
Benefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other FrameworksBenefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other FrameworksSoftradix Technologies
 
Bluetooth Controlled Car with Arduino.pdf
Bluetooth Controlled Car with Arduino.pdfBluetooth Controlled Car with Arduino.pdf
Bluetooth Controlled Car with Arduino.pdfngoud9212
 
Science&tech:THE INFORMATION AGE STS.pdf
Science&tech:THE INFORMATION AGE STS.pdfScience&tech:THE INFORMATION AGE STS.pdf
Science&tech:THE INFORMATION AGE STS.pdfjimielynbastida
 
Artificial intelligence in the post-deep learning era
Artificial intelligence in the post-deep learning eraArtificial intelligence in the post-deep learning era
Artificial intelligence in the post-deep learning eraDeakin University
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationRidwan Fadjar
 
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...shyamraj55
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Mattias Andersson
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubKalema Edgar
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesSinan KOZAK
 
Pigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping ElbowsPigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping ElbowsPigging Solutions
 
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticscarlostorres15106
 
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024BookNet Canada
 
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsAndrey Dotsenko
 
Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Enterprise Knowledge
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):comworks
 
Pigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food ManufacturingPigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food ManufacturingPigging Solutions
 

Recently uploaded (20)

New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
 
Build your next Gen AI Breakthrough - April 2024
Build your next Gen AI Breakthrough - April 2024Build your next Gen AI Breakthrough - April 2024
Build your next Gen AI Breakthrough - April 2024
 
Benefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other FrameworksBenefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other Frameworks
 
Bluetooth Controlled Car with Arduino.pdf
Bluetooth Controlled Car with Arduino.pdfBluetooth Controlled Car with Arduino.pdf
Bluetooth Controlled Car with Arduino.pdf
 
Science&tech:THE INFORMATION AGE STS.pdf
Science&tech:THE INFORMATION AGE STS.pdfScience&tech:THE INFORMATION AGE STS.pdf
Science&tech:THE INFORMATION AGE STS.pdf
 
Artificial intelligence in the post-deep learning era
Artificial intelligence in the post-deep learning eraArtificial intelligence in the post-deep learning era
Artificial intelligence in the post-deep learning era
 
DMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special EditionDMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special Edition
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 Presentation
 
Hot Sexy call girls in Panjabi Bagh 🔝 9953056974 🔝 Delhi escort Service
Hot Sexy call girls in Panjabi Bagh 🔝 9953056974 🔝 Delhi escort ServiceHot Sexy call girls in Panjabi Bagh 🔝 9953056974 🔝 Delhi escort Service
Hot Sexy call girls in Panjabi Bagh 🔝 9953056974 🔝 Delhi escort Service
 
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding Club
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen Frames
 
Pigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping ElbowsPigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping Elbows
 
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
 
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
 
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
 
Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):
 
Pigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food ManufacturingPigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food Manufacturing
 

Portuguese NLP Tools and Resources for CPDOC/FGV

  • 1. PORTUGUESE NLP FOR CPDOC/FGV? PARGRAM/PARSEM PROJECT ANNOUNCEMENT Valeria de Paiva October 2010
  • 2. Fundacao Getulio Vargas (FGV) “Fundação Getulio Vargas (FGV) is a Brazilian higher education institution founded in December 20, 1944. It offers regular courses of Economics, Business Administration, Law, Social Sciences and Information technology management. Its original goal was to train people for the country's public- and private- sector management.[…] It is considered by Foreign Policy magazine to be a top-5 "policymaker think-tank" worldwide.”
  • 3. CPDOC Centro de Pesquisa e Documentação de História Contemporânea do Brasil (Contemporary Brazilian History Research and Documentation Center) is a private higher education institution founded in 1973 linked to the FGV.
  • 4. CPDOC Originally: To house personal archives of public figures Develop historic research using privileged archive Documentation & Research
  • 5. CPDOC Now: Graduate program on History, Political Science and Cultural Artifacts (since 1974) School of Social Sciences and History of the FGV (since 2005)
  • 6. CPDOC   two centers Rio/SP   Research:   Núcleo de Pesquisa Social Aplicada   Centro de Relações Internacionais   Archives:   Programa de Arquivos Pessoais (PAP) program of personal archives   Programa de História Oral (PHO) program of oral history
  • 7. FGV School of Applied Math   In the process of being created…   Experts on image processing, signal/ sound processing   Not much on textual processing   Possibility of impact…
  • 8. Portuguese in PARGRAM/PARSEM?   Alexandre Rademaker, Ph D, PUC-Rio, Mar2010   Work on Automating Description Logics   A Bridge-like system for Portuguese?
  • 9. Bridge Layered Architecture XLE/LFG Pars ing Inference Text KR Mapping Engine F-structure Transfer semantics AKR Sources Assertions Question Query Unified Lexicon LFG Term rewriting ECD XLE KR mapping rules Textual Inference MaxEnt models Factives,Deverbals logics Basic idea: canonicalization of meanings
  • 10. Portuguese NLP?   Lots of Homework to do…   STIL (Simpósio de Tecnologia da Informação e Linguagem Humana) 8th ed   PROPOR (bienally, 9th edition)   Linguistica de corpus (7a edition)   Um panorama do Núcleo Interinstitucional de Linguística Computacional às vésperas de sua maioridade Maria das Graças V. Nunes, Sandra M. Aluisio, Thiago A. S. Pardo NILC – ICMC – Universidade de São Paulo São Carlos – SP, Brasil Junho de 2010   Pardo, T.A.S.; Gasperin, C.V.; Caseli, H.M.; Nunes. M.G.V. (2010). Computational Linguistics in Brazil: An Overview. In the Proceedings of the NAACL-HLT Young Investigators Workshop on Computational Approaches to Languages of the Americas, pp. 1-7. June 1-6, Los Angeles, CA/USA. Pdf   SBC special interest group in NLP, 2007
  • 11. Challenges from STIL09 survey   Lack of large and robust language resources   Lack of formal models for linguistic description and analysis of Portuguese   Difficulty in attracting students and researchers   Lack of multidisciplinary collaboration   CL/NLP marginalization in both Computer Science and Linguistics.   Poor interaction between universities and industry   Insufficient funding.
  • 12. NILC’s view 1   Recently more robust semantic tools like TeP 2.016, Wordnet.PT17, and MWN.PT18, as well as named entities recognizers, e.g., REMBRANDT19.   Topics: text summarization, machine translation, text simplification, automatic discourse analysis, coreference and anaphora resolution, information retrieval, text mining, terminology/lexicon research, ontologies and semantic tagging, and corpus linguistics.   The largest CL/NLP research group in Brazil is NILC (Interinstitutional Center for Research and Development in Computational Linguistics) USP, UFSCar and UNESP
  • 13. NILC’s view 2?   Portuguese has got state of the art tools (as POS taggers and syntactic parsers) and comprehensive corpora of contemporary written language BUT   still needs resources for particular applications   Portuguese syntactic parsers (which are considered basic NLP tools) and wordnet-like resources are still too limited.   CHARLA 2008 ?   contests/conferences such as TAC33, Senseval/ SemEval34, and TREC35, among others, might make Portuguese/Spanish datasets available, as CLEF36 has done.
  • 14. Some detail on CPDOC data   Brazilian Biographic-Historic Dictionary (Dicionario Historico-Biografico Brasileiro DHBB)   First edition (printed) 1984:   4.400 entries / 4 volumes.   Second edition (printed and cd-rom) 2001:   6.626 entries / 5 volumes or cd-rom.   Third edition (online) 2010:   7.553 entries / online
  • 15. CPDOC Online Portal   Since 2009, search tool to access CPDOC’s archives   Documents of personal archives, photos, interviews from the Oral History Project and entries of the DHBB.   http://www.fgv.br/cpdoc/busca
  • 16. User Profile   Undergrads 53,68%   Grad studies (lato sensu) 13,18%   Masters 11,50%   High school 11,07%   Doctoral cand 5,72%
  • 17. User Profile/Goal of research   Artigo acadêmico 17,97%   Curiosidade histórica 14,61%   Monografia (graduação) 11,74%   Dissertação (mestrado) 11,39%   Pesquisa escolar 8,52%   Tese (doutorado) 7,57% Filme/vídeo (3,13 %) aparecem em penúltimo lugar, e matéria de imprensa (2,29%) em último.
  • 18. User Profile - media   Kind of media accessed:   audiovisual 38,21%   textual 33,51%   printed matter 16,33%   all of the above 11,95%
  • 21. Revista Estudos Históricos   Since 1988, half yearly   Multidisciplinar scientific magazine   45 editions so far 2010   All articles available online   http://virtualbib.fgv.br/ojs/index.php/reh/index
  • 22. ThToanks! Thanks!