SlideShare a Scribd company logo
1 of 11
Project IDI
David I Widjaja
Steps
 Data Extraction
 Tagging
 Correlation
 Web Scraping
 Comparison
 Documentation
Data Extraction
 How to get the data?
 Input from database
 Input manually
 Data type:
 Topics that is made of strings
Tagging
 Prerequisite:
 Topic Sentences (Subject)
 Dictionary (Tags)
Dictionary
 How to create tags:
1. Get all topic sentences and split them between white space
2. Convert all words into lower case
3. Delete all numeric and duplicate values
4. Sort words alphabetically
5. Delete unnecessary words (e.g. is, the, and, etc.)
6. Search for synonym words and cluster them into a single tag
7. Translate words if necessary
8. Insert tags into main spreadsheet
Correlation
 A weighted graph map is used:
 The larger the amount of word associated
with the tag, the bigger the bubble.
 Lines get thicker according to the number
of relationship between topics.
Web Scraping
 Web Scraping on other similar websites
 Take the topic sentences to be in the
subject columns. Examples:
 Article Titles
 Comments
 Etc.
 Copy to previous spreadsheet (The one with
the pervious tags).
Correlation
 Do the same process as before on
the weighted graph map
Comparison
 Compare the two weighted graph maps
Word Cloud
 Generate Word Cloud using Python or online tools.
e.g.
Tools
 Microsoft Excel 2013 (Spreadsheet)
 Mozilla Firefox (Browser)
 Inspect Element (Search Patterns)
 DownThemAll (Download HTMLs)
 Total Commander (Merge HTMLs)
 Notepad++ (Cleanse Data)

More Related Content

What's hot

Dataset reuse: An analysis of references in community discussions, publicatio...
Dataset reuse: An analysis of references in community discussions, publicatio...Dataset reuse: An analysis of references in community discussions, publicatio...
Dataset reuse: An analysis of references in community discussions, publicatio...Kemele M. Endris
 
IS411 Research Training 2011
IS411 Research Training 2011IS411 Research Training 2011
IS411 Research Training 2011weixiasmu
 
Business Journalism Professors 2014: Excel for Journalists by Steve Doig
Business Journalism Professors 2014: Excel for Journalists by Steve DoigBusiness Journalism Professors 2014: Excel for Journalists by Steve Doig
Business Journalism Professors 2014: Excel for Journalists by Steve DoigReynolds Center for Business Journalism
 
3 3 integrate, check and share documents
3 3 integrate, check and share documents3 3 integrate, check and share documents
3 3 integrate, check and share documentsQondileRamokgadi
 
Using spreadsheets in the classroom
Using spreadsheets in the classroomUsing spreadsheets in the classroom
Using spreadsheets in the classroomPatricia McCauley
 
Database Management System
Database Management SystemDatabase Management System
Database Management SystemMuhd Dembo
 
The future of scholarly communications professionals
The future of scholarly communications professionalsThe future of scholarly communications professionals
The future of scholarly communications professionalsNancy Pontika
 
CSPro Workshop P-3
CSPro Workshop P-3CSPro Workshop P-3
CSPro Workshop P-3prabhustat
 
spreadsheet program
spreadsheet programspreadsheet program
spreadsheet programsamina khan
 
Applications: Word-Processing, Spreadsheet & Database
Applications: Word-Processing, Spreadsheet & DatabaseApplications: Word-Processing, Spreadsheet & Database
Applications: Word-Processing, Spreadsheet & DatabaseAlaa Sadik
 
Talis Insight Europe 2017 - Using Talis data with other datasets - Tim Hodson
Talis Insight Europe 2017 - Using Talis data with other datasets - Tim HodsonTalis Insight Europe 2017 - Using Talis data with other datasets - Tim Hodson
Talis Insight Europe 2017 - Using Talis data with other datasets - Tim HodsonTalis
 

What's hot (18)

Dataset reuse: An analysis of references in community discussions, publicatio...
Dataset reuse: An analysis of references in community discussions, publicatio...Dataset reuse: An analysis of references in community discussions, publicatio...
Dataset reuse: An analysis of references in community discussions, publicatio...
 
IS411 Research Training 2011
IS411 Research Training 2011IS411 Research Training 2011
IS411 Research Training 2011
 
Session 1
Session 1Session 1
Session 1
 
Business Journalism Professors 2014: Excel for Journalists by Steve Doig
Business Journalism Professors 2014: Excel for Journalists by Steve DoigBusiness Journalism Professors 2014: Excel for Journalists by Steve Doig
Business Journalism Professors 2014: Excel for Journalists by Steve Doig
 
3 3 integrate, check and share documents
3 3 integrate, check and share documents3 3 integrate, check and share documents
3 3 integrate, check and share documents
 
Using spreadsheets in the classroom
Using spreadsheets in the classroomUsing spreadsheets in the classroom
Using spreadsheets in the classroom
 
Database Management System
Database Management SystemDatabase Management System
Database Management System
 
Data Modelling QlikView
Data Modelling QlikViewData Modelling QlikView
Data Modelling QlikView
 
The future of scholarly communications professionals
The future of scholarly communications professionalsThe future of scholarly communications professionals
The future of scholarly communications professionals
 
Graph
GraphGraph
Graph
 
Data modelling qlik view
Data modelling qlik viewData modelling qlik view
Data modelling qlik view
 
Endnote
EndnoteEndnote
Endnote
 
CSPro Workshop P-3
CSPro Workshop P-3CSPro Workshop P-3
CSPro Workshop P-3
 
spreadsheet program
spreadsheet programspreadsheet program
spreadsheet program
 
Applications: Word-Processing, Spreadsheet & Database
Applications: Word-Processing, Spreadsheet & DatabaseApplications: Word-Processing, Spreadsheet & Database
Applications: Word-Processing, Spreadsheet & Database
 
Mail merge
Mail mergeMail merge
Mail merge
 
Talis Insight Europe 2017 - Using Talis data with other datasets - Tim Hodson
Talis Insight Europe 2017 - Using Talis data with other datasets - Tim HodsonTalis Insight Europe 2017 - Using Talis data with other datasets - Tim Hodson
Talis Insight Europe 2017 - Using Talis data with other datasets - Tim Hodson
 
What is a spreadsheet
What is a spreadsheetWhat is a spreadsheet
What is a spreadsheet
 

Viewers also liked

ADVTS DESIGNED BY MR SINHA
ADVTS DESIGNED BY MR SINHAADVTS DESIGNED BY MR SINHA
ADVTS DESIGNED BY MR SINHASunil Sinha
 
Arizuma tradezone private limited
Arizuma tradezone private limitedArizuma tradezone private limited
Arizuma tradezone private limitedNayan Singh
 
Creating Discounts & Promotions with Hitachi Solutions Ecommerce
Creating Discounts & Promotions with Hitachi Solutions EcommerceCreating Discounts & Promotions with Hitachi Solutions Ecommerce
Creating Discounts & Promotions with Hitachi Solutions EcommerceHitachi Solutions America, Ltd.
 
Projeto integrador Historia da Computação Grupo 5
Projeto integrador Historia da Computação Grupo 5Projeto integrador Historia da Computação Grupo 5
Projeto integrador Historia da Computação Grupo 5Bernardo Citelis
 
Hitachi Solutions Ecommerce Integration with Dynamics CRM 2013
Hitachi Solutions Ecommerce Integration with Dynamics CRM 2013Hitachi Solutions Ecommerce Integration with Dynamics CRM 2013
Hitachi Solutions Ecommerce Integration with Dynamics CRM 2013Hitachi Solutions America, Ltd.
 

Viewers also liked (20)

Seguridad ciudadana
Seguridad ciudadanaSeguridad ciudadana
Seguridad ciudadana
 
Manejo de seguridad en internet (13)
Manejo de seguridad en internet (13)Manejo de seguridad en internet (13)
Manejo de seguridad en internet (13)
 
ADVTS DESIGNED BY MR SINHA
ADVTS DESIGNED BY MR SINHAADVTS DESIGNED BY MR SINHA
ADVTS DESIGNED BY MR SINHA
 
Arizuma tradezone private limited
Arizuma tradezone private limitedArizuma tradezone private limited
Arizuma tradezone private limited
 
Manage your sales with Hitachi Solutions Ecommerce
Manage your sales with Hitachi Solutions EcommerceManage your sales with Hitachi Solutions Ecommerce
Manage your sales with Hitachi Solutions Ecommerce
 
Creating Discounts & Promotions with Hitachi Solutions Ecommerce
Creating Discounts & Promotions with Hitachi Solutions EcommerceCreating Discounts & Promotions with Hitachi Solutions Ecommerce
Creating Discounts & Promotions with Hitachi Solutions Ecommerce
 
uso de internet
uso de internetuso de internet
uso de internet
 
Manejo del internet montaño
Manejo del internet montañoManejo del internet montaño
Manejo del internet montaño
 
Gift Certificates with Hitachi Solutions Ecommerce
Gift Certificates with Hitachi Solutions EcommerceGift Certificates with Hitachi Solutions Ecommerce
Gift Certificates with Hitachi Solutions Ecommerce
 
Jaquelinne yoanna ruizachury_actividad4
Jaquelinne yoanna ruizachury_actividad4Jaquelinne yoanna ruizachury_actividad4
Jaquelinne yoanna ruizachury_actividad4
 
Ceramicos
CeramicosCeramicos
Ceramicos
 
Projeto integrador Historia da Computação Grupo 5
Projeto integrador Historia da Computação Grupo 5Projeto integrador Historia da Computação Grupo 5
Projeto integrador Historia da Computação Grupo 5
 
Hitachi Solutions Ecommerce Integration with Dynamics CRM 2013
Hitachi Solutions Ecommerce Integration with Dynamics CRM 2013Hitachi Solutions Ecommerce Integration with Dynamics CRM 2013
Hitachi Solutions Ecommerce Integration with Dynamics CRM 2013
 
Jaquelinne yoannaruizachury actividad1_2mapac.pdf
Jaquelinne yoannaruizachury actividad1_2mapac.pdfJaquelinne yoannaruizachury actividad1_2mapac.pdf
Jaquelinne yoannaruizachury actividad1_2mapac.pdf
 
Emails in Hitachi Solutions Ecommerce
Emails in Hitachi Solutions EcommerceEmails in Hitachi Solutions Ecommerce
Emails in Hitachi Solutions Ecommerce
 
Security Testing Report Hitachi Application Q1 Sep 2015
Security Testing Report Hitachi Application Q1 Sep 2015Security Testing Report Hitachi Application Q1 Sep 2015
Security Testing Report Hitachi Application Q1 Sep 2015
 
El buen manejo del internet
El buen manejo del internetEl buen manejo del internet
El buen manejo del internet
 
Configure taxes in Hitachi Solutions Ecommerce
Configure taxes in Hitachi Solutions EcommerceConfigure taxes in Hitachi Solutions Ecommerce
Configure taxes in Hitachi Solutions Ecommerce
 
Manejo de seguridad en internet (15)
Manejo de seguridad en internet (15)Manejo de seguridad en internet (15)
Manejo de seguridad en internet (15)
 
SKU pricing in Hitachi Solutions Ecommerce
SKU pricing in Hitachi Solutions EcommerceSKU pricing in Hitachi Solutions Ecommerce
SKU pricing in Hitachi Solutions Ecommerce
 

Similar to Project IDI PPT

Mail Merge - the basics
Mail Merge - the basicsMail Merge - the basics
Mail Merge - the basicskprentice
 
microsoft_access_working_with_forms_and_generating_reports__107.ppt
microsoft_access_working_with_forms_and_generating_reports__107.pptmicrosoft_access_working_with_forms_and_generating_reports__107.ppt
microsoft_access_working_with_forms_and_generating_reports__107.pptKunle Faseyi
 
Open Office Writer : Level2
Open Office Writer : Level2 Open Office Writer : Level2
Open Office Writer : Level2 thinkict
 
Document databases
Document databasesDocument databases
Document databasesQframe
 
Submission ChecklistBefore submitting your graded project, make .docx
Submission ChecklistBefore submitting your graded project, make .docxSubmission ChecklistBefore submitting your graded project, make .docx
Submission ChecklistBefore submitting your graded project, make .docxdavid4611
 
Survey of Generative Clustering Models 2008
Survey of Generative Clustering Models 2008Survey of Generative Clustering Models 2008
Survey of Generative Clustering Models 2008Roman Stanchak
 
CSC388 Online Programming Languages Homework 3 (due b.docx
CSC388 Online Programming Languages  Homework 3 (due b.docxCSC388 Online Programming Languages  Homework 3 (due b.docx
CSC388 Online Programming Languages Homework 3 (due b.docxannettsparrow
 
Week 1 Assignment InstructionsGOAL Create the initial element o.docx
Week 1 Assignment InstructionsGOAL Create the initial element o.docxWeek 1 Assignment InstructionsGOAL Create the initial element o.docx
Week 1 Assignment InstructionsGOAL Create the initial element o.docxjessiehampson
 
Less07 2 e_testermodule_6
Less07 2 e_testermodule_6Less07 2 e_testermodule_6
Less07 2 e_testermodule_6Suresh Mishra
 
MS Access Ch 2 PPT
MS Access Ch 2 PPTMS Access Ch 2 PPT
MS Access Ch 2 PPTprsmith72
 
Creating and editing a database
Creating and editing a databaseCreating and editing a database
Creating and editing a databasecrystalpullen
 
WK8_A2 OverviewAssignment 2 Excelling with ExcelDue Week 8 an.docx
WK8_A2 OverviewAssignment 2 Excelling with ExcelDue Week 8 an.docxWK8_A2 OverviewAssignment 2 Excelling with ExcelDue Week 8 an.docx
WK8_A2 OverviewAssignment 2 Excelling with ExcelDue Week 8 an.docxambersalomon88660
 
Searching Repositories of Web Application Models
Searching Repositories of Web Application ModelsSearching Repositories of Web Application Models
Searching Repositories of Web Application ModelsMarco Brambilla
 
RPE - Template formating, style and stylesheet usage
RPE - Template formating, style and stylesheet usageRPE - Template formating, style and stylesheet usage
RPE - Template formating, style and stylesheet usageGEBS Reporting
 

Similar to Project IDI PPT (20)

Mail Merge - the basics
Mail Merge - the basicsMail Merge - the basics
Mail Merge - the basics
 
Presentation
PresentationPresentation
Presentation
 
microsoft_access_working_with_forms_and_generating_reports__107.ppt
microsoft_access_working_with_forms_and_generating_reports__107.pptmicrosoft_access_working_with_forms_and_generating_reports__107.ppt
microsoft_access_working_with_forms_and_generating_reports__107.ppt
 
Open Office Writer : Level2
Open Office Writer : Level2 Open Office Writer : Level2
Open Office Writer : Level2
 
Document databases
Document databasesDocument databases
Document databases
 
Submission ChecklistBefore submitting your graded project, make .docx
Submission ChecklistBefore submitting your graded project, make .docxSubmission ChecklistBefore submitting your graded project, make .docx
Submission ChecklistBefore submitting your graded project, make .docx
 
Unit08 dbms
Unit08 dbmsUnit08 dbms
Unit08 dbms
 
The Duet model
The Duet modelThe Duet model
The Duet model
 
Survey of Generative Clustering Models 2008
Survey of Generative Clustering Models 2008Survey of Generative Clustering Models 2008
Survey of Generative Clustering Models 2008
 
CSC388 Online Programming Languages Homework 3 (due b.docx
CSC388 Online Programming Languages  Homework 3 (due b.docxCSC388 Online Programming Languages  Homework 3 (due b.docx
CSC388 Online Programming Languages Homework 3 (due b.docx
 
Week 1 Assignment InstructionsGOAL Create the initial element o.docx
Week 1 Assignment InstructionsGOAL Create the initial element o.docxWeek 1 Assignment InstructionsGOAL Create the initial element o.docx
Week 1 Assignment InstructionsGOAL Create the initial element o.docx
 
Less07 2 e_testermodule_6
Less07 2 e_testermodule_6Less07 2 e_testermodule_6
Less07 2 e_testermodule_6
 
MS Access Ch 2 PPT
MS Access Ch 2 PPTMS Access Ch 2 PPT
MS Access Ch 2 PPT
 
Creating and editing a database
Creating and editing a databaseCreating and editing a database
Creating and editing a database
 
WK8_A2 OverviewAssignment 2 Excelling with ExcelDue Week 8 an.docx
WK8_A2 OverviewAssignment 2 Excelling with ExcelDue Week 8 an.docxWK8_A2 OverviewAssignment 2 Excelling with ExcelDue Week 8 an.docx
WK8_A2 OverviewAssignment 2 Excelling with ExcelDue Week 8 an.docx
 
Searching Repositories of Web Application Models
Searching Repositories of Web Application ModelsSearching Repositories of Web Application Models
Searching Repositories of Web Application Models
 
RPE - Template formating, style and stylesheet usage
RPE - Template formating, style and stylesheet usageRPE - Template formating, style and stylesheet usage
RPE - Template formating, style and stylesheet usage
 
06 ms office
06 ms office06 ms office
06 ms office
 
Illustrated Code (ASE 2021)
Illustrated Code (ASE 2021)Illustrated Code (ASE 2021)
Illustrated Code (ASE 2021)
 
Week 2-intro-html
Week 2-intro-htmlWeek 2-intro-html
Week 2-intro-html
 

Project IDI PPT

  • 2. Steps  Data Extraction  Tagging  Correlation  Web Scraping  Comparison  Documentation
  • 3. Data Extraction  How to get the data?  Input from database  Input manually  Data type:  Topics that is made of strings
  • 4. Tagging  Prerequisite:  Topic Sentences (Subject)  Dictionary (Tags)
  • 5. Dictionary  How to create tags: 1. Get all topic sentences and split them between white space 2. Convert all words into lower case 3. Delete all numeric and duplicate values 4. Sort words alphabetically 5. Delete unnecessary words (e.g. is, the, and, etc.) 6. Search for synonym words and cluster them into a single tag 7. Translate words if necessary 8. Insert tags into main spreadsheet
  • 6. Correlation  A weighted graph map is used:  The larger the amount of word associated with the tag, the bigger the bubble.  Lines get thicker according to the number of relationship between topics.
  • 7. Web Scraping  Web Scraping on other similar websites  Take the topic sentences to be in the subject columns. Examples:  Article Titles  Comments  Etc.  Copy to previous spreadsheet (The one with the pervious tags).
  • 8. Correlation  Do the same process as before on the weighted graph map
  • 9. Comparison  Compare the two weighted graph maps
  • 10. Word Cloud  Generate Word Cloud using Python or online tools. e.g.
  • 11. Tools  Microsoft Excel 2013 (Spreadsheet)  Mozilla Firefox (Browser)  Inspect Element (Search Patterns)  DownThemAll (Download HTMLs)  Total Commander (Merge HTMLs)  Notepad++ (Cleanse Data)