SlideShare a Scribd company logo
1 of 14
Download to read offline
Získáváme, čistíme a
ukládáme data
Digital Humanities, Lekce druhá
Josef Šlerka, Studia nových médií, 15. 10. 2012
ETL (light verze)
Extracting data from outside sources
Transforming it to fit operational needs (which can
include quality levels)
Loading it into the end target (database, more
specifically, operational data store, data mart or data
warehouse)
(viz Wikipedie)
Real-life podle Wiki
1. Cycle initiation
2. Build reference data
3. Extract (from sources)
4. Validate
5. Transform (clean, apply business rules, check for
data integrity, create aggregates or disaggregates)
6. Stage (load into staging tables, if used)
Real-life podle Wiki

7. Audit reports (for example, on compliance with
business rules. Also, in case of failure, helps to
diagnose/repair)
8. Publish (to target tables)
9. Archive
10. Clean up
Extracting
co se vám bude hodit...
Extract
strukturovaná data vs nestrukturovaná
pro DH nejčastěji databáze vs web
web API vs scrapping
lze si vystačit i jen malým znalostmi
statická data vs real-time mohou být zákeřná, ale jde
to řešit
XPATH

XPath, the XML Path Language, is a query language
for selecting nodes from an XML document. In
addition, XPath may be used to compute values (e.g.,
strings, numbers, or Boolean values) from the content
of an XML document. XPath was defined by the World
Wide Web Consortium (W3C)
Jednoduché nástroje
Google Docs (hlavně statická data)
http://drive.google.com
YQL (hlavně statická data)
http://developer.yahoo.com/yql/console/
Yahoo Pipes (hlavně dynamická data)
http://pipes.yahoo.com/pipes/
IFTTT (hlavně dynamická data)
https://ifttt.com/
Ale mocné....


Twitter Archiving Google Spreadsheet TAGS v3
http://mashe.hawksey.info/2012/01/twitter-archive-
tagsv3/
Transforming
Hlavně o čištění a sjednocování dat ...
Google Refine
http://code.google.com/p/google-refine/downloads/list?
can=1
Google Refine is a standalone desktop application
provided by Google for data cleanup and
transformation to other formats. It is similar to
spreadsheet applications (and can work with
spreadsheet file formats), however acts more like
database.
Loading
kam s nimi, když ne do tradiční databáze...
Google Fusion Tables
jednoduché řešení pro běžné uživatele
http://www.google.com/fusiontables/Home/
Web service provided by Google for data
management. Data is stored in multiple tables that
Internet users can view and download. The Web
service provides means for visualizing data with pie
charts, bar charts, lineplots, scatterplots, timelines as
well as geographical maps. Data is exported in a
comma-separated values file format.
A teď ještě jedno
demo....

More Related Content

What's hot

Visualizing Austin's data with Elasticsearch and Kibana
Visualizing Austin's data with Elasticsearch and KibanaVisualizing Austin's data with Elasticsearch and Kibana
Visualizing Austin's data with Elasticsearch and KibanaObjectRocket
 
Ehr.care system introduction
Ehr.care system introduction Ehr.care system introduction
Ehr.care system introduction xudong_lu
 
Data Enthusiasts London: Scalable and Interoperable data services. Applied to...
Data Enthusiasts London: Scalable and Interoperable data services. Applied to...Data Enthusiasts London: Scalable and Interoperable data services. Applied to...
Data Enthusiasts London: Scalable and Interoperable data services. Applied to...Andy Petrella
 
Packages for data wrangling データ前処理のためのパッケージ
Packages for data wrangling データ前処理のためのパッケージPackages for data wrangling データ前処理のためのパッケージ
Packages for data wrangling データ前処理のためのパッケージHiroki K
 
Preparing for BIT – IT2301 Database Management Systems 2001e
Preparing for BIT – IT2301 Database Management Systems 2001ePreparing for BIT – IT2301 Database Management Systems 2001e
Preparing for BIT – IT2301 Database Management Systems 2001eGihan Wikramanayake
 
R programming lab 2 - jupyter notebook
R programming lab   2 - jupyter notebookR programming lab   2 - jupyter notebook
R programming lab 2 - jupyter notebookAshwini Mathur
 

What's hot (6)

Visualizing Austin's data with Elasticsearch and Kibana
Visualizing Austin's data with Elasticsearch and KibanaVisualizing Austin's data with Elasticsearch and Kibana
Visualizing Austin's data with Elasticsearch and Kibana
 
Ehr.care system introduction
Ehr.care system introduction Ehr.care system introduction
Ehr.care system introduction
 
Data Enthusiasts London: Scalable and Interoperable data services. Applied to...
Data Enthusiasts London: Scalable and Interoperable data services. Applied to...Data Enthusiasts London: Scalable and Interoperable data services. Applied to...
Data Enthusiasts London: Scalable and Interoperable data services. Applied to...
 
Packages for data wrangling データ前処理のためのパッケージ
Packages for data wrangling データ前処理のためのパッケージPackages for data wrangling データ前処理のためのパッケージ
Packages for data wrangling データ前処理のためのパッケージ
 
Preparing for BIT – IT2301 Database Management Systems 2001e
Preparing for BIT – IT2301 Database Management Systems 2001ePreparing for BIT – IT2301 Database Management Systems 2001e
Preparing for BIT – IT2301 Database Management Systems 2001e
 
R programming lab 2 - jupyter notebook
R programming lab   2 - jupyter notebookR programming lab   2 - jupyter notebook
R programming lab 2 - jupyter notebook
 

Viewers also liked

Věštění (s) Wikipedií
Věštění (s) WikipediíVěštění (s) Wikipedií
Věštění (s) WikipediíJosef Šlerka
 
Několik čísel na téma děti (?) a internet (?)
Několik čísel na téma děti (?) a internet (?)Několik čísel na téma děti (?) a internet (?)
Několik čísel na téma děti (?) a internet (?)Josef Šlerka
 
Hranice se stírají
Hranice se stírajíHranice se stírají
Hranice se stírajíJosef Šlerka
 
Strojová cesta do zákazníkovy duše
Strojová cesta do zákazníkovy dušeStrojová cesta do zákazníkovy duše
Strojová cesta do zákazníkovy dušeJosef Šlerka
 
Český a slovenský Twitter pod lupou
Český a slovenský Twitter pod lupouČeský a slovenský Twitter pod lupou
Český a slovenský Twitter pod lupouJosef Šlerka
 
Každý bude jiný... Bohužel...
Každý bude jiný... Bohužel...Každý bude jiný... Bohužel...
Každý bude jiný... Bohužel...Josef Šlerka
 
The Art of Trolling 2.0 For Dummies
The Art of Trolling 2.0 For DummiesThe Art of Trolling 2.0 For Dummies
The Art of Trolling 2.0 For DummiesJosef Šlerka
 
Ways to understand fans - social network analysis
Ways to understand fans - social network analysisWays to understand fans - social network analysis
Ways to understand fans - social network analysisJosef Šlerka
 
The Art of Trolling 2.0
The Art of Trolling 2.0The Art of Trolling 2.0
The Art of Trolling 2.0Josef Šlerka
 
All about Facebook? All about you!
All about Facebook? All about you!All about Facebook? All about you!
All about Facebook? All about you!Josef Šlerka
 
Malý velký svět bublin na Facebooku
Malý velký svět bublin na FacebookuMalý velký svět bublin na Facebooku
Malý velký svět bublin na FacebookuJosef Šlerka
 
Úvod do studia nových médií
Úvod do studia nových médiíÚvod do studia nových médií
Úvod do studia nových médiíJosef Šlerka
 
Některé obecně rozšířené mýty o Facebooku
Některé obecně rozšířené  mýty o FacebookuNěkteré obecně rozšířené  mýty o Facebooku
Některé obecně rozšířené mýty o FacebookuJosef Šlerka
 

Viewers also liked (18)

Věštění (s) Wikipedií
Věštění (s) WikipediíVěštění (s) Wikipedií
Věštění (s) Wikipedií
 
Social Insider
Social InsiderSocial Insider
Social Insider
 
Několik čísel na téma děti (?) a internet (?)
Několik čísel na téma děti (?) a internet (?)Několik čísel na téma děti (?) a internet (?)
Několik čísel na téma děti (?) a internet (?)
 
Internet of things
Internet of thingsInternet of things
Internet of things
 
Hranice se stírají
Hranice se stírajíHranice se stírají
Hranice se stírají
 
Strojová cesta do zákazníkovy duše
Strojová cesta do zákazníkovy dušeStrojová cesta do zákazníkovy duše
Strojová cesta do zákazníkovy duše
 
Český a slovenský Twitter pod lupou
Český a slovenský Twitter pod lupouČeský a slovenský Twitter pod lupou
Český a slovenský Twitter pod lupou
 
Last.fm
Last.fmLast.fm
Last.fm
 
Každý bude jiný... Bohužel...
Každý bude jiný... Bohužel...Každý bude jiný... Bohužel...
Každý bude jiný... Bohužel...
 
The Art of Trolling 2.0 For Dummies
The Art of Trolling 2.0 For DummiesThe Art of Trolling 2.0 For Dummies
The Art of Trolling 2.0 For Dummies
 
Ways to understand fans - social network analysis
Ways to understand fans - social network analysisWays to understand fans - social network analysis
Ways to understand fans - social network analysis
 
Shall we dance
Shall we danceShall we dance
Shall we dance
 
The Art of Trolling 2.0
The Art of Trolling 2.0The Art of Trolling 2.0
The Art of Trolling 2.0
 
All about Facebook? All about you!
All about Facebook? All about you!All about Facebook? All about you!
All about Facebook? All about you!
 
Just metadata
Just metadataJust metadata
Just metadata
 
Malý velký svět bublin na Facebooku
Malý velký svět bublin na FacebookuMalý velký svět bublin na Facebooku
Malý velký svět bublin na Facebooku
 
Úvod do studia nových médií
Úvod do studia nových médiíÚvod do studia nových médií
Úvod do studia nových médií
 
Některé obecně rozšířené mýty o Facebooku
Některé obecně rozšířené  mýty o FacebookuNěkteré obecně rozšířené  mýty o Facebooku
Některé obecně rozšířené mýty o Facebooku
 

Similar to Získáváme, čistíme a ukládáme data

Big Data .. Are you ready for the next wave?
Big Data .. Are you ready for the next wave?Big Data .. Are you ready for the next wave?
Big Data .. Are you ready for the next wave?Mahmoud Sabri
 
Populate your Search index, NEST 2016-01
Populate your Search index, NEST 2016-01Populate your Search index, NEST 2016-01
Populate your Search index, NEST 2016-01David Smiley
 
ETL Tools Ankita Dubey
ETL Tools Ankita DubeyETL Tools Ankita Dubey
ETL Tools Ankita DubeyAnkita Dubey
 
Real time data processing frameworks
Real time data processing frameworksReal time data processing frameworks
Real time data processing frameworksIJDKP
 
Real-Time Data Flows with Apache NiFi
Real-Time Data Flows with Apache NiFiReal-Time Data Flows with Apache NiFi
Real-Time Data Flows with Apache NiFiManish Gupta
 
Summary introduction to data engineering
Summary introduction to data engineeringSummary introduction to data engineering
Summary introduction to data engineeringNovita Sari
 
XML In The Real World - Use Cases For Oracle XMLDB
XML In The Real World - Use Cases For Oracle XMLDBXML In The Real World - Use Cases For Oracle XMLDB
XML In The Real World - Use Cases For Oracle XMLDBMarco Gralike
 
From Data Hell to Bliss: Getting the Most Out of Your Acumatica Data
From Data Hell to Bliss: Getting the Most Out of Your Acumatica DataFrom Data Hell to Bliss: Getting the Most Out of Your Acumatica Data
From Data Hell to Bliss: Getting the Most Out of Your Acumatica DataTim Rodman, (CPA-Inactive)
 
GeoKettle: A powerful open source spatial ETL tool
GeoKettle: A powerful open source spatial ETL toolGeoKettle: A powerful open source spatial ETL tool
GeoKettle: A powerful open source spatial ETL toolThierry Badard
 
Information On Line Transaction Processing
Information On Line Transaction ProcessingInformation On Line Transaction Processing
Information On Line Transaction ProcessingStefanie Yang
 
Why apache Flink is the 4G of Big Data Analytics Frameworks
Why apache Flink is the 4G of Big Data Analytics FrameworksWhy apache Flink is the 4G of Big Data Analytics Frameworks
Why apache Flink is the 4G of Big Data Analytics FrameworksSlim Baltagi
 
Enabling SQL Access to Data Lakes
Enabling SQL Access to Data LakesEnabling SQL Access to Data Lakes
Enabling SQL Access to Data LakesVasu S
 
Etl design document
Etl design documentEtl design document
Etl design documentsgyazuddin
 
Hadoop and IoT Sinergija 2014
Hadoop and IoT Sinergija 2014Hadoop and IoT Sinergija 2014
Hadoop and IoT Sinergija 2014Darko Marjanovic
 
Apache Phoenix with Actor Model (Akka.io) for real-time Big Data Programming...
Apache Phoenix with Actor Model (Akka.io)  for real-time Big Data Programming...Apache Phoenix with Actor Model (Akka.io)  for real-time Big Data Programming...
Apache Phoenix with Actor Model (Akka.io) for real-time Big Data Programming...Trieu Nguyen
 
Eat whatever you can with PyBabe
Eat whatever you can with PyBabeEat whatever you can with PyBabe
Eat whatever you can with PyBabeDataiku
 

Similar to Získáváme, čistíme a ukládáme data (20)

Big Data .. Are you ready for the next wave?
Big Data .. Are you ready for the next wave?Big Data .. Are you ready for the next wave?
Big Data .. Are you ready for the next wave?
 
Populate your Search index, NEST 2016-01
Populate your Search index, NEST 2016-01Populate your Search index, NEST 2016-01
Populate your Search index, NEST 2016-01
 
ETL DW-RealTime
ETL DW-RealTimeETL DW-RealTime
ETL DW-RealTime
 
ETL Tools Ankita Dubey
ETL Tools Ankita DubeyETL Tools Ankita Dubey
ETL Tools Ankita Dubey
 
Real time data processing frameworks
Real time data processing frameworksReal time data processing frameworks
Real time data processing frameworks
 
Real-Time Data Flows with Apache NiFi
Real-Time Data Flows with Apache NiFiReal-Time Data Flows with Apache NiFi
Real-Time Data Flows with Apache NiFi
 
Summary introduction to data engineering
Summary introduction to data engineeringSummary introduction to data engineering
Summary introduction to data engineering
 
XML In The Real World - Use Cases For Oracle XMLDB
XML In The Real World - Use Cases For Oracle XMLDBXML In The Real World - Use Cases For Oracle XMLDB
XML In The Real World - Use Cases For Oracle XMLDB
 
From Data Hell to Bliss: Getting the Most Out of Your Acumatica Data
From Data Hell to Bliss: Getting the Most Out of Your Acumatica DataFrom Data Hell to Bliss: Getting the Most Out of Your Acumatica Data
From Data Hell to Bliss: Getting the Most Out of Your Acumatica Data
 
GeoKettle: A powerful open source spatial ETL tool
GeoKettle: A powerful open source spatial ETL toolGeoKettle: A powerful open source spatial ETL tool
GeoKettle: A powerful open source spatial ETL tool
 
Information On Line Transaction Processing
Information On Line Transaction ProcessingInformation On Line Transaction Processing
Information On Line Transaction Processing
 
Why apache Flink is the 4G of Big Data Analytics Frameworks
Why apache Flink is the 4G of Big Data Analytics FrameworksWhy apache Flink is the 4G of Big Data Analytics Frameworks
Why apache Flink is the 4G of Big Data Analytics Frameworks
 
Enabling SQL Access to Data Lakes
Enabling SQL Access to Data LakesEnabling SQL Access to Data Lakes
Enabling SQL Access to Data Lakes
 
Etl design document
Etl design documentEtl design document
Etl design document
 
ETL
ETL ETL
ETL
 
notes
notesnotes
notes
 
Lighty
LightyLighty
Lighty
 
Hadoop and IoT Sinergija 2014
Hadoop and IoT Sinergija 2014Hadoop and IoT Sinergija 2014
Hadoop and IoT Sinergija 2014
 
Apache Phoenix with Actor Model (Akka.io) for real-time Big Data Programming...
Apache Phoenix with Actor Model (Akka.io)  for real-time Big Data Programming...Apache Phoenix with Actor Model (Akka.io)  for real-time Big Data Programming...
Apache Phoenix with Actor Model (Akka.io) for real-time Big Data Programming...
 
Eat whatever you can with PyBabe
Eat whatever you can with PyBabeEat whatever you can with PyBabe
Eat whatever you can with PyBabe
 

More from Josef Šlerka

Znaky, znaky, modely
Znaky, znaky, modelyZnaky, znaky, modely
Znaky, znaky, modelyJosef Šlerka
 
LLM a mixed methods v humanitních vědách
LLM a mixed methods v humanitních vědáchLLM a mixed methods v humanitních vědách
LLM a mixed methods v humanitních vědáchJosef Šlerka
 
Vliv AI na mediální trh
Vliv AI na mediální trhVliv AI na mediální trh
Vliv AI na mediální trhJosef Šlerka
 
Informační věda - Pravděpodobnosti
Informační věda - PravděpodobnostiInformační věda - Pravděpodobnosti
Informační věda - PravděpodobnostiJosef Šlerka
 
Informacni veda: Pocitace
Informacni veda: PocitaceInformacni veda: Pocitace
Informacni veda: PocitaceJosef Šlerka
 
Inforamační věda: Algoritmus
Inforamační věda: AlgoritmusInforamační věda: Algoritmus
Inforamační věda: AlgoritmusJosef Šlerka
 
Co je to datova novinarina
Co je to datova novinarinaCo je to datova novinarina
Co je to datova novinarinaJosef Šlerka
 
Algoritmy a sociální sítě - stručný úvod
Algoritmy a sociální sítě - stručný úvodAlgoritmy a sociální sítě - stručný úvod
Algoritmy a sociální sítě - stručný úvodJosef Šlerka
 
Parallel Polis Revisited: Way from concept of Parallel Polis to Distributed R...
Parallel Polis Revisited: Way from concept of Parallel Polis to Distributed R...Parallel Polis Revisited: Way from concept of Parallel Polis to Distributed R...
Parallel Polis Revisited: Way from concept of Parallel Polis to Distributed R...Josef Šlerka
 
Dezinformační weby a zpravodajství v ČR
Dezinformační weby a zpravodajství v ČRDezinformační weby a zpravodajství v ČR
Dezinformační weby a zpravodajství v ČRJosef Šlerka
 
INFOWAR IN CZECH REPUBLIC
INFOWAR IN CZECH REPUBLICINFOWAR IN CZECH REPUBLIC
INFOWAR IN CZECH REPUBLICJosef Šlerka
 
Česká média dnes aneb Pokus o kontext k aktuální debatě
Česká média dnes aneb Pokus o kontext k aktuální debatěČeská média dnes aneb Pokus o kontext k aktuální debatě
Česká média dnes aneb Pokus o kontext k aktuální debatěJosef Šlerka
 
Svět viděný cizíma očima
Svět viděný cizíma očimaSvět viděný cizíma očima
Svět viděný cizíma očimaJosef Šlerka
 
Do Birds of a Feather Flock Together?
Do Birds of a Feather Flock Together?Do Birds of a Feather Flock Together?
Do Birds of a Feather Flock Together?Josef Šlerka
 
Projekt Navigátor - datová část
Projekt Navigátor - datová částProjekt Navigátor - datová část
Projekt Navigátor - datová částJosef Šlerka
 
Stručná zpráva o jednom experimentu
Stručná zpráva o jednom experimentuStručná zpráva o jednom experimentu
Stručná zpráva o jednom experimentuJosef Šlerka
 
Wikipedie ve službách zla?!
Wikipedie ve službách zla?!Wikipedie ve službách zla?!
Wikipedie ve službách zla?!Josef Šlerka
 

More from Josef Šlerka (20)

Znaky, znaky, modely
Znaky, znaky, modelyZnaky, znaky, modely
Znaky, znaky, modely
 
LLM a mixed methods v humanitních vědách
LLM a mixed methods v humanitních vědáchLLM a mixed methods v humanitních vědách
LLM a mixed methods v humanitních vědách
 
Vliv AI na mediální trh
Vliv AI na mediální trhVliv AI na mediální trh
Vliv AI na mediální trh
 
Informační věda - Pravděpodobnosti
Informační věda - PravděpodobnostiInformační věda - Pravděpodobnosti
Informační věda - Pravděpodobnosti
 
Informacni veda: Pocitace
Informacni veda: PocitaceInformacni veda: Pocitace
Informacni veda: Pocitace
 
Inforamační věda: Algoritmus
Inforamační věda: AlgoritmusInforamační věda: Algoritmus
Inforamační věda: Algoritmus
 
Co je to datova novinarina
Co je to datova novinarinaCo je to datova novinarina
Co je to datova novinarina
 
Algoritmy a sociální sítě - stručný úvod
Algoritmy a sociální sítě - stručný úvodAlgoritmy a sociální sítě - stručný úvod
Algoritmy a sociální sítě - stručný úvod
 
Atlas konspirací
Atlas konspiracíAtlas konspirací
Atlas konspirací
 
Parallel Polis Revisited: Way from concept of Parallel Polis to Distributed R...
Parallel Polis Revisited: Way from concept of Parallel Polis to Distributed R...Parallel Polis Revisited: Way from concept of Parallel Polis to Distributed R...
Parallel Polis Revisited: Way from concept of Parallel Polis to Distributed R...
 
Dezinformační weby a zpravodajství v ČR
Dezinformační weby a zpravodajství v ČRDezinformační weby a zpravodajství v ČR
Dezinformační weby a zpravodajství v ČR
 
INFOWAR IN CZECH REPUBLIC
INFOWAR IN CZECH REPUBLICINFOWAR IN CZECH REPUBLIC
INFOWAR IN CZECH REPUBLIC
 
Česká média dnes aneb Pokus o kontext k aktuální debatě
Česká média dnes aneb Pokus o kontext k aktuální debatěČeská média dnes aneb Pokus o kontext k aktuální debatě
Česká média dnes aneb Pokus o kontext k aktuální debatě
 
Svět viděný cizíma očima
Svět viděný cizíma očimaSvět viděný cizíma očima
Svět viděný cizíma očima
 
Do Birds of a Feather Flock Together?
Do Birds of a Feather Flock Together?Do Birds of a Feather Flock Together?
Do Birds of a Feather Flock Together?
 
Projekt Navigátor - datová část
Projekt Navigátor - datová částProjekt Navigátor - datová část
Projekt Navigátor - datová část
 
AI a žurnalistika
AI a žurnalistikaAI a žurnalistika
AI a žurnalistika
 
Stručná zpráva o jednom experimentu
Stručná zpráva o jednom experimentuStručná zpráva o jednom experimentu
Stručná zpráva o jednom experimentu
 
Volba a metoda
Volba a metodaVolba a metoda
Volba a metoda
 
Wikipedie ve službách zla?!
Wikipedie ve službách zla?!Wikipedie ve službách zla?!
Wikipedie ve službách zla?!
 

Recently uploaded

Measures of Central Tendency: Mean, Median and Mode
Measures of Central Tendency: Mean, Median and ModeMeasures of Central Tendency: Mean, Median and Mode
Measures of Central Tendency: Mean, Median and ModeThiyagu K
 
Holdier Curriculum Vitae (April 2024).pdf
Holdier Curriculum Vitae (April 2024).pdfHoldier Curriculum Vitae (April 2024).pdf
Holdier Curriculum Vitae (April 2024).pdfagholdier
 
Micro-Scholarship, What it is, How can it help me.pdf
Micro-Scholarship, What it is, How can it help me.pdfMicro-Scholarship, What it is, How can it help me.pdf
Micro-Scholarship, What it is, How can it help me.pdfPoh-Sun Goh
 
This PowerPoint helps students to consider the concept of infinity.
This PowerPoint helps students to consider the concept of infinity.This PowerPoint helps students to consider the concept of infinity.
This PowerPoint helps students to consider the concept of infinity.christianmathematics
 
Python Notes for mca i year students osmania university.docx
Python Notes for mca i year students osmania university.docxPython Notes for mca i year students osmania university.docx
Python Notes for mca i year students osmania university.docxRamakrishna Reddy Bijjam
 
Application orientated numerical on hev.ppt
Application orientated numerical on hev.pptApplication orientated numerical on hev.ppt
Application orientated numerical on hev.pptRamjanShidvankar
 
Key note speaker Neum_Admir Softic_ENG.pdf
Key note speaker Neum_Admir Softic_ENG.pdfKey note speaker Neum_Admir Softic_ENG.pdf
Key note speaker Neum_Admir Softic_ENG.pdfAdmir Softic
 
ICT Role in 21st Century Education & its Challenges.pptx
ICT Role in 21st Century Education & its Challenges.pptxICT Role in 21st Century Education & its Challenges.pptx
ICT Role in 21st Century Education & its Challenges.pptxAreebaZafar22
 
On National Teacher Day, meet the 2024-25 Kenan Fellows
On National Teacher Day, meet the 2024-25 Kenan FellowsOn National Teacher Day, meet the 2024-25 Kenan Fellows
On National Teacher Day, meet the 2024-25 Kenan FellowsMebane Rash
 
Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...
Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...
Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...christianmathematics
 
1029-Danh muc Sach Giao Khoa khoi 6.pdf
1029-Danh muc Sach Giao Khoa khoi  6.pdf1029-Danh muc Sach Giao Khoa khoi  6.pdf
1029-Danh muc Sach Giao Khoa khoi 6.pdfQucHHunhnh
 
Energy Resources. ( B. Pharmacy, 1st Year, Sem-II) Natural Resources
Energy Resources. ( B. Pharmacy, 1st Year, Sem-II) Natural ResourcesEnergy Resources. ( B. Pharmacy, 1st Year, Sem-II) Natural Resources
Energy Resources. ( B. Pharmacy, 1st Year, Sem-II) Natural ResourcesShubhangi Sonawane
 
1029 - Danh muc Sach Giao Khoa 10 . pdf
1029 -  Danh muc Sach Giao Khoa 10 . pdf1029 -  Danh muc Sach Giao Khoa 10 . pdf
1029 - Danh muc Sach Giao Khoa 10 . pdfQucHHunhnh
 
The basics of sentences session 2pptx copy.pptx
The basics of sentences session 2pptx copy.pptxThe basics of sentences session 2pptx copy.pptx
The basics of sentences session 2pptx copy.pptxheathfieldcps1
 
Grant Readiness 101 TechSoup and Remy Consulting
Grant Readiness 101 TechSoup and Remy ConsultingGrant Readiness 101 TechSoup and Remy Consulting
Grant Readiness 101 TechSoup and Remy ConsultingTechSoup
 
Web & Social Media Analytics Previous Year Question Paper.pdf
Web & Social Media Analytics Previous Year Question Paper.pdfWeb & Social Media Analytics Previous Year Question Paper.pdf
Web & Social Media Analytics Previous Year Question Paper.pdfJayanti Pande
 
Unit-IV; Professional Sales Representative (PSR).pptx
Unit-IV; Professional Sales Representative (PSR).pptxUnit-IV; Professional Sales Representative (PSR).pptx
Unit-IV; Professional Sales Representative (PSR).pptxVishalSingh1417
 
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhi
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in DelhiRussian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhi
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhikauryashika82
 
Introduction to Nonprofit Accounting: The Basics
Introduction to Nonprofit Accounting: The BasicsIntroduction to Nonprofit Accounting: The Basics
Introduction to Nonprofit Accounting: The BasicsTechSoup
 

Recently uploaded (20)

Measures of Central Tendency: Mean, Median and Mode
Measures of Central Tendency: Mean, Median and ModeMeasures of Central Tendency: Mean, Median and Mode
Measures of Central Tendency: Mean, Median and Mode
 
Holdier Curriculum Vitae (April 2024).pdf
Holdier Curriculum Vitae (April 2024).pdfHoldier Curriculum Vitae (April 2024).pdf
Holdier Curriculum Vitae (April 2024).pdf
 
Micro-Scholarship, What it is, How can it help me.pdf
Micro-Scholarship, What it is, How can it help me.pdfMicro-Scholarship, What it is, How can it help me.pdf
Micro-Scholarship, What it is, How can it help me.pdf
 
This PowerPoint helps students to consider the concept of infinity.
This PowerPoint helps students to consider the concept of infinity.This PowerPoint helps students to consider the concept of infinity.
This PowerPoint helps students to consider the concept of infinity.
 
Python Notes for mca i year students osmania university.docx
Python Notes for mca i year students osmania university.docxPython Notes for mca i year students osmania university.docx
Python Notes for mca i year students osmania university.docx
 
Application orientated numerical on hev.ppt
Application orientated numerical on hev.pptApplication orientated numerical on hev.ppt
Application orientated numerical on hev.ppt
 
Key note speaker Neum_Admir Softic_ENG.pdf
Key note speaker Neum_Admir Softic_ENG.pdfKey note speaker Neum_Admir Softic_ENG.pdf
Key note speaker Neum_Admir Softic_ENG.pdf
 
ICT Role in 21st Century Education & its Challenges.pptx
ICT Role in 21st Century Education & its Challenges.pptxICT Role in 21st Century Education & its Challenges.pptx
ICT Role in 21st Century Education & its Challenges.pptx
 
On National Teacher Day, meet the 2024-25 Kenan Fellows
On National Teacher Day, meet the 2024-25 Kenan FellowsOn National Teacher Day, meet the 2024-25 Kenan Fellows
On National Teacher Day, meet the 2024-25 Kenan Fellows
 
Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...
Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...
Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...
 
1029-Danh muc Sach Giao Khoa khoi 6.pdf
1029-Danh muc Sach Giao Khoa khoi  6.pdf1029-Danh muc Sach Giao Khoa khoi  6.pdf
1029-Danh muc Sach Giao Khoa khoi 6.pdf
 
Energy Resources. ( B. Pharmacy, 1st Year, Sem-II) Natural Resources
Energy Resources. ( B. Pharmacy, 1st Year, Sem-II) Natural ResourcesEnergy Resources. ( B. Pharmacy, 1st Year, Sem-II) Natural Resources
Energy Resources. ( B. Pharmacy, 1st Year, Sem-II) Natural Resources
 
1029 - Danh muc Sach Giao Khoa 10 . pdf
1029 -  Danh muc Sach Giao Khoa 10 . pdf1029 -  Danh muc Sach Giao Khoa 10 . pdf
1029 - Danh muc Sach Giao Khoa 10 . pdf
 
The basics of sentences session 2pptx copy.pptx
The basics of sentences session 2pptx copy.pptxThe basics of sentences session 2pptx copy.pptx
The basics of sentences session 2pptx copy.pptx
 
Grant Readiness 101 TechSoup and Remy Consulting
Grant Readiness 101 TechSoup and Remy ConsultingGrant Readiness 101 TechSoup and Remy Consulting
Grant Readiness 101 TechSoup and Remy Consulting
 
Web & Social Media Analytics Previous Year Question Paper.pdf
Web & Social Media Analytics Previous Year Question Paper.pdfWeb & Social Media Analytics Previous Year Question Paper.pdf
Web & Social Media Analytics Previous Year Question Paper.pdf
 
Unit-IV; Professional Sales Representative (PSR).pptx
Unit-IV; Professional Sales Representative (PSR).pptxUnit-IV; Professional Sales Representative (PSR).pptx
Unit-IV; Professional Sales Representative (PSR).pptx
 
Mehran University Newsletter Vol-X, Issue-I, 2024
Mehran University Newsletter Vol-X, Issue-I, 2024Mehran University Newsletter Vol-X, Issue-I, 2024
Mehran University Newsletter Vol-X, Issue-I, 2024
 
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhi
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in DelhiRussian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhi
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhi
 
Introduction to Nonprofit Accounting: The Basics
Introduction to Nonprofit Accounting: The BasicsIntroduction to Nonprofit Accounting: The Basics
Introduction to Nonprofit Accounting: The Basics
 

Získáváme, čistíme a ukládáme data

  • 1. Získáváme, čistíme a ukládáme data Digital Humanities, Lekce druhá Josef Šlerka, Studia nových médií, 15. 10. 2012
  • 2. ETL (light verze) Extracting data from outside sources Transforming it to fit operational needs (which can include quality levels) Loading it into the end target (database, more specifically, operational data store, data mart or data warehouse) (viz Wikipedie)
  • 3. Real-life podle Wiki 1. Cycle initiation 2. Build reference data 3. Extract (from sources) 4. Validate 5. Transform (clean, apply business rules, check for data integrity, create aggregates or disaggregates) 6. Stage (load into staging tables, if used)
  • 4. Real-life podle Wiki 7. Audit reports (for example, on compliance with business rules. Also, in case of failure, helps to diagnose/repair) 8. Publish (to target tables) 9. Archive 10. Clean up
  • 5. Extracting co se vám bude hodit...
  • 6. Extract strukturovaná data vs nestrukturovaná pro DH nejčastěji databáze vs web web API vs scrapping lze si vystačit i jen malým znalostmi statická data vs real-time mohou být zákeřná, ale jde to řešit
  • 7. XPATH XPath, the XML Path Language, is a query language for selecting nodes from an XML document. In addition, XPath may be used to compute values (e.g., strings, numbers, or Boolean values) from the content of an XML document. XPath was defined by the World Wide Web Consortium (W3C)
  • 8. Jednoduché nástroje Google Docs (hlavně statická data) http://drive.google.com YQL (hlavně statická data) http://developer.yahoo.com/yql/console/ Yahoo Pipes (hlavně dynamická data) http://pipes.yahoo.com/pipes/ IFTTT (hlavně dynamická data) https://ifttt.com/
  • 9. Ale mocné.... Twitter Archiving Google Spreadsheet TAGS v3 http://mashe.hawksey.info/2012/01/twitter-archive- tagsv3/
  • 10. Transforming Hlavně o čištění a sjednocování dat ...
  • 11. Google Refine http://code.google.com/p/google-refine/downloads/list? can=1 Google Refine is a standalone desktop application provided by Google for data cleanup and transformation to other formats. It is similar to spreadsheet applications (and can work with spreadsheet file formats), however acts more like database.
  • 12. Loading kam s nimi, když ne do tradiční databáze...
  • 13. Google Fusion Tables jednoduché řešení pro běžné uživatele http://www.google.com/fusiontables/Home/ Web service provided by Google for data management. Data is stored in multiple tables that Internet users can view and download. The Web service provides means for visualizing data with pie charts, bar charts, lineplots, scatterplots, timelines as well as geographical maps. Data is exported in a comma-separated values file format.
  • 14. A teď ještě jedno demo....