Student Profile Sample - We help schools to connect the data they have, with ...
Quran and Text-Fabric
1. Data Analysis for Ancient Corpora
applied to the Quran
Dirk Roorda
and
Cornelis van Lit
Filosofie en Religiewetenschap, Utrecht, 2019-03-28
0
50
100
150
200
250
conj nmpr subs adjv prep art
Parts of Speech after Atnach in ETCBC Phrase
2. A. reasons
B. a solution
C. toy example of a TF datasource
D. ministudy: rings and sentiments
C'. an easter egg
B'. new ways
A'. new horizons
3.
4. • researchers in control of their own
data
• researchers empowered to fully
harness the data available to them
• researchers encouraged to DIY
computing
5. A. reasons
B. a solution
C. toy example of a TF datasource
D. ministudy: rings and sentiments
C'. an easter egg
B'. new ways
A'. new horizons
6. Data model
• Graph model: words, phrases, etc. are “nodes,” relationships
between them are edges.
• Graphs model complex data structures better than other
methods (e.g. XML).
• All stored in easy-to-understand, plain-text files. No messy
XML, SQL, etc.
• ... and we call him Text-Fabric (TF)
7. Data structure of TF - the IKEA spirit
node
order! order!
stacks of components
uniquely identified
words
phrases
chapters
verses
8. A. reasons
B. a solution
C. toy example of a TF datasource
D. ministudy: rings and sentiments
C'. an easter egg
B'. new ways
A'. new horizons
9. # Consider Phlebas
$ author=Iain M. Banks
## 1
Everything about us,
everything around us,
everything we know [and can know of] is composed ultimately of
patterns of nothing;
that’s the bottom line, the final truth.
So where we find we have any control over those patterns,
why not make the most elegant ones, the most enjoyable and good
ones,
in our own terms?
## 2
Besides,
it left the humans in the Culture free to take care of the things that
really mattered in life,
such as [sports, games, romance,] studying dead languages,
barbarian societies and impossible problems,
and climbing high mountains without the aid of a safety harness.
10. @node
@compiler=Dirk Roorda
@description=the letters of a word
@name=Culture quotes from Iain
Banks
@source=Good Reads
@url=https://www.goodreads.com/
work/quotes/14366-consider-phlebas
@valueType=str
@writtenBy=Text-Fabric
@dateWritten=2019-01-30T22:20:19Z
Everything
about
us
everything
around
us
everything
we
know
and
can
know
of
is
composed
ultimately
of
patterns
of
nothing
that’s
the
bottom
line
the
final
truth
So letters
@node
@compiler=Dirk Roorda
@description=the punctuation after
a word
@name=Culture quotes from Iain
Banks
@source=Good Reads
@url=https://www.goodreads.com/
work/quotes/14366-consider-phlebas
@valueType=str
@writtenBy=Text-Fabric
@dateWritten=2019-01-30T22:20:19Z
3 ,
6 ,
20 ;
24 ,
27 .
38 ,
45 ,
51 ,
55 ?
,
75 ,
78 ,
,
,
83 ,
88 ,
99 .
punc
banks/tf/
author.tf
gap.tf
letters.tf
number.tf
oslots.tf
otext.tf
otype.tf
punc.tf
terminator.tf
title.tf
TF dataset
11. otype
@node
@compiler=Dirk Roorda
@name=Culture quotes from Iain Banks
@source=Good Reads
@url=https://www.goodreads.com/work/quotes/14366-consider-phlebas
@valueType=str
@writtenBy=Text-Fabric
@dateWritten=2019-01-30T22:20:19Z
1-99 word
100 book
101-102 chapter
103-114 line
115-117 sentence
12. oslots
@edge
@compiler=Dirk Roorda
@name=Culture quotes from Iain Banks
@source=Good Reads
@url=https://www.goodreads.com/work/quotes/14366-consider-phlebas
@valueType=str
@writtenBy=Text-Fabric
@dateWritten=2019-01-30T22:20:19Z
100 1-99
1-55
56-99
1-3
4-6
7-9,14-20
21-27
28-38
39-51
52-55
56
57-75
76-77,81-83
84-88
89-99
1-27
28-55
56-99
1-99 word
100 book
101-102 chapter
103-114 line
115-117 sentence
## 1
Everything about us,
everything around us,
everything we know [and can know of] is composed ultimately of patterns of
nothing;
that’s the bottom line, the final truth.
So where we find we have any control over those patterns,
why not make the most elegant ones, the most enjoyable and good ones,
in our own terms?
## 2
Besides,
it left the humans in the Culture free to take care of the things that really
mattered in life,
such as [sports, games, romance,] studying dead languages,
barbarian societies and impossible problems,
and climbing high mountains without the aid of a safety harness.
33. A. reasons
B. a solution
C. toy example of a TF datasource
D. ministudy: rings and sentiments
C'. an easter egg
B'. new ways
A'. new horizons
34. Sharing and re-using data
Text-Fabric has been developed by a DANS-employee
as a consequence:
Data export is built in ✅
Provenance tracking is built in ✅
Redistribution of newly created data is built in ✅
35. sharing #1: GitHub & NBviewer
work done in a Jupyter Notebook inside a GitHub repository
is very sharable
39. sharing #4: Create new features
https://nbviewer.jupyter.org/github/annotation/tutorials/blob/master/quran/share.ipynb
• etcbc/valence/tf : the results of the verbal valence work of Janet Dyk in the
SYNVAR project;
• etcbc/lingo/heads/tf : head words for phrases, work done by Cody Kingham;
• ch-jensen/Semantic-mapping-of-participants/actor/tf : participant analysis in
progress by Christian Høygaard-Jensen;
• cmerwich/bh-reference-system/tf: participant analysis in progress by
Christiaan Erwich;
• nino-cunei/oldbabylonian/parallels/tf: similar lines by Dirk Roorda
• q-ran/quran/parallels/tf: similar lines by Dirk Roorda
• q-ran/exercises/mining/tf: sentiments (crude) by Dirk Roorda
• you/quran/sentiments/tf: sentiments (refined) by You
• cvlit/quran/semantics/tf: semantic fields by cvlit
40. The Text-Fabric Ethos
• Open source tool for corpus annotation and analysis.
• Corpus data in a repository, with standard license, as free as
possible
• Researchers: step out of your technological comfort zones and
pave the way for the ones after you
• Find computational inspiration across disciplines