3. EXAMPLE OF STRUCTURED DATA
<bug>
<bug_id>45411</bug_id>
<creation_ts>2000-07-13 13:46:00 -0700</creation_ts>
<short_desc>Drag, hover over tab should open tab</short_desc>
<delta_ts>2009-12-04 13:03:48 -0800</delta_ts>
<reporter_accessible>1</reporter_accessible>
<cclist_accessible>1</cclist_accessible>
<classification_id>2</classification_id>
<classification>Client Software</classification>
<product>SeaMonkey</product>
<component>Tabbed Browser</component>
<version>Trunk</version>
<rep_platform>All</rep_platform>
<op_sys>All</op_sys>
<bug_status>RESOLVED</bug_status>
<resolution>WONTFIX</resolution>
<priority>--</priority>
<bug_severity>enhancement</bug_severity>
<target_milestone>---</target_milestone>
<blocked>121292</blocked>
...
</bug>
3
4. So What?
EXAMPLES OF UNSTRUCTURED DATA
web-sites diagrams requirements
documents
social media documentation help
IRC chat files
code
so urce nts orts
mme bu g rep captchas
co
commit logs
email system logs
4
5. SE data without explicit format
COMPLEXITY DIVERSITY IMPERFECTION
5
6. Unstructured Data is
COMPLEX ...
all
QLite library sh Bonjour,
0: The S ents
S1 000 l SQ L statem
high-leve s to persistent
translate all
level I/O c ces deux pro
blèmes sont
into low- En effet, les reliés.
paquets Ubu
storage. comportent ntu ne
SQL
k of every an-
pas les dépe
ndances (e.
The ess ential tas to translate hum libpng, libjp
eg, libglew, g.
ne is ...).
datab ase engi ts into
SQL s tatemen s. Si Tulip ne p
readable operation eut afficher
les fichiers
of I/O PNG, c'est s
sequences ans doute ca
r le paquet
libpng est m
anquant sur
Nous travail le système.
lons à ajout
dépendance er les
s sur les paq
natural language n'arrivera pr
obablement
uets, mais c
pas avant T
eci
3.5. ulip
rich semantics
Cordialemen
t,
no authoritative formats Charles.
6
7. ... AND DIVERSE
In this report, you have defined a parameter named blocksize,
which is given a value of "7|D|1|D". In open script of data set,
there are below lines code:
<script begin>
token=Packages.java.util.StringTokenizer(params["blocksize"],"|");
vec=new Packages.java.util.Vector();
while(token.hasMoreTokens()){
vec.addElement(token.nextToken()); Eclipse #150222
}
params["DateRange"]=java.lang.Integer.parseInt(vec.elementAt(0));
</script end>
Since the value of params["blocksize"] is "7|D|1|D", vec.elementAt(0)
is "7", and then it can not be parsed to int value. In 1.0.1,
the value of params["blocksize"] might be 7|D|1|D, so it can be
parsed to int value of 7.
7
8. ... AND IMPERFECT
o e@gmail.com
From: john.d c eforge.net
To: d evlist@sour !!
Subject: BS OD WTF!!??
Hi devs,
C inconsistency
in JDBC-RP ’t
f ound a bug ol. OMG can ambiguity
ver y badass l sed that. I
ve you mis incorrect informal language
belie er
get a bsod aft
(
pw, pls fix :'
JD $$$
8
9. So What?
EXAMPLES OF UNSTRUCTURED DATA
web-sites diagrams requirements
documents
social media documentation help
IRC chat files
code
so urce nts orts
mme bu g rep captchas
co
commit logs
email system logs
9