Slides for lecture 5 of the course Introduction to Legal Technology at the University of Turku Law School, presented Feb 10 2015.
This lecture is the first of three lectures on specific legal technology applications: information retrieval, knowledge management, and e-discovery.
Introduction to Legal Technology, lecture 5 (2015)
1. TLS0070 Introduction to
Legal Technology
Lecture 5 Applications I:
Information retrieval, knowledge
management, e-discovery
University of Turku Law School 2015-02-10
Anna Ronkainen @ronkaine
anna.ronkainen@onomatics.com
3. Google Flu Trends
- predicting the timing and strength of
influenza epidemics based on the relative
frequency of certain keywords in searches
- values for the model in black (dotted lines
95% confidence intervals for predicted
values), actual CDC influenza figures in red
7. Lessons worth learning (also for legal
applications)
- transparency and replicability
- use big data for understanding the unknown
- study the algorithm
- it’s not just about the size of the data
(from Lazer et al 2014)
9. Application lectures overview
Applications I (this week):
- information retrieval
- e-discovery (e-disclosure)
- knowledge management
Applications II (next week, 1st half):
- case management
- online dispute resolution
- access to justice solutions
Applications III (next week, 2nd half):
- decision support
- prediction
- automation
- self-service
10. Legal tech applications not covered
here
- general-purpose applications (like Office®/
office software)
- legislative drafting applications
- docket management (and other applications
for use within the judiciary)
- courtroom visualization (etc.) software
- ... and probably a ton of other things I don’t
even know existed
12. Information retrieval (IR)
- the granddaddy of legal tech applications
- the only form of legal tech available in all
(industrial) countries at least in some form
- making different types of static legal content
available for human consumption
- statute law (+ commentaries)
- case law
- doctrine: journal articles and books
13. Information retrieval users
- types of users:
- lawyers in general
- subgroups of lawyers (e.g. IP lawyers)
- legal/admin support staff (e.g. tax
administrators, paralegals, informaticians)
- other non-law professionals
- ordinary citizens
- different users have different needs in terms of
- type and quantity of content required
- terminology used
- user interface in general
14. First-generation information retrieval
- take whatever text you have (on paper) and
put it into a database
- full-text search (exact match or wildcards)
- structured search (in whatever fields are
available)
- Boolean search with AND, OR, NOT
- some metadata enhancements like keywords
(typically same as on paper)
16. Further developments
- hypertext (links)
- better search capabilities with language
technology (try searching for “back” as a
noun)
- relevancy ranking
- recommendations for further reading
- morebetter metadata
17. An example: WestlawNext
- natural-language and Boolean search
- relevancy ranking of sources of law, using
(among others) a network of links between
cases
- (commercial break, text version:
http://info.legalsolutions.thomsonreuters.com/pdf/wln2/L-355700_v2.pdf)
18. On the horizon
- natural-language query interfaces and
advanced text understanding (think Watson/
Siri)
- merging relevancy ranking with predictive
legal analytics (like a certain trademark
platform)
- even more polarization between biggest
markets (esp. US) and others (e.g. Finland,
let alone developing countries)
20. Knowledge management
- taking (and improving upon!) the knowledge
(explicit and tacit!) of an organization and
putting it into optimal use
- by no means just tech: creating and developing
processes within the organization is equally
important
- can take different forms:
- internal: e.g. making work product (memos,
contracts etc.) electronically searchable
- external: creating digital legal content for use
by law firm customers
21. Knowledge management advantages
- higher efficiency -> better service
- higher quality (better dissemination of
expertise)
- makes life easier for lawyers (increased
productivity, reduced stress)
- keeps knowledge in the firm even if individuals
leave
- helps with the training of new lawyers
- necessary for good risk management
(after Kay 2003)
22. One knowledge management example:
contract management
- the default solution that’s still used by many
(most?) companies: paper + binders
- low overhead; manageable with low volumes
- doesn’t scale (cope with large volumes) well,
e.g. finding information becomes difficult
- particularly kludgy when documents needed
externally (due diligence, anyone?)
- error-prone and fragile
- still need to manage templates somewhere
(lack of central storage leads to inconsistencies)
23. Low-tech electronic contract
management
- establish a central organization-wide repository for
signed contracts and official templates
- doesn’t need proprietary software, any LAN or
cloud based (private) file sharing solution works
- electronically searchable, at least if word processing
documents and scans are kept together
- works well (enough) if there are good processes
(e.g. regarding file naming and organization of files)
and they are (always!) consistently adhered to
- ...which this solution obviously cannot enforce
- no built-in workflow management
24. Dedicated contract lifecycle
management (CLM) solutions
- hundreds of providers, including two from Finland (that I
know of: M-Files and Sopima)
- functionalities of varying sophistication for different stages
in the contract lifecycle:
- contract and clause template libraries
- platform and history for internal review
- platform and history for negotiations and external
review
- electronic signing / import of scanned definitive paper
originals
- archiving, retrieval etc.
- workflow management, managing access privileges etc.
27. Electronic signing
- real electronic signing not widespread
(outside Estonia, anyway), to a great deal
due to a lack of standards internationally
(and esp. for identifying legal persons)
- pseudo-electronic signing (images manually
written signatures stored electronically) now
quite widespread, dedicated solutions and
support in CLM systems also available
- the latter raises some obvious questions
about probative value
32. Fondia’s Virtual Lawyer
- a collection of ~1700 short documents made
by Fondia staff describing the legal aspects
of particular situations
- for external use (self-help by Fondia clients
etc.), AFAIK also used internally in an
enhanced version
- not for total novices
- available at virtuallawyer.fi for free,
registration required, document template
library additionally available for a fee
34. Discovery in electronically stored
information (e-discovery)
- emerged out of nowhere a dozen years ago
- now a multi-billion-dollar industry (mostly US),
hundreds of providers
- roots in more general-purpose language tech
(outside the AI & law community)
- Enron corpus, Sedona Conference, TREC, DESI
- storage requirements for e-mail etc. introduced
(US) by amendments to Federal Rules of Civil
Procedure in 2006
35. ...and now* it’s already this much
widespread (in the US, anyway):
*: actually this book is from 2009
36. Zubulake v. UBS Warburg
- employment law case in District Court for
Southern NY, heard 2003–2005
- led to four groundbreaking rulings which set
the basic standards for e-discovery (before
2006 FRCP revisions), widely referred to as
Zubulake I, III, IV, V
37. Zubulake I and III
- what data is considered accessible ESI
- yes: online data/hard disks, optical disks, offline magnetic tapes
- no: backup tapes, damaged/deleted/... data
- no -> yes if considerable evidentiary value can be demonstrated, for
which a 7-factor test was introduced:
- The extent to which the request is specifically tailored to discover
relevant information;
- The availability of such information from other sources;
- The total cost of production, compared to the amount in
controversy;
- The total cost of production, compared to the resources available to
each party;
- The relative ability of each party to control costs and its incentive to
do so;
- The importance of the issues at stake in the litigation; and
- The relative benefits to the parties of obtaining the information.
38. Zubulake IV
- some backups no longer available
- relevant emails (created after the start of the
proceedings) had been deleted
- defendant had a duty to preserve evidence
(since relevant for ongoing/future litigation)
- plaintiff got access to the information
- however, plaintiff couldn’t show adverse
interference (at this stage) and was ordered
to pay the costs
39. Zubulake V
- upon the plaintiff’s motion, the court
concluded that the defendant (and defence
counsel) had failed to safeguard and produce
evidence in an adequate manner
- defendant sanctioned and ordered to pay
plaintiff’s costs for producing evidence
(witness re-examination etc.) necessary due
to plaintiff’s late (or non-)production of
relevant evidence
40. Outcome
- active interference (intentional destruction
or hiding of evidence) ruled by the judge
- jury found in favour of the plaintiff,
compensatory and punitive damages
- reimbursement of even more costs to the
plaintiff (generally a lot more unusual in US)
41. E-discovery workflow
- establish an ESI retention policy, stick to it when
creating and storing data
- identify relevant ESI, create authentic snapshot and
collect it for further processing
- process and filter ESI (e.g. removal of duplicates)
- review and analyze ESI for privileged information
- produce ESI after filtering out irrelevant, duplicated
or privileged materials
- possibly clawback if too much produced in error
- present at trial (if it ever goes that far)
42. First-generation e-discovery
- based on lists of specific search terms (or
phrases) proposed by the plaintiff and
approved or modified by the judge
- a bit sketchy, not even real consensus about
whether keywords cover all inflections?
- no longer considered acceptable by many of
the most influential US judges for this field
43. Predictive coding
- based on coding a (very) small subset of the
relevant document mass as responsive or not
(should/n’t be released)
- then using that as the teaching set for a
machine learning algorithm
- performance comparable to (or better than)
human reviewers at a fraction of the cost
44. E-discovery output
- native (original) formats (e.g.: .docx)
- usually better for the plaintiff: electronically
searchable
- native file formats for proprietary software not
necessarily openable without that software
- “petrified” formats (tiff, pdf)
- often better for the defendant: almost the same
as handing out the data on paper
- general-purpose tools enough for viewing
- easier to redact
45. What’s the status with e-discovery
- very widespread in the US (because it’s the
law!)
- gaining popularity in the rest of Anglophonia
(because common law; tech readily available for
English)
- some providers also support major European
and Asian languages (mostly for international
companies operating in the US)
- rest of the world: is there even a word for this?
(then again: discovery in the common-law sense
doesn’t exist in most civil-law countries (incl.
Finland) in general)
46. No concrete examples
- (because, frankly, I understand neither the field
nor the legal issue well enough)
- but e-discovery in itself is an interesting
example of legal tech for many reasons
- first real big data application for law
- came out of nowhere in the early 2000s
- now a multi-billion-dollar industry (US)
- many startups, some notable exits (e.g.
Cataphora’s e-discovery ops to EY)
- also continuously new funding rounds (even
$100M+) to more and more companies