Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...
IntelliSemantc - Second generation semantic technologies for patents
1. Second generation
semantic technologies for
patent analysis
Alberto Ciaramella - IntelliSemantic
Marco Ciaramella - IntelliSemantic
PATINFO 2015 - 10/6/2015 – Ilmenau
2. This presentation
Intellisemantic
Semantic technologies in patent solutions are
sometimes controversial.
The first part of this presentation provides a
framework and a fair overview of what exist now,
and anticipates some coming evolutions, belonging
to “second generation semantic technologies”.
The second part of this presentation provides
IntelliSemantic specific examples.
.
2
3. Content
Intellisemantic
IntelliSemantic: an introduction
Patent tasks, phases and technologies
Second generation semantic technologies
Semantic technologies demos
Embedded in MyIntelliPatent
TOPAS originated technologies
Conclusions and follow-up
3
4. Speakers
Alberto Ciaramella background:
Intellisemantic CEO/founder in 2005.
Before that:
Researcher and Research Manager at
CSELT, the research branch of
Telecom Italia, for speech and and
Natural Language Processing.
Competitive Intelligence Manager at
Loquendo SpA, the CSELT spin-off
for speech and language
processing.
Marco Ciaramella background:
Intellisemantic product manager since
2009.
Before that:
Project officer at Enginering.IntelliSemantic 4
6. IntelliSemantic
Founded in 2005, in Torino
in the incubator of the Politecnico
di Torino.
Competences: natural language processing.
Research activities:
partner of the FP7 cofunded TOPAS
project. (Tool for Patent Analysis and
Summarization).
R&D internal activities for
MyIntelliPatent.
partner of some Piemonte or Veneto
region cofunded research projects forIntelliSemantic 6
8. On the information side, the number of worldwide
patents is continuously increasing, hence the effort
required for any kind of patent-related task.
On the user side, the number of companies whose
business can be affected by patent information is
increasing and include now also a significant
percentage of SMEs, which can be even more tight on
costs.
But if patent analyses are performed less frequently or
less deeply than required, a company can incur:
higher costs, if a company misses in due time a
competitor which can invalidate its research
efforts.
less benefits, if a company has not the time to
extract “hidden” suggestions from the patent
literature.
The patent information challenge
IntelliSemantic 8
9. A solution to this challenge is to deliver smarter tools
which allow professionals to concentrate their activities
in the higher value-added part of their activity.
Smarter tools can include features as:
Patent specific knowledge management, to:
learn, accumulate, and reuse the company
professionals knowledge.
provide a structured approach for different
use cases.
Intelligent language technologies to automatically
extract the text embedded knowledge, as the
most relevant entities and passages, and to
identify as well the patent document structure.
How to solve this challenge
IntelliSemantic 9
10. Patent tasks, phases and
technologies
Tasks
Phases
Technologies
Semantic technologies in more details
11. Patent informatics solutions
Patent informatics solutions can be categorized according
to three different dimensions:
tasks.
interaction phases.
technologies used.
This framework is useful:
to compare solutions.
to identify the potential benefits of a new technology
on different tasks and interaction phases.
IntelliSemantic 11
12. Tasks
monitoring:
new published applications, status
evolutions of already known documents.
searches:
Prior art, validity, freedom to operate.
analyses:
Technologies, competitors.
IntelliSemantic 12
13. Interaction phases
query:
by metadata, by text, by reference.
patent set results analysis:
extracts distributions (e,g. by
applicant).
identifies correlations.
ranks documents to analyze in more
detail.
single patent analysis:
identifies main sections.
identifies main topics.
navigates through topics and sections.
IntelliSemantic 13
14. Phases: some conclusions
the query and the patent set results
analysis are characterized by recall and
precision:
the recall is measured by (relevant
results found / total relevant results
in the data base).
the precision is measured by (relevant
results found / total results found).
recall and precision are inversely
related.
a safe strategy is to maximize the
recall of the query, then use precise
and efficient technologies to analyze
results.
the single patent analysis can becomeIntelliSemantic 14
16. Technology generations (1)
based on metadata only:
e.g. querying by IPC and applicant.
text-based, the most popular of which are:
boolean, e.g. querying by «speaker
recognition» AND «hidden Markov
models». Results are included or
not.
vector based, i.e. by comparing the
words sequence of the query and
the words sequence of results.
Results are ranked.
vector based with term
dependecies. A notewort example
is Latent Semantic Analysis.
Results can be clustered.IntelliSemantic 16
17. Technology generations (2)
vector based with terms interdependencies
have been called «semantic technologies»
in patent informatics, since the Latent
Semantic Algorithm (LSA) is the most
popular in this class.
LSA clusters cooccuring terms, hence
simulates an «intelligent behaviour».
these technologies are more typically focused
on recall.
IntelliSemantic 17
18. Technology generations (3)
second generation semantic solutions
can be defined, as those having at least
one of these characteristics:
to be user controllable, e.g. by relying
on user defined lexicons.
to include patent specific algoritms,
e.g. patent segmentation.
these technologies are more typically
focused on precision.
mantic 18
19. Technologies: conclusions
We ordered technologies by time, without
implying that a technology is superior to
others simply because it is the most
recent or that an older technology is to
deprecated.
Technologies of different generations can
coexist in the same applications:
for different tasks and phases
for different objectives, like to
increase the recall or to increase
the precision.
Define your requirements first, then select
a technology, but:
A new technology can suggest you new
requirements.IntelliSemantic 19
21. IntelliSemantic 21
A high level functions taxonomy
entities extraction:
Generic entities or tags.
Qualified entities: i.e. only measurements,
or substances, or methods.
entities relationships identification:
short range: to relate entities in the same
sentence.
long range: to relate claims and
description.
patent structure identification:
the patent is a structured text.
the role of an entity is section specific,
hence different in prior art or in claims.
22. IntelliSemantic 22
Technologies and application
Technologies mentioned in the previous slide,
can be used very differently, since they can
be used:
for different phases.
stand alone or in combination.
for enhancing a manual or an
automatic process.
The most important issue for selecting these
technologies is:
to figure out their advantages on the
application side.
to select the more appropriate
combination of application and
technology.
23. Generic entity (or tag) extractor
a tag is a word (e.g. “inductor”) or a
sequence of words (e.g. “speaker
verification”) having a well defined
meaning.
from the implementation point of view we
have to distinguish two phases:
the off line annotation.
the real time user interactions with
annotated documents.
this also applies to other technologies
mentioned in the following.
IntelliSemantic 23
24. Examples of applications enabled
to build up topic specific vocabularies,
from a topic-specific patent sets
collections.
for queries: to extract most relevant
topics in a patent and suggest them to
the user in task like validity search and
prior art search.
for patent set analyses:
to identify patents citing the same
topics.
to score patents by topics richness.
to identify topics distribution (by
applicant, by year).IntelliSemantic 24
25. Qualified entities
Measurements, which can include:
physical unit (e,g. Volt) and rank (e.g. milli)
numbers (e.g. 10) and numerals (e.g. ten).
closed intervals (e.g. between 1 and 2 nm)
open intervals (e.g. up to 1 nm).
tolerance values.
Citations of patents and non patent literature.
Substances, as «aluminium».
Processes, as «redundancy control».
Technical quality, as «piston speed».
IntelliSemantic 25
26. Examples of applications enabled
for patent set analyses:
to identify patents mentioning
specific measurements and
ranges.
to categorize patents more
related to substances, methods
and a combination of.
IntelliSemantic 26
27. Structure identification functions
to identify the structure of the description:
first level: as technical field,
background art, summary of
invention, description of drawings,
preferred embodiment.
second level, as preferred
embodiments.
third level, as. objective,
advantages.
to identify the structure of claims:
Interclaim, as dependent and
independent claims.
intraclaim, as preamble, transition,IntelliSemantic 27
28. Examples of applications enabled
patent segmentation only:
patent sets analyses: to select
specific patent sections, as
background art, and compare
them.
single patent analysis: to build a
patent document directory, which
can facilitate the patent
document navigation.
combined with entity extraction:
these technologies combine
naturally, since the meaning of
an entity can depend from theIntelliSemantic 28
30. IntelliSemantic 30
MyIntelliPatent
A smart solution for patent intelligence tasks.
MyIntelliPatent includes the company specific knowledge,
since it is provided as a password-protected Software as a
Service and repository. A company can build and access
to its specific vocabularies, patent sets, patent
annotations.
MyIntelliPatent supports structured interactions, as
detailed in the following.
MyIntelliPatent includes intelligent language technologies,
as detailed in the following.
31. Structured interaction
IntelliSemantic 31
Queries, by metadata, by
a reference patent, a
reference text or even by a
patent list
A first level results
analysis through
QuickView.
A second level analysis
and statistics inlcluding
metadata through
Search/Statistics
A third level analysis and
statistics including tags
through Tag and
Search/Statistics
32. IntelliSemantic
32
Second level analysis by Search/Statistics
The Search/Statistics page allows the user to identify most relevant patents
(by family size, by citations) or interesting (by applicant), to extract different
kind of results tables and to order these results by different criteria, to extract
statistics . Example shown here are only based on metadata. Tags allow
more refined analyses, as shown in the following slide.
33. Linguistic intelligence: Tags
A tag is a word (e.g. “inductor”) or a sequence of words
(e.g. “speaker verification”) having a well defined meaning.
Tags are a distinguishing feature in MyIntelliPatent.
MyIntelliPatent can:
suggest a topic specific vocabulary from a set of
topic specific patents.
allow the user to edit this suggested vocabulary.
apply the finally edited vocabulary to all
collections, in such a way that vocabulary tags in a
patent become new text-specific metadata.
different topic specific vocabularies can be present in
the same platform.
IntelliSemantic
33
35. Extracting a tags vocabulary
35
IntelliSemantic
The Edit & Tag page allows to extract more relevant tags from a set of patents, to
analyze these suggested tags, to edit them , to confirm the user validated
vocabulary of tags. The user can also copy and paste his/her suggested vocabulary.
36. First level analysis by QuickView
36
IntelliSemantic
.
This level of analysis provides a quick view of patent applicant, title,
summary and extracted tags, which is a good proxy for identifying the
patent interest for the user. In case of doubt, he/she can directly access
from this page the whole document. This level of analysis can be enough
for some tasks, as quick prior art searches.
37. IntelliSemantic
37
Tags in third level analysis: an example
Tags allow to identify most relevant concepts in a patent and allows to
extend the analysis based on metadata. This table summarizes the
number of patents by year using a specific tag, and allows to identify first
patents using a concept and the most popular concepts now.
38. TOPAS demo
The TOPAS project, participants and results
A demo with Patent description and Measurements
extraction
IntelliSemantic planned exploitation
39. TOPAS demo
IntelliSemantic
39
This demo exemplifies some second
generation semantic technologies not yet
integrated in MyIntelliPatent.
This demo was developed by
IntelliSemantic for the FP7 research project
TOPAS (Tool Platform for Patent Anaysis
and Summarization), which will be
summarized in the following.
-
40. The EU cofunded research project TOPAS
(Tool Platform for Patent Analysis and
Summarization) studied, prototyped and tested
some of second generation semantic
technologies for English, German and
French.
TOPAS was a 24 months FP7 Capacity
project, under grant agreement number FP7-
SME-2011 286639, from october 2011 to
september 2013.
-
TOPAS research project
40
IntelliSemantic
41. 5 TOPAS participants were Bruegman
Software, IALE, IntelliSemantic, University of
Stuttgart and University Pompeu Fabra.
Universities transferred all rights of
technologies.
SMEs have the whole ownership of TOPAS
technologies and are mutuallt independent
in the exploitation.
TOPAS had qualified advisors to provide
feedback on the application side; between them
we can mention EPO, Fraunhofer and some
companies and consultants.
-
TOPAS participants
41IntelliSemantic
42. TOPAS prototyped and tested solutions for:
Qualified entities extraction.
Entities relationship
identification.
Patent segmentation.
Patent summarization (not
detailed here)
In English, German and French
The overview of project results has been
recently published on WPI magazine,
march 2015, in the paper “Towards
content oriented patent document
processing: intelligent patent analysis and
summarization”.
TOPAS results
IntelliSemantic 42
43. Patent description – first level
This screenshot exemplifies the first level patent segmentation used to
analyze a patent set, e,g to focus the analysis of results to specific sections,
as in this case the background art.
IntelliSemantic 43
-
44. Measurement extraction
IntelliSemantic 44
-
This screenshot exemplifies the measurement extraction used to analyze a
single patent, i.e, to retrieve patent sections citing measurements and to
extract the meanings of these measurements.
45. IntelliSemantic has further developed some
some of the TOPAS technologies and is
ready to expioit them:
Integrated in new MyIntelliPatent
releases, e.g. to extend patent set
analyses and single patent analysis.
As technology engines to be integrated
into the customer platform, and to
extend it with features like patent
segmentation and qualified entities
extraction:
this last solution is more suitable
to advanced users, as patent
offices and big companies.
IntelliSemantic TOPAS exploitation
45IntelliSemantic
46. For more information
Visit us at stand 4 for more details about
MyIntelliPatent.
Other semantic technologies.
And/or:
Contact IntelliSemantic
e-mail info@intellisemantic.com
tel. +39 011 9550 380
for a Web Conference presentation.
46IntelliSemantic
47. 47
Licence
This work is licenced under Creative Commons
Attribution-NonCommercial-Share A like 3.0
Unported Licence
To view a copy of this licence visit:
http://creativecommons.org/licenses/by-nc-sa/3.0/
Editor's Notes
OK
Status
Introduction (ok)
Patent tasks (almost ok, but still requires 2 slides to conclude)
Second generation semantic technologies (the most significant section:it has to be substantially rewritten and simplidied)
MyIntelliPatent with semantics and demo (ok)
Other semantic technologies and demo (not difficult to do)
- Concusions (not difficult to do)
This slide has been produced initially for the PDG meeting, but it is now more precise
I think that it is kind to present itself. It is not easy to summarize a company in a slide, but we will try it. This IntelliSemantic in a slide.
The obvious «difficult» quesstions are about the company size, the market adoptation of IntelliSemantic solutions and so: be readu to answer!!!
Since the number of patents to monitor is continuously increasing, it is also increasing the effort required for any kind of patent analysis. Another factor which increases this effort is the increasing variety of relevant languages, with an increasing number of patents available only in non English language. This factor is a “supplier side” factor.
Ob the “user side”, an increasing number of companies, even SMEs, are in need to perform these analyses.
The focus is clearly to company professionals. In any case, we have to be ready to the eventual question: what’s is about external consultants?
Detail
Detail
Detail
Detail
Detail
Some more details are provided in http://en.wikipedia.org/wiki/Information_retrieval , which really cites 3*3 cases.
In any case, in our slide, we have included only the three most popular methods.
Detail
Detail
Detail
Fine. In any case, it is the main slide and it could be partially rewritten
More than «personalized solution» it is better to present it as «a knowledge management system» enhanced with «NL features».
This slide is nice, but a little too complicated here. Is is better to repace it with the table in the «quick unser manual»
This is just a slide presenting a screenshot with very interesting, although specific feature, which empasize the importance of tagging, which is an important feature of MyIntelliPatent.
Other more general screenshots should be provided , as the screenshot presenting the ordered list of results. In any case we prefer not to add other screenshots, since:
They can evolve with the evolution of the product, hence we could have an additional problem in the maintenance of this presentation
This presentation is typically followed by the product presentation, in which these details are more appropriate.
Motivate why tagging adds intelligence
This slide will be rewritten merging this information with the information of the table in the quick introduction
This slide can be reatained as an useful example
This is just a slide presenting a screenshot with very interesting, although specific feature, which empasize the importance of tagging, which is an important feature of MyIntelliPatent.
Other more general screenshots should be provided , as the screenshot presenting the ordered list of results. In any case we prefer not to add other screenshots, since:
They can evolve with the evolution of the product, hence we could have an additional problem in the maintenance of this presentation
This presentation is typically followed by the product presentation, in which these details are more appropriate.
Patent description – large grain is performat and efficient as well and it has to be «pushed»
Patent description – large grain is performat and efficient as well and it has to be «pushed»
Status: ok
Comment.
This slide motivates the audience to visit IntelliSemantic stand, since the solution is more feature rich than presened here, and at the same time it provides a German telephone number (no, in this slide, since the German number presently costs too much).