Interval Hashing Based Ranking

London Information Retrieval Meetup
21 OCT 2019
Music Information Retrieval Take 2
Interval Hashing Based Ranking
Andrea Gazzarini, Software Engineer
21th October 2019

▪ Software Engineer (1999-)
▪ “Hermit” Software Engineer (2010-)
▪ Java & Information Retrieval Passionate
▪ Apache Qpid (past) Committer
▪ Husband & Father
▪ Bass Player
Andrea Gazzarini, “Gazza”

Sease
Search Services
● Open Source Enthusiasts
● Apache Lucene/Solr experts
! Community Contributors
● Active Researchers
● Hot Trends : Learning To Rank, Document Similarity,
Search Quality Evaluation, Relevancy Tuning

✓Music Information Retrieval (MIR)
➢ Query by Humming and Cover Song Detection
➢ Chroma Feature
➢ Music Language Essentials
➢ Ranked Interval Hashing
➢ Next steps
➢ Q&A
Agenda

MIR is concerned with the extraction, analysis and usage of information about any kind of music
entity (e.g. a song or a music artist) on any representation level (for example, audio signal, symbolic MIDI
representation of a piece of music, or name of a music artist).”
Schedl, M.: Automatically extracting, analyzing and visualizing information on music artists from the world wide web.
Dissertation, Johannes Kepler University, Wien (2003)
Music information retrieval (MIR) is the interdisciplinary science of retrieving information from
music. MIR is a small but growing ﬁeld of research with many real-world applications. Those involved in
MIR may have a background in in musicology, psychoacoustics, psychology, academic music study,
signal processing, informatics, machine learning, optical music recognition, computational intelligence or
some combination of these.
https://en.wikipedia.org/wiki/Music_information_retrieval
Music Information Retrieval (MIR)

AUDIO IDENTIFICATION
GENRE IDENTIFICATION
TRANSCRIPTION RECOMMENDATION
SYMBOLIC SIMILARITY
MOOD
SOURCE SEPARATION
INSTRUMENT RECOGNITION
TEMPO ESTIMATION
SCORE ALIGNMENT
BEAT TRACKING
QUERY BY HUMMINGAUDIO IDENTIFICATION
INSTRUMENT RECOGNITION
GENRE IDENTIFICATION
TRANSCRIPTION RECOMMENDATION
TEMPO ESTIMATION
SONG STRUCTURE SCORE ALIGNMENT
COVER SONG DETECTION
SYMBOLIC SIMILARITY
KEY DETECTION
BEAT TRACKING
MOOD
SOURCE SEPARATION
Music Information Retrieval (MIR)

➢ Music Information Retrieval (MIR)
✓Query By Humming / Cover Song Detection
➢ Chroma Feature
➢ Next steps
➢ Q&A
Agenda

Query by humming (QbH) is a music retrieval
system that branches off the original classification
systems of title, artist, composer, and genre.
It normally applies to songs or other music with a
distinct single theme or melody. The system involves
taking a user-hummed melody (input query) and
comparing it to an existing database.
(Wikipedia)
What is it?
Query by Humming/Playing
The main challenge of such query paradigm is the
difference, in terms of audio signal, between the
input query and the stored dataset.

A cover song is an alternative version/recording of
a previously recorded song.
A cover can differ from the original song in several
features: key, timbre, tempo, structure, arrangement,
language, even melody and lyrics.
The goal of a cover song detection search algorithm
is to identify, for a query audio track, recordings that
belong to the same conceptual work/composition.
The main challenges are:
• find a set of features that capture the common
aspects of the target compositions
• give a rank to each response items (generally not
so important)
What is it?
Cover Song Detection

QBH + CSD

✓Chroma Feature
➢ Next steps
➢ Q&A
Agenda

Chroma features are a powerful representation for
music audio in which the entire spectrum is
projected onto 12 bins representing the 12 distinct
semitones (or chroma) of the musical octave.
The base assumption is that in music, notes exactly
one octave apart are perceived as “similar”. As
consequence of that, working on the distribution of
chroma without the absolute frequency can give
useful musical information about the audio
It’s a kind of analysis which bridges between low-
level and middle-level features, moving the audio
signal representation towards something which is
more readable, from a functional perspective.
Chroma Feature
Chroma Features (1/5)
Source: Wikipedia

C
D
D# D
DD
D
D
DC
C
C E
F
F#
A
A#
B
F#
F#
BB
EF
D
C
D
F

A Chroma Matrix consists of a 12 * tn matrix where
each vector represents the twelve pitches
strengths for a given instant t.
Sometimes the matrix is inverted so the same
information described above is available at row-level.
Chroma features can be computed using different
tools available on the internet including:
• LabROSA audio analysis tools (partially portable
to Octave)
• opensource tools written in several languages
Chroma Matrix
Pitches strengths at instant t1 where you can see the top (i.e. most intense) 3 classes
are A#, B and A

Time
C
D
E
F
G
A
B
C#
D#
F#
G#
A#
A A A A A A C A F F F F F F FG C C C C C C D C B B B B B B C B
N
O
I
S
E

➢ Chroma Feature
✓Music Language Essentials
➢ Next steps
➢ Q&A
Agenda

A note is used for denoting a sound, its pitch and duration
A sound is the audio signal produced by a vibrating body
Notes are associated to graphical symbols (for indicating the pitch and the duration)
Two notes with the same fundamental frequency in a ratio of any integer power of two are perceived as similar. As
consequence of that, we say they belong to the same pitch class
A note is also used for denoting a pitch class. The Western Music Theory individuates 12 pitch classes
Notes and Pitch classes are associated to mnemonic codes (e.g. C,D,E,F,G,A,B or DO,RE,MI,FA,SOL,LA,SI)
C D E
F G A B
C
B A
G F E D
C
C#
D# F# G#
A#
Bb Ab Gb Eb Db
Music Language Essentials
The distance between two subsequent notes is called a semitone

Text Music
Letter Note
Word
Phrase
Sentence
Chord
Ghost Note
Phrase
Text vs Music

Time Signature
Key Signature
Clef
Tempo
Note
Reference Chord
Chord
Score music representation

➢ Chroma Feature
✓Ranked Interval Hashing
➢ Next steps
➢ Q&A
Agenda

We’ve already seen Chroma Features analysis
normalises the entire spectrogram in classes,
loosing the absolute frequency.
That introduces a normalisation which actually puts
on the same layer notes belonging to the same
class.
This strictly relates to a concept known in music
theory as “Inversion”
“Inversion, in music, is the rearrangement of the top-
to-bottom elements in an interval, a chord, a melody,
or a group of contrapuntal lines of music.”
(Source: Encyclopedia Britannica)
Inversions
Assumption #1: Inversions as “Synonyms”
=
= =
AND =
IF
AND
THEN
～～
=～ =～ =～

In Music Theory, an interval is defined as the
distance between two notes.
In Western Music Theory the minimum distance
between two notes is called semitone. As
consequence of that, an interval measures the
number of semitones that occurs between two
notes.
The intervals concept allows to translate any
sequence of notes.
If we have a melody starting from a given note (e.g.
C) we can translate the whole sequence using a
different starting point (e.g. A): the most important
thing is that intervals must remain the same.
Intervals
Assumption #2: Intervals instead of Notes!

A simple matrix which defines all the possible
intervals between two notes.
Each interval has a precise name in Western Music
Theory (e.g. 0 = unison), but for our purposes seeing
the number (of semitones) is more important
The interval between two arbitrary notes can be
easily computed by using the following formula:
(defn semitones [x y]
(if (>= y x)
(- y x)
(+ (- 11 (- x 1)) y)))
Intervals Table
Intervals Table
C C# D D# E F F# G G# A A# B
C 0 1 2 3 4 5 6 7 8 9 10 11
C# 11 0 1 2 3 4 5 6 7 8 9 10
D 10 11 0 1 2 3 4 5 6 7 8 9
D# 9 10 11 0 1 2 3 4 5 6 7 8
E 8 9 10 11 0 1 2 3 4 5 6 7
F 7 8 9 10 11 0 1 2 3 4 5 6
F# 6 7 8 9 10 11 0 1 2 3 4 5
G 5 6 7 8 9 10 11 0 1 2 3 4
G
#
4 5 6 7 8 9 10 11 0 1 2 3
A 3 4 5 6 7 8 9 10 11 0 1 2
A# 2 3 4 5 6 7 8 9 10 11 0 1
B 1 2 3 4 5 6 7 8 9 10 11 0

For each Chroma Vector we keep the top k
most intense pitch energies.
Each class is denoted by its ordinal position
(the examples on the left provides, for
simplicity, the corresponding mnemonic code
as well).
The underlying assumption is that in order
to represent and summarise a chroma
vector, the stronger a class energy is, the
better.
• INPUT => n Chroma vectors
• OUTPUT => n Rank vectors
Top K classes
Step #1: Ranks
0.28956729767718553 0.29918483818230696 0.29750046253718804 0.26970075757159021
0.26570971680808941 0.28762092331555628 0.27230350897098865 0.26479060237734442
0.24563184404410343 0.35945972816613164 0.20234795661509283 0.19691813882136551
0.21751928582300245 0.31682129361484279 0.21327678582822374 0.19345434437896755
0.19067933460430175 0.19899201366683592 0.20626346569405868 0.11096279563650845
0.18864723689615112 0.15732581183385463 0.16582685461399227 0.13454776152251802
0.17659990800974107 0.12781101466846442 0.12998904712393031 0.17361679104251077
0.15805875199726557 0.11070659008562342 0.12210923914829071 0.14658222225322309
0.14652990682934677 0.10462518086147332 0.11147004101461088 0.11962189455960689
0.13012578265891636 0.08802297436725277 0.08795354261988087 0.07726694847307212
0.10615064267867232 0.06285819867921958 0.06280150731135973 0.04137698779765421
0.07978125487239938 0.04066800213125138 0.03900988562456922 0.01849582605529331
0 - C 2 - D 0 - C 0 - C
1 - C# 3 - D# 1 - C# 1 - C#
2 - D 0 - C 3 - D# 2 - D
3 - D# 1 - C# 4 - E 3 - D#
4 - E 4 - E 2 - D 6 - F#

For each pair of Rank vectors, for each
pair of pitch classes sharing the same
index, we compute the corresponding
interval.
The underlying assumption is that we want
to have a relative model which works
regardless the input key/tonality.
• INPUT => n Rank Vectors
• OUTPUT => n - 1 Interval Vectors
Distances
Step #2: Intervals
0 - C 2 - D 0 - C 0 - C
1 - C# 3 - D# 1 - C# 1 - C#
2 - D 0 - C 3 - D# 2 - D
3 - D# 1 - C# 4 - E 3 - D#
4 - E 4 - E 2 - D 6 - F#
(defn semitones [x y]
(if (>= y x) (- y x) (+ (- 11 (- x 1)) y)))
2 10 0
2 10 0
10 3 11
10 3 11
0 10 4

Step #3: Hashing
Each Interval Vector is reduced to a
number (hash).
The hash uniquely represents a given
intervals distribution.
• INPUT => n-1 Interval Vectors
• OUTPUT => 1 Hash Vector
Summarisation
2 10 0 …
2 10 0 …
10 3 11 …
10 3 11 …
0 10 4 …
∑xi *12^i
18476 213106 103536 …

Step #4: Partitions
The hash vector is partitioned using a size and an
overlap parameters (see examples on the left).
An audio input is therefore represented as a list of
partitions derived from the hash vector where
• each partition is a term
• ordering is preserved:
• in the term metadata (e.g. position increment)
• implicitly in the term value
Bags of Bags of Words
39483 4149 1440 3213 33232 44256 6654 4432 4149 7843 …
Partition of 3 hashes without overlap
[39483, 4149, 1440] [3213, 33232, 44256] [6654,4432, 4149], …
Partition of 3 hashes with overlap = 1
[39483, 4149, 1440] [1440, 3213, 33232] [33232, 44256, 6654], …
Partition of 5 hashes without overlap
[39483,4149,1440,3213,33232] [44256,6654,4432,4149,7843] …
(Examples)

Step #5: Multiple choices for the output
{
"title": "Zombie",
"hashes": [“39483,4149,1440”, “3213,33232,44256”, “6654,4432, 4149”, … ]
}
Only one partition type
Only one partition type with different overlaps
{
"title": "Zombie",
“hashes_overlap_0”: [“39483,4149,1440”, “3213,33232,44256”, “6654,4432, 4149”, … ],
“hashes_overlap_1”: [“39483,4149,1440”, “1440,3213,33232”, “33232,44256,6654”, … ],
…
}
Multiple partition types
{
"title": "Zombie",
“hashes_3”: [“39483,4149,1440”, “3213,33232,44256”, “6654,4432, 4149”, … ],
“hashes_5”: [“39483,4149,1440,3213,33232”, “33232,44256,6654,4432,4149”, … ],
…
}

Sample test query
Sample Data
Sample test data
Money, Pink Floyd
https://www.youtube.com/watch?v=-0kcet4aPpQ
{Q1 - Main Bass Riff (original tonality)
Q2 - Main Bass Riff (different tonality)
Q3 - Descendent pentatonic scale (different tonality)
Zombie, Cranberries
https://www.youtube.com/watch?v=6Ejga4kJUts
{
Q1 - Main Bass Riff (original tonality)
Q2 - Main Melody
Q3 - Chorus
Stand By Me, John Lennon
https://www.youtube.com/watch?v=YqB8Dm65X18
{Q1 - Main Bass Riff (different tonality)
Q2 - Main Melody

V1.5
Sample Config
V1.0
<requestHandler name="standard">
<lst name="defaults">
<str name="defType">edismax</str>
<str name="qf">hash</str>
<str name="mm">10%</str>
<str name="fl">*,score</str>
</lst>
</requestHandler><lst name="defaults">
<str name="defType">edismax</str>
<str name="qf">hash_3</str>
<str name="fl">*,score</str>
<str name="pf2">hash_2 hash_3^10 hash_4^100 hash_5^1000</str>
<str name="ps2">3</str>
<str name="pf3">hash_2^100 hash_3^1000 hash_4^10000 hash_5^100000</str>
<str name="ps3">2</str>
<str name="boost">product(query({!lucene df=hash_2 v=$q}), 10)</str>
<str name="tie">0.1</str>

Nice shot #1
Q2 - Money (Pink Floyd): main bass riff (different tonality)
Q1 - Money (Pink Floyd): main bass riff (original tonality)
Same melody (bass riff) in different tonalities yield similar results!
P@1 P@2 P@10 NDCG

Nice shot #2
Q2 - Zombie (Cranberries): main melody (different tonality)
Q1 - Zombie (Cranberries): main bass riff (original tonality)
Position matters!
P@1 P@2 P@10 NDCG

Nice shot #3
Good similarity even if the query pattern is not in result.
P@1

To be improved
Same query as before, red items have a very different “shape”
P@1

To be improved
If we were looking for “Money” these would look like great results…
…but unfortunately the input query here is the “Stand by me” bass line

➢ Chroma Feature
✓Next Steps
➢ Q&A
Agenda

Further areas of investigation
• Different types of Chroma Features
• “Stop” chroma vectors / intervals / energies
• Input data understanding & pre-processing
• Complementary approaches
• Additional audio features
• Query time
• Machine Learning

Chroma Matrix Clean Up
Original Threshold = 50% Threshold = 70% Threshold = 80% Top frequency

➢ Chroma Feature
➢ Next Steps
✓Q&A
Agenda

Links
Chroma Feature Analysis and Synthesis
https://labrosa.ee.columbia.edu/matlab/chroma-ansyn/
Chroma (Java)
https://github.com/adiyoss/Chroma
CHROMA TOOLBOX: MATLAB IMPLEMENTATIONS FOR
EXTRACTING VARIANTS OF CHROMA-BASED AUDIO FEATURES
https://www.academia.edu/30760846/
CHROMA_TOOLBOX_MATLAB_IMPLEMENTATIONS_FOR_EXTRACTING_VARIANTS_OF_CH
ROMA-BASED_AUDIO_FEATURES
FALCON: FAst Lucene-based Cover sOng identification
https://www.researchgate.net/publication/221571301_FALCON_FAst_Lucene-based_Cover_sOng_identification
Music Theory
https://en.wikipedia.org/wiki/Music_theory
Inversions
https://en.wikipedia.org/wiki/Inversion_(music)
Intervals
https://en.wikipedia.org/wiki/Interval_(music)
Chroma Feature
https://en.wikipedia.org/wiki/Chroma_feature

21 OCT 2019
Thank you!
Music Information Retrieval Take 2
Interval Hashing Based Ranking
Andrea Gazzarini, Software Engineer
21th October 2019

Interval Hashing Based Ranking

Recommended

Recommended

More Related Content

Similar to Interval Hashing Based Ranking

Similar to Interval Hashing Based Ranking (20)

Recently uploaded

Recently uploaded (20)

Interval Hashing Based Ranking