SlideShare a Scribd company logo
1 of 35
Searching Images by Color 
Chris Becker 
Search Engineering @ Shutterstock
What is Shutterstock? 
• Shutterstock sells stock images, videos & music. 
• Crowdsourced from artists around the world 
• Shutterstock reviews and indexes them for search 
• Customers buy a subscription and download them
Why search by color?
Stock photography on the internet… 
images from www.shutterstock.com
Stock photography on the internet… 
images from www.shutterstock.com
Color is one of many visual 
attributes that you can use 
to create an engaging 
image search experience
Shutterstock Labs 
Spectrum 
Palette
Diving into Color Data
Color Spaces 
• RGB 
• HSL 
• Lab 
• LCH 
images from www.wikipedia.org
Calculating Distances Between Colors 
• Euclidean distance works reasonably well in any color space 
distRGB = sqrt((r 
-r 
1 
)^2 + (g 
2 
-g 
1 
)^2 + (b 
2 
-b 
1 
)^2) 
2 
distHSL = sqrt((h 
-h 
1 
)^2 + (s 
2 
-s 
1 
)^2 + (l 
2 
-l 
1 
)^2) 
2 
distLCH = sqrt((L 
-L 
1 
)^2 + (C 
2 
-C 
1 
)^2 + (H 
2 
-H 
1 
)^2) 
2 
distLAB = sqrt((L 
-L 
1 
)^2 + (a 
2 
-a 
1 
)^2 + (b 
2 
-b 
1 
)^2) 
2 
• More sophisticated equations that better account for human 
perception can be found at 
http://en.wikipedia.org/wiki/Color_difference
Images are just numbers 
[ 
[[054,087,058], [054,116,206], [017,226,194], [234,203,215], [188,205,000], [229,156,182]], 
[[214,238,109], [064,190,104], [191,024,161], [104,071,036], [222,081,005], [204,012,113]], 
[[197,100,189], [159,204,024], [228,214,054], [250,098,125], [050,144,093], [021,122,101]], 
[[255,146,010], [115,156,002], [174,023,137], [161,141,077], [154,189,005], [242,170,074]], 
[[113,146,064], [196,057,200], [123,203,160], [066,090,234], [200,186,103], [099,074,037]], 
[[194,022,018], [226,045,008], [123,023,087], [171,029,021], [040,001,143], [255,083,194]], 
[[115,186,246], [025,064,109], [029,071,001], [140,031,002], [248,170,244], [134,112,252]], 
[[116,179,059], [217,205,159], [157,060,251], [151,205,058], [036,214,075], [107,103,130]], 
[[052,003,227], [184,037,078], [161,155,181], [051,070,186], [082,235,108], [129,233,211]], 
[[047,212,209], [250,236,085], [038,128,148], [115,171,113], [186,092,227], [198,130,024]], 
[[225,210,064], [123,049,199], [173,207,164], [161,069,220], [002,228,184], [170,248,075]], 
[[234,157,201], [168,027,113], [117,080,236], [168,131,247], [028,177,060], [187,147,084]], 
[[184,166,096], [107,117,037], [154,208,093], [237,090,188], [007,076,086], [224,239,210]], 
[[105,230,058], [002,122,240], [036,151,107], [101,023,149], [048,010,225], [109,102,195]], 
[[050,019,169], [219,235,027], [061,064,133], [218,221,113], [009,032,125], [109,151,137]], 
[[010,037,189], [216,010,101], [000,037,084], [166,225,127], [203,067,214], [110,020,245]], 
[[180,147,130], [045,251,177], [127,175,215], [237,161,084], [208,027,218], [244,194,034]], 
[[089,235,226], [106,219,220], [010,040,006], [094,138,058], [148,081,166], [249,216,177]], 
[[121,110,034], [007,232,255], [214,052,035], [086,100,020], [191,064,105], [129,254,207]], 
]
Any operation you can do on a set of 
numbers, you can do on an image 
• getting histograms 
• computing median values 
• standard deviations / variance 
• other statistics
Extracting Color Data
Tools & Libraries 
• ImageMagick 
• Python Image Library 
• ImageJ
# python example to get a histogram from an image 
import PIL 
from PIL import Image 
from pprint import pprint 
image = Image.open('./samplephoto.jpg') 
width, height = image.size 
colors = image.getcolors(width*height) 
hist = {} 
for i, c in enumerate(colors): 
hex = '%02x%02x%02x' % (c[1][0],c[1][1],c[1][2]) 
hist[hex] = c[0] 
pprint(hist)
Indexing & Searching 
in Solr
Indexing color histograms 
• index colors just like you would index text 
• amount of color = frequency of the term 
color_txt = "cfebc2 
cfebc2 cfebc2 cfebc2 
cfebc2 cfebc2 cfebc2 
cfebc2 cfebc2 cfebc2 
95bf40 95bf40 95bf40 
95bf40 95bf40 95bf40 
2e6b2e 2e6b2e 2e6b2e 
ff0000 …"
Solr Schema & Queries 
<field name="color" type="text_ws" …> 
• Can use solr’s default ranking effectively 
/solr/select?q=ff0000 e2c2d2&qf=color&defType=edismax… 
• or use term frequencies directly for specific sort functions: 
sort=product(tf(color,"ff0000"),tf(color,"e2c2d2")) desc
Indexing color statistics 
Represent aggregate statistics of each image 
lightness: 
median: 2 
standard dev: 1 
largest bin: 0 
largest bin size: 50 
saturation 
median: 0 
standard dev: 0 
largest bin: 0 
largest bin size: 100 
…
Solr Fields & Queries 
<field name=”hue_median” type=”int” …> 
• Sort by the distance between input param 
and median value for each image 
/solr/select?q=*&sort=abs(sub($query,hue_median)) asc
Ranking & Relevance
How much of the image has the color ? 
image from www.shutterstock.com
is this relevant if I search for ? 
image from www.shutterstock.com
which image is more relevant if I search for ? 
image from www.shutterstock.com
is this relevant if I search for ? 
image from www.shutterstock.com
How do we account for these factors?
How much of the image contains the 
selected color? 
• Score each color by the number of pixels 
sort=tf(color,"cfebc2") desc
Balance Precision and Recall 
• Reduce your colorspace enough 
to balance: 
• color accuracy 
• index size 
• query complexity 
• result counts 
• only need 100-200 colors for a good UX 
✓
Weighing Multiple Colors Together 
• If you search for 2 or more colors, the top result should have 
the most even distribution of those colors 
✓ 
• simple option: 
sort=product(tf(color,"ff9900"),tf(color,"2280e2")) desc 
• more complex: compute the standard deviation or variance 
of the term frequencies of matching color values for each 
image, and sort the results with the lowest variance first.
Weighing Similar & Different Colors 
• The score for one color should reflect all the colors in the image. 
• At indexing time, increase the score based on similar colors; 
decrease it based on differing colors.
Conclusion
Conclusion 
• Steps for building color search in Solr: 
• Extract colors using a tool like the Python Image Library 
• Score colors based on the number of pixels 
• Adjust scores based on similar / different colors 
• Index colors into Solr as text document 
• In your query, sort by the term frequency values for each 
color
One more demo…

More Related Content

Similar to Searching Images by Color Using Solr

Digitization Basics for Archives and Special Collections – Part 1: Select and...
Digitization Basics for Archives and Special Collections – Part 1: Select and...Digitization Basics for Archives and Special Collections – Part 1: Select and...
Digitization Basics for Archives and Special Collections – Part 1: Select and...WiLS
 
Efficient realization for geometric transformation of digital images in run l...
Efficient realization for geometric transformation of digital images in run l...Efficient realization for geometric transformation of digital images in run l...
Efficient realization for geometric transformation of digital images in run l...Shlomo Pongratz
 
Ch2
Ch2Ch2
Ch2teba
 
Helvetia
HelvetiaHelvetia
HelvetiaESUG
 
Overview of graphics systems
Overview of  graphics systemsOverview of  graphics systems
Overview of graphics systemsJay Nagar
 
Learn Creative Coding: Begin Programming with the Processing Language
Learn Creative Coding: Begin Programming with the Processing LanguageLearn Creative Coding: Begin Programming with the Processing Language
Learn Creative Coding: Begin Programming with the Processing LanguageW M Harris
 
Learn Creative Coding: Begin Programming with the Processing Language
Learn Creative Coding: Begin Programming with the Processing LanguageLearn Creative Coding: Begin Programming with the Processing Language
Learn Creative Coding: Begin Programming with the Processing Languageshelfrog
 
5707_10_auto-encoder.pptx
5707_10_auto-encoder.pptx5707_10_auto-encoder.pptx
5707_10_auto-encoder.pptxSidoriOne
 
What Color is Solid State Lighting - Panel Discussion
What Color is Solid State Lighting - Panel DiscussionWhat Color is Solid State Lighting - Panel Discussion
What Color is Solid State Lighting - Panel DiscussionCindy Foster-Warthen
 
Building Composable Abstractions
Building Composable AbstractionsBuilding Composable Abstractions
Building Composable AbstractionsEric Normand
 
Introduction to Coding
Introduction to CodingIntroduction to Coding
Introduction to CodingFabio506452
 
Multimedia
MultimediaMultimedia
MultimediaMR Z
 
Lecture 02 visualization and programming
Lecture 02   visualization and programmingLecture 02   visualization and programming
Lecture 02 visualization and programmingSmee Kaem Chann
 
ModuleII.ppt
ModuleII.pptModuleII.ppt
ModuleII.pptSKILL2021
 
Searching Images with MPEG-7 (& MPEG-7 Like) Powered Localized dEscriptors (S...
Searching Images with MPEG-7 (& MPEG-7 Like) Powered Localized dEscriptors (S...Searching Images with MPEG-7 (& MPEG-7 Like) Powered Localized dEscriptors (S...
Searching Images with MPEG-7 (& MPEG-7 Like) Powered Localized dEscriptors (S...Savvas Chatzichristofis
 
Overview of graphics systems.ppt
Overview of graphics systems.pptOverview of graphics systems.ppt
Overview of graphics systems.pptMalleshBettadapura1
 

Similar to Searching Images by Color Using Solr (20)

Style Guide
Style GuideStyle Guide
Style Guide
 
Digitization Basics for Archives and Special Collections – Part 1: Select and...
Digitization Basics for Archives and Special Collections – Part 1: Select and...Digitization Basics for Archives and Special Collections – Part 1: Select and...
Digitization Basics for Archives and Special Collections – Part 1: Select and...
 
Efficient realization for geometric transformation of digital images in run l...
Efficient realization for geometric transformation of digital images in run l...Efficient realization for geometric transformation of digital images in run l...
Efficient realization for geometric transformation of digital images in run l...
 
Ch2
Ch2Ch2
Ch2
 
Helvetia
HelvetiaHelvetia
Helvetia
 
Overview of graphics systems
Overview of  graphics systemsOverview of  graphics systems
Overview of graphics systems
 
Learn Creative Coding: Begin Programming with the Processing Language
Learn Creative Coding: Begin Programming with the Processing LanguageLearn Creative Coding: Begin Programming with the Processing Language
Learn Creative Coding: Begin Programming with the Processing Language
 
Learn Creative Coding: Begin Programming with the Processing Language
Learn Creative Coding: Begin Programming with the Processing LanguageLearn Creative Coding: Begin Programming with the Processing Language
Learn Creative Coding: Begin Programming with the Processing Language
 
5707_10_auto-encoder.pptx
5707_10_auto-encoder.pptx5707_10_auto-encoder.pptx
5707_10_auto-encoder.pptx
 
What Color is Solid State Lighting - Panel Discussion
What Color is Solid State Lighting - Panel DiscussionWhat Color is Solid State Lighting - Panel Discussion
What Color is Solid State Lighting - Panel Discussion
 
Building Composable Abstractions
Building Composable AbstractionsBuilding Composable Abstractions
Building Composable Abstractions
 
Introduction to Coding
Introduction to CodingIntroduction to Coding
Introduction to Coding
 
Multimedia
MultimediaMultimedia
Multimedia
 
Lecture 02 visualization and programming
Lecture 02   visualization and programmingLecture 02   visualization and programming
Lecture 02 visualization and programming
 
ModuleII.ppt
ModuleII.pptModuleII.ppt
ModuleII.ppt
 
ModuleII.ppt
ModuleII.pptModuleII.ppt
ModuleII.ppt
 
ModuleII.ppt
ModuleII.pptModuleII.ppt
ModuleII.ppt
 
Searching Images with MPEG-7 (& MPEG-7 Like) Powered Localized dEscriptors (S...
Searching Images with MPEG-7 (& MPEG-7 Like) Powered Localized dEscriptors (S...Searching Images with MPEG-7 (& MPEG-7 Like) Powered Localized dEscriptors (S...
Searching Images with MPEG-7 (& MPEG-7 Like) Powered Localized dEscriptors (S...
 
CBIR_white.ppt
CBIR_white.pptCBIR_white.ppt
CBIR_white.ppt
 
Overview of graphics systems.ppt
Overview of graphics systems.pptOverview of graphics systems.ppt
Overview of graphics systems.ppt
 

Recently uploaded

『澳洲文凭』买詹姆士库克大学毕业证书成绩单办理澳洲JCU文凭学位证书
『澳洲文凭』买詹姆士库克大学毕业证书成绩单办理澳洲JCU文凭学位证书『澳洲文凭』买詹姆士库克大学毕业证书成绩单办理澳洲JCU文凭学位证书
『澳洲文凭』买詹姆士库克大学毕业证书成绩单办理澳洲JCU文凭学位证书rnrncn29
 
Call Girls In The Ocean Pearl Retreat Hotel New Delhi 9873777170
Call Girls In The Ocean Pearl Retreat Hotel New Delhi 9873777170Call Girls In The Ocean Pearl Retreat Hotel New Delhi 9873777170
Call Girls In The Ocean Pearl Retreat Hotel New Delhi 9873777170Sonam Pathan
 
Git and Github workshop GDSC MLRITM
Git and Github  workshop GDSC MLRITMGit and Github  workshop GDSC MLRITM
Git and Github workshop GDSC MLRITMgdsc13
 
A Good Girl's Guide to Murder (A Good Girl's Guide to Murder, #1)
A Good Girl's Guide to Murder (A Good Girl's Guide to Murder, #1)A Good Girl's Guide to Murder (A Good Girl's Guide to Murder, #1)
A Good Girl's Guide to Murder (A Good Girl's Guide to Murder, #1)Christopher H Felton
 
Top 10 Interactive Website Design Trends in 2024.pptx
Top 10 Interactive Website Design Trends in 2024.pptxTop 10 Interactive Website Design Trends in 2024.pptx
Top 10 Interactive Website Design Trends in 2024.pptxDyna Gilbert
 
Magic exist by Marta Loveguard - presentation.pptx
Magic exist by Marta Loveguard - presentation.pptxMagic exist by Marta Loveguard - presentation.pptx
Magic exist by Marta Loveguard - presentation.pptxMartaLoveguard
 
『澳洲文凭』买拉筹伯大学毕业证书成绩单办理澳洲LTU文凭学位证书
『澳洲文凭』买拉筹伯大学毕业证书成绩单办理澳洲LTU文凭学位证书『澳洲文凭』买拉筹伯大学毕业证书成绩单办理澳洲LTU文凭学位证书
『澳洲文凭』买拉筹伯大学毕业证书成绩单办理澳洲LTU文凭学位证书rnrncn29
 
PHP-based rendering of TYPO3 Documentation
PHP-based rendering of TYPO3 DocumentationPHP-based rendering of TYPO3 Documentation
PHP-based rendering of TYPO3 DocumentationLinaWolf1
 
Elevate Your Business with Our IT Expertise in New Orleans
Elevate Your Business with Our IT Expertise in New OrleansElevate Your Business with Our IT Expertise in New Orleans
Elevate Your Business with Our IT Expertise in New Orleanscorenetworkseo
 
Q4-1-Illustrating-Hypothesis-Testing.pptx
Q4-1-Illustrating-Hypothesis-Testing.pptxQ4-1-Illustrating-Hypothesis-Testing.pptx
Q4-1-Illustrating-Hypothesis-Testing.pptxeditsforyah
 
Film cover research (1).pptxsdasdasdasdasdasa
Film cover research (1).pptxsdasdasdasdasdasaFilm cover research (1).pptxsdasdasdasdasdasa
Film cover research (1).pptxsdasdasdasdasdasa494f574xmv
 
Contact Rya Baby for Call Girls New Delhi
Contact Rya Baby for Call Girls New DelhiContact Rya Baby for Call Girls New Delhi
Contact Rya Baby for Call Girls New Delhimiss dipika
 
定制(AUT毕业证书)新西兰奥克兰理工大学毕业证成绩单原版一比一
定制(AUT毕业证书)新西兰奥克兰理工大学毕业证成绩单原版一比一定制(AUT毕业证书)新西兰奥克兰理工大学毕业证成绩单原版一比一
定制(AUT毕业证书)新西兰奥克兰理工大学毕业证成绩单原版一比一Fs
 
Font Performance - NYC WebPerf Meetup April '24
Font Performance - NYC WebPerf Meetup April '24Font Performance - NYC WebPerf Meetup April '24
Font Performance - NYC WebPerf Meetup April '24Paul Calvano
 
NSX-T and Service Interfaces presentation
NSX-T and Service Interfaces presentationNSX-T and Service Interfaces presentation
NSX-T and Service Interfaces presentationMarko4394
 
办理(UofR毕业证书)罗切斯特大学毕业证成绩单原版一比一
办理(UofR毕业证书)罗切斯特大学毕业证成绩单原版一比一办理(UofR毕业证书)罗切斯特大学毕业证成绩单原版一比一
办理(UofR毕业证书)罗切斯特大学毕业证成绩单原版一比一z xss
 
办理多伦多大学毕业证成绩单|购买加拿大UTSG文凭证书
办理多伦多大学毕业证成绩单|购买加拿大UTSG文凭证书办理多伦多大学毕业证成绩单|购买加拿大UTSG文凭证书
办理多伦多大学毕业证成绩单|购买加拿大UTSG文凭证书zdzoqco
 
Potsdam FH学位证,波茨坦应用技术大学毕业证书1:1制作
Potsdam FH学位证,波茨坦应用技术大学毕业证书1:1制作Potsdam FH学位证,波茨坦应用技术大学毕业证书1:1制作
Potsdam FH学位证,波茨坦应用技术大学毕业证书1:1制作ys8omjxb
 
定制(Lincoln毕业证书)新西兰林肯大学毕业证成绩单原版一比一
定制(Lincoln毕业证书)新西兰林肯大学毕业证成绩单原版一比一定制(Lincoln毕业证书)新西兰林肯大学毕业证成绩单原版一比一
定制(Lincoln毕业证书)新西兰林肯大学毕业证成绩单原版一比一Fs
 
Call Girls Near The Suryaa Hotel New Delhi 9873777170
Call Girls Near The Suryaa Hotel New Delhi 9873777170Call Girls Near The Suryaa Hotel New Delhi 9873777170
Call Girls Near The Suryaa Hotel New Delhi 9873777170Sonam Pathan
 

Recently uploaded (20)

『澳洲文凭』买詹姆士库克大学毕业证书成绩单办理澳洲JCU文凭学位证书
『澳洲文凭』买詹姆士库克大学毕业证书成绩单办理澳洲JCU文凭学位证书『澳洲文凭』买詹姆士库克大学毕业证书成绩单办理澳洲JCU文凭学位证书
『澳洲文凭』买詹姆士库克大学毕业证书成绩单办理澳洲JCU文凭学位证书
 
Call Girls In The Ocean Pearl Retreat Hotel New Delhi 9873777170
Call Girls In The Ocean Pearl Retreat Hotel New Delhi 9873777170Call Girls In The Ocean Pearl Retreat Hotel New Delhi 9873777170
Call Girls In The Ocean Pearl Retreat Hotel New Delhi 9873777170
 
Git and Github workshop GDSC MLRITM
Git and Github  workshop GDSC MLRITMGit and Github  workshop GDSC MLRITM
Git and Github workshop GDSC MLRITM
 
A Good Girl's Guide to Murder (A Good Girl's Guide to Murder, #1)
A Good Girl's Guide to Murder (A Good Girl's Guide to Murder, #1)A Good Girl's Guide to Murder (A Good Girl's Guide to Murder, #1)
A Good Girl's Guide to Murder (A Good Girl's Guide to Murder, #1)
 
Top 10 Interactive Website Design Trends in 2024.pptx
Top 10 Interactive Website Design Trends in 2024.pptxTop 10 Interactive Website Design Trends in 2024.pptx
Top 10 Interactive Website Design Trends in 2024.pptx
 
Magic exist by Marta Loveguard - presentation.pptx
Magic exist by Marta Loveguard - presentation.pptxMagic exist by Marta Loveguard - presentation.pptx
Magic exist by Marta Loveguard - presentation.pptx
 
『澳洲文凭』买拉筹伯大学毕业证书成绩单办理澳洲LTU文凭学位证书
『澳洲文凭』买拉筹伯大学毕业证书成绩单办理澳洲LTU文凭学位证书『澳洲文凭』买拉筹伯大学毕业证书成绩单办理澳洲LTU文凭学位证书
『澳洲文凭』买拉筹伯大学毕业证书成绩单办理澳洲LTU文凭学位证书
 
PHP-based rendering of TYPO3 Documentation
PHP-based rendering of TYPO3 DocumentationPHP-based rendering of TYPO3 Documentation
PHP-based rendering of TYPO3 Documentation
 
Elevate Your Business with Our IT Expertise in New Orleans
Elevate Your Business with Our IT Expertise in New OrleansElevate Your Business with Our IT Expertise in New Orleans
Elevate Your Business with Our IT Expertise in New Orleans
 
Q4-1-Illustrating-Hypothesis-Testing.pptx
Q4-1-Illustrating-Hypothesis-Testing.pptxQ4-1-Illustrating-Hypothesis-Testing.pptx
Q4-1-Illustrating-Hypothesis-Testing.pptx
 
Film cover research (1).pptxsdasdasdasdasdasa
Film cover research (1).pptxsdasdasdasdasdasaFilm cover research (1).pptxsdasdasdasdasdasa
Film cover research (1).pptxsdasdasdasdasdasa
 
Contact Rya Baby for Call Girls New Delhi
Contact Rya Baby for Call Girls New DelhiContact Rya Baby for Call Girls New Delhi
Contact Rya Baby for Call Girls New Delhi
 
定制(AUT毕业证书)新西兰奥克兰理工大学毕业证成绩单原版一比一
定制(AUT毕业证书)新西兰奥克兰理工大学毕业证成绩单原版一比一定制(AUT毕业证书)新西兰奥克兰理工大学毕业证成绩单原版一比一
定制(AUT毕业证书)新西兰奥克兰理工大学毕业证成绩单原版一比一
 
Font Performance - NYC WebPerf Meetup April '24
Font Performance - NYC WebPerf Meetup April '24Font Performance - NYC WebPerf Meetup April '24
Font Performance - NYC WebPerf Meetup April '24
 
NSX-T and Service Interfaces presentation
NSX-T and Service Interfaces presentationNSX-T and Service Interfaces presentation
NSX-T and Service Interfaces presentation
 
办理(UofR毕业证书)罗切斯特大学毕业证成绩单原版一比一
办理(UofR毕业证书)罗切斯特大学毕业证成绩单原版一比一办理(UofR毕业证书)罗切斯特大学毕业证成绩单原版一比一
办理(UofR毕业证书)罗切斯特大学毕业证成绩单原版一比一
 
办理多伦多大学毕业证成绩单|购买加拿大UTSG文凭证书
办理多伦多大学毕业证成绩单|购买加拿大UTSG文凭证书办理多伦多大学毕业证成绩单|购买加拿大UTSG文凭证书
办理多伦多大学毕业证成绩单|购买加拿大UTSG文凭证书
 
Potsdam FH学位证,波茨坦应用技术大学毕业证书1:1制作
Potsdam FH学位证,波茨坦应用技术大学毕业证书1:1制作Potsdam FH学位证,波茨坦应用技术大学毕业证书1:1制作
Potsdam FH学位证,波茨坦应用技术大学毕业证书1:1制作
 
定制(Lincoln毕业证书)新西兰林肯大学毕业证成绩单原版一比一
定制(Lincoln毕业证书)新西兰林肯大学毕业证成绩单原版一比一定制(Lincoln毕业证书)新西兰林肯大学毕业证成绩单原版一比一
定制(Lincoln毕业证书)新西兰林肯大学毕业证成绩单原版一比一
 
Call Girls Near The Suryaa Hotel New Delhi 9873777170
Call Girls Near The Suryaa Hotel New Delhi 9873777170Call Girls Near The Suryaa Hotel New Delhi 9873777170
Call Girls Near The Suryaa Hotel New Delhi 9873777170
 

Searching Images by Color Using Solr

  • 1.
  • 2. Searching Images by Color Chris Becker Search Engineering @ Shutterstock
  • 3. What is Shutterstock? • Shutterstock sells stock images, videos & music. • Crowdsourced from artists around the world • Shutterstock reviews and indexes them for search • Customers buy a subscription and download them
  • 4. Why search by color?
  • 5. Stock photography on the internet… images from www.shutterstock.com
  • 6. Stock photography on the internet… images from www.shutterstock.com
  • 7. Color is one of many visual attributes that you can use to create an engaging image search experience
  • 10. Color Spaces • RGB • HSL • Lab • LCH images from www.wikipedia.org
  • 11. Calculating Distances Between Colors • Euclidean distance works reasonably well in any color space distRGB = sqrt((r -r 1 )^2 + (g 2 -g 1 )^2 + (b 2 -b 1 )^2) 2 distHSL = sqrt((h -h 1 )^2 + (s 2 -s 1 )^2 + (l 2 -l 1 )^2) 2 distLCH = sqrt((L -L 1 )^2 + (C 2 -C 1 )^2 + (H 2 -H 1 )^2) 2 distLAB = sqrt((L -L 1 )^2 + (a 2 -a 1 )^2 + (b 2 -b 1 )^2) 2 • More sophisticated equations that better account for human perception can be found at http://en.wikipedia.org/wiki/Color_difference
  • 12. Images are just numbers [ [[054,087,058], [054,116,206], [017,226,194], [234,203,215], [188,205,000], [229,156,182]], [[214,238,109], [064,190,104], [191,024,161], [104,071,036], [222,081,005], [204,012,113]], [[197,100,189], [159,204,024], [228,214,054], [250,098,125], [050,144,093], [021,122,101]], [[255,146,010], [115,156,002], [174,023,137], [161,141,077], [154,189,005], [242,170,074]], [[113,146,064], [196,057,200], [123,203,160], [066,090,234], [200,186,103], [099,074,037]], [[194,022,018], [226,045,008], [123,023,087], [171,029,021], [040,001,143], [255,083,194]], [[115,186,246], [025,064,109], [029,071,001], [140,031,002], [248,170,244], [134,112,252]], [[116,179,059], [217,205,159], [157,060,251], [151,205,058], [036,214,075], [107,103,130]], [[052,003,227], [184,037,078], [161,155,181], [051,070,186], [082,235,108], [129,233,211]], [[047,212,209], [250,236,085], [038,128,148], [115,171,113], [186,092,227], [198,130,024]], [[225,210,064], [123,049,199], [173,207,164], [161,069,220], [002,228,184], [170,248,075]], [[234,157,201], [168,027,113], [117,080,236], [168,131,247], [028,177,060], [187,147,084]], [[184,166,096], [107,117,037], [154,208,093], [237,090,188], [007,076,086], [224,239,210]], [[105,230,058], [002,122,240], [036,151,107], [101,023,149], [048,010,225], [109,102,195]], [[050,019,169], [219,235,027], [061,064,133], [218,221,113], [009,032,125], [109,151,137]], [[010,037,189], [216,010,101], [000,037,084], [166,225,127], [203,067,214], [110,020,245]], [[180,147,130], [045,251,177], [127,175,215], [237,161,084], [208,027,218], [244,194,034]], [[089,235,226], [106,219,220], [010,040,006], [094,138,058], [148,081,166], [249,216,177]], [[121,110,034], [007,232,255], [214,052,035], [086,100,020], [191,064,105], [129,254,207]], ]
  • 13. Any operation you can do on a set of numbers, you can do on an image • getting histograms • computing median values • standard deviations / variance • other statistics
  • 14.
  • 16. Tools & Libraries • ImageMagick • Python Image Library • ImageJ
  • 17. # python example to get a histogram from an image import PIL from PIL import Image from pprint import pprint image = Image.open('./samplephoto.jpg') width, height = image.size colors = image.getcolors(width*height) hist = {} for i, c in enumerate(colors): hex = '%02x%02x%02x' % (c[1][0],c[1][1],c[1][2]) hist[hex] = c[0] pprint(hist)
  • 19. Indexing color histograms • index colors just like you would index text • amount of color = frequency of the term color_txt = "cfebc2 cfebc2 cfebc2 cfebc2 cfebc2 cfebc2 cfebc2 cfebc2 cfebc2 cfebc2 95bf40 95bf40 95bf40 95bf40 95bf40 95bf40 2e6b2e 2e6b2e 2e6b2e ff0000 …"
  • 20. Solr Schema & Queries <field name="color" type="text_ws" …> • Can use solr’s default ranking effectively /solr/select?q=ff0000 e2c2d2&qf=color&defType=edismax… • or use term frequencies directly for specific sort functions: sort=product(tf(color,"ff0000"),tf(color,"e2c2d2")) desc
  • 21. Indexing color statistics Represent aggregate statistics of each image lightness: median: 2 standard dev: 1 largest bin: 0 largest bin size: 50 saturation median: 0 standard dev: 0 largest bin: 0 largest bin size: 100 …
  • 22. Solr Fields & Queries <field name=”hue_median” type=”int” …> • Sort by the distance between input param and median value for each image /solr/select?q=*&sort=abs(sub($query,hue_median)) asc
  • 24. How much of the image has the color ? image from www.shutterstock.com
  • 25. is this relevant if I search for ? image from www.shutterstock.com
  • 26. which image is more relevant if I search for ? image from www.shutterstock.com
  • 27. is this relevant if I search for ? image from www.shutterstock.com
  • 28. How do we account for these factors?
  • 29. How much of the image contains the selected color? • Score each color by the number of pixels sort=tf(color,"cfebc2") desc
  • 30. Balance Precision and Recall • Reduce your colorspace enough to balance: • color accuracy • index size • query complexity • result counts • only need 100-200 colors for a good UX ✓
  • 31. Weighing Multiple Colors Together • If you search for 2 or more colors, the top result should have the most even distribution of those colors ✓ • simple option: sort=product(tf(color,"ff9900"),tf(color,"2280e2")) desc • more complex: compute the standard deviation or variance of the term frequencies of matching color values for each image, and sort the results with the lowest variance first.
  • 32. Weighing Similar & Different Colors • The score for one color should reflect all the colors in the image. • At indexing time, increase the score based on similar colors; decrease it based on differing colors.
  • 34. Conclusion • Steps for building color search in Solr: • Extract colors using a tool like the Python Image Library • Score colors based on the number of pixels • Adjust scores based on similar / different colors • Index colors into Solr as text document • In your query, sort by the term frequency values for each color