SlideShare a Scribd company logo
1 of 21
"Socializing 'Big Data':
             Collaborative Opportunities in
Computer Science, the Social Sciences, and the Humanities"




                Richard Marciano
                      UNC Chapel Hill
                 richard_marciano@unc.edu


                     http://salt.unc.edu
             http://digitalinnovation.unc.edu
Current research Areas

•records in the cloud,
•big cultural data,
•access to big heterogeneous data,
•federated grid/cloud storage,
•visual interfaces to large collections,
•policy-based frameworks to automate content management,
•distributed cyberinfrastructure to enable data sharing.
Records in the Cloud




Kickoff meeting on Feb. 5, 2013
•UBC iSchool, Faculty of Law, School of Bus.
•UW iSchool
•Mid-Sweden Info. Tech and Media,
 Delegating to cloud providers the responsibility for security,
 accessibility, disposition and preservation.
•   Grids in Context
1998                                                 •
                                                            •     Larry Smarr
                                                         Computational Grids
                                                            •     Ian Foster and Carl Kesselman
                                                     •   Distributed Supercomputing Applications
                                                            •     Paul Messina
                                                     •   Realtime Widely Distributed Instrumentation
                                                            •     William E. Johnston
                                                     •   Data-Intensive Computing
                                                             •    Reagan Moore, … Richard Marciano, …
                                                     •   Teleimmersion
                                                             •    Tom DeFanti and Rick Stevens
                                                     •   Application-Specific Tools
                                                             •    Henri Casanova, Jack Dongarra, …
                                                     •   Compilers, Languages, and Libraries
                                                             •    Ken Kennedy
                                                     •   Object-Based Approaches
                                                             •    Dennis Gannon, Andrew Grimshaw
                                                     •   High-Performance Commodity Computing
                                                             •    Geoffrey Fox, Wojtek Furmanski
                                                     •   The Globus Toolkit
                                                             •    Ian Foster, Carl Kesselman
                                                     •   High-Performance Schedulers
                                                             •    Francine Berman
                                                     •   High-Throughput Resource Management
                                                             •    Miron Livny, Rajesh Raman
                                                     •   Instrumentation and Measurement
                                                             •    Jeffrey Hollingsworth, Bart Miller
                                                     •   Performance Analysis and Visualization
                                                             •    Daniel Reed, Randy Ribler
                                                     •   Security, Accounting, and Assurance
                                                             •    Clifford Neuman
                                                     •
2003                                                     Computing Platforms
       Tony Hey:                                             •    Andrew Chien
       “The Data Deluge: An e-Science Perspective”   •   Network Protocols
                                                             •    P.M. Melliar-Smith, Louise Moser
                                                     •   Network Quality of Service
                                                             •    Roch Guerin, Henning Schultzrinne
                                                     •   Operating Systems and Network Interfaces
                                                             •    Peter Druschel, Larry Peterson
                                                     •   Network Infrastructure
         Collaborative Science
2004                                                 •
                                                             •    Jon Postel, Joe Touch
                                                         Testbeds: Bridges from Research to Infrastructure
                                                             •    Charlie Catlett, John Toole
Big Data is a Big Deal
White House announcement:
    http://www.whitehouse.gov/blog/2012/03/29/big-data-big-deal
Big Data Across the Federal Government:
    http://www.whitehouse.gov/sites/default/files/microsites/ostp/big_data_fact_sheet_final_1.pdf
More then $200M in new commitments (NSF, HHS/NIH, DOE, DOD, DARPA, USGS)
Goal: “improve the ability to extract knowledge and insights from large and complex
collections of digital data”.

   DataNet
      Long-term preservation and access of data
   Software Infrastructure for Sustained Innovation (SI2)
   Digging Into Data Challenge (NSF/NEH/IMLS & JISC)
      Computational Humanities
   Cyber-Enabled Discovery and Innovation (CDI)
      Data enabled science and engineering
   Core Techniques and Technologies for Advancing Big
    Data Science & Engineering (BIGDATA)
   Data Infrastructure Building Blocks (DIBBs)
   DataWay
      National Infrastructure for Heterogeneous Data
“Size Matters:
    Big Data, New Vistas in the Humanities and Social Sciences”:
                               DataEdge, UC Berkeley May 31, 2012



Geoffrey Nunberg Panel:

                          “Something seems to happen, people feel, when you
                          get to that 13th zero, or 15th zero, or 18th zero, or 21st
                          zero, wherever it is, and bingo it’s the petabyte age, it’s
                          the age of big data.
                          It’s like combing your hair, you just comb, and comb,
                          and comb, and all of a sudden it’s like big hair.”
                          “The question is whether the advent of big data
                          changes the way we do social science and also what
                          role social scientists will play…”

12/31/2012 Forbes article by Edd Dumbill: “Big Data, Big Hype: Big Deal”
“Big data is an imprecise term. As such it’s a huge boon to marketers… not
everyone is pleased with the “bigger is better” argument. “Big data” really means
“smart use of data”.
Allistair Croll: “Big Data is our Generation’s civil rights issue, an we don’t know it.”

  “Personalization” is another word for discrimination. We’re not discriminating if we tailor
  things to you based on what we know about you — right? That’s just better service.

                                       When bank managers tried to restrict loans to residents of
                                       certain areas (known as redlining) Congress stepped in to
                                       stop it (with the Fair Housing Act of 1968). They were able
                                       to legislate against discrimination, making it illegal to change
                                       loan policy based on someone’s race.
                                       Home Owners’ Loan Corporation map showing redlining of “hazardous”
                                       districts in 1936. see: DURHAM MAPS for T-RACES –project

Music selection and sharing with friends could allow to guess a person’s
racial background and deny a loan.




                                                       Publicly available last name information can
                                                       be used to generate racial boundary maps.

                                                       From the Mapping London project
Big Data = Big Collaborations
May 2007
      Socializing CI:
Networking the Humanities,
 Arts, and Social Sciences
TUCASI data-Infrastructure Project (TIP)
   TUCASI data-Infrastructure Project (TIP)
  Managing Digital Research Data in Federated Storage
  Managing Digital Research Data in Federated Storage
                        Clouds
                         Clouds
• Project Lead:        Richard Marciano (UNC/SALT)

• Project Manager:     Amy Shoop (UNC ITS)

• Oversight Council
  – CIOs                           -- Head Librarians
      • Tracy Futhey -- Duke CIO         Deborah Jakubs -- Duke Librarian
      • Marc Hoit – NCSU CIO             Susan Nutter – NCSU Librarian
      • Larry Conrad – UNC CIO           Sara Michalak – UNC Librarian
  – RENCI
      • Alan Blatecky -- RENCI          Stan Ahalt -- RENCI
  – DICE Center
      • Reagan Moore – DICE
  – SALT Lab
      • Richard Marciano -- SALT
Focus Group Membership
                                                  University Teams
Focus
                           Duke                      Chapel Hill                              NC State
Groups

                                           Suzanne Cadwell (ITS-Academic
Classroom       Samantha Earp (CC             Outreach & Engagement)              Lou Harrison (DELTA)
                  lead) (OIT-Academic      Charlie Greene (ITS-Teaching &         Hal Meeks (OIT-Outreach,
Capture            Services)                  Learning)                              Communications and Consulting)
                                           Pam Sessoms (Lib-e-Reference)

                Amy Brooks (OIT-Systems)
                Klara Jelinkova (OIT-      Reagan Moore (S lead) (DICE)
                   Shared Services &       Leesa Brieger (RENCI-Data)
                   Infrastructure)         Brent Caison (ITS-Storage)             Steve Morris (Lib-Systems)
Storage         David Kennedy (Lib-Info.   Dave Pcolar (Lib-Systems)              Eric Sills (OIT-Research Computing)
                   Sys. Support)           Bill Schulz (Lib-Systems)
                Molly Tamarkin (Lib-       Lisa Stillwell (RENCI-Data)
                   Systems)
                Jim Tuttle (Lib-Systems)


Future Data &   Paolo Mangiafico (Provost- Ruth Marinshaw (ITS-Research           Kristin Antelman (FD&P lead)
                                              Computing)
                  Dig. Info. Strategy)                                               (Lib)
Policy          Tim Pyatt (Lib-Archives)   Will Owen (Lib-Systems)                Susan Nutter (Lib-Head Librarian)
                                           Rich Szary (Lib-Special Collections)
30 funded
57 total
“Public Scholarship”
                            Kathy Woodward, UW Simpson Center for the Humanities


UNC, Duke, Asheville collaboration
   •   University of North Carolina Asheville (UNCA): staff (provost, head librarian, head of
       special collections, library staff, departments of computer science / history / political
       science), centers (National Environmental Modeling and Analysis Center / Center for
       Diversity Education), and students
   •   community-based development organizations (Green Opportunities Corps, Asheville
       Design Center)
   •   neighborhood community group leaders and residents (Southside, Burton Street, East
       End)
   •   city of Asheville officials (Housing Authority of the City of Asheville, Planning &
       Development Department, West Asheville Public Library, Chamber of Commerce)
   •   county (head of Buncombe County Register of Deeds, Land-Of-Sky Regional Council)
   •   other groups including the North Carolina Humanities Council, Mountain Housing
       Opportunities Inc.
   •   “Twilight of a Neighborhood: Asheville’s East End, 1970” project. This project
       examined the process and aftermath of urban renewal and collected voices of residents,
       after the 2007 transfer of records to UNC Asheville. We have secured support
       and commitment from the community groups relevant to tackling this project.
   •   Asheville’s African-American Community Historical Bus Tour, June 19, 2012 (35
       people)
UNCA & Asheville Partners:
  • Dwight Mullen, UNCA Political Science
  • Priscilla Ndiaye, chair of Asheville's Southside Advisory Commi
Big Heterogeneous Data (with Duke)
                    Mapping historical residential segregation in the
                    US
                    Researching the cyberinfrastructure implications of
                    supporting large scale content based indexing of highly
                    heterogeneous digital collections potentially embodying non-
                    uniform or sparse metadata architectures…

Intellectual Merit:
Demonstrating the creation of national collections through automation and citizen-
scientist crowdsourcing efforts is the focus of this task.

Broader Impacts:
This case-study will bring heterogeneous content from a variety of sources:
census, economic, historic, planning, insurance, financial, and scientific.

Outcomes:
Worfklows & Visual prototype
From Crowdsourcing to Citizen-led Sourcing




•   Neighborhood community group leaders and residents (Southside, Burton Street, East End)

•   University of North Carolina Asheville staff (provost, head of special collections, library staff, departments of
    computer science / history / political science), centers (Renaissance Computing Institute / Center for Diversity
    Education), and students

•   Community-based development organizations (Green Opportunities Corps, Asheville Design Center)

•   City of Asheville officials (Housing Authority of the City of Asheville, Register of Deeds, GIS, Planning &
    Development Department, , West Asheville Public Library, Chamber of Commerce, Regional Council)

•   Other groups including the North Carolina Humanities Council, Mountain Housing Opportunities Inc., Twilight
    of a Neighborhood.
SALT
                                           SALT
  “We define the ‘discipline of data curation’ as the practice of collection,
annotation, conditioning , and preservation of data for both current and future
                                     use”
                      – Helen Tibbo & Bryan H eidorn



       Governance                                        Policy
       conditioning                                    annotation

                 Content
                collection
                                                        Infrastructure
                                                         preservation
current & future use
    Evolution


 Vectors – Annenberg Center for Communication                       SDSC: SALT

More Related Content

Viewers also liked

Fall Directors 2014: Junior/Upperclass Research Projects Presentation
Fall Directors 2014: Junior/Upperclass Research Projects PresentationFall Directors 2014: Junior/Upperclass Research Projects Presentation
Fall Directors 2014: Junior/Upperclass Research Projects PresentationBonner Foundation
 
Designing Course-Based, Student-Faculty Collaborative Research Projects Usi...
Designing Course-Based,  Student-Faculty Collaborative  Research Projects Usi...Designing Course-Based,  Student-Faculty Collaborative  Research Projects Usi...
Designing Course-Based, Student-Faculty Collaborative Research Projects Usi...Rebecca Davis
 
School Science Projects based on Experiments
School Science Projects based on ExperimentsSchool Science Projects based on Experiments
School Science Projects based on ExperimentsHiran Amarasekera
 
Job satisfaction Research based project
Job satisfaction Research based projectJob satisfaction Research based project
Job satisfaction Research based projectHARSH SHAH
 
Bootstrapping Machine Learning
Bootstrapping Machine LearningBootstrapping Machine Learning
Bootstrapping Machine LearningLouis Dorard
 
Data+Science : A First Course
Data+Science : A First CourseData+Science : A First Course
Data+Science : A First CourseArnab Majumdar
 
Project based learning with ICT
Project based learning with ICTProject based learning with ICT
Project based learning with ICTPetya Assenova
 

Viewers also liked (9)

Fall Directors 2014: Junior/Upperclass Research Projects Presentation
Fall Directors 2014: Junior/Upperclass Research Projects PresentationFall Directors 2014: Junior/Upperclass Research Projects Presentation
Fall Directors 2014: Junior/Upperclass Research Projects Presentation
 
Designing Course-Based, Student-Faculty Collaborative Research Projects Usi...
Designing Course-Based,  Student-Faculty Collaborative  Research Projects Usi...Designing Course-Based,  Student-Faculty Collaborative  Research Projects Usi...
Designing Course-Based, Student-Faculty Collaborative Research Projects Usi...
 
School Science Projects based on Experiments
School Science Projects based on ExperimentsSchool Science Projects based on Experiments
School Science Projects based on Experiments
 
Data Mining (Predict The Future)
Data Mining (Predict The Future)Data Mining (Predict The Future)
Data Mining (Predict The Future)
 
Job satisfaction Research based project
Job satisfaction Research based projectJob satisfaction Research based project
Job satisfaction Research based project
 
Bootstrapping Machine Learning
Bootstrapping Machine LearningBootstrapping Machine Learning
Bootstrapping Machine Learning
 
Data+Science : A First Course
Data+Science : A First CourseData+Science : A First Course
Data+Science : A First Course
 
Project based learning with ICT
Project based learning with ICTProject based learning with ICT
Project based learning with ICT
 
Doing a research project using the Big 6 Model
Doing a research project using the Big 6 ModelDoing a research project using the Big 6 Model
Doing a research project using the Big 6 Model
 

Similar to Socializing Big Data: Collaborative Opportunities in Computer Science, the Social Sciences, and the Humanitiesno

Crowd Computing: Opportunities & Challenges (IJCNLP 2011 Keynote)
Crowd Computing: Opportunities & Challenges (IJCNLP 2011 Keynote)Crowd Computing: Opportunities & Challenges (IJCNLP 2011 Keynote)
Crowd Computing: Opportunities & Challenges (IJCNLP 2011 Keynote)Matthew Lease
 
Rise of Crowd Computing (December 2012)
Rise of Crowd Computing (December 2012)Rise of Crowd Computing (December 2012)
Rise of Crowd Computing (December 2012)Matthew Lease
 
Profissão Market Research, Que Futuro?
Profissão Market Research, Que Futuro?Profissão Market Research, Que Futuro?
Profissão Market Research, Que Futuro?sandrina
 
Leveraging the INDECT Project: An Activist Strategy to Implement Privacy Ethi...
Leveraging the INDECT Project: An Activist Strategy to Implement Privacy Ethi...Leveraging the INDECT Project: An Activist Strategy to Implement Privacy Ethi...
Leveraging the INDECT Project: An Activist Strategy to Implement Privacy Ethi...Eddan Katz
 
Developing Staff Competencies in Emerging Technologies
Developing Staff Competencies in Emerging TechnologiesDeveloping Staff Competencies in Emerging Technologies
Developing Staff Competencies in Emerging TechnologiesDouglas Joubert
 
Big Data: What's it Really About?
Big Data: What's it Really About?Big Data: What's it Really About?
Big Data: What's it Really About?inside-BigData.com
 
Sirris innovate2011 - Smart Products with smart data - introduction, Dr. Elen...
Sirris innovate2011 - Smart Products with smart data - introduction, Dr. Elen...Sirris innovate2011 - Smart Products with smart data - introduction, Dr. Elen...
Sirris innovate2011 - Smart Products with smart data - introduction, Dr. Elen...Sirris
 
Metrocon-Rise-Of-Crowd-Computing
Metrocon-Rise-Of-Crowd-ComputingMetrocon-Rise-Of-Crowd-Computing
Metrocon-Rise-Of-Crowd-ComputingMatthew Lease
 
Internet of Things talk about crisis data, Feb 2012
Internet of Things talk about crisis data, Feb 2012Internet of Things talk about crisis data, Feb 2012
Internet of Things talk about crisis data, Feb 2012Sara-Jayne Terp
 
Recent developments in data analytics and big data
Recent developments in data analytics and big dataRecent developments in data analytics and big data
Recent developments in data analytics and big dataDez Blanchfield
 
Jana Diesner, "Words and Networks: Considering the Content of Text Data for N...
Jana Diesner, "Words and Networks: Considering the Content of Text Data for N...Jana Diesner, "Words and Networks: Considering the Content of Text Data for N...
Jana Diesner, "Words and Networks: Considering the Content of Text Data for N...summersocialwebshop
 
Data Mining and Big Data Challenges and Research Opportunities
Data Mining and Big Data Challenges and Research OpportunitiesData Mining and Big Data Challenges and Research Opportunities
Data Mining and Big Data Challenges and Research OpportunitiesKathirvel Ayyaswamy
 
Curriculum Development at the Tetherless World Constellation - Peter Fox - RD...
Curriculum Development at the Tetherless World Constellation - Peter Fox - RD...Curriculum Development at the Tetherless World Constellation - Peter Fox - RD...
Curriculum Development at the Tetherless World Constellation - Peter Fox - RD...ASIS&T
 
Big and Small Web Data
Big and Small Web DataBig and Small Web Data
Big and Small Web DataMarieke Guy
 
Artificial intelegence semifinal round (3rd rank)
Artificial intelegence semifinal round (3rd rank)Artificial intelegence semifinal round (3rd rank)
Artificial intelegence semifinal round (3rd rank)Hîmãlåy Làdhä
 

Similar to Socializing Big Data: Collaborative Opportunities in Computer Science, the Social Sciences, and the Humanitiesno (20)

Big Data
Big Data Big Data
Big Data
 
Crowd Computing: Opportunities & Challenges (IJCNLP 2011 Keynote)
Crowd Computing: Opportunities & Challenges (IJCNLP 2011 Keynote)Crowd Computing: Opportunities & Challenges (IJCNLP 2011 Keynote)
Crowd Computing: Opportunities & Challenges (IJCNLP 2011 Keynote)
 
Rise of Crowd Computing (December 2012)
Rise of Crowd Computing (December 2012)Rise of Crowd Computing (December 2012)
Rise of Crowd Computing (December 2012)
 
DBMS
DBMSDBMS
DBMS
 
Profissão Market Research, Que Futuro?
Profissão Market Research, Que Futuro?Profissão Market Research, Que Futuro?
Profissão Market Research, Que Futuro?
 
Leveraging the INDECT Project: An Activist Strategy to Implement Privacy Ethi...
Leveraging the INDECT Project: An Activist Strategy to Implement Privacy Ethi...Leveraging the INDECT Project: An Activist Strategy to Implement Privacy Ethi...
Leveraging the INDECT Project: An Activist Strategy to Implement Privacy Ethi...
 
Mining Social Data
Mining Social DataMining Social Data
Mining Social Data
 
Developing Staff Competencies in Emerging Technologies
Developing Staff Competencies in Emerging TechnologiesDeveloping Staff Competencies in Emerging Technologies
Developing Staff Competencies in Emerging Technologies
 
Big Data: What's it Really About?
Big Data: What's it Really About?Big Data: What's it Really About?
Big Data: What's it Really About?
 
Sirris innovate2011 - Smart Products with smart data - introduction, Dr. Elen...
Sirris innovate2011 - Smart Products with smart data - introduction, Dr. Elen...Sirris innovate2011 - Smart Products with smart data - introduction, Dr. Elen...
Sirris innovate2011 - Smart Products with smart data - introduction, Dr. Elen...
 
Metrocon-Rise-Of-Crowd-Computing
Metrocon-Rise-Of-Crowd-ComputingMetrocon-Rise-Of-Crowd-Computing
Metrocon-Rise-Of-Crowd-Computing
 
Internet of Things talk about crisis data, Feb 2012
Internet of Things talk about crisis data, Feb 2012Internet of Things talk about crisis data, Feb 2012
Internet of Things talk about crisis data, Feb 2012
 
What is Big Data?
What is Big Data?What is Big Data?
What is Big Data?
 
Recent developments in data analytics and big data
Recent developments in data analytics and big dataRecent developments in data analytics and big data
Recent developments in data analytics and big data
 
Jana Diesner, "Words and Networks: Considering the Content of Text Data for N...
Jana Diesner, "Words and Networks: Considering the Content of Text Data for N...Jana Diesner, "Words and Networks: Considering the Content of Text Data for N...
Jana Diesner, "Words and Networks: Considering the Content of Text Data for N...
 
Data Mining and Big Data Challenges and Research Opportunities
Data Mining and Big Data Challenges and Research OpportunitiesData Mining and Big Data Challenges and Research Opportunities
Data Mining and Big Data Challenges and Research Opportunities
 
Curriculum Development at the Tetherless World Constellation - Peter Fox - RD...
Curriculum Development at the Tetherless World Constellation - Peter Fox - RD...Curriculum Development at the Tetherless World Constellation - Peter Fox - RD...
Curriculum Development at the Tetherless World Constellation - Peter Fox - RD...
 
Big and Small Web Data
Big and Small Web DataBig and Small Web Data
Big and Small Web Data
 
Duncan product tank
Duncan product tankDuncan product tank
Duncan product tank
 
Artificial intelegence semifinal round (3rd rank)
Artificial intelegence semifinal round (3rd rank)Artificial intelegence semifinal round (3rd rank)
Artificial intelegence semifinal round (3rd rank)
 

Recently uploaded

Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationRadu Cotescu
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxMalak Abu Hammad
 
Understanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitectureUnderstanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitecturePixlogix Infotech
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking MenDelhi Call girls
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationMichael W. Hawkins
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreternaman860154
 
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersEnhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersThousandEyes
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024The Digital Insurer
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityPrincipled Technologies
 
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...HostedbyConfluent
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationRidwan Fadjar
 
Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Allon Mureinik
 
Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Paola De la Torre
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesSinan KOZAK
 
SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024Scott Keck-Warren
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdfhans926745
 
Maximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxMaximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxOnBoard
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonAnna Loughnan Colquhoun
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Servicegiselly40
 
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 3652toLead Limited
 

Recently uploaded (20)

Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptx
 
Understanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitectureUnderstanding the Laravel MVC Architecture
Understanding the Laravel MVC Architecture
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreter
 
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersEnhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 Presentation
 
Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)
 
Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen Frames
 
SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 
Maximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxMaximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptx
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Service
 
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
 

Socializing Big Data: Collaborative Opportunities in Computer Science, the Social Sciences, and the Humanitiesno

  • 1. "Socializing 'Big Data': Collaborative Opportunities in Computer Science, the Social Sciences, and the Humanities" Richard Marciano UNC Chapel Hill richard_marciano@unc.edu http://salt.unc.edu http://digitalinnovation.unc.edu
  • 2. Current research Areas •records in the cloud, •big cultural data, •access to big heterogeneous data, •federated grid/cloud storage, •visual interfaces to large collections, •policy-based frameworks to automate content management, •distributed cyberinfrastructure to enable data sharing.
  • 3. Records in the Cloud Kickoff meeting on Feb. 5, 2013 •UBC iSchool, Faculty of Law, School of Bus. •UW iSchool •Mid-Sweden Info. Tech and Media, Delegating to cloud providers the responsibility for security, accessibility, disposition and preservation.
  • 4. Grids in Context 1998 • • Larry Smarr Computational Grids • Ian Foster and Carl Kesselman • Distributed Supercomputing Applications • Paul Messina • Realtime Widely Distributed Instrumentation • William E. Johnston • Data-Intensive Computing • Reagan Moore, … Richard Marciano, … • Teleimmersion • Tom DeFanti and Rick Stevens • Application-Specific Tools • Henri Casanova, Jack Dongarra, … • Compilers, Languages, and Libraries • Ken Kennedy • Object-Based Approaches • Dennis Gannon, Andrew Grimshaw • High-Performance Commodity Computing • Geoffrey Fox, Wojtek Furmanski • The Globus Toolkit • Ian Foster, Carl Kesselman • High-Performance Schedulers • Francine Berman • High-Throughput Resource Management • Miron Livny, Rajesh Raman • Instrumentation and Measurement • Jeffrey Hollingsworth, Bart Miller • Performance Analysis and Visualization • Daniel Reed, Randy Ribler • Security, Accounting, and Assurance • Clifford Neuman • 2003 Computing Platforms Tony Hey: • Andrew Chien “The Data Deluge: An e-Science Perspective” • Network Protocols • P.M. Melliar-Smith, Louise Moser • Network Quality of Service • Roch Guerin, Henning Schultzrinne • Operating Systems and Network Interfaces • Peter Druschel, Larry Peterson • Network Infrastructure Collaborative Science 2004 • • Jon Postel, Joe Touch Testbeds: Bridges from Research to Infrastructure • Charlie Catlett, John Toole
  • 5. Big Data is a Big Deal White House announcement: http://www.whitehouse.gov/blog/2012/03/29/big-data-big-deal Big Data Across the Federal Government: http://www.whitehouse.gov/sites/default/files/microsites/ostp/big_data_fact_sheet_final_1.pdf More then $200M in new commitments (NSF, HHS/NIH, DOE, DOD, DARPA, USGS) Goal: “improve the ability to extract knowledge and insights from large and complex collections of digital data”.  DataNet  Long-term preservation and access of data  Software Infrastructure for Sustained Innovation (SI2)  Digging Into Data Challenge (NSF/NEH/IMLS & JISC)  Computational Humanities  Cyber-Enabled Discovery and Innovation (CDI)  Data enabled science and engineering  Core Techniques and Technologies for Advancing Big Data Science & Engineering (BIGDATA)  Data Infrastructure Building Blocks (DIBBs)  DataWay  National Infrastructure for Heterogeneous Data
  • 6. “Size Matters: Big Data, New Vistas in the Humanities and Social Sciences”: DataEdge, UC Berkeley May 31, 2012 Geoffrey Nunberg Panel: “Something seems to happen, people feel, when you get to that 13th zero, or 15th zero, or 18th zero, or 21st zero, wherever it is, and bingo it’s the petabyte age, it’s the age of big data. It’s like combing your hair, you just comb, and comb, and comb, and all of a sudden it’s like big hair.” “The question is whether the advent of big data changes the way we do social science and also what role social scientists will play…” 12/31/2012 Forbes article by Edd Dumbill: “Big Data, Big Hype: Big Deal” “Big data is an imprecise term. As such it’s a huge boon to marketers… not everyone is pleased with the “bigger is better” argument. “Big data” really means “smart use of data”.
  • 7. Allistair Croll: “Big Data is our Generation’s civil rights issue, an we don’t know it.” “Personalization” is another word for discrimination. We’re not discriminating if we tailor things to you based on what we know about you — right? That’s just better service. When bank managers tried to restrict loans to residents of certain areas (known as redlining) Congress stepped in to stop it (with the Fair Housing Act of 1968). They were able to legislate against discrimination, making it illegal to change loan policy based on someone’s race. Home Owners’ Loan Corporation map showing redlining of “hazardous” districts in 1936. see: DURHAM MAPS for T-RACES –project Music selection and sharing with friends could allow to guess a person’s racial background and deny a loan. Publicly available last name information can be used to generate racial boundary maps. From the Mapping London project
  • 8. Big Data = Big Collaborations
  • 9.
  • 10. May 2007 Socializing CI: Networking the Humanities, Arts, and Social Sciences
  • 11. TUCASI data-Infrastructure Project (TIP) TUCASI data-Infrastructure Project (TIP) Managing Digital Research Data in Federated Storage Managing Digital Research Data in Federated Storage Clouds Clouds • Project Lead: Richard Marciano (UNC/SALT) • Project Manager: Amy Shoop (UNC ITS) • Oversight Council – CIOs -- Head Librarians • Tracy Futhey -- Duke CIO Deborah Jakubs -- Duke Librarian • Marc Hoit – NCSU CIO Susan Nutter – NCSU Librarian • Larry Conrad – UNC CIO Sara Michalak – UNC Librarian – RENCI • Alan Blatecky -- RENCI Stan Ahalt -- RENCI – DICE Center • Reagan Moore – DICE – SALT Lab • Richard Marciano -- SALT
  • 12. Focus Group Membership University Teams Focus Duke Chapel Hill NC State Groups Suzanne Cadwell (ITS-Academic Classroom Samantha Earp (CC Outreach & Engagement) Lou Harrison (DELTA) lead) (OIT-Academic Charlie Greene (ITS-Teaching & Hal Meeks (OIT-Outreach, Capture Services) Learning) Communications and Consulting) Pam Sessoms (Lib-e-Reference) Amy Brooks (OIT-Systems) Klara Jelinkova (OIT- Reagan Moore (S lead) (DICE) Shared Services & Leesa Brieger (RENCI-Data) Infrastructure) Brent Caison (ITS-Storage) Steve Morris (Lib-Systems) Storage David Kennedy (Lib-Info. Dave Pcolar (Lib-Systems) Eric Sills (OIT-Research Computing) Sys. Support) Bill Schulz (Lib-Systems) Molly Tamarkin (Lib- Lisa Stillwell (RENCI-Data) Systems) Jim Tuttle (Lib-Systems) Future Data & Paolo Mangiafico (Provost- Ruth Marinshaw (ITS-Research Kristin Antelman (FD&P lead) Computing) Dig. Info. Strategy) (Lib) Policy Tim Pyatt (Lib-Archives) Will Owen (Lib-Systems) Susan Nutter (Lib-Head Librarian) Rich Szary (Lib-Special Collections)
  • 14. “Public Scholarship” Kathy Woodward, UW Simpson Center for the Humanities UNC, Duke, Asheville collaboration • University of North Carolina Asheville (UNCA): staff (provost, head librarian, head of special collections, library staff, departments of computer science / history / political science), centers (National Environmental Modeling and Analysis Center / Center for Diversity Education), and students • community-based development organizations (Green Opportunities Corps, Asheville Design Center) • neighborhood community group leaders and residents (Southside, Burton Street, East End) • city of Asheville officials (Housing Authority of the City of Asheville, Planning & Development Department, West Asheville Public Library, Chamber of Commerce) • county (head of Buncombe County Register of Deeds, Land-Of-Sky Regional Council) • other groups including the North Carolina Humanities Council, Mountain Housing Opportunities Inc. • “Twilight of a Neighborhood: Asheville’s East End, 1970” project. This project examined the process and aftermath of urban renewal and collected voices of residents, after the 2007 transfer of records to UNC Asheville. We have secured support and commitment from the community groups relevant to tackling this project. • Asheville’s African-American Community Historical Bus Tour, June 19, 2012 (35 people)
  • 15.
  • 16.
  • 17.
  • 18. UNCA & Asheville Partners: • Dwight Mullen, UNCA Political Science • Priscilla Ndiaye, chair of Asheville's Southside Advisory Commi
  • 19. Big Heterogeneous Data (with Duke) Mapping historical residential segregation in the US Researching the cyberinfrastructure implications of supporting large scale content based indexing of highly heterogeneous digital collections potentially embodying non- uniform or sparse metadata architectures… Intellectual Merit: Demonstrating the creation of national collections through automation and citizen- scientist crowdsourcing efforts is the focus of this task. Broader Impacts: This case-study will bring heterogeneous content from a variety of sources: census, economic, historic, planning, insurance, financial, and scientific. Outcomes: Worfklows & Visual prototype
  • 20. From Crowdsourcing to Citizen-led Sourcing • Neighborhood community group leaders and residents (Southside, Burton Street, East End) • University of North Carolina Asheville staff (provost, head of special collections, library staff, departments of computer science / history / political science), centers (Renaissance Computing Institute / Center for Diversity Education), and students • Community-based development organizations (Green Opportunities Corps, Asheville Design Center) • City of Asheville officials (Housing Authority of the City of Asheville, Register of Deeds, GIS, Planning & Development Department, , West Asheville Public Library, Chamber of Commerce, Regional Council) • Other groups including the North Carolina Humanities Council, Mountain Housing Opportunities Inc., Twilight of a Neighborhood.
  • 21. SALT SALT “We define the ‘discipline of data curation’ as the practice of collection, annotation, conditioning , and preservation of data for both current and future use” – Helen Tibbo & Bryan H eidorn Governance Policy conditioning annotation Content collection Infrastructure preservation current & future use Evolution Vectors – Annenberg Center for Communication SDSC: SALT

Editor's Notes

  1. Thank you for having me. Hood Canal… by Union on the other side of Bremerton… Tacoma & Ballard. South Lake Union by Amazon. Thank you for your hospitality. I understand you have a number of searches going on… This is quite a mouthfull… of trendy terms…
  2. Obama administration’s Open Government Initiative, which encourages public participation and collaboration. “ citizen sourcing” which has been defined as the “government adoption of crowdsourcing techniques for the purposes of (1) enlisting citizens in the design and execution of government services and to (2) tapping into the citizenry’s collective intelligence.” Vivek Kundra, Chief Information Officer of the United States from March 2009, to August 2011 under President Obama, described citizen sourcing as a way of driving “innovation by tapping into the ingenuity of the American people to solve those problems that are too big for government to solve on its own.” In the International Journal of Public Participation article, “ Citizensourcing: Applying the Concept of Open Innovation to the Public Sector, ” the authors present “ a structural overview of how external collaboration and innovation between citizens and public administrations can offer new ways of citizen integration and participation, enhancing public value creation and even the political decision-making process. ” Citizen sourcing is derived from the term crowdsourcing and emphasizes the type of civic engagement typically enabled through Web 2.0 participatory technologies, over a more impersonal crowd-based distributed problem-solving and production model. There are many excellent studies on the value of crowdsourcing for libraries, archives and museums. . The Archivist of the United States, David Ferriero, introduced the concept of “citizen archivists” in 2010. He made a parallel with citizen scientists and spoke of increasing public engagement in the archives given the National Archives and Records Administration’s over-abundance of paper records and need to digitize and transcribe them. He concluded that it wasn’t clear yet what types of citizen archivist projects were possible.   at the August 2011 Society of American Archivists (SAA) annual meeting in Chicago Kate Theimer offered the following definition: Participatory Archive : An organization, site or collection in which people other than archives professionals contribute “knowledge or resources, resulting in increased understanding about archival materials, usually in an online environment.”  
  3. Sustainability is a frame of mind