This document discusses data visualization in Python and Django. It provides motivation for representing business analytic data graphically using charts and diagrams. It describes sources of data, preprocessing data, and categorizing data as real-time or batch-based. Visualization can be done on the server or client. Tools are discussed for data analysis and visualization libraries like Matplotlib are mentioned. Appendices provide code examples for scatter plots, loading data from databases, and refreshing views.
3. Introduction
My background
Requirements (
Python, Django, Matplotlib, ajax ) and other
third-party libraries.
What this talk is not about ( we are not trying
to re-implement Google analytics ).
Source codes are available at (
https://github.com/kenluck2001/PyCon2012
_Talk ).
"Everything should be made as simple as
4. MOTIVATION
There is a need to represent the business
analytic data in a graphical form. This because
a picture speaks more than a thousand words.
Source: en.wikipedia.org
5. Where do we find
data?
Source: en.wikipedia.org
7. Data Processing
Identify the data source.
Preprocessing of the data (
removing nulls, wide characters
) e.g. Google refine.
Actual data processing.
Present the clean data in
descriptive format. i.e. Data
visualization
See Appendix 1
8. Visual Representation of
data
Charts / Diagram format
Texts format
Tables
Log files
Source: devk2.wordpress.com Source: elementsdatabase.com
10. Rules of Data Collection
Keep data in the easiest
processable form e.g
database, csv
Keep data collected with
timestamp.
Gather data that are relevant to
the business needs.
Remove old data
11. Where is the data
visualization done?
Server
See Appendix from 2 - 6
Client
Examples of Javascript library
DS.js ( http://d3js.org/ )
gRaphael.js (
http://g.raphaeljs.com/ )
12. Factors to Consider for
Choice of Visualization
Where do we perform the
visualization processing?
Is it Server or Client?
It depends
Security
Scalability
15. Appendix 1
## This describes a scatter plot of solar radiation against the month.
This aim to describe the steps of data gathering.CSV file from data science
hackathon website. The source code is available in a folder named
“plotCode”
import csv
from matplotlib.backends.backend_agg
import FigureCanvasAgg as FigureCanvas
from matplotlib.figure import Figure
def prepareList(month_most_common_list):
''' Prepare the input for process by removing all unnecessary values. Replace "NA"
with 0''„
output_list = []
for x in month_most_common_list:
if x != 'NA':
output_list.append(x)
else:
output_list.append(0)
return output_list
16. Appendix 1
def plotSolarRadiationAgainstMonth(filename):
contd.
trainRowReader = csv.reader(open(filename, 'rb'), delimiter=',')
month_most_common_list = []
Solar_radiation_64_list = []
for row in trainRowReader:
month_most_common = row[3]
Solar_radiation_64 = row[6]
month_most_common_list.append(month_most_common)
Solar_radiation_64_list.append(Solar_radiation_64)
#convert all elements in the list to float while skipping the first element for the 1st element is a
description of the field.
month_most_common_list = [float(i) for i in prepareList(month_most_common_list)[1:] ]
Solar_radiation_64_list = [float(i) for i in prepareList(Solar_radiation_64_list)[1:] ]
fig=Figure()
ax=fig.add_subplot(111)
title='Scatter Diagram of solar radiation against month of the year'
ax.set_xlabel('Most common month')
ax.set_ylabel('Solar Radiation')
fig.suptitle(title, fontsize=14)
try:
ax.scatter(month_most_common_list, Solar_radiation_64_list)
#it is possible to make other kind of plots e.g bar charts, pie charts, histogram
except ValueError:
pass
canvas = FigureCanvas(fig)
canvas.print_figure('solarRadMonth.png',dpi=500)
if __name__ == "__main__":
plotSolarRadiationAgainstMonth('TrainingData.csv')
17.
18. Appendix 2
From the project in folder named WebMonitor
class LoadEvent:
…
def fillMonitorModel(self):
for monObj in self.monitorObjList:
mObj = Monitor(url = monObj[2], httpStatus =
monObj[0], responseTime = monObj[1], contentStatus
= monObj[5])
mObj.save()
#also see the following examples in project named
YAAStasks.py This shows how the analytic tables are
loaded with real-time data.
19. Appendix 3
from django.http import HttpResponse
from matplotlib.backends.backend_agg
import FigureCanvasAgg as FigureCanvasfrom matplotlib.figure
import Figurefrom YAAS.stats.models import RegisteredUser, OnlineUser, StatBid #scatter diagram of number of bids
made against number of online users
# weekly report
@staff_member_required
def weeklyScatterOnlinUsrBid(request, week_no):
page_title='Weekly Scatter Diagram based on Online user verses Bid'
weekno=week_no
fig=Figure()
ax=fig.add_subplot(111)
year=stat.getYear()
onlUserObj = OnlineUser.objects.filter(week=weekno).filter(year=year)
bidObj = StatBid.objects.filter(week=weekno).filter(year=year)
onlUserlist = list(onlUserObj.values_list('no_of_online_user', flat=True))
bidlist = list(bidObj.values_list('no_of_bids', flat=True))
title='Scatter Diagram of number of online User against number of bids (week {0}){1}'.format(weekno,year)
ax.set_xlabel('Number of online Users')
ax.set_ylabel('Number of Bids')
fig.suptitle(title, fontsize=14)
try:
ax.scatter(onlUserlist, bidlist)
except ValueError:
pass
canvas = FigureCanvas(fig)
response = HttpResponse(content_type='image/png')
canvas.print_png(response)
return response
More info. can be found in YAAS/graph/The folder named
"graph"
20. Appendix 4
# Example of how database may be deleted to recover some space.
From folder named “YAAS”. Check task.py
@periodic_task(run_every=crontab(hour=1, minute=30, day_of_week=
0))
def deleteOldItemsandBids():
hunderedandtwentydays = datetime.today() -
datetime.timedelta(days=120)
myItem = Item.objects.filter(end_date__lte=hunderedandtwentydays
).delete()
myBid = Bid.objects.filter(end_date__lte=hunderedandtwentydays
).delete()#populate the registereduser and onlineuser model at regular
intervals