Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
The things we found in your website
1. The things we found in your website
Hernâni Borges de Freitas
Technical Consultant
hernani@acquia.com
@hernanibf
Porto, 5th May, 2012
2. About us
Drupal & open source expertise
• Dries Buytaert, Drupal founder
• Gabor Hojtsy, D6
• Angie Byron, D7
• Many leading community
contributors in engineering,
support, and consulting.
Software industry experience
• Cloud site deployment
• Professional technical support
• Services delivery
The things we found in your website!
3. Acquia Network
•
xpert Drupal Support
•
ptimized Drupal hosting
• Dev Cloud
• Managed Cloud
• Foster Drupal adoption
• Commons
• Drupalgardens.com
• Dev Desktop
The things we found in your website!
4. About me
• .PT
• Acquia Professional Services
EMEA
• Technical Consultant
• Drupal* many things
• Passionate about web and
communities
• Travel lover
The things we found in your website!
6. What we do
• Drupal Jumpstarts
• Architecture Workshop
• Discovery workshops
• Site Audit
• Performance Audit
• Security Audit
• On-site Consulting
The things we found in your website!
7. Site Audit
• During one week we look to your
website assuring it is following best
practices and don’t present risks:
• Architecture
• Security
• Performance
• Infrastructure
• Maintenance headaches
The things we found in your website!
8. Balance
• Understand the project history
• Understand the constraints
• Be clear that there is no single right way of
solving problems.
• Explain the right balance.
• Everyone do mistakes. And should learn from
them!
• Long term solutions make everyone happier than
short term patchwork.
• The best tool: the one you know how to use.
The things we found in your website!
9. C ontent architecture
“Editors don’t understand what to create. ”
“The page content type article is similar to news. We
just used it during some months to create special
news in homepage.”
“We needed to change this template because we
wanted to show everything in that location and
we use school_location and teacher_city.”
The things we found in your website!
10. Content architecture
Symptoms
• Similar content types
• Fields not reused
• Content types with almost no nodes
Chasing it
Take a look at field report page.
Content type structure.
Simple database queries
Select count(*), type from node group by type
The things we found in your website!
11. Dis play architecture
“Views_london, views_paris, views_porto shows jobs
available in these cities”
“The scores block in the sports section ? Some PHP
code is controlling its visibility in block
configuration..”
“We need those node_load() in preprocess_page
because we need to show those nodes in
homepage.”
The things we found in your website!
12. Display architecture
Chasing it
• Understand how pages are build.
• Look at views and how reusable they are.
• How many custom templates are used ?
• How much logic do you have in templates.
• How easy is to switch theme (mobile,
special occasions?)
• How long does it take to produce a totally
new design in your site?
The things we found in your website!
13. Site architecture
Symptoms
• Modules installed
• Number of modules that are not useful at all.
• Hacked core and modules
• “There is a module for that” – does not
mean you need to use it!
• Modules used for things they were not
designed to do.
• Code in database
The things we found in your website!
14. Reinventing the wheel
“This is a custom module we designed to create
forms on the fly that could be emailed to site
admins!”
“ That custom module adds small hidden
tokens to control SPAM in our website.”
The things we found in your website!
15. Site architecture
Chasing it
• Understand how pages are build.
• Look at views and how reusable they are.
• How much custom templates do you have?
• How much logic do you have in templates.
• How easy is to switch theme (mobile,
special occasions?)
• How long does it take to produce a
totally new design in your site?
The things we found in your website!
16. E xtra complexity
“We thought we needed content translation but in
the end our website is just in english.”
“ Right now we only have one type of users,
but in the future we might need to have
more roles, so we already have
content_access.”
“ Authcache module is used to speed up pages
for our 20 journalists.”
The things we found in your website!
17. Site architecture
Chasing it
• Use hacked module to compare code
versions used.
• Balance custom code / contributed code
or reusable ways of solving problems.
• Couldn’t that query be a view ?
• Couldn’t context or panels create that page?
• Couldn’t that custom action be controlled by a
rule?
The things we found in your website!
18. Custom modules
Symptoms
• Not following coding standards
• Can be a warning for what is coming…
• Not using the right hooks
• Excessive usage of hook_init, hook_nodeapi
• Not using the API
• Reinventing something that Drupal is already doing
well
• Hardcoded strings (nids, tids, vids, urls).
• All code in .module file
The things we found in your website!
19. S ecurity
“ That webservice path is impossible to find, it
does not need authentication. Only the mobile
app uses it.”
“ You would need to be a administrator to
access that page.”
“ We are the only ones we can access the
server, therefore we are just too worried
about it.”
The things we found in your website!
20. Security
Basic problems
• Not updated core and contributed modules.
• Bad configuration
• Users have permissions to do things they shouldn’t
• Admins have easy passwords (similar to
usernames, hacked email accounts..)
• File upload is not checked
• Code repository contain extra gifts
• Database dumps, files with information that should no be
there ..
The things we found in your website!
21. Security
SQL Injection
• db_query(“select from table where id=$_GET[‘id’]”);
• Example.com/index.php?id=1;drop database yoursite;--
• Just use Drupal DB API
• Think if you really need to write that sql query !
The things we found in your website!
22. Security
XSS – Cross site scripting
• <?php echo “Your number is “. $_GET[‘id’]; ?>
• Index.php?id=<script>alert(“UAAAT??”);</script>
• Careful with some data you might think it
is safe to use
• <?php echo $node->title ?>
• <?php echo $node->field_location[0][‘value’] ?>
The things we found in your website!
23. Security
CSRF – Cross site request forgery
<?php
function mymodule_menu() {
$items['admin/cookies’] = array(
‘access callback' => 'user_access',
'access arguments' => array('access cookies'),
'page callback' => 'cookie_list'
);
$items[‘admin/cookies/add’] = array(
'access callback' => 'user_access',
'access arguments' => array('access cookies'),
'page callback' => 'cookie_add'
);
$items[‘admin/cookies/%/delete’] = array(
'access callback' => 'user_access',
'access arguments' => array('access cookies'),
'page callback' => 'cookie_delete'
);
return $items;
);
The things we found in your website!
24. Security
CSRF – Cross site request forgery
• HTML Email
• <img src=‘http://example.com/admin/cookies/10/delete’ />
• HTTP Post to forms
• You expect the request to come from your site but it can
come from anywhere
• Drupal protects against both attacks using tokens and Form
API
The things we found in your website!
25. Performance
What is your website doing
• How long do most pages take to load
(common lists, node pages, homepage?)
• Why do they take so long? DB queries,
application requests?
• What about edge cases? Clear cache for
instance?
• What is your caching strategy?
• What are your logs telling you?
The things we found in your website!
26. Performance
• How long do most pages take to load (common lists, node
pages, homepage?)
• Devel query log can show immediately some problems
• XhProf can do the rest
• NewRelic is pure gold!
• Why is CPU and memory wasted?
• Typically
• Complex queries that take too much time
• Function called too much times
• Edge cases that are happening all the time
The things we found in your website!
27. Performance
Why is the database so slow? Why is only slow now?
• Databases not optimized to grow
• Complex queries made by without indexes usage
• Select * from betterpoll_votes where poll_id= 10;
• Complex queries made automatically
SELECT node.nid AS nid, users.picture AS users_picture, users.uid AS users_uid, users.name AS
users_name, users.mail AS users_mail, node.title AS node_title, GREATEST(node.changed,
node_comment_statistics.last_comment_timestamp) AS node_comment_statistics_last_updated
FROM node node
INNER JOIN users users ON node.uid = users.uid
INNER JOIN node_comment_statistics node_comment_statistics ON node.nid =
node_comment_statistics.nid
ORDER BY node_comment_statistics_last_updated DESC
The things we found in your website!
28. Performance
Is using InnoDb always good?
SELECT COUNT(*) FROM (SELECT DISTINCT node.nid AS nid FROM node node
LEFT JOIN og_ancestry og_ancestry ON node.nid = og_ancestry.nid INNER JOIN
users users ON node.uid = users.uid INNER JOIN node_comment_statistics
node_comment_statistics ON node.nid = node_comment_statistics.nid WHERE
og_ancestry.group_nid = 5 ) count_alias
• Use views lite pager example
The things we found in your website!
29. Performance
Can it be cached?
• Assure caching and aggregation are set. Yes, look at it!
• Review caching strategy:
• https://www.acquia.com/blog/when-and-how-caching-can-save-yo
• Guarantee caching is effectively helping you.
• Don’t clear it too often.
• Not used only by a minority.
The things we found in your website!
30. Performance
Careful with missing files
• Drupal bootstrap will be needed to serve them.
• Missing images in homepage might kill your site.
• Attention to your logs
• Use fast 404
The things we found in your website!
31. Performance
What is it doing ??
function mytheme_preprocess($variables) {
$parent = node_load (array(‘nid’ => 45));
$node = $variables['node'];
if ($node->nid = 45) {
$node->title = $parent->type . ’ ' . $node->title;
}
}
How many times is node_load called?
Node_load uses a static variable. How many times are queries
executed to database to grab node infromation?
The things we found in your website!
32. Infrastructure
“Our DB Server has 48Gb of memory. Enough to
handle all requests!”
• My.cnf
• Innodb_buffer_pool = 1024M
• Adjust limits according to your resources.
• http://mysqltuner.pl
• Your slowest bottleneck represents your overall
bottleneck.
The things we found in your website!
33. Infrastructure
“We don’t need that many web servers. As
varnish is set in front and working as a reverse
proxy, most of the website most traffic will be
cached.”
The things we found in your website!
34. Infrastructure
“Our external firewall controls all sort of attacks.
We don’t use any specific firewall in the
servers.”
• 50/70% of attacks are internal. Remote connections with DB,
Memcached, Solr should be forbidden.
• Hard to remember about details on fast moving environments.
The things we found in your website!
35. Infrastructure
This is where your website ends..
• What is the right size? How do you grow?
• Are the different servers well tuned ?
• Apache / PHP
• Mysql
• Varnish
• What are your logs telling you?
The things we found in your website!
36. Maintenance
This is going to be must of the work!
• What is your deployment architecture?
• How hard is it too change?
• How do you test changes?
• How relaxed do you leave your desk?
The things we found in your website!
37. Deployment
“We just copy the code directly to the server by
FTP.”
“Any developer can just take a snapshot from
production and install on their laptop.”
“Don’t touch that module. We just did some
changes from what it was originally.”
The things we found in your website!
38. Maintenance
Control your code!
• All piece of code should be under VCS.
• Git, Mercury, Bazaar, SVN, CVS
• Copying to backup folders is not VCS.
• Yes, those log messages serve for something…
• No, your holidays pictures should not be under VCS.
• No, your database dumps shouldn’t also be there.
The things we found in your website!
39. Maintenance
“We can only test that in production.”
“Yes we have a staging environment. But its data is
from last summer.”
“Sometimes problems occur when we upgrade.
But we have always a backup.”
The things we found in your website!
40. Maintenance
Do once, prepare many!
• Several environments should exist
• Development, Staging and Production.
• Should be possible to deploy from VCS to them!
• Environments should be up to date and accessible
• Environments should be as possible similar to real
life
• Environments should be easy to destroy and
replicate
The things we found in your website!
41. Maintenance
This is going to be most of the work!
• Be prepared for changes
• You don’t control them most of times!
• Review periodically website architecture
• What you need today is not similar when you built it
• Pay attention to security updates
• Review your logs periodically
The things we found in your website!
43. So, before your questions.
I do have a question.
Would you like to join Acquia?
W ae hir E E Y HE E
e r ing V R W R !
• Consultants
• Support
• Sales
• Engineering