Vancouver Drupal group presentation for April 25, 2013.
How to deploy Drupal on
- multiple web servers,
- multiple web and database servers, and
- how to join all that together and make site deployed on Amazon Cloud (Virtual Private Cloud) inside
- one availability zone
- multiple availability zones deployment.
Session cover details about what you need in order to get Drupal deployed on separate servers, what are issues/concerns, and how to solve them.
2. about me
name: Vladimir Ilic
email: burger.boy.daddy@gmail.com
twitter: @burgerboydaddy
http://burgerboydaddy.com
3. agenda
why all of this?
step 1: test locally -> from one server to the server farm,
step 2: multiple web and database servers,
step 3: how to join all that together and make site deployed on Amazon
Cloud and inside Virtual Private Cloud
Amazon term
benefits
single availability zone
multiple availability zones.
4. why?
if you want to increase site speed
if you want your site to be responsive and to work
under heavy stress
if you want to be in control what goes on your server
5. get it divided / decouple
Easy to do inside local development/hosting
environment
Just separate web, database and cache servers
Problems
we can increase resources only vertically
Not all resources are used same way (web server
will probably die before cache or MySQL)
Multiple “single points of failure”
6. multiple web servers –
one dbApache load balancer in front
of
2-3 web servers; each
server with integrated
APC cache
Multiple cache servers
Powerful MySQL server
In real life you can use some
other LB solution (this one is
great for proof of concept
moments).
Without dedicated file server;
used bi-directional rsync
replication
7. configuring Apache load
balancer
Apache web server ships a load balancer module called
mod_proxy_balancer (since version 2.2).
All you need to do is to enable this module and the
modules mod_proxy and mod_proxy_http. Please note
that without mod_proxy_http, balancer just won't work.
LoadModule proxy_module mod_proxy.so
LoadModule proxy_http_module mod_proxy_http.so
LoadModule proxy_balancer_module
mod_proxy_balancer.so
8. many to many
In this case each web
server will have it's
own db server.
Reason for this:
Higher site
availability; if one
db server is down,
second one can
continue to serve
customers.
9. Amazon AWS
Why Amazon (business point of view)
Most complete cloud solution on the market.
Almost zero upfront infrastructure investment
Just-in-time infrastructure
Pay as you go – pay what you use
Constant price drop
Easy to deploy and scale
….
10. why Amazon (technical
benefits)
Automation – “Scriptable infrastructure”: You can
create repeatable build and deployment systems by
leveraging programmable (API-driven) infrastructure.
Auto-scaling: You can scale your applications up and
down to match your unexpected demand without any
human intervention.
Proactive Scaling: Scale your application up and down
to meet your anticipated demand; Elasticity
11. why Amazon (technical
benefits)
More Efficient Development lifecycle: Production
systems may be easily cloned for use as development
and test environments.
Improved Testability: Never run out of hardware for
testing. Inject and automate testing at every stage
during the development process.
Disaster Recovery and Business Continuity: The cloud
provides a lower cost option for maintaining a fleet of
DR servers and data storage.
13. key Amazon terms – #1
AWS – Amazon Web Services
Amazon Web Services (AWS) is a collection of remote computing services (also called web
services) that together make up a cloud computing platform.
EC2 - Elastic Compute Cloud
EC2 allows users to rent virtual computers on which to run their own computer applications.
EC2 allows scalable deployment of applications by providing a Web service through which a
user can boot an Amazon Machine Image to create a virtual machine.
A user can create, launch, and terminate server instances as needed, paying by the hour for
active servers, hence the term "elastic".
S3 - Simple Storage Service
Amazon S3 (Simple Storage Service) is an online storage web service offered by AWS.
AMI - Amazon Machine Images
An Amazon Machine Image (AMI) is a special type of virtual appliance which is used to
instantiate (create) a virtual machine within the Amazon Elastic Compute Cloud ("EC2").
14. key Amazon terms - #2
EBS - Elastic Block Storage
Amazon Elastic Block Storage (EBS) provides raw block devices that can be attached
to Amazon EC2 instances.
Can be used like any raw block device. In a typical use case, this would include
formatting the device with a filesystem and mounting said filesystem.
VPC - Virtual Private Cloud
Amazon Virtual Private Cloud (VPC) is a commercial cloud computing service that
provides a virtual private cloud.
Unlike traditional EC2 instances which are allocated internal and external IP numbers
by Amazon, the customer can assign IP numbers of their choosing from one or more
subnets.
VPC provides much more granular control over security.
ELB - Elastic Load Balancing
AZ - Amazon Availability Zones (Data Centers)
15. key Amazon terms - #3
RDS - Amazon Relational Database Service
Amazon RDS is a distributed relational database service by Amazon.com.
It is a web service running "in the cloud" and provides a relational database for use
in applications.
Supporting
MySQL databases
Oracle databases
Microsoft SQL Server
ECU - Elastic Computational Unit
One EC2 Compute Unit provides the equivalent CPU capacity of a 1.0-1.2 GHz
2007 Opteron or 2007 Xeon processor.
SQS - Simple Queue Service
17. humor
“We will launch site on EC2 with EBS behind ELB with
domain registered on Route 53
Your images will come from CloudFront,
backup will go to S3
and your DB on RDS with Multi-AZ availability”
18. first step first – create
accountGo to aws.amazon.com
and just use your
amazon.com account for
start
After login go to IAM
(Identity Access
Management) to add multi-
factor authentication; not
to your root account, but
create new account,
assign privileges to it and
add MFA. After that use
only new account to login
to your AWS (with given
alias)
19. easy one – use
CloudFormation
Fastest way to get Drupal on AWS
is using predefined templates inside
CloudFormation service.
In this moment you can find 4
(Drupal specific) templates
Drupal_Simple.template
Drupal_Single_Instance.template
Drupal_Single_Instance_With_R
DS.template
Drupal_Multi_AZ.template
You can use any other template as
starting point and customize it to
your needs.
20. steps after
Create KeyPair
Add your home/corporate IP to be only allowed to access server over port 22
(SSH).
Create AMI from existing machine
Drop original machine
Create new EC2 instance using just created AMI and your key-pair
Add Elastic IP and associate to your instance
Connect to instance
Add DNS CNAME record using given Amazon DNS name:
ec2-54-225-110-202.compute-1.amazonaws.com
22. Amazon VPC - ultimate
goal
We can install complete infrastructure required for Drupal using public set of servers
ELB (load balancer)
AMI (servers images)
RDS (Amazon relational database service)
Elastic Cache...
BUT
Amazon VPC is a way to setup an isolated partition of AWS and control the network
topology.
Services
Dynamodb, ElastiCache, SQS, SES, and CloudSearch are not yet available in VPC
(things change on daily basis)
RDS instances launched in VPC cannot be accessed over the internet (through the
end point). You will need bastion server to access it
26. VPC subnets
IP Ranges - When setting up a VPC you are essentially fixing the network of the VPC.
Public and Private Subnets - The VPC network can be divided further in to smaller network
segments called as Subnets. Any VPC will have at least one Subnet
You can setup a Public Subnet which will have internet connectivity. Instances launched within
a Public Subnet will have both outbound and inbound (through EIP) internet connectivity
through the Internet Gateway attached to the Public Subnet
Private Subnets are completely locked down. They do not have internet connectivity by default
Create number of Public and Private Subnets depending upon your architecture.
32. how to autoscale
Install AWS Command Line Tools from Amazon
Downloads
Download from: http://aws.amazon.com/developertools/2535
Note: AWS Auto scaling needs Amazon CloudWatch
monitoring service to function. Amazon CloudWatch is
billed on usage basis.
33. step 1
Configuring AWS Auto Scaling with AWS ELB
elb-create-lb my-load-balancer --headers
--listener "lb-port=80,instance-port=8080, protocol=HTTP"
--availability-zones us-west-2c
lb-port -- load balancer port
instance-port -- app server port to which request
needs to be forwarded
my-load-balancer -- name for my load balancer
34. step 2
Create a launch configuration
as-create-launch-config my-lconfig --image-id ami-e38823c8a
--instance-type m1.small --key my-key-pair
--group my-security-group
my-lconfig -- name for launch configuration
ami-e38823c8a -- name for Amazon Machine Image (AMI) to be
launched during scaling
m1.small -- Amazon EC2 instance size
my-key-pair -- Key pair / security group settings for the Amazon
EC2 instances
my-security-group -- security group for instance
35. step 3
Create an AWS Auto Scale Group
as-create-auto-scaling-group my-as-group --availability-zones us-west-2c
--launch-configuration my-lconfig --max-size 11 --min-size 3 --cooldown 180
--desired-capacity 2 --load-balancers my-load-balancer
my-load-balancer -- LB name in which the new Amazon EC2 instances launched will
be attached
my-as-group -- Name Auto Scale group
us-west-2c -- availability zone in which the auto scaled amazon EC2 instances will
be launched
11/3 -- Maximum/Minimum number of Amazon EC2 instances maintained by Auto
Scale
Desired capacity is an important component of the as-create-auto-scaling-group
command. Although it is an optional parameter, desired capacity tells Auto Scaling
the number of instances you want to run initially.
To adjust the number of instances you want running in your Auto Scaling group, you
change the value of --desired-capacity. If you don't specify --desired-capacity, its
value is the same as minimum group size
36. step 4
this step is not available in Auto Scaling
API
Configure the Auto scaling Triggers / Alarms
as-create-or-update-trigger my-as-trigger
--auto-scaling-group my-as-group --namespace "AWS/EC2"
--measure CPUUtilization --statistic Average
--dimensions "AutoScalingGroupName=my-as-group"
--period 60 --lower-threshold 20 --upper-treshold 80
--load-breach-increment"=-2" --upper-breach-increment 4
--breach-duration 180
Measure the average CPU of the Auto Scale Group
Scale out by 4 Amazon EC2 instances. Scale down by 2
Amazon EC instances
Lower CPU Limit is 20% and Upper CPU Limit is 80%
37. shutdown auto scaling
group
Shutdown auto-scaling group - require 3 commands
as-update-auto-scaling-group bbd4me-as-group --min-size 0
--max-size 0 --region us-west-2
as-describe-auto-scaling-groups bbd4me-as-group --headers
--region us-west-2
as-delete-auto-scaling-group bbd4me-as-group
--force-delete --region us-west-2
At start we can add as many web servers as we want. One important part is to configure web servers to share files. This was done by using rsync replication and mounting on all servers /var/www/html/mysite/sites folder as shared one (expecting that Drupal core is same on all servers I didn't want to share it). I like this solution since we will have source code on both web instances, and not on the file storage. This makes it possible to release new "source code" (not database!) instances of Drupal modules. Or you can quickly change some lines on a PROD environment for debugging (as long as you block traffic from visitors to that web instance of course ;-)). Memcached Move the caching mechanism to Memcached. Memcached can store all caching data in memory. So it doesn't use the MySQL tables any longer. Also, Memcached can run in a clustered environment, so no need to manually flush the remote cache. The Memcached Drupal module and Memcached daemon would take care of it. Because of the movement of caching to Memcached, databases would not be on heavy load any longer. Database server Database server can be just one or full MySQL cluster (depending on amount of available $$$). File storage replication Another (possible) improvement to above solution would be to store all data on NAS file storage. The NAS storage holds all data in sites/##YOUR_SITE_NAME##/files directory. Compared with the previous solution, we don’t need to sync data again. Again: one disadvantage here: if the NAS file storage goes out: no file in your files will be served. Nor by web-server1, nor by web-server 2. As previous solution, problem with this solution lay in some single points of failure, like only one load balancer and possible one MySQL server.
Because mod_proxy makes Apache to become an (open) proxy server, and open proxy servers are dangerous both to your network and to the Internet at large, I completely disable this feature: ProxyRequests Off <Proxy> Order deny,allow Deny from all </Proxy>
So this option is many aspects similar to the default one, with one big difference. In this case each web server will have it's own db server. Reason for this: Higher site availability; if one db server is down, second one can continue to serve customers. - Be sure to exclude some tables from replication DrupalDB.cache% DrupalDB.watchdog% DrupalDB.temp_search_sids DrupalDB.temp_search_results - And exclude all databases that are local mysql test
Almost zero upfront infrastructure investment: If you have to build a large-scale system it may cost a fortune to invest in real estate, physical security, hardware (racks, servers, routers, backup power supplies), hardware management (power management, cooling), and operations personnel. Because of the high upfront costs, the project would typically require several rounds of management approvals before the project could even get started. Now, with utility-style cloud computing, there is no fixed cost or startup cost. Just-in-time Infrastructure: In the past, if your application became popular and your systems or your infrastructure did not scale you became a victim of your own success. Conversely, if you invested heavily and did not get popular, you became a victim of your failure. By deploying applications in-the-cloud with just-in-time self-provisioning, you do not have to worry about pre-procuring capacity for large-scale systems. This increases agility, lowers risk and lowers operational cost because you scale only as you grow and only pay for what you use. More efficient resource utilization: System administrators usually worry about procuring hardware (when they run out of capacity) and higher infrastructure utilization (when they have excess and idle capacity). With the cloud, they can manage resources more effectively and efficiently by having the applications request and relinquish resources on-demand. Usage-based costing: With utility-style pricing, you are billed only for the infrastructure that has been used. You are not paying for allocated but unused infrastructure. This adds a new dimension to cost savings. You can see immediate cost savings (sometimes as early as your next month’s bill) when you deploy an optimization patch to update your cloud application. For example, if a caching layer can reduce your data requests by 70%, the savings begin to accrue immediately and you see the reward right in the next bill. Moreover, if you are building platforms on the top of the cloud, you can pass on the same flexible, variable usage-based cost structure to your own customers.
Automation – “Scriptable infrastructure”: You can create repeatable build and deployment systems by leveraging programmable (API-driven) infrastructure. Auto-scaling: You can scale your applications up and down to match your unexpected demand without any human intervention. Auto-scaling encourages automation and drives more efficiency. Proactive Scaling: Scale your application up and down to meet your anticipated demand with proper planning understanding of your traffic patterns so that you keep your costs low while scaling.
More Efficient Development lifecycle: Production systems may be easily cloned for use as development and test environments. Staging environments may be easily promoted to production. Improved Testability: Never run out of hardware for testing. Inject and automate testing at every stage during the development process. You can spawn up an “instant test lab” with pre-configured environments only for the duration of testing phase. Disaster Recovery and Business Continuity: The cloud provides a lower cost option for maintaining a fleet of DR servers and data storage. With the cloud, you can take advantage of geo-distribution and replicate the environment in other location within minutes.
AWS – Amazon Web Services Amazon Web Services (AWS) is a collection of remote computing services (also called web services) that together make up a cloud computing platform, offered over the Internet by Amazon.com. EC2 - Elastic Compute Cloud Amazon Elastic Compute Cloud (EC2) is a central part of Amazon.com's cloud computing platform, Amazon Web Services (AWS). EC2 allows users to rent virtual computers on which to run their own computer applications. EC2 allows scalable deployment of applications by providing a Web service through which a user can boot an Amazon Machine Image to create a virtual machine. A user can create, launch, and terminate server instances as needed, paying by the hour for active servers, hence the term "elastic". EC2 provides users with control over the geographical location of instances that allows for latency optimization and high levels of redundancy. S3 - Simple Storage Service Amazon S3 (Simple Storage Service) is an online storage web service offered by AWS. Amazon S3 provides storage through web services interfaces (REST, SOAP, and BitTorrent). AMI - Amazon Machine Images An Amazon Machine Image (AMI) is a special type of virtual appliance which is used to instantiate (create) a virtual machine within the Amazon Elastic Compute Cloud ("EC2"). It serves as the basic unit of deployment for services delivered using EC2.
EBS - Elastic Block Storage Amazon Elastic Block Storage (EBS) provides raw block devices that can be attached to Amazon EC2 instances. Can be used like any raw block device. In a typical use case, this would include formatting the device with a filesystem and mounting said filesystem. In addition EBS supports a number of advanced storage features, including snapshotting and cloning. VPC - Virtual Private Cloud Amazon Virtual Private Cloud (VPC) is a commercial cloud computing service that provides a virtual private cloud, allowing enterprise customers to access the Amazon Elastic Compute Cloud over an IPsec based virtual private network. Unlike traditional EC2 instances which are allocated internal and external IP numbers by Amazon, the customer can assign IP numbers of their choosing from one or more subnets. By giving the user the option of selecting which AWS resources are public facing and which are not, VPC provides much more granular control over security. ELB - Elastic Load Balancing AZ - Amazon Availability Zones (Data Centers)
RDS - Amazon Relational Database Service Amazon RDS is a distributed relational database service by Amazon.com. It is a web service running "in the cloud" and provides a relational database for use in applications. It is aimed at simplifying the set up, operation, and scaling a relational database. Supporting MySQL databases Oracle databases Microsoft SQL Server ECU - Elastic Computational Unit One EC2 Compute Unit provides the equivalent CPU capacity of a 1.0-1.2 GHz 2007 Opteron or 2007 Xeon processor. SQS - Simple Queue Service
Note: Because the set of IP addresses associated with a Elastic IP can change over time, you should never create an "A" record with any specific IP address. If you want to use a friendly DNS name for your EIP/ELB instead of the name generated by the Elastic Load Balancing service, you should create a CNAME record for the LoadBalancer DNS name, or use Amazon Route 53 to create a hosted zone.
NAT Instance - By default the Private Subnets in a VPC do not have internet connectivity. They cannot be accessed over the internet and neither can they make outbound connections to internet resources. But let's say you have setup a database on an EC2 Instance in the Private Subnet and have implemented a backup mechanism. You would want to push the backups to Amazon S3. But the Private Subnet's cannot access S3 since there is no internet connectivity. You can achieve it by placing a NAT Instance in the VPC1. Through NAT Instance outbound connectivity for Private Subnet Instances can be achieved. The Instances will still not be reachable from the internet (inbound)2. You need to configure the VPC Routing Table to enable all outbound internet traffic for the Private Subnet to go through the NAT Instance3. AWS provides a ready NAT AMI (ami-f619c29f) which you can use to launch the NAT Instance4. You can have only one NAT Instance per VPC
Since you can have only one NAT Instance per VPC, you need to be aware that it becomes a Single Point Of Failure in the architecture. If the architecture depends on the NAT Instance for any critical connectivity, it is an area to be reviewed. 1. And you are limited by the bandwidth availability of a single NAT Instance. So do not build architecture that will have internet bandwidth requirements from the Private Subnet with NAT. 2. You can create network topology with multiple NAT servers
IP Ranges - When setting up a VPC you are essentially fixing the network of the VPC. And if the VPC requires VPN connectivity (as in most of the cases), care should be taken to choose the IP range of the VPC and avoid any IP conflicts. Public and Private Subnets - The VPC network can be divided further in to smaller network segments called as Subnets. Any VPC will have at least one SubnetYou can setup a Public Subnet which will have internet connectivity. Instances launched within a Public Subnet will have both outbound and inbound (through EIP) internet connectivity through the Internet Gateway attached to the Public SubnetPrivate Subnets are completely locked down. They do not have internet connectivity by defaultCreate number of Public and Private Subnets depending upon your architecture. Place all public facing servers such as web servers, search servers in the public subnet. Keep DB servers, cache nodes, application servers in the private subnet
- Use Simple GUI to build SG's - Divide your resources Public Web DB Network File Server (separated or inside Web group? VPC Security Groups are different from normal EC Security Groups. With EC2 Security Groups you can control the ingress into your EC2 Instance. With VPC Security Groups, you have the option to control both inbound and outbound traffic. When something is not accessible you have to check both inbound and outbound rules set in the VPC Security Group ELB Security Group - When you launch an ELB within VPC, you have the option to specify a VPC Security Group to be attached with the ELB. This is not available for ELB launched outside VPC in normal EC2. With this additional option, you can control access to specific ELB ports from specific IP sources. On the backend EC2 Instances' Security Group, you can allow access to the VPC Security Group that you associated with the ELB 7. Internal ELB - When you launch an ELB within VPC, you also have additional option to launch it as an "Internal Load Balancer". You can use an "Internal Load Balancer" to load balance your application tier from the web tier above.