3. E N D P O I N T S
• REST API for everyone
• Great documentation
• http://aws.amazon.com/
documentation/
4. A W S J AVA S D K
• One monolithic jar before 1.9.0
• Currently split into ~48 smaller modules dedicated to individual
Amazon services
• All depend on aws-java-sdk-core module
• Other runtime dependencies:
• commons-logging
• apache http client (4.3.4)
• joda time
5. C R E D E N T I A L S
• Manually provide accessKey and secretKey (generated
by IAM)
• Manual key management
• No automatic rotation
• Leaked keys will loose you serious $$$
new AmazonS3Client(new BasicAWSCredentials(accessKey, secretKey));
6. C R E D E N T I A L S
“I only had S3 keys on my GitHub and they where gone within 5 minutes!
Turns out through the S3 API you can actually spin up EC2 instances, and my key had
been spotted by a bot that continually searches GitHub for API keys. Amazon AWS
customer support informed me this happens a lot recently, hackers have created an
algorithm that searches GitHub 24 hours per day for API keys. Once it finds one it spins
up max instances of EC2 servers to farm itself bitcoins.
Boom! A $2375 bill in the morning.”
http://www.devfactor.net/2014/12/30/2375-amazon-mistake/
7. C R E D E N T I A L S
• Use credentials provider
• Default behaviour when zero argument constructor is invoked
• EnvironmentVariableCredentialsProvider
SystemPropertiesCredentialsProvider
ProfileCredentialsProvider
InstanceProfileCredentialsProvider
• All but last one share security problems with manual access/
secret keys management
new AmazonS3Client();
8. C R E D E N T I A L S
• Use InstanceProfileCredentialsProvider
• Needs IAM role of the server to be configured with permissions
needed by the service using this provider.
• Calls EC2 Instance Metadata Service to get current security
credentials.
• http://169.254.169.254/latest/meta-data/iam/security-credentials/
• Automatic management and rotation of keys.
• Stored only in memory of calling process
9. C R E D E N T I A L S
• Use InstanceProfileCredentialsProvider
• Credentials are reloaded under lock which may cause
latency spikes (every hour).
• Instantiate with refreshCredentialsAsync == true
• Problems when starting on developers machines
• Use AdRoll’s hologram to create fake environment locally
• https://github.com/AdRoll/hologram
10. B U I LT I N M O N I T O R I N G
amazonS3Client.addRequestHandler(new RequestHandler2() {
@Override
public void beforeRequest(Request<?> request) {
}
@Override
public void afterResponse(Request<?> request, Response<?> response) {
request.getAWSRequestMetrics()...
}
@Override
public void afterError(Request<?> request, Response<?> response, Exception e) {
}
});
11. B U I LT I N M O N I T O R I N G
AmazonS3Client amazonS3 = new AmazonS3Client(
new StaticCredentialsProvider(credentials),
new ClientConfiguration(),
new RequestMetricCollector() {
@Override
public void collectMetrics(Request<?> request, Response<?> response) {
}
}
);
12. T E S T I N G W I T H S 3
• Use buckets located close to testing site
• Use fake S3 process:
• https://github.com/jubos/fake-s3
• https://github.com/tkowalcz/fake-s3
• same thing but with few bug fixes
• Not scalable enough
• Write your own :(
• Not that hard
//lookout for issue 414
amazonS3.setEndpoint(“http://localhost...");
13. S C A RY S T U F F
• #333 SDK can't list bucket nor delete S3 object with characters in
range [0x00 - 0x1F] #333
• According to the S3 objects naming scheme, [0x00 - 0x1F] are
valid characters for the S3 object. However, it's not possible to list
bucket with such objects using the SDK (XML parser chokes on
them) and also, they can't be deleted thru multi objects delete
(also XML failure). What is interesting, download works just fine.
• #797 S3 delete_objects silently fails with object names containing
characters in the 0x00-0x1F range
• Bulk delete over 1024 objects will fail with unrelated exception
14. “ A S Y N C H R O N O U S ” V E R S I O N S
• There is no truly asynchronous mode in AWS SDK
• Async versions of clients use synchronous blocking
http calls but wrap them in a thread pool
• S3 has TransferManager (we have no experience here)
15. B A S I C S 3 P E R F O R M A N C E T I P S
• Pseudo random key prefix allows splitting files among
S3 “partitions” evenly
• Listing is usually the bottleneck. Cache list results.
• Or write your own microservice to eliminate lists
16. S D K P E R F O R M A N C E
• Creates tons of short lived objects
• Many locks guarding internal state
• Profiled with Java Mission Control (if it does not crash)
• Or Yourkit
• Then test on production data
17.
18. public XmlResponsesSaxParser() throws AmazonClientException {
// Ensure we can load the XML Reader.
try {
xr = XMLReaderFactory.createXMLReader();
} catch (SAXException e) {
throw new AmazonClientException("Couldn't initialize a SAX driver to create
an XMLReader", e);
}
}
19. @Override
protected final CloseableHttpResponse doExecute(final HttpHost target, final
HttpRequest request,
final HttpContext context)
throws IOException, ClientProtocolException {
Args.notNull(request, "HTTP request");
// a null target may be acceptable, this depends on the route planner
// a null context is acceptable, default context created below
HttpContext execContext = null;
RequestDirector director = null;
HttpRoutePlanner routePlanner = null;
ConnectionBackoffStrategy connectionBackoffStrategy = null;
BackoffManager backoffManager = null;
// Initialize the request execution context making copies of
// all shared objects that are potentially threading unsafe.
synchronized (this) {
20. public synchronized final ClientConnectionManager getConnectionManager() {
if (connManager == null) {
connManager = createClientConnectionManager();
}
return connManager;
}
public synchronized final HttpRequestExecutor getRequestExecutor() {
if (requestExec == null) {
requestExec = createRequestExecutor();
}
return requestExec;
}
public synchronized final AuthSchemeRegistry getAuthSchemes() {
if (supportedAuthSchemes == null) {
supportedAuthSchemes = createAuthSchemeRegistry();
}
return supportedAuthSchemes;
}
public synchronized void setAuthSchemes(final AuthSchemeRegistry registry) {
supportedAuthSchemes = registry;
}
public synchronized final ConnectionBackoffStrategy getConnectionBackoffStrategy() {
return connectionBackoffStrategy;
}
21. O L D A PA C H E H T T P C L I E N T ( 4 . 3 . 4 )
• Riddled with locks
• Reusing same client can save resources but at cost of performance
• different code paths may not target same sites
• open sockets are not that costly
• better use many client instances (e.g. per-thread)
• Make sure number of threads using one client instance it is less than maximum
number of connections in its pool
• severe contention on returning connections to pool
• recent versions got better
22. B A S I C C O N F I G U R AT I O N
<bean id=“...” class="com.amazonaws.services.s3.AmazonS3Client" scope="prototype">
<constructor-arg>
<bean class="com.amazonaws.ClientConfiguration">
<property name="maxConnections"
value="#{T(Integer).parseInt('${storage.readingThreads}') * 2}”/>
<property name="protocol" value="HTTP"/>
</bean>
</constructor-arg>
</bean>
23. C L I E N T P O O L
<bean id="poolTargetSource" class="pl.codewise.voluum.util.AmazonS3ClientPool">
<property name="targetBeanName" value="amazonS3Client"/>
<property name="maxSize" value="10"/>
</bean>
<bean id="amazonS3Client" class="org.springframework.aop.framework.ProxyFactoryBean"
primary="true">
<property name="targetSource" ref="poolTargetSource"/>
<property name="interfaces">
<list>
<value>com.amazonaws.services.s3.AmazonS3</value>
</list>
</property>
</bean>
int index = ThreadLocalRandom.current().nextInt(getMaxSize());
return clients[index];
24. W H AT T O D O W I T H T H I S ?
• Hardcore approach (classpath overrides of following classes)
• Our own AbstractAWSSigner that uses third party, lock
free HmacSHA1 signing algorithm
• ResponseMetadataCache without locks (send metadata
to /dev/null)
• AmazonHttpClient to remove call to System.getProperty
• DateUtils using joda time (now fixed in SDK itself)
25. D s t a t o u t p u t . U s e r m o d e c p u u s a g e
m o s t l y re l a t e d t o d a t a p ro c e s s i n g .
P E R F O R M A N C E A C H I E V E D
CPU (user, system, idle) Network transfer (IN/OUT) IRQ/CNTX
26. O P T I M I S AT I O N S R E S U LT
com.amazonaws.services.s3.model.AmazonS3Exception:
Please reduce your request rate.
(Service: Amazon S3; Status Code: 503; Error Code: SlowDown)
27. – H E N RY P E T R O S K I
"The most amazing achievement of the computer
software industry is its continuing cancellation of
the steady and staggering gains made by the
computer hardware industry."