aws hadoop webservices spark streaming cloudera clairvoyant spark rest soap hive apache spark mapreduce big data transactions soa phoenix data conference ec2 self service analytics s3 data lake sparksql spark thrift server dataframe spark driver sqlcontext hivecontext sparkui cluster manager worker node spark architecture catalyst optimizer rdd datasets data profile data lake challenges cloudera navigator data lineage tokenization data organization data governance metadata management data democratization data security data catalog masking data provisioning data quality data discovery encryption replicasets slave upsert oplog primary read concern master mongodb write concern document collection secondary nosql hidden arbiter cap theorem mongodb replication asynchronous architectures dry premature optimization continous integration ssl yagni distributed transactions kiss continous deployment iot kinesis emr cloud get json delete namespaces http response codes http xml put rest clients security http headers post hive metastore pig hdfs bigdata usecases bigdata cdh databricks community cloud databricks spark sql workflow etl apache hadoop apache airflow airflow dag eda idempotency message oriented architecture databases redis logging logstash shipper log aggregation kibana elasticsearch hbase tuning hannibal region hotspotting apache phoenix hbase read path secondary indexes hbase write path rowkey hbase cluster hbase compactions column families hbase optimization zookeeper data locality byoc separation of storage and compute sentry navigator data lake on aws kafka cloudera security choice hotels
See more