The document discusses standardizing over 113 million merchant names from transaction data using regex and fuzzy matching. It involved extracting features from merchant names, cleaning names using regular expressions, fuzzy matching to group similar names, and manual rules. This allowed preliminary analysis showing 90% of transactions and spending were concentrated in 7-8% of top merchants. Customer segments were identified based on relative value added scores.
SCRIPT:This diagram depicts the Greenplum Unified Analytics Platform. Let’s take a high level look of what it looks like from a stack diagram. The foundations of UAP lie in Greenplum Database for analyzing your structured data, co-processing unstructured data with Greenplum Hadoop. These two components are fused together by Greenplum gNet, which allows for parallel data exchange and parallel query access. These are overlaid with a unified data access and query layer that combines the languages of choice for your analysts (SQL, MapReduce, Etc.). Over the access layer comes our powerful partner tool and services layer. We are not about locking customers into a single tool or stack. Instead we work with the tool vendor of your choice, be it SAS or R, Microstrategy or informatica. And what truly enables productivity and ensures you are getting maximum value out of your data scientist team is Greenplum Chorus. What sets this diagram apart from a typically vendor example is the inclusion of people – Data Stakeholders. UAP is designed to enable an emerging group of talent, the new practitioners, that we refer to as the Data Science team. This team can include the data platform administrator, data scientist, analysts, engineers, BI teams, and most importantly the line of business user and how they participate on this data science team.We develop, package, and support this as a unified software platform available over your favorite commodity hardware, cloud infrastructure, or from our modular Data Computing Appliance. NOTES: