Denny Lee, Senior Director of Data Sciences Enginering, Concur
[Lumira + Spark + Simba connectivity] allows us to simplify expense-report classification. Customer users take a picture of their receipt, we OCR it, we classify it...automatically.
Denny Lee is responsible for data sciences engineering at Concur, the world leader in spend management solutions and services. Lee’s ambitious mission is to seek order in the petabytes of expense-related data collected by Concur’s 20,000+ customers around the world.
- Accelerated Hadoop data-processing engine
- Advanced BI analytics and visualization capabilities
- SQL Connectivity
- Apache Spark
- SAP Lumira
- Simba Technologies Spark Driver
- Simba Technologies Spark Extension for SAP Lumira
Learning From Data and Making Customer Reporting Easier
Lee works to derive actionable insight from the biggest of Big Data. Specifically, his task is two-fold: Learn from the data so Concur can deliver better services to its 20,000+ customers worldwide, and establish machine-learning algorithms to operate more efficiently, particularly when it comes to enabling customer fiduciary reporting capabilities. A few years ago, Lee delineated his approach, and broke his analytics challenge into four thematic components: Consolidation, Visualization, Insight, and Recommendation.
For Lee, consolidation meant looking to Hadoop data solutions for storing, processing, and uniting Concur’s petabytes of expense data. An early adopter of Apache Spark, Lee sought to leverage its data-processing power to accelerate Concur’s analytics, or, in Lee’s words, to make it “run really fast.’
Visualization would enable Lee to find meaning in raw data, and he turned to Lumira, the data-visualization and analytics solution from Concur’s parent company SAP. “Lumira enables us to analyze, extract, wrangle, visualize, and create stories around our data,” explains Lee.
With application and data tools in place, Lee would be able to gain insight, and importantly, make recommendations to Concur leadership. But he needed one more thing to get his analytics engine running at high speed. And that was connectivity from Simba.
Back up a couple years: Before a Concur acquisition was a glimmer in SAP’s eye, the Lumira team recognized the need to provide open, cross-platform connectivity to emerging data sources. After evaluating competing technologies, the SAP Analytics group chose Simba Technologies to provide Lumira with connectors for Hive, Impala, and Redshift. The drivers offered open connectivity to those platforms, but also SQL capabilities: Data scientists could use Lumira to get at their Hive, Impala, and Redshift data using their existing SQL expertise without having to learn a source-specific language like HiveQL.
The Lumira team recognized early on that Apache Spark could make efficient Big Data self-service analytics more powerful. That goal—paired with escalating customer demand for accelerated Hadoop data-processing—drove the Lumira team to embrace Apache Spark connectivity, and they went with a trusted partner to provide it. Paul Ekeland, SAP’s Senior Director of Product Management for Big Data, cites industry leaders’ adoption of Simba Spark connectivity as a factor in extending the Simba partnership.
“For Spark, particularly, it’s the presence that Simba has in the market,” notes Ekeland. “Having other vendors such as Hortonworks and Databricks already using that particular driver, and Simba showing a predominant presence in a market area which is so dynamic made us confident that we could have a good partner in Simba to go along on the Spark adventure.”
The Brave New World of Easy, Efficient Expense Reporting
Back to Concur. With Lumira, Spark, and Simba connectivity in place, Lee took aim at data analysis, specifically expense-report classification. He started with a seemingly “boring topic”: corporate receipt management.
“From a high level, [Lumira + Spark + Simba connectivity] allows us to simplify expense-report classifcation,” explains Lee. “Customer users take a picture of their receipt, we OCR it, we classify it…automatically.”
Lee’s expense-reporting classification model has provided tangible benefits–both to Concur and to Concur customers. “Expense reporting is easier for the user,” continues Lee, “and [the accuracy the capability delivers] actually improves the tax implications for the customer company.”
What’s next? Giving back to the broader Apache Spark community. Lee heads the Seattle, Washington Apache Spark Meetup, and SAP’s Ekeland is committed to making the most of the Lumira team’s experience.
“Spark offers a lot of hope,” concludes Ekeland. “At SAP, we are investing from an analytical-tool perspective and a data-source perspective where we see opportunities to improve Spark itself.”