On January 15th, we had the pleasure of attending a local Vancouver Apache Spark Meetup hosted by Samir Gupta from Databricks and Hubert Duan from Microsoft. The event coincided with a snowstorm in which most Vancouverites are not prepared for. There are very few snowplows and the whole city basically shuts down when it snows. However, we braved the storm for this highly topical session with Raki Rahman, a solution architect from Slalom on ML Ops with Spark, MLFlow and Azure.
Here’s a recap of the discussion highlights:
When it comes to Machine Learning (ML) Ops, data scientist and operations professionals are required to collaborate and communicate. Data scientists may choose to set up their models on a local machine and once the model is complete, there is a requirement to move the project into production. There are a number of considerations, such as: What kind of tools and pipelines are available? When the project hits the DevOps desk; does DevOps understand the model the data scientist created? Is there a gap in knowledge between the 2 individuals and how can the gap be bridged?
Raki shared his best practices such as: make sure the model is reproducible; the model is centralized, and there is automation for continuous integration and continuous deployment.
He dived into an example of how to architect an automated, operationalized Cloud Data Analytics Platform.
The goal was to get Bloomberg data displayed in PowerBI. For those of us who are not in the financial sector, Raki explained that the Bloomberg terminal may not be optimized for user experience and hence the reason some may want to use other BI solutions to view data.
Azure Solution Architecture
First, the raw data needs to be streamed through a secure network and into the system. As it’s streamed, it’s added to a data lake and the data undergoes cleaning and aggregation. Data is also added to the warehouse.
Our ears perked up as the discussion stirred towards leveraging Databricks JDBC Spark connector to accomplish one of these steps. This is exciting because our Simba team works with Databricks to build the ODBC and JDBC connectivity for them. It’s gratifying to see our products be part of a solution utilized by others. This is what I feel any developer lives for: being able to see their work make a difference and to be a part of someone else’s solution. Huge thanks to Sneha Nagrikar, Daisy Wang and Holman Lan for their dedication to developing a reliable Spark driver.
Trading Strategy Output from News Data
Raki then continued his presentation focusing on training ML models to process and analyze news to generate Buy/Sell signals. He revealed that the outcome of his model peaks coincided with actual Bitcoin price peaks.
The talk ended with an end-to-end demo showcasing MLFlow using Airbnb Dataset.
We’re fortunate to have been able to make it to the talk. No snow was going to stop us. Huge thank you to Microsoft and the organizers for putting on community events such as these.