I was recently reading an article entitled “An Introduction to Data Wrangling” by Stephen Swoyer at TDWI. Stephen interviews Stephanie Langenfeld McReynolds at Trifacta. One interesting quote from Stephanie is “The work that you do with data wrangling others would call ‘data plumbing’ or even janitorial work, but when you have somebody who knows how to wrangle data and gets into a flow of data wrangling, it’s an elegant dance to watch”. I think this is a very powerful concept. As the variety of data continues to increase, data wrangling will become more important and having good tools to do this will be a key differentiator for data scientists. Definitely worth reading Stephen’s article.
At Simba, we focus on data connectivity. Data connectivity is one key component that feeds into data wrangling. Simba has a componentized SQL engine that software companies use to provide a generic SQL interface on their data engines. The Simba SQL engine allows a developer to quickly build a SQL interface onto data that can literally be in any format. When this is done, users can then use standard tools to access that data.