Everyone knows that cloud data workloads are getting outrageously expensive. Many enterprises have cloud spend in the tens and hundreds of millions annually, of which the cost of data processing is a big chunk.
I first heard about Sync from a friend, who claimed to know of a company that helps run data workloads up to 90% faster while also drastically reducing cost. Unlike other tools in the market, it lets engineers choose how much to optimize on cost and performance. Even better, an engineer could deploy the solution just using copy and paste, and then see the value almost instantly. It seemed too good to be true. I had to meet the team behind the company.
That’s how I got to know Jeff and Suraj early this year. The idea for Sync spun out of their postdoctoral research at MIT’s Lincoln Lab, where they worked on distributed computing, high performance computing and quantum computing among other deep tech problems. It was no surprise that they would be the founders behind such a powerful optimization tool.
They started Sync with a very potent optimization algorithm but without a clear sense of their markets. After testing several, they found incredible pull from companies doing large scale data processing, particularly with Spark. Think major enterprises like Disney and Duolingo that have massive audiences and user bases and need to process vast amounts of clickstream and product data to deliver their services.
The team launched a self-service Autotuner tool that gives engineers a sense of the product’s potential even before they become a customer. The reception has already been incredible, with engineers learning about Sync through word-of-mouth in Reddit and Slack. Without bothering their internal platform teams, any engineer can try Sync on Spark jobs in a matter of minutes and see serious improvements in performance and cost.
Prior to using Sync, engineers had to configure a large set of parameters and tune them so they could all work together effectively. This was almost always a guessing game. While researching Sync, I heard all sorts of nightmarish anecdotes, like the engineer who inherited Spark workloads from another engineer who had recently left a company, but was too scared to touch the configurations for fear of not just negatively impacting performance and cost but of actually breaking the Spark job. That’s insane.
It’s no surprise that teams have been asking for Sync’s API product. Teams want Sync to configure and kick off all of their important recurring Spark jobs. It’s also worth noting that Sync’s technology itself is agnostic to specific distributed data processing frameworks.
Above all, I have been so impressed with Jeff and Suraj. They went from researchers and outsiders to the world of enterprise software to building a product that resonates deeply with data engineers, engineering leaders and CIOs, particularly in a market where enterprises are looking to reduce profligate cloud spend. Many of the organizations with the largest cloud spend in the world are already seeing incredible results even before the public launch of Sync’s first commercial product. Duolingo saw significant cost reductions for jobs that processed terabytes of data, with no degradation of performance. Other companies, like Disney, are seeing frequently run jobs complete much more quickly and at a reduced cost. These improvements are not small: We are talking about speed gains of 40-70% and cost savings of 30-55%.
We are thrilled to partner with Jeff, Suraj and the Sync team as they march towards their vision of being the infrastructure that provisions cloud compute and improves the developer experience for companies with large distributed data workloads. Today’s launch of the API for the Apache Spark Autotuner marks an important step towards this vision.