Huawei has announced a new open source project called Astro at the O’Reilly Open Source Convention in Portland Oregon.
Astro combines HBase and Apache Spark into a single solution which Huawei has embedded into its big data solution, FusionInsight. Huawei has also announced it will launch Spark as a Service later this year.
This announcement by Huawei shows how quickly Spark is gaining traction and beginning to make inroads into the Hadoop market. Over the last month, IBM and Hortonworks have both made announcements covering their support of Spark as have many other members of the Hadoop community.
Why is Huawei getting into Spark?
Despite the perception of Huawei as a telecoms supplier who has only recently moved into enterprise computing, it has a long history of working with and contributing to open source software projects. This has saved the company from having to spend billions developing its own enterprise software business and allowed it to collaborate with other software companies.
According to Wang Chenglu, President of Huawei Central Software: “Open source is Huawei’s company-wide strategy, from core business networking, where the firm helped drive the network openness as founding member of OPNFV, to new business frontiers cloud computing and IoT, where we open sourced LiteOS, the world’s most lightweight IoT Operating System to standardize and simplify the infrastructure and enhance the IoT connectivity.”
Huawei’s relationship with Spark is also long standing and it was an early sponsor in 2011. Since then it has been a regular contributor to the different versions of Spark especially in Spark 1.2 and Spark 1.3.
In June, Bing Xiao, Head of Big Data, Huawei USA and Sr. Director, Product Mgmt & Strategy, Huawei Software wrote a blog for Databricks who are developing their own commercial Spark distribution. In it he explained how Huawei was leveraging Spark.
“With Spark, the raw data from multi-systems of multi-vendors (e.g., CRM, billing, OSS and network) can be easily loaded into a single data processing layer. Data scientists and data engineers can also use Spark SQL to explore the data, extract and group features, and develop models by leveraging MLlib algorithms.
“Application developers can leverage the output of these models or features to build specific applications (e.g. base station investment optimization), and publish dashboards or reports for subscriber profiling and network monitoring. Finally, business users can use Spark SQL for ad-hoc query, or continue to use existing BI systems or tools like SAS, R or Python with Spark’s powerful APIs.”
What is Huawei promising with Astro?
In the press release, Huawei is promising a range of new features that include:
- Improved data pruning
- Custom filters
- Support for Spark 1.4
In addition to these features, Huawei has says that it has contributed new features to Spark SQL, Machine Learning and Spark R to help enrich standard libraries in Spark.
What is attracting people to Spark?
One of the challenges of working with very large data sets is the way that data is used by the query tools. The vast majority take a sequential processing approach which means that queries have to be queued while they wait for data to become available.
The only way around this is for users to duplicate large data sets which requires a lot of network, storage and processing time all of which costs money. For cloud providers this is good news because they get to sell more resources to customers but it does mean that customers can quickly exceed their budgets for doing complex data analysis.
This is where Spark comes in. Using a technique known as Directed Acyclic Graph (DAG) it is able to run multi-step pipelines. For those customers who are taking advantage of in-memory databases where they attempt to preload all the data into memory, Spark will allow multiple DAGs to share that data. One of the benefits of this is that query results can be held in-memory and immediately passed to other queries involved in a complex analysis.
Huawei already deploying to customers
Huawei is already claiming wins with some of its existing customers. China Unicom has deployed Spark in its mission-critical business areas to support real-time query and analysis over multiple data sources. It is also claiming success with an unnamed telco in South America where it has replaced existing analytics and BI solutions. The press release states that the customer is now achieving a 10x performance increase in customer insights.
It will be interesting to see when Huawei begins to add Spark to its SAP HANA appliances. Over the last couple of years, Huawei has set several benchmark records with its SAP HANA appliances. In 2014, SAP announced its own Spark distribution working in partnership with Databricks. With both the Huawei and SAP distributions being based on the Databricks Spark distribution it makes sense for Huawei to embed its own version of Spark in its SAP appliance.
Coming on the back of the IBM, Hortonworks and other Spark announcements, this is good timing by Huawei and excellent news for the supporters of Spark. If the last two years have been all about Hadoop and MapReduce, this year is definitely the year Spark lit a fire under data analytics.