Big data for small business
There is a widespread view that data – information represented or coded in a way that makes it usable – is the ‘fuel’ driving the fourth industrial revolution. Like fossil fuels in previous industrial eras, data-fuel has to be processed before it can be converted into energy for productive activities. Converting oil or coal into usable energy requires large-scale infrastructure and is therefore expensive, and the same is still true, to a great extent, with renewable energy sources such as wind and solar power (certainly on an industrial scale). Moreover, as is evident from its popular sobriquet ‘big data’, the quantities of data that are involved in Data-Driven Innovation are huge. Even if this data is being generated at relatively low cost, through vast numbers of people using social media sites for instance, there are high costs associated with the kind of analytical capability needed to turn it into DDI-energy.
“Is big data analytics a privilege of big companies?” is the logical question therefore posed by Chancellor’s Fellow Yang Cao, along with his colleague Wenfei Fan. The corollary might be that big data analytics is “beyond the reach of small companies” who lack the necessary resources, notably large-scale parallel computer processing. The consequences could potentially hamper the spread and sustainability of DDI, but Dr Cao and his colleagues argue that it is not necessarily too expensive for small businesses. They propose a solution in the shape of BEAS, a query evaluation paradigm that offers “a framework for querying big relations with constrained resources, based on bounded evaluation and data-driven approximation”. BEAS is based on a series of publications on top database conferences including SIGMOD, PODS, VLDB, and journals like ACM TODS.
Yang Cao received his PhD from the University of Edinburgh in 2016, and is now based at the Bayes Centre, in the University’s School of Informatics, where he is working on database systems and fundamental methods for querying, integrating and cleaning large scale datasets, to extract useful knowledge from big data. His current research interests include database systems and applied database theory, especially data-driven query optimization and processing, the heading under which the research and development of BEAS was undertaken. Dr Cao is the recipient of numerous awards for his work over the past decade, starting with first place in the China Undergraduate Mathematical Contest in Modelling in 2008, for which he also received the Gao-Jiao-She Cup. More recently, he has won the Best Paper Award and the Research Highlight Award from the Association for Computing Machinery’s Special Interest Group on Management of Data (ACM-SIGMOD) in consecutive years.