Cloud Research at Harvard

High Performance Cloud Computing

Our research at Harvard addresses the challenges of cloud systems for the efficient and scalable execution of large-scale compute- and data-intensive applications. Our research agenda is driven by the critical role large-scale computing and data processing will play in tomorrow’s society, research and business environments. Data is growing faster than ever before, more data has been created in the past two years than in the entire previous history. By the year 2020, our accumulated data will grow from 4.4 zettabytes today to around 44 zettabytes, we will have over 6.1 billion smartphone users globally and there will be over 50 billion connected IoT devices in the world, all developed to collect and share data. Our research group is extending the state of the art to achieve highly capable, robust, scalable cloud systems that are well suited to serving the infrastructure needs of large-scale computing anda data processing in an increasingly distributed world.

Main research interests within the area of high-performance cloud computing includes:

Distributed and disaggregated cloud systems 

No matter how powerful individual computers, computing clusters or cloud providers become, there are still reasons to harness the power of multiple systems to fulfill the security, performance or cost needs of modern computing and data processing applications. One example is the use of hybrid cloud environments for the cost-effective execution of applications with variability of workload patterns. Other example is the use of multiple cloud systems spread across large geographic areas for the execution of highly responsive services, which require low latency, or edge computing analytics, which require processing of data where it is produced.

Data-centric cloud architectures 

The operation of very large volumes of data in order to get their insights in real time presents new challenges and opportunities for future cloud infrastructures. Addressing these needs requires a redesign of future cloud architectures and how they can interoperate to create distributed environments. This means a paradigm shift in cloud design from computational to data exploration that requires infrastructures to be data-centric in order to minimize data movement by bringing compute to data.