Next week's HotCloud conference on cloud computing in San Diego will boast a slew of fresh research into this hottest of IT topics. Here's a glimpse at the work to be showcased at the event (PDFs of some research papers will not be available until the week of June 15 at the HotCloud site):
*Nebulas
“Enabling Dynamic Business Applications with BPM and SOA” – Forrester Consulting on behalf of IBM: Download nowResearchers from the University of Minnesota have outlined a way to use “distributed voluntary resources — those donated by end-user hosts — to form nebulas” that would potentially complement today's managed clouds from companies such as Amazon, IBM and Google. Nebulas could address the needs of service classes that more traditional clouds could not, providing more scalability, more geographical dispersion of nodes and lower cost, the researchers say. Possible users would include those rolling out experimental cloud services and those looking to offer free public services or applications.
Unlike famed volunteer-based computing resources such as SETI@home (now BOINC), nebulas would need to support more complex tasks. Challenges needing to be addressed would include managing highly distributed data and computational resources and coping with failures. “We believe that nebulas can exist as complementary infrastructures to clouds, and can even serve as a transition pathway for many services that would eventually be hosted on clouds,” the researchers write in a paper titled “Nebulas: Using Distributed Voluntary Resources to Build Clouds.”
University of Minnesota researchers have a separate project dubbed “Virtual Putty” that sounds intriguing as well. It focuses on the reshaping of virtual machine footprints to satisfy user needs and ease VM management for resource providers.
* CloudViews
Amidst the hype surrounding cloud computing, security issues are often raised, such as those involved with multiple customers having their data and applications sharing the same cloud resources. But researchers at the University of Washington also see lots of opportunity in the fact that Web services and applications will be so closely situated. CloudViews is a Hadoop HBase-supported common storage system being developed by the researchers “to facilitate collaboration through protected inter-service data sharing.” The researchers say in a paper called “CloudViews: Communal Data Sharing in Public Clouds” that public cloud providers must facilitate such collaboration — in the form of data driven, server side mashups — to ensure the market's growth through development of new Web services.
* Trusted Cloud Computing Platform
Researchers at the Max Planck Institute for Software Systems have outlined a Trusted Cloud Computing Platform that “enables Infrastructure as a Service (IaaS) providers such as Amazon EC2 to provide a closed box execution environment that guarantees confidential execution of guest virtual machines.” Such a platform would assure customers that service providers haven't been messing with their data and would enable service providers to secure data even across many VMs. The researchers, in a paper titled “Towards Trusted Cloud Computing,” acknowledge that details of how cloud providers set up their data centers is held pretty close to the vest, but base their system on an open source offering called Eucalyptus that they suspect is similar to at least some commercial implementations. A prototype based on the design is this research team's next step.
* Private Virtual Infrastructure (PVI) and Locator Bot
Also addressing the security and confidentiality issues surrounding cloud computing is University of Maryland, Baltimore County researcher John Krautheim. His proposal is aimed at better sharing the risk responsibility between the cloud provider and customer, giving the customer much more control than is typically the case. “A method of combining the requirements of the user and provider is to let the clients control the security posture of their applications and virtual machines while letting the service provider control the security of the fabric. This provides a symbiotic security stance that can be very powerful provided both parties hold up their end of the agreement,” Krautheim writes. He adds that this setup calls for big-time trust on both sides (including support for a virtualized Trusted Platform Module, or TPM, for storing cryptographic keys), since they'll need to share security information between themselves and possibly with others. Components of this approach will include having a method for shutting down VMs if necessary and monitoring/auditing from within and outside the PVI, Krautheim writes in a paper titled “Private Virtual Infrastructure for Cloud Computing.”
* Trading storage for computation
One way to make cloud computing more efficient and cost effective might involve rethinking the way data is stored. Researchers at the University of California, Santa Cruz, NetApp and Pergamum Systems are looking at the trade-offs between storing data, including data that might not be called on that often, and simply recalculating results as needed. In a paper titled “Maximizing Efficiency By Trading Storage for Computation” the researchers write: “Recomputation as a replacement for storage fits well into the holistic model of computing described by the cloud architecture. With its dynamically scalable, and virtualized architecture, cloud computing aims to abstract away the details of underlying infrastructure. In both public and private clouds, the user is encouraged to think in terms of services, not structure.”
Determining the best way to store and retrieve data requires a cost-benefits analysis based on insights from both the cloud operator and the data user because “neither has a completely informed view,” the researchers write. They argue that the nature of cloud computing, with its dynamically allocated computing resources, could lend itself to storing information about the whereabouts and origins of data and then just recomputing results as needed. But they acknowledge that this sort of system would require forecasting where prices would be headed and figuring out things such as the cost of not being able to immediately access data.