Healing and Self-Repair in Large Scale Distributed Computing Systems


The project will focus on the development of fault tolerance mechanisms to allow distributed systems to operate under different operating conditions.


Professor Albert Y. Zomaya

Research Location

Computer Science

Program Type



As the complexity of distributed systems increases time there will be a need to endow such systems with capabilities that make them capable of operating in disaster scenarios. What makes this problem very complex is the heterogeneous nature of today’s distributed computing environments that could be made up of hundreds or thousands of components (computers, databases, etc). In addition, a user in one location might not be able to have control over other parts of the system. So it is rather logical that there is a need for “smart” algorithms (protocols) that can achieve such an acceptable level of fault-tolerance and account for a variety of disaster recovery scenarios.

Want to find out more?

Contact us to find out what’s involved in applying for a PhD. Domestic students and International students

Contact Research Expert to find out more about participating in this opportunity.

Browse for other opportunities within the Computer Science .


parallel systems, Distributed systems, internet-scale computing systems, Distributed computing, complex systems, optimization, ICT

Opportunity ID

The opportunity ID for this research opportunity is: 978

Other opportunities with Professor Albert Y. Zomaya