Research_

Algorithmics and analytics

Leveraging distributed techniques to untold scale computations

Algorithms that solve popular problems effectively and efficiently are the main pedestal for the theoretical foundations of distributed computing. We focus on problems in scheduling, bin-packing, assignment and network flow.

Most practical problems in distributed computing can be reduced to well-defined problems in algorithms. In particular, we study algorithms in the areas of scheduling problems, bin-packing problems, assignment problems and network flow problems. We investigate, for example, approximation algorithms that give a quality bound on the solution (where the computed schedule is no worse than 50 percent of the optimal schedule). We use techniques such as linear programming, probabilistic algorithms, convex optimisation and other methods to get a handle on these algorithmic problems.

We also look at data mining and big data analytics where there is a need for techniques to handle the massive data sets and be able to scale statistical inference to large and high dimensional data sets. Towards this end, we leverage techniques from distributed computing to scale computations to previously unimaginable levels.

**Our experts:** Professor Albert Zomaya

**Our partners**:* *Professor Paulo Pires and Professor Flavia Delicato (Federal University of Rio de Janeiro)

The proliferation of digital data sources is generating an unprecedented amount of data daily. This massive volume of data has the potential to generate new opportunities for the understanding of natural, engineering and human systems and can lead to extraordinary new knowledge and discoveries in many fields of application. However, analysing this massive volume of data with exponential growth is not a trivial task. Since its birth in the 1980s the rough sets theory has proven to be a useful tool for data analysis, and a wide variety of rough sets algorithms were implemented in scientific and commercial tools for data processing; however, they are very time consuming when data sets are in a large scale.

This project aims to investigate how to leverage rough sets calculations through field programmable gate arrays (FPGAs) and graphics processing units (GPUs), to support massive data-intensive applications.

**Our experts:** Professor Albert Zomaya,* *Professor Mikhail Prokopenko, Dr Mahendra Piraveenan

Security concerns are increasingly at the forefront in today’s world. A complex network’s topology plays an important role in determining how the network can resist random and targeted attacks. It has been shown by prominent researchers that the so-called scale-free networks display high resilience to random attacks, and this is perhaps partly the reason for most of the ‘real-world’ networks in any domain being scale free. Yet scale-free networks are vulnerable to targeted attacks. It is possible to design network topologies so that they display high levels of tolerance to both random and targeted attacks. However, quantifying the robustness and attack tolerance of a network topology is necessary to achieve this goal.

This project will attempt to define robustness measures for complex networks, which are particularly useful in sustained attack scenarios, and test their effectiveness in various domains, including computer networks, the Internet and World Wide Web, social networks and biological networks inside organisms.

**Our experts:** Professor Albert Zomaya, Dr Mohammadreza Hoseinyfarahabady

Unlike traditional NoSQL databases that use the Linux/OS file system that was built for rotational drives, modern NoSQL engines (such as Aerospike) bypass the native OS filesystem and have implemented their own systems to access solid-state drives directly. This capability makes it great for achieving high throughput for both reads and writes. Past experiments showed that it can easily hit one million writes per second with only 50 servers. On the other hand, the Amazon EC2 Instance Store provides a unique opportunity for a user to lease a temporary instance that the SSD storage attaches to the host computer physically. This means that now you can lease a powerful i2.xlarge computing instance with more than 60 cores and an SSD storage attached to the computing engine with very low latency, for only ~$3/hour in a PAYG fashion.

In this project our aim is to propose a cost-effective solution for using the above-mentioned promising technologies to handle the burst of read/write traffics coming into a system with big-data attributes.

**Our experts:** Professor Albert Zomaya

**Our partner:** Associate Professor Javid Taheri (University Karlstad, Sweden)

Optimisation algorithms can be used to solve a wide range of problems that arise in the design and operation of distributed computing environments (for example, data mining, scheduling and routing). However, the many classical optimisation techniques (for example, linear programming) are not suited for solving parallel processing problems due to their restricted nature.

This project is investigating the application of some new and unorthodox optimisation techniques such as fuzzy logic, genetic algorithms, deep learning and others. However, these techniques are computationally intensive and require enormous computing time. Distributed computing has the potential to reduce the computational load and enable the efficient use of these techniques to solve a wide variety of problems.

**Our expert:** Dr Dong Yuan

**Our industry partner:** NASDAQ Sydney

Market manipulation is a deliberate attempt to interfere with fair trading in stock markets. Circular trading collusion is one method of market manipulation, and involves two or more traders to trade large volumes between themselves with a small net change of stock ownership. It becomes very hard to detect when the collusion has large groups of participants.

This project will investigate and create an algorithm to detect n-party circular trading in the stock market with a very large number of traders. We use graph theory to formulate the problem and design parallel algorithm and efficiently process the graph, to find possible collusions.