Federated Learning: what and why?

Federated Learning: what and why?

Why federated learning pops up?

Research on artificial intelligence (AI), machine learning (ML) and deep learning (DL) have led to disruptive innovations in lots of fields within decades. The value of AI applications and real-world data is unquestionable nowadays. However, one real and difficult challenge lies in how people dealing with data. In order to be robust and performant, modern ML/DL models need to be learned from sufficiently large curated data. Also, as GDPR (General Data Protection Regulation) went into effect, concerns regarding user data privacy and data protection appear when using existing machine-learning approaches of centralizing data from multiple locations. Thus, federated learning has then come in people’s eyes.

For these reasons, the ability to train machine learning models across multiple locations without moving the data has become a critical and necessary technology. To solve this problem, Google AI first introduced the new methodology “Federated Learning (FL)” in 2017[1]. In just a few years, this new machine learning technique has taken significant strides forward and is now widely applied in industries, especially for healthcare.

Now, let’s explore what is federated learning!

Data does not move, only algorithms travel!

Traditional machine learning often uses a central server that hosts the trained model in order to make predictions. Federated learning (FL), in contrast, is an alternative approach of training machine learning algorithms on private, fragmented data which is stored on a variety of devices, clients, or servers. Machine learning process (i.e., training algorithms) occurs locally at each participating device and only model characteristics (i.e., weights, parameters, or gradients) travelled. This enables people to build a consensus model collaboratively without moving data beyond the device in which they reside.

In other words, the data pipeline of FL is different from traditional machine learning methodologies. First, the current model is downloaded from the central server and computes an updated model at the local device itself using local data. These locally trained models (i.e. parameter weights without local data) are then sent back from the devices to the central server where they are originally formed. Finally, a single consolidated and improved global model is sent back to the local devices. This procedure might be repeated several times to make the model optimized and fit to the up-to-date data.

Federated Learning EN
A comparison between traditional machine learning approach and federated learning

Google provides a mobile phone example to describe how FL works:

It works like this: your device downloads the current model, improves it by learning from data on your phone, and then summarizes the changes as a small focused update. Only this update to the model is sent to the cloud, using encrypted communication, where it is immediately averaged with other user updates to improve the shared model. All the training data remains on your device, and no individual updates are stored in the cloud.

If you are really interested in this example, please visit Google AI blog or its short anime on YouTube.com[1][2].

A typical example of federated learning

A FL workflow can be realized with various topologies and compute plans. Here’s the most common one for industrial applications: an aggregation server approach.

federated learning aggregation

The aggregation server approach is depicted in the figure above. This typical FL workflow contains a federation of training nodes which first receive the global model. The partially trained models on the node will then be resubmitted to a central server intermittently for aggregation and then continue training on the consensus model that the server returns. Noted that the trained algorithm parameters are pooled here, not data. A new, compound model will be aggregated based on the contributions (algorithm parameters) from training nodes. This compound algorithm will then be shipped back to each participant (node) for more training using more local data, and then transferred back to the central server for more optimization. Eventually, algorithms on all local participants will be converge into an optimal, trained algorithm. During all the FL procedures, only algorithm parameters travel, while local data does not.

FL implicitly offers a certain degree of privacy, as FL participants never directly access data from other participants and only receive model parameters that are aggregated over several participants. This means, instead of pooling data from local devices to central server, FL enables all participating devices train the same ML algorithm on their own proprietary data. Actually, the participants can even remain unknown to each other in a FL workflow with aggregation server.

Advantages of federated learning

  • Devices are able to train and learn a shared model collaboratively without transferring the local data.
  • Model training is moved to the edge. For example, personal devices, participants, or some organizations that are required to operate under strict privacy constraints such as medical centers. Making personal data remain local is a strong advantage after GDPR launched.
  • Since FL does not require the raw local data being transmitted, the time lag for model updating could be reduced, which makes real-time prediction possible.
  • Under the premise that privacy is ensured, federated learning allows for faster deployment and testing of smarter models, and lower latency.

Challenges for federated learning

  • The communication process need to be secured. It is a critical procedure in FL networks. In order to train and update a model using a participant’s local data in the FL network, communication-efficient methods are necessary which can reduce the total number of communication rounds and the cost of time.
  • Need to make sure that the data to each of the nodes is labeled in the same fashion so that data can be compared under the same criteria. For example, comparing apples to apples, and bananas to bananas.
  • FL needs to carefully consider the data heterogeneity problem. Inhomogeneous data distribution really challenges FL algorithms. Furthermore, data heterogeneity may lead to a situation in which the global optimal solution may not be optimal for an individual local participant.
  • There are lots of ways to aggregate the data in FL networks. Also, there is not a dominant FL topology (communication architecture of a federation) for industrial applications. That’s a critical part which is still under research.
  • FL still needs to face privacy concerns. In order to update models, the communicating throughout the process might nonetheless reveal sensitive information, either to a third-party, or to the central server.

Federate learning applications

  • FL is now widely used in healthcare industry where data privacy is paramount. Patients’ data (images, disease history …etc.) will not be transferred to other medical institutions under FL networks. NVIDIA has made a lot of efforts on FL for healthcare[3][4].
  • Google uses FL methodology to make users better personalize their smartphones[1][2].
  • Professor Qiang Yang and his team are leading IEEE standards in federated learning. They are now donating an open source software to the Linux Foundation to allow anybody to use it for free and build their federated system. Also, they are working with several organizations, such like hospitals, banks, and financial institutions. He is also the Chief AI Officer at WeBank.
Federated Learning Applications
  • Banks, no matter the size of the bank, make contribute to a global fraud detection model using FL networks.
  • Self-driving cars: various machine learning technologies are encapsulated in self-driving car to make it adapt to the environment (i.e., a sharp turn on the road, a rough slope …etc.) Traditional cloud approach may generate safety risks due to the lack of quick response to real-time situations. Federated learning is now applied as an alternative approach for limiting volume of data transfer and accelerating learning processes.
    • Pokhrel, Shiva Raj (2020). “Federated learning meets blockchain at 6G edge: a drone-assisted networking for disaster response”: 49-54. doi:10.1145/3414045.3415949.
    • Elbir, Ahmet M.; Coleri, S. (2 June 2020). “Federated Learning for Vehicular Networks”. arXiv:2006.01412 [eess.SP].
  • Several browsers like Mozilla Firefox and Google Chrome have tried to phase out third-party cookies. A new approach which is called “Federated Learning of Cohorts (FLoC)” could enable interest-based advertising on the web. FLoC has the ability to target digital ads to groups of like-minded users, while still keeping individual browser histories private. (https://github.com/jkarlin/floc)


In this article, we introduces a hot distributed machine learning methodology which is so-called federated learning (FL). FL is motivated by the intuition that users do not send the local data to central servers, but sends the parameters for updating machine-learning models. Users only need to provide part of their computational power for local machine learning training. Within recent years, FL is still a promising and challenging technique that AI researchers are rigorously engaged with bringing this new technology forward and applied in various industries. Hope you like this articles!

Related articles:


  1. Google AI Blog (Federated Learning: Collaborative Machine Learning without Centralized Training Data)
  2. Making every phone smarter with Federated Learning
  3. What Is Federated Learning?
  4. NVIDIA Research: First Privacy-Preserving Federated Learning System for Medical Imaging


Select list(s)*