AFML - Sharing knowledge without data

Public administration services struggle under the burden of tedious processes. For good reason, complaining about the slow grind of government agencies has nearly become a national pastime in Germany. However, technological progress has still not completely bypassed German bureaucracy, which is frequently viewed as behind the times. Machine learning technology in particular can help simplify and accelerate processes with data-driven methods. One of the major hurdles is not the algorithms, but more the processing of the required data while complying with guidelines such as the European General Data Protection Regulation. With this in mind, IBM and fortiss are working within the framework of the Center for AI on accountable federated machine learning (AFML) projects, in which data protection guidelines are fastidiously maintained while still creating valid results.

How does federated machine learning function?

To create a machine learning model, you need as much detailed data as possible. Because data protection regulations often stand in the way, a method has to be found that allows the models to be trained sufficiently enough so that they accurately function while adhering to strict data privacy requirements at the same time. This is where so-called federated machine learning comes into play.

Figure Federated Learning — Fig. 1: Federated Learning

With federated machine learning, an algorithm learns from stringently-separated data sets. All of the process owners manage and save their own data locally. fortiss and IBM selected this approach for use in an AFML project for German public administration services with several German cities. The aim was to create a prototype of an idea classifier that can recognize how residents formulate civic ideas, which they can use to become actively involved in shaping their cities and communities. In the end the machine learning algorithm helps to classify all civic ideas faster and thus respond to them quicker. Because they are not permitted to share their data, the individual cities trained their own models with their own data and forwarded these results to an aggregator by means of a predefined workflow protocol. The aggregator in turn collected all of the shared information and using the input from all of the cities, designed a machine learning model that could be fed back.

The model must provide accountability

A key point with every machine learning model is its transparency and explainability. Particularly when it comes to use in government agency applications, it was important to be able to comprehend and verify how the model was created. The question of accountability, in other words how precise and trustworthy the model is, was addressed in the next step by fortiss and IBM. In line with the motto “trust but verify”, an accountability framework was designed that enables the model to be tested with respect to reproducibility and potential errors that could occur during training. The system must also ascertain if a bias exists, whether in terms of the gender, ancestry or socio-economic status of the citizen, as well as determine if the model is fair or was manipulated in some way.

Federated machine learning models are difficult to verify since they don’t have the raw data. Federated learning systems can be trusted in theory, but to provide accountability through the special model, IBM and fortiss employed diverse claims by means of the Evidentia technology. The claims are based on various model points, such as the training process workflow or the processed data. Verification of these claims is the accountability, which third parties such as auditors can also access.

Issue to verification

The accountable federated machine learning process consists of four steps, which ends with the audit of the model. The first step involves setting up the project, gaining an awareness of the goal of the actual issue and clarifying precisely what the project should achieve. In pre-processing, the data to be included in the model is selected, followed by the actual training of the model. The last step – pre-processing – then involves deciding exactly how the model should be implemented.

Figure Accountability — Fig. 2: Accountabilty

While these four steps sound simple, the actual implementation is complex. There are numerous stumbling blocks that can impact the final model. The previously-mentioned bias in the data can affect the quality, not to mention different data sets that can lead to partial aspects of the model being over-trained or under-trained. Final implementation has to be properly coordinated as well so that the model actually functions as intended.

For this reason, verification of the model is carried out with a fact sheet developed by IBM and fortiss, which covers all four steps and all claims and is continuously updated during the process. All relevant information is maintained in this fact sheet, thus providing an explanation as to why the model can be trusted.

Accountability with data protection

The concrete verifications that have to be stored and entered into the system are determined at the very beginning of the project. This information is then verified and the result is continuously maintained in the fact sheet. The system shows the auditor at a glance which claims were verified, whether a problem has occurred or whether the information is in line with the required standard. This allows the auditor to review all of the relevant information and verify the machine learning model.

Particularly in areas where strict data protection regulations exist and highly sensitive data is being handled, designing a machine learning model in an understandable manner is a challenge. Federated machine learning allows the creation of such models without storing the raw data in a central repository. However, trust in the algorithm is first developed through verification of the individually-coordinated claims and a definitive and readable fact sheet. IBM and fortiss are thus continuing their research efforts to find ways to spur accountable federated machine learning.

Name	Purpose	Lifetime	Type	Provider
_pk_id	Used to store a few details about the user such as the unique visitor ID.	13 months	HTML	Matomo
_pk_ref	Used to store the attribution information, the referrer initially used to visit the website.	6 months	HTML	Matomo
_pk_ses	Short lived cookie used to temporarily store data for the visit.	30 minutes	HTML	Matomo
_pk_cvar	Short lived cookie used to temporarily store data for the visit.	30 minutes	HTML	Matomo
_pk_hsr	Short lived cookie used to temporarily store data for the visit.	30 minutes	HTML	Matomo

AFML - Sharing knowledge without data

How does federated machine learning function?

The model must provide accountability

Issue to verification

Accountability with data protection

Your contact

Related news

More information