The risk of privacy restricted different users from sharing their valuable data to carry out data mining applications on their shared data. The privacy-preserving data mining techniques provide a solution to carry out distributed data mining on joint data without revealing any useful information about the underlying data.
In this presentation we discourse a well known problem in Artificial Intelligence and Machine Learning: learning the structure of the Bayesian network explaining the causal relationships between variables involved in an observed phenomenon.
Learning occurs in a distributed environment without revealing the actual data on which the network is built.
The algorithm works in a two-party architecture in which the client who owns the data requests the server for the Bayesian structure on its data by using a randomised protocol for sending the values of any pair of variables involved in the database.
Thanks to the properties of the protocol, we demonstrate that it is possible to reconstruct the true joint probability of any pair of variables without the need to know the true value of the variables.
Finally, in order to prevent attacks coming from the successive observation of the same attribute in combination with multiple attributes, the randomized values of the attribute pairs are communicated using a second randomised protocol.
In summary, the algorithm is organised as follows.
The client-side algorithm creates a private matrix by projection and randomisation of the values of two columns from the original database for each ordered pairs of columns. It randomises the indexes of the matrix rows and following this randomised order it communicates the values of the rows to the server side. Thanks to the randomized order the communication can even be unsecure.
The server-side algorithm generates a contingency table for storing the joint probability of each pair of variables whose values received from the client. Then, the server constructs a Bayesian network using the contingency tables and sends the network structure to the client.
Using this protocol, the second party (server) could not learn any useful information about the client data thus preserving the privacy of the client. We believe that the presented solution provides complete accuracy, fully privacy, ideal universality, and better performance.