Transparent AI

Technical (implications and solutions)

Ian Marsh, Ph.D. (RISE-SICS, Sweden,


Recent traffic accidents have shown that complex systems, augmented with some “intelligence” have been at the center of fatal accidents [Uber, Tesla]. Exactly the role of the system, driver, external parties and software engineers was shown not to be clear at the time of the accident, especially following the inevitable media storm. In some reports sloppy/sensational journalism only heightened issues in whom was to be held responsible as part of the inevitable public scrutiny. Furthermore, a legal conflict ensued between parties to whom exactly was responsible with obvious consequences for the industries involved, and possibly for the field of artificial intelligence as well. The German car industry responded directly, stating that in Europe the driver will not be held responsible for any kind of accident involving any kind of “self-driving” vehicle and Toyota initiated a program at MIT, led by Gerald Sussman, an expert on artificial intelligence and programming languages, to develop automated driving systems capable of explaining why particular solutions took particular actions. Work in Explainable AI has gained traction and popularity not only from a technical perspective, but legal, ethic, business and societal points of view. 

Of course, vehicles are not the only systems to have faced problems. Deployed human speech solutions, by Microsoft (Tay) output racist sentiments [Microsoft].  Facebook’s chatbot created a shorthand code between themselves that the developers didn’t envisage. Images that simply confuse DNN solutions are relatively easy to find, and disambiguation has been shown to be relatively difficult to detect. Researchers in Japan published a paper in 2018 called “1 pixel attack” showing that by adding a single pixel to images in two popular public datasets can cause misclassification up to 41% in one dataset, and 68% in another [Pixel].

Post Scope 

The questions that this document attempts to address are as follows

  1. What are the main issues in data science, machine learning and artificial intelligence 
  2. What types of algorithms are there for which this is relevant? Can we classify them?
  3. What types are used most today? Will this change?
  4. What about algorithms that are too complex?
  5. Provide information on what level (user understanding)?
  6. What about adversarial learning (security)?
  7. It’s not only the algorithms, it’s also the training data. What does that mean?
  8. How to characterise uncertainty in the output of a data analytics system?
  9. Do end-users have the right to reverse engineer a solution?
Fig. 1: AI, ML and Deep Learning

The main problems around AI interpretability


Bias, accidental or intentional can be introduced into processes, due to biased training data or code effect, conditionals and switch statements. 

In the following discussion, we assume the bias is accidental, for malicious we refer to [malicious].

Solutions include better training material, unbiasing techniques or standard data sets that should give the same results, irrespective of the code. 

Bias can be introduced into the system due to a  developer or company wants to system to behave in a certain way, e.g. for business reasons, this can be against the values of the society or user community, for example in the case of autonomous driving: keep vehicle passengers safe versus the safety of other road users. Medical trials are an example of the system should confirm to certain, enforced legal and safety limits. The side effects of some medicines have been shown to be somewhat conservative.

Is it possible to distill the key decision making rules into its own module and review that module with stakeholders? Have solutions been black box tested for bias detection, reverse engineering and etc. What is the ground truth (example with traffic flow) Can it be defined and  encoded? What about decision making in situations with ethical conflict? Finally, who has the right to decide and to make the decision making understandable?


  1. Developers do not understand the implications of their code
  2. Critical decision making splintered into multiple modules
  3. Interdependencies are not understood
  4. This is similar to unexpected “bugs” but on a different level
  5. Especially difficult in situations which happen seldom 
    1. (e.g. emergency in autonomous vehicle)
  6. How can we visualize the effects?
    1. Prove that software works correctly 
    2. Prove that software does the right thing

Data greediness

Classifiers often take tremendous amounts of data to learn accurate models for various tasks from computer vision to natural language. While we are increasingly living in a world, where we can acquire a lot of data, for many tasks it is still extremely expensive to get labelled data. Consider as an example, building a conversational dialogue system. It’s very hard to get hundreds of thousands of annotated conversations to feed a classifier, but that’s often what we need to achieve competent performance.


Classifiers learn very complex higher-order decision boundaries to model phenomena. While this allows them to perform so well on many tasks, it also makes it extremely difficult to probe the model to understand why and how it is doing so well. Much of the time, they effectively behave like black boxes that force you to pray to the all-might Deep Learning Gods for a generous bounty. While there has been ongoing work on increasing model interpretability, it is still very much an unsolved research problem. As a point of comparison, traditional ML algorithms such as decision trees, various types of regression, etc. are much easier to interpret.

  1. What types of algorithms are there for which this is relevant? Can we classify them? 

Parallels with IT auditing

How to audit an algorithm? (

Auditing in social sciences: detect bias by for instance sending two CVs one with English name other with Arabic name. This is a kind of outside-in (probing) transparency. (It has drawbacks in social science because it can cause side-effects & is wasting time & money of the ones being audited) Forms of auditing, translated into auditing algorithms:

  1. Code auditing: Reddit opens its algorithm; except one part, the one that prevents spammers.  Solution is of course a 3rd party escrow to scrutinize an algorithm. However, it is hard to understand the code.
  1. Noninvasive user audit. Asking users about their experience; actual user interaction. Problem: how to sample the behaviours?
  2. Scraping audit: repeated queries & observe behaviour. Scraping itself may violate terms & conditions in cases, so algorithm owner has to agree
  3. Sock puppet audit: Using testers to act as ‘normal’ users. (also legal issues her) Interesting research question: what amount of false data injection would be required to proven algorithm is misbehaving and yet still inject data in such a way that the algorithm being investigated is not itself perturbed by the injection of false accounts.

These are techniques that could in some way aid reverse engineering.

Adversarial learning

This is the effect that in some cases only small changes in the inputs to an algorithm are required, in order to let them behave in a completely different way. It works because it creates inputs that seem natural to the human, but that are nonetheless not within normal realistic bounds in other respects. Examples:

  1. Traffic signs:
  2. Speech recognition: Adversarial examples against machine learning systems are such an incredible exploit, because they weaponize the gap in our understanding of those systems. This new one targets speech-to-text extremely effectively

If you can observe for instance a trader on a market, if s/he is transparent, you can devise ways to speculate against him. This paper “Counterfactual Explanations Without Opening The Black Box: Automated Decisions And The GDPR (Sandra Wachter, Brent Mittelstadt, & Chris Russell) (Oxford Internet Institute) argues that ‘counterfactuals’ would be a way to explain algorithms, especially  in cases where the algorithm is too complex to understand at once. However, they also acknowledge that this could be compared to adversarial learning. The difference is that with adversarial learning usually many dimensions and/or data points are changed slightly (in order to shift the algorithm to another output state), and that ‘counterfactuals’ would concentrate on changing only one (or few) parameters. The approach also has parallels to the ‘sock puppet audit’ introduced above.

Interpretation of ML

Many of the algorithm transparency scientific papers deal with machine learning. The problem is that the interpretation of machine learning models is a general and difficult problem. There is work on how the models could explain their behaviour but the results so far are limited. In addition to paying attention to machine learning models the algorithm transparency naturally applies to other decision making algorithms as well.

  1. A Survey of Methods For Explaining Black Box Models  (Guidotti et. al, CNR & UniPisa, 2018)
  2. Survey of issued related algorithm transparency (Mittelstadt, Allo, Taddeo, Wachter, & Floridi, 2016)
  3. Some machine approaches try to generate rules, which should be understandable to humans (Letham, Rudin, McCormick, & Madigan, 2015)
  4. Using dimensionality reduction to make machine learning model interpretable (Vellido, Martin-Guerroro, & Lisboa, 2012).
  5. Measures for detecting bias in black-box algorithms (Datta, Sen, & Zick, 2016)
  6. At least four different tasks are handled with algorithm decision making:  prioritize, classify, associate, and filter (Diakopoulos, 2016)
  7. User experience of transparency information. There is need for disclose mechanisms, which do harm usability (Schaffer et al., 2015)
  8. Having access to the code is not enough because for complex systems even its builders do not understand all dependencies (Ananny & Crawford, n.d.). Furthermore, in the case of learning and evolving systems they are continuously changing so a stable snapshot for careful studies is missing.

IPR protection and limits to transparency

Possible solution to IPR protection: a trusted third party could act on behalf of the public, audit the algorithm and state that it is OK. > Are there examples of this? Would it work? Would it be accepted? When algorithmic transparency would include giving insight into the training sets, that might be a way to reduce unwanted biased algorithms  (models). But it would also create problems with privacy or confidentiality. Additionally, algorithms (models) that update themselves (reinforcement learning) would be very hard to make transparent since their training data is constantly changing.

Improving quality

Within BDVA we are interested in simplifying algorithms, as far as can be, even sacrificing some accuracy for transparency. For example using a decision tree instead of a random forest. From telecommunications use Principal Component Analysis to reduce the number of dimensions in measurement data from 20+ done to 4-5 for analysis.

In traffic flow, we use standard Deep Learning algorithms, “out of the box” e.g. RNN, FF, LSTM, but if successful break them back into their statistical parts. We deconstruct the hidden layers (the non-linear functions) into functions we can model and use without NN “black box”. Industry does not want NN systems it cannot understand, but is useful to see if there is potential without much effort. We are looking at the “Keep it Simple Stupid”  for NN, for example LSTM (4 internal gates) to GRU (2 gates) for traffic flow.

(Example: craft. AI they switched to rule-based algorithms instead of neural networks because users wanted to know why a certain advice was given.) Models should be able to trace decisions back to data. It depends on the model how easy this is. When you have an output, you should be able to trace back from which data it came.

Technical goals of transparency

Auditing has two goals:

  1. For the developer / engineer: to check how it works as intended, find errors, etc
  2. For the end user: to understand why a certain decision has been reached

Interesting parallel with open source (and open data); “more eyeballs will make all bugs shallow”. (It may be a case for a business because it engages the community to improve or give feedback to the data and the way it is processed. Or to create a kind of ground truth.)

Level of transparency (interpretability)

[1] paper addresses interpretability, but does not mention transparency at all.

In some cases, you will have a very simple explanation of the algorithm (say, pseudocode) but not need all details. Such explanation may be freely given; the details are kept secret. However, in other cases, you explain it in one way, but the actual implementation is different, even though the input-output relations are equivalent. Is this a problem? Is it allowed?

It’s also a trade-off; what do you explain, what not; what is compatible with legal requirements what is compatible with your audience?

Simple example: encryption. You can run it as an algorithm (input > output). Or you can implement it as a lookup table of inputs to outputs. Which is completely different. Is it allowed? (example: Antonio will think of one)

Reverse engineering

Do end-users have the right to reverse engineer? (If the algorithm is kept as a secret.) Is it legal? If you have IPR protection, it might be a problem.


White paper within BDVA,


[EU] High-Level Expert Group on Artificial Intelligence. Draft Ethics Guidelines for Trustworthy AI, European Commission, Dec. 2018.

[ACM] ACM Policy Council: Statement on Algorithmic Transparency and Accountability, 2017, link.


[BADA] Report on data readiness. Equivalent to a TRL for data. The state of the data plays an important role on how useful it is. Should lots of wrangling be needed up to 90% of a projects’s resources may be used up getting the data into a meaningful state. Examples include missing fields and features, missing rows, NA values, local use of decimal points (commas are common in Europe) and so on. Therefore the report classifies what a Data Readiness level is using examples from 3 difference projects.

[LIME] Local Interpretable Model-Agnostic Explanations, original blog post, updated and more readable here [Oreilly]

[Stockholm AI] Distilling AI: Interpretability summarised. Ingrid af Sandeberg, Carl Samuelsson, Amir Rahnama, Yuchen Pei and Patrik Bergfeldt, December 2018. A summary of LIME and the book from [Molnar]. They define interpretability, Interpretability requirements and tradeoffs. A session on interpretability has just occurred in Aug. 2019.

[Paradox] LL.M. S. Djafari, et. al, The paradox of Algorithmic Transparency, BDVA, 2019. Governments and private organisations use algorithms as part of a decision-making process with potentially significant consequences for individuals and as a  society as a whole. Because algorithmic decisions may reflect be bias, or may not offer explanations that satisfy our accustomed social and legal expectations, there is growing debate that our current frameworks for implementing transparency and accountability may not be sufficient as governance mechanisms.

[Automating] Automating Inequality: How High-Tech Tools Profile, Police, and Punish the Poor, Virginia Eubanks,

[Weapons] Weapons of Math Destruction: How Big Data Increases Inequality and Threatens Democracy, Cathy O’Neil, Winograd concluded that it would be impossible to give machines true language understanding using the tools available then. The problem, as Hubert Dreyfus, a professor of philosophy at UC Berkeley, argued in a 1972 book called What Computers Can’t Do, is that many things humans do require a kind of instinctive intelligence that cannot be captured with hard-and-fast rules. This is precisely why, before the match between Sedol and AlphaGo, many experts were dubious that machines would master Go.  


[Holst] Correlation and causation

[EU] Ethics

[Bolukbasi] Detecting and removing gender bias in computer word embeddings in natural language processing (Bolukbasi, Chang, Zou, Saligrama, & Kalai, 2016).

[Ananny] Ananny, M., & Crawford, K. (n.d.). Seeing without knowing: Limitations of the transparency ideal and its application to algorithmic accountability.

[Bolukbasi], T., Chang, K.-W., Zou, J. Y., Saligrama, V., & Kalai, A. T. (2016). Man is to Computer Programmer as Woman is to Homemaker? Debiasing Word Embeddings. Retrieved from

[Datta], A., Sen, S., & Zick, Y. (2016). Algorithmic Transparency via Quantitative Input Influence: Theory and Experiments with Learning Systems. In Proceedings – 2016 IEEE Symposium on Security and Privacy, SP 2016 (pp. 598–617). IEEE.

[Diakopoulos], N. (2016). Accountability in algorithmic decision making. Communications of the ACM, 59(2), 56–62.

[Letham], B., Rudin, C., McCormick, T. H., & Madigan, D. (2015). Interpretable classifiers using rules and Bayesian analysis: Building a better stroke prediction model. The Annals of Applied Statistics, 9(3), 1350–1371.

[Mittelstadt], B. D., Allo, P., Taddeo, M., Wachter, S., & Floridi, L. (2016). The ethics of algorithms: Mapping the debate. Big Data & Society, 3(2), 205395171667967.

Schaffer, J., Giridhar, P., Jones, D., Höllerer, T., Abdelzaher, T., & O’Donovan, J. (2015). Getting the Message? In Proceedings of the 20th International Conference on Intelligent User Interfaces – IUI ’15 (pp. 345–356). New York, New York, USA: ACM Press.

Vellido, A., Martin-Guerroro, J. D., & Lisboa, P. (2012). Making machine learning models interpretable. 20th European Symposium on Artificial Neural Networks, Computational Intelligence and Machine Learning (ESANN), (April), 163–172. Retrieved from

Guidotti, R., Monreale, A., Turini, F., Pedreschi, D. and Giannotti, F., 2018. A Survey Of Methods For Explaining Black Box Models. arXiv:1802.01933

Ribeiro, M.T., Singh, S. and Guestrin, C., Why should I trust you? Explaining the predictions of any classifier.  Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 2016, DOI.

[] Conference on Knowledge Discovery and Data Mining (pp. 1135-1144).

[DARPA] video.

John [Launchbury, the Director of DARPA’s Information Innovation Office (I2O)


[ERCIM] Transparency in Algorithmic Decision Making, ERCIM News, Jan 2019, link.