Knowledge extraction using natural language processing

TL;DR.

To extract the electric motor design process using human guidelines in English and German. Utilise data sheets, formulae and online resources where(ever) possible. The local designation is called “Knowledge Engines”, but think NLP knowledge extraction.

What is a knowledge engine?

Project specific term for NLP + datasheet combination.

Document purpose

The purpose of this document is i) a followup to Johannes’s first knowledge engine paper ii) as a result of work done within the K24402 group iii) as a roadmap to the future of knowledge engines, iv) introduce the field of machine learning to the field.

High level view

This project describes how to understand, capture, code and work with human-imparted skills humans to build or operate semi-autonomous mechanical and electrical systems. Our goal is “how do we encode skills of humans into electro-mechanical systems?”. From an AI perspective, this document is similar in style to “The measure of intelligence” by François Chollet of Google, published on arXiv this year [Measure]. Whereas that paper has lofty ambitions and proposes first steps for how intelligence can be measured, i.e. how can we measure the system intelligence that solves problems it has not seen before, we want to research how can we encode human knowledge in existing or evolving systems.New form of Digital Twins from Austrian real world simulator.

Intended audience

As a direct result, the intended audience is intended to be i) internal to LCM at first, and ii) as a white paper for discussion to a selected audience after feedback, given iii) With concrete results, as a short publication, iv) and with work done at LCM, RISE and Aalborg a full paper.

What this document contains

This document describes how to understand, capture, code and work with human-imparted skills humans to build or operate semi-autonomous mechanical and electrical systems. Our goal is “how do we encode skills of humans into electro-mechanical systems?” From an AI perspective, this document is similar in style to “The measure of intelligence” by François Chollet of Google, published on arXiv this year [Measure]. Whereas that paper has lofty ambitions and proposes first steps for how intelligence can be measured. For example, can we measure the intelligence of a system that solves problems it has not seen before?

What this document does not contain

It is not a description of electrical motors, the design thereof being done at LCM. It is not supposed to be wishy washy with no real application(s) in mind, nor a summary of other works. The certification process → ranging from peers to authorities, explanation of systems of systems (but references thereto) neither. It should be followable and only need implementation to make a KE engine implementable.

Vocabulary, semantics and terms

Knowledge is a collection of facts, in this context, it could be a set of PDF data sheets. A skill is the ability to perform a task without substantial cognitive load, for example writing a function to sort values from a linked list. Simulation executing a model of a system in a software environment. Emulation executing the actual production software in a simulated environment. A knowledge engine is a software unit which encodes a skill, and a network of knowledge engines is a connected system of knowledge engines, a system of systems.

Technical scope

The area of knowledge, embedded into a working system, is large and varied field. Classically it crosses existing engineering fields such as mechanical and electrical disciplines. Within these large fields we have smaller disciplines, often mixing mechanical and electronic systems such as mechatronic, control and computer systems. Newer concepts such as system of system, also bring a framework setting we could benefit from.

As mentioned the potential scope is wide. Therefore we will limit the work to operational systems, not a design one (AutoCAD), a system that is open to the outside world (has sensors and actuators), a closed one would be SymSpace. So something akin to a digital twin (see section 11) or a system of systems [SoS]. Table 1 shows this document’s position.

	Design knowledge engines	Operational knowledge engines
Open systems	Emulation (Software)	This document (Software) Control systems (Soft + Hardware)
Closed systems	SymSpace (Software)AutoCAD (Software)	Predictive maintenance (Hardware)Virtualisation (Software)

Table 1: Knowledge engines, open and closed, design and operational.

Mechanical and electrical systems: Again the distinction seems clear, but modern practices like robotics and mechatronics combine these disciplines. Even AI, which is seen as a computer system still needs a mechanical system to move a chess piece, turn a page over etc. AI will move more and more into the physical world to make it really useful.

Open and closed systems: A closed system is one that does not interact intensively with its environment in natural operation. An example is a jet engine, where most of the conditions are known, oxygen content at different altitudes, the response of the compressors to the thrust of the control system. A car engine on the other hand might have to cope with gear box, the weight of the car, the type of fuel and so on. Operating with the environment can make the design more complex, as there are more factors in operation. Digital twins work better in a closed system, or partially closed, since there are less (complex) feedback loops from the environment in which the real system needs to operate.

At LCM Sympace is more or less a closed system. The electric motors can be (and should) be designed in isolation. In operation SymSpace would be less useful as the feedback from the system might not be known in advance. There are some caveats for this example, as a motor could be designed to react to different torque, speed and power, but essentially this can be done during the testing of the motor (see correctness), not necessarily in an open system. An open system example could be a small robot car, in which the motors are installed, but the robot car should drive around the surface of the moon, as said interacting with the environment.

Design and operational modes: We touched upon whether the knowledge engine should be used in a design or operational mode. A digital twin is really a design tool. One can perform what if scenarios, but it is used primarily to see the different scenarios in advance of the physical system being deployed. Simulation is essentially running a model of a system, whilst emulation is trying out the production code in a simulated environment. Control theory often operates in the operational environment, by sensing the system and applying actuators to bring the system to a desired point, the temperature of a room for example. Where the system cannot be measured, or is too complex, it is possible to use a model in place of the real system, in oil refineries, for example, models of the distillation process are used.

Knowledge engine motivation

The motivation for a Knowledge Engine and consequently network thereof is design flexibility. A simple example is vehicles where many parts of a car are similar, but the drive trains differ, diesel, petrol, electric, as well as the testing. The more design and testing that can be done purely digitally, the better. It will be cheaper, be able to test in faster than real time (wearage), A/B test solutions and so on. Tooling for predictive maintenance, especially making stubs and hubs to tap data already at the design stage makes a data-driven approach much easier than as an afterthought. Condition monitoring for example tends to be ‘tacked onto’ existing mechanical machines.

Knowledge engine users

One of the determining features of a Knowledge Engine is what and who are the users. The ‘what’ we have covered by the design and operation modes of a KE. We are focussing on the experts→apprentices by capturing the knowledge in the head of the expert. In LCM K24402 we have ‘Automating Siggie’ as a use case. Basically what knowledge would be needed to implement his design steps, to be analysed, kept and potentially sold as a service to customers of SymSpace, more details on this in Implementation below.RISE and LCM have an ongoing task which ‘follows’ the expert he design steps, by recording the design decisions. In this case the users are new users to SymSpace, with a minimal viable product of a working, but not necessarily an optimal motor.

Composable engines are KEs which can interoperate and communicate. These can be in the design space, for example have different purposes, power control or energy consumption, they may be functional, such as a module that can classify data from a unit, or even a module in operation mode (only). Implementation is again covered below. Here there are groups of users, one for each KE and one for the system, probably as an operator.

System representations

A system of systems (SoS) is loosely defined by five attributes: i) Elemental operational independence, the constituent systems may operate independently ii) Managerial independence of the elements iii) Evolutionary development meaning the SoS is not fully formed but progresses, iv) an system emergent behaviour fulfilled by global behaviors v) geographical distribution, meaning constituents exchange information [SoS].

A digital twin (DT) can be thought of as a digital representation of a physical entity. From small, electric motors to jet engines, they are a copy, close enough to the original, for the purposes of design or operation. DTs for the transport sector are scarce, as vehicles are proprietary and are usually single ownership. Furthermore, a vehicle is subject to varying road conditions, driving situations, and human behavior [Towards]. Knowledge engines as system of systems?

Data representation

Knowledge representation from a data perspective is well developed (and developing). It is often coupled with reasoning systems, which constitute part of the AI field [Russell]. Ontologies, derived from philosophy, describe entities and their existence. Objects can be grouped according to similarities and differences between them. For us they are useful where a vehicle moves in a context as well as just in space and time an example could be “machines numbered #40-100 report your condition monitoring data”. The de facto language OWL structures information including taxonomy and classification features [OWL]. A Knowledge Graph is an information base gathered from many sources, modelling attributes and their relations, one example is Google with 18B facts to enhance search results & DBpedia’s graph of Wikipedia using some 580M facts [DBpedia]. SymSpace already includes a knowledge graph, to maintain and hold features of motor parts.

One method to “Automate Siggie” is look at a simple rule-based representation. Where imperative programs say how to solve problems through a set of commands, a declarative language like Prolog, Haskell, Lisp or ML describe what the problem is and state a number of rules to find a (possible) solution. That is a rule-based predicate logic for representing knowledge that the apprentices of SymSpace could use, the rules themselves being encoded by expert users such as Siggie. See the implementation section below for examples.

Existential questions like “is the representation format sufficient?” are important to ask in the choice of data representation. Also the scope of KEs in general, should they be extended out of electric motors, the representation format might be questioned. If different KEs should be coded with different formats (logic, graphs, databases) then an interchange format or protocol would need to be defined and evaluated. We have chosen to test ‘what resides where’ as a case-by-case test, starting first with the LCM motors. Any more elaborate KEs (e.g. a digital twin) should encompass the motor base case.

Implementation

A language like Prolog is a suitable tool for representing symbols representing connections between its arguments. Therefore as an engine to encode rules, we consider this a suitable choice. By design decisions we mean factors such as #windings of a motor, #poles, often discrete steps which can influence other steps, and can be codified (see implementation below). Composing KEs can be a simple as using callback functions, APIs (stateful or stateless), serialised data streams like JSON, or virtualised compute with containers like Docker. As external partners contribute KEs it reasonable to think that the KEs will be composed, loosely a system of systems. Graph databases such as Neo4J or GraphDB are good options for storing information for production, as is Prometheus, for time series storage and coupling to container environments like Kubernetes. It is the database behind the SoundCloud platform and application.

Knowledge engine correctness

Correctness is a key issue, i.e. does the KE resemble the real world, in the LCM case up to 50kW can be verified with real motors, which is closer to the verification field. However SymSpace is well equipped to perform such tests and at least give a tolerance of the specifications. For a more generic KEs, we can rely on traditional software engineering methods such as unit/system tests. However, the assumptions made in the design and testing of KE might be more difficult to prove its correctness. A method is to expose thinking to groups further away from K24402 as well as customers of LCM.

Example I: Electric motor design

LCM.

Example II: A vehicle entirely digitally designed “context car”

Setting: A future vision is a vehicle within a smart city accessing and partaking in data-driven services. Excluding low-level communication and specific vehicle details, ContextCar defines what primitives and services are needed for the vehicle to communicate with its environment. This is for the benefit of the driver, fellow motorists and city residents. Furthermore, services should not only for singly owned vehicles, but also for fleets of vehicles, for example, taxis as shown. Essentially, ContextCar describes a software architecture for use in, and out, of the vehicle as shown figure.

Design mode: i) Designing the correct abstraction layers for communication protocols to access and store the right data and the required amount of data for the use cases defined in ContextCar, ii) finding an appropriate data exchange or conversions’ format suitable for a smart city, iii) satisfying timely data transfers in transport scenarios, iv) finding data representations for a context-aware vehicle, and v) maintaining trust, consistency & security of the gathered data. Each KE in this case would have a separate and independent representation, this is an ongoing project within RISE to extended twins into operations.

Operational mode (given Swedish sources): ContextCar needs access to data about vehicle operation held only by the OEM manufacturer during the project. It will be important to identify these sources and how to process them, including anonymisation. It will also be important to identify potential business models and beneficial use cases for OEMs, operators and cities. It will also need access to open data sources such as the weather SMHI, commuter information, e.g. Trafiklab, highway and friction data as provided by Trafikverket, open city information from openlabsthlm as well as mapping information from openstreetmaps. These data sources as well such as the OEM proprietary format, are candidates for the operational ContextCar.

Figure 1: ContextCar & environment

References

Much focus is on Industry 4.0, digital twins and digitalisation of traditional industries [Industry40].

[ContextCar] Ian Marsh (RISE), ContextCar: A digital twin prestudy, 2018.

[SoS] Jakob Axelsson, System of systems, personal blog.

[ERCIM] A special issue on digital twins, Oct. 2018.

[FMI] Functional Mock-up Interface.

[Measure] Francois Chollet, The measure of intelligence, arXiv:1911.01547, Nov. 2019

[Johannes] Instantly deployable expert knowledge Network of Knowledge Engines.

[Russel] Stuart Russell & Peter Norvig, Artificial Intelligence: A Modern Approach.

[Towards] Towards a Generic IoT Platform for Data-driven Vehicle Services.

[Industry40] Industry 4.0, wikipedia entry.