20 Nov Identifying and Acquiring Knowledge
One of the simplest knowledge systems is a photograph. It consists of a systematically arranged collection of pixels and its design is based almost completely on framing and focusing. Specifying knowledge software involves framing the knowledge domain and focusing on the aspects that are meaningful to users, and the constraints that affect meaning. By so saying, you may correctly infer that I mean my system to serve people, not the other way around. My posts about planning and determining requirements and environments clearly indicate that the costs of building this system must be justified by human needs.
Because my goal is to build a knowledge system capable of understanding and translating human language (see my earlier post for Step 1), I’ll concentrate on acquiring knowledge in a way suitable for this type of knowledge system project. As we live in a commercially interconnected, but linguistically disjoint world, I believe the opportunity for improving linguistic interconnectedness is huge.
|Understanding Context Cross-Reference|
|Click on these Links to other posts and glossary/bibliography references|
|Prior Post||Next Post|
|Environmental Awareness for AI Geeks||Planning and Scheming|
|knowledge worker acquisition||Lenat 1989 Carrico 1989 Davies 2009|
|domain constraints||Hewitt 1986 Minsky 1975|
|knowledge representation||Weiss 1984 Jakus 2013|
|inference MT||Nirenburg 1987 Wysocki 1997|
The simplest knowledge system is a photograph. It consists of a systematically arranged collection of pixels and its design is based almost completely on framing and focusing. Specifying knowledge software involves framing the knowledge domain and focusing on the aspects that are meaningful to users, and the constraints that affect meaning. Because my goal is to build a knowledge system capable of understanding and translating human language, I’ll concentrate on acquiring knowledge in a way suitable for this type of knowledge system project. I discussed Step 1 in Planning a Knowledge Project. Now let’s determine where to go to find answers to the questions that define our need for a knowledge system.
Step 2 – Knowledge Source Identification and Acquisition
The work in the Knowledge Definition process includes finding the knowledge that will be needed to feed the software that empowers the knowledge workers. The two tasks in Step 2 are to find your sources of knowledge and get that knowledge into a usable digital form (Minsky 1975).
Step 2: Task 1 – Knowledge source identification and selection ==>
A primary source for many expert systems is human experts, and the way to get their knowledge is through the old-fashioned interview process. Most domains have experts whose brains you can pick. Since each domain can have different sources for the knowledge, talking with experts is a good place to start, and if not provide the core knowledge, they can at least point you to the best sources of knowledge (Weiss 1984).
Typical knowledge sources for automatic translation of language are either human or published sources. Human sources include linguists and translators. Because the translation task is fairly straightforward, a linguist or two (preferably with experience in translation) should be permanently assigned to the development team. Published digital sources may include these items:
- printed and machine-readable dictionaries and bilingual dictionaries;
- grammars for the target languages;
- large corpus or corpora for extracting salient language data automatically and testing the grammars and the final product.
Many resources are available to help you fulfill these requirements. You want the ones that will be the most useful for the least cost in terms of money, time, and computational or data-entry requirements.
Searching the Internet is a great source for useful data. Also narrow the field by seeking sources that already have the data formatted in a way that will be easy to convert to the KR scheme. This overlaps both tasks of Step 3. After ranking the available knowledge sources, select the most cost effective and complete set.
Once you have identified the knowledge sources, you must turn the knowledge into data in a usable form. This may involve unlocking knowledge that is hard to get, either because it is locked in people’s brains in places they have trouble articulating, or because it is in unstructured formats that require intelligent parsing. It does not need to go directly into the final KR scheme, but, as you store it, put it into a form that will permit automatic conversion to the final format (XML or labeled rows and columns may be the easiest).
Step 2: Task 2 – Knowledge acquisition and extraction ==>
For the linguistic application domain, Task 2, acquiring knowledge will overlap with Step 3 (knowledge analysis and design). As machine-readable dictionaries and grammars are processed, the most logical approach is to automatically extract the information and convert it into the final form to be used by the inference engine and parser. These systems usually have an interactive component, so “automatically” is used loosely. In order to determine the final form, regularities in the structure of the data must be sought and described.
Grammars have been designed as production systems. Many machine-readable grammars use top-down productions so, if the grammar fits your model, the knowledge acquisition task can be simplified by using one of these. If the grammar does not express the information required by the machine translation (MT) system, you may need to start from scratch (Nirenburg 1987). In such cases, the large corpora become much more important in the discovery process.
For our application domain of MT, the expert knowledge includes at least two human languages (such as French and English). Understanding two languages implies knowing the words or lexicon of the languages and understanding their syntax or grammar.
Deliverables: Two languages worth of lexicon and syntax in machine-readable form.
MT Knowledge Sources
Dictionaries contain elements (words) that are generally the smallest lexical items useful in interpretation. Smaller chunks of meaning (sememes), however, might be useful in cases of unknown words compounded from a root and one or more affixes. The question of whether to only have single words in the lexicon or to have smaller elements such as morphemes and sememes or larger elements such as idioms with two or more words is a Step 3 (knowledge design) question. Clearly, the Step 3 question must be based on constraints discovered in Step 2 (knowledge definition).
MT systems come in various flavors, ranging from mostly automatic systems to online translation aids such as bilingual dictionaries. These represent points on a continuum: automatic is at one end and interactive is at the other. In documenting requirements and test plans, and in writing the preliminary users’ manual, the MT system’s commitment to automation and interaction will have to be clearly stated. This, in turn, will aid in the decisions regarding KR and inference techniques. Also, most MT systems are domain-limited, much like those that use medical or automotive terminology only, so this process of finding where you are on the continuum will have to be part of the scope-narrowing process. It will also need to be described in the users’ manual.
Step 3 – Knowledge Design
Knowledge design has two tasks: definition and detailed design.
Step 3: Task 1 – Definition ==>
As mentioned earlier, waiting until Step 2 (knowledge definition) is completed to begin Step 3 is not advisable. Because the final KR scheme may dictate the types of knowledge that are necessary, the knowledge search can be facilitated by some idea, if not a complete specification, of the knowledge required for the KR scheme. In this sense, language can be a fickle mistress. Most people know how to use it, and many people know more than one language. Still, that does not mean we are experts on language.
In Europe most people have knowledge of more than one language, but just because they are bilingual or linguists does not mean they know enough to specify an MT system. This was part of the problem encountered by the multi-national EUROTRA European Community Automated Translation project. Despite the infusion of billions of dollars and participation by many nations, billions of dollars worth of useful products did not result from the collaboration. An inadequate KR scheme contributed to the problem.
This task corresponds to back-end design in a database application model. The back end contains the database and the DBMS functions (see Section 9).
- Knowledge model
- Rule base model
- Entity relationship model or ontology taxonomy
- Object role model
Given the foundation of these artifacts, your project engineers will have a sound working foundation.
|Click below to look in each Understanding Context section|
|4||Perception and Cognition||5||Fuzzy Logic||6||Language and Dialog||7||Cybernetic Models|
|8||Apps and Processes||9||The End of Code||Glossary||Bibliography|