10 Dec Measuring Knowledge
Sometimes you need to know about your knowledge. When you’re in the middle of trying to build a system that knows stuff, you may ask, how much does the system know after this training or learning cycle as a percent of the total knowable amount? When we test students in their learning cycles, we use a set of metrics from ABC grades to college entrance exam scores to see how well they are learning. There are many standard tools used for measuring the quality and completeness of data, and they are often categorized as “Data Profiling” tools. “Data profiling tools track the frequency, distribution and characteristics of the values that populate the columns of a data set; they then present the statistical results to users for review and drill-down analysis” (TechTarget on data profiling).
Data, however, has different, probably simpler characteristics than knowledge. The differences could make traditional data profiling tools less useful. The data profiling process typically captures metrics applicable to the values in individual columns of data, such as the number or percentage of populated versus empty fields; the number of unique values; frequency of occurrence for each unique value and for unique patterns in the data. The profile may also include the maximum and minimum values; and information on data types and the length of character strings in data elements. “More sophisticated data profiling software will provide additional details about dependencies between columns and relationships across tables. The goal is to identify data anomalies or shared characteristics among data elements” (ibid).
|Understanding Context Cross-Reference
|Click on these Links to other posts and glossary/bibliography references
|Co-Responsibility in Hybrid IT
|expert system output
|Neural network models
Evaluating knowledge involves looking at fragments of knowledge in the context of their categories and associations.
- Categories, knowledge fragments or associations missing when they should be present
- Weak or poorly represented categories of knowledge values and associations
- Invalid knowledge fragments or associations present when they should be absent
- Knowledge fragments or associations appearing in one or more categories with unexpectedly high or low frequency
- Knowledge fragments, associations or category distributions that don’t match an expected pattern or format.
- Outlier knowledge values that are much lower than or far exceed a defined range.
The following metrics are associated with determination of the range of performance or status characteristics and their distributions:
Unlike the metrics above for measuring knowledge, Nonparametric (distribution-free) Statistics operate on fewer and less strict assumptions. They, therefore, may be even more useful for fuzzy or irregular data with characteristics such as:
- Distinctly non-normal elements that cannot be transformed
- A sample set too small for the central limit theorem to lead to normality of averages
- A distribution not covered by parametric methods
- Elements from an unknown distribution
- Nominal or ordinal data sets
I’ll address techniques briefly in this post, but please follow the references for more depth understanding (iSixSigma).
- Making Sense of Mann-Whitney Test for Median Comparison
- Using the 1-Sample Sign Test for Paired Data
- Understanding the Uses for Mood’s Median Test
Any of these tools can be of service to the expert system designer. Given a knowledge source or sample set, we can create processes to simply ingest the data, or learn it. Neural networks and other knowledge learning models are evaluated by comparing the output set with the expected outputs. For neural networks, the comparison is strictly statistical: each cycle the outputs will grow statistically more accurate if the network is properly set up. When the data is semantic, (i.e. in a non-neural format) and is being learned as ontology or rules, the metrics may require a combination of statistical and semantic comparison, and can be used to evaluate the amount of knowledge correctly learned, and determine the percentages of false positives and false negatives.
When the system is used to solve problems of different categories, you can define a topological space with different areas or sectors of the topology assigned to each category. I selected this image to represent a topology, even though it doesn’t show hierarchy and inheritance, because it represents two of the most important attributes of a topological space: 1) spheres showing distinct groupings or categories (that may have a tight interconnection structure within the category, and 2) interconnections between categories that may be looser and irregular. If the approach is to run automated learning algorithms and test the acquired knowledge against a set of problems the knowledge should be able to solve, then measure the success rate for different classifications of such problems. Each category of problems can have its own sector in a general topology or each rule can have its own sector in a category-specific topology.
Besides evaluating the knowledge in a system, the statistical methods described can be used to help deliver knowledge to end users in the form of numeric and graphical reports and dashboards. The trick is to use the right tool for the task at hand. For example, if the requirements for the system specify an ability to perform predictive trend analysis, frequency distribution, dispersion, and normal curve can be good tools. You may also need to add specialized tools such as moving averages to show progress or decay over time.
The equation for each of the metrics are relatively and easy to implement. For example, to derive the arithmetic mean of a set of n numbers, a common measure of central tendency, add the numbers in the set and divide the sum by n. Central tendency metrics can be used both in mining information, especially unstructured information from large data sets such as corpora or web pages into graph data stores, and in testing the proximity of learned knowledge once stored in a graph such as a semantic network.
The trick is in applying the metrics to the most meaningful data or knowledge objects. For the arithmetic mean equation, it is critical to define the list of “a” and its size and membership “n” to give the most useful insights.
The intent of looking at these tools is to find those that will be most appropriate for providing the information that experts or other users will need to make better decisions.
Central tendency is another valuable resource for analysis. Using the several measures of central tendency, analysts can determine whether data fall within the range of “normal,” “exceptional” or “anomalous.” Normal results would gravitate to the center or fall close to a large cluster of other results. Exceptional values are those further from the center of mass but within the limits of previously observed data. Anomalous data violates the mold in some way and falls outside the expected range. These are the measures of central tendency:
- MEAN: The mean or average is probably the most reliable and widely used measure of central tendency. It is calculated by adding all cases of a data set and dividing by the number of cases.
- MODE: Mode is another extremely useful figure in that it indicates the “center of mass” – the point on the scale with the maximum frequency distribution.
- MEDIAN: The median is the point on a scale of measurement above which exactly half the cases and below which exactly half the cases occur.
- CENTILE: The centile scale is divided into 100 units or centiles that indicate the rank of the value in relation to the entire set of values.
When data is not grouped, the mode is the measurement that occurs most frequently. If the data is symmetrical, the mean, mode and median will be clustered together. There may not be any actual cases that occupy the median point. Ascertaining the median value is obtained by splitting the difference between the top and bottom values, which can be useful in coarse ranking. The median value is the 50th centile or percentile; the 80th centile lies above 80 percent and below 20 percent of the cases. Centile is most useful in establishing rank.
Frequency distribution describes the range of values, intervals and frequencies of a data set. It is useful in determining average or typical values, amount of variability, magnitude of individual differences, and the general dispersion of values within the range. Scatter diagrams such as the one shown here are useful representations of frequency distribution.
Determining frequency distribution is essential to the resolution of most of the other salient results in statistical analysis. This Section does not address determination of frequency distribution per se, but the results of frequency distribution analyses are an excellent starting point for establishing weights for fuzzy analysis techniques.
One of the ways we can teach an expert system (or a robot such as MIPUS) to learn is to develop frequency distribution models for certain tasks at certain times of the day. If, for example, the robot keeps track of all tasks performed and the times during which they are required, these would serve as a basis for predicting upcoming tasks. With this data, it could review scatter diagrams during idle times and build up a set of expectations for the most likely upcoming tasks.
Sometimes this type of a model can backfire. When, for example, the system predictively begins a time-consuming task when a more important task is about to come due, it may be unavailable for the priority task. This has happened to MIPUS several times when he became enamored with new kitchen appliances.
There are no standard benchmark suites for comparing the capabilities or content of knowledge systems. As you consider how to apply quality assurance or other testing to your knowledge system, look at the range of tools available, including nonparametric methods, and choose those that will give you the most useful benchmarks for describing its capabilities, completeness of knowledge, and accuracy of rules and processes.
|Click below to look in each Understanding Context section
|Perception and Cognition
|Language and Dialog
|Apps and Processes
|The End of Code