09 Jun In Search of Depth and Breadth
Depth-First Search
Search is one of many functions needed in intelligent systems. Web search in systems like Google, Bing, Yahoo and Alta Vista use complex algorithms to help you find the information you want based on words. Words are patterns of letters strung together in a unique way. Searching is a kind of pattern matching. An easy search to implement is the depth-first search. In depth-first searches, you traverse nodes down until you find a match or hit the end. If you hit the end without a match, you go back up until you can go down again to a node you have not yet visited. This video from Nathaniel Fan demonstrates depth-first search for a graph (such as s semantic network).
The search space in these tutorials and examples is so small that, no matter how you search it, you’ll get an answer quickly. Imagine a search space with 5 million nodes instead of 9 or 18. You could speed up the search by limiting the depth to which the search would go in each pass to an arbitrary number of generations (say, great, great, great grandchildren). Then, if the search failed to reach a goal in the first pass, it would search the next five generations. We’ll call the technique of limiting the number of generations to search in a single pass depth-limited search. In a small search space, a depth-first search takes may take fewer jumps, while a depth-limited search, limited to very few generations may take more. In very large spaces, the calculus may be the opposite, especially in a graph optimized for conceptual proximity.
Understanding Context Cross-Reference |
---|
Click on these Links to other posts and glossary/bibliography references |
|
|
Prior Post | Next Post |
Survival of the Fittest Knowledge | |
Definitions |
References |
Go Gate IIT on DFS | |
Nathaniel Fan on BFS | |
Winston 1984 |
Breadth-first searches keep track of each generation, eliminating the need to retrace steps through parents and grandparents (see Bill Byrne’s video on Search Techniques). The value of breadth-first searches becomes extremely apparent in larger (or deeper) search spaces. In our example of the 5 million node network, however, the relative efficiency of a depth-first versus a breadth-first search would be difficult to predict without knowing the structure of the data or the relative frequency of specific searches or goals.
Optimizing Search
Now, consider what would happen if you knew some unique characteristic of the search space, such as the relative frequency of each goal. In many situations, such as spell checking, you can use simple measurements to determine an optimal ordering. The words the, an, some and in appear more frequently in English-language documents than words like starcruiser and fishmonger. Many of the techniques used to optimize search algorithms are described as heuristic searches.
Among search techniques well-suited for heuristic searches are hill climbing, beam search and best first. Hill climbing traverses nodes based on the relative distance between nodes. This might be a good approach for the traveling salesman problem. Beam search limits the search to an area of the network or tree known to be most likely to contain the goal. Best-first search is useful when groupings of nodes have similar characteristics because it automatically gravitates toward the goal of the search.
Some search techniques, such as hill climbing, organize the data into a search space that resembles a mountain range with peaks and valleys. Reaching the goal in these models means getting to the top of the highest peak. Recognizing the highest peak can sometimes be a challenge, as shown at left. For a good overview of these and other useful AI search techniques, see Winston, 1984.
In our attempt to optimize search mechanisms, it is important that we avoid some of the minefields that can make a search fail to find the right or the best match. One of those problems is the pitfall of local minima. If we are trying to get to the peak, and we get to a high place and start down again, the clouds may deceive us into mistakenly thinking we were at the peak. Remember, computers always operate in the clouds, unable to see what’s ahead.
The saddle between peaks in the image above could represent a local minimum that could trap less robust search mechanisms looking for the lowest point and finding the terrain is rising again, thus incorrectly assuming that they have hit bottom.
As I proceed through this analysis, I’d like to show how to select a search technique based on the characteristics of the knowledge we are searching.
Click below to look in each Understanding Context section |
---|