Content-Based Searching of Large Image Databases

Mary Lynette Larsgaard

Introduction

It all started so harmlessly. An employee of ESRI (Environmental Systems Research Institute - Arc/Info software) attending a meeting, concerning online access to spatial data, at the Map and Imagery Lab, Davidson Library, University of California at Santa Barbara asked the librarian (me) if it were possible to use an online catalog to find, for example, all the maps the library had that had beaches. I replied, aghast, that that wasn't the way cataloging worked, that for a cataloger to itemize, for each piece cataloged, map or air photo, every generic feature on that item would be impractical because of the time involved. Strangely enough, it was the first time in 27 years of working in map libraries that I'd ever been asked that question.

Time passed. The U.S. Federal Geographic Data Committee sent out the drafts, and eventually the final version, of its Content Standard for Digital Geospatial Data. And the idea appeared again, in the "Preliminary Revised Draft, Content Standards for Spatial Metadata, July 23, 1993", as:

"Geographic Keyword Type -- the geographic type of significant areas and or places that fall within the extent of the data set. Type: text Domain: airport arch area arroyo bar basin bay beach bench bend bridge building canal cape cave cemetery channel church civil cliff crater crossing dam falls flat forest gap geyser glacier gut harbor hospital island isthmus lake lava levee locale mine oilfield park pillar plain populated place range rapids reserve reservoir ridge school sea slope spring stream summit swamp trail tower tunnel valley well woods " [problem: no others allowed!]

But by the time of the "final" standard, this field had disappeared, perhaps on the grounds that such information would or could appear in Section 5, Entity and Attribute data (for data in geographic information systems - GIS). The following paper is a presentation from the point of view of an extremely interested librarian/cybrarian as to how the Alexandria Digital Library is attempting to deal with this matter.

Alexandria Digital Library (ADL)

First, a few words on what Alexandria is and does. The primary goal of the Alexandria Project is to design, implement, and deploy a digital library for spatially-indexed information. A digital library supporting such materials is needed because spatially-indexed information is an extremely valuable resource in many applications but is currently costly to access, or - as a general rule - impossible to access off-site, since materials may be too valuable or too fragile to send through interlibrary loan. Many important collections of such information, such as maps, photographs, atlases, and gazetteers, are currently stored only in non-digital form, and collections of considerable size and diversity are found only in the largest research libraries. Although a growing amount of such information is available in digital form, it is still inaccessible to most individuals. The Alexandria Digital Library (ADL) will provide a framework for putting these collections online, providing search and access to these collections to broad classes of users, and allowing both collections and users to be distributed throughout the Internet.

The development of ADL commenced in late 1994 as part of a national Digital Library Initiative sponsored by NSF, ARPA (Advanced Research Projects Agency), and NASA. The Alexandria Project at the University of California at Santa Barbara (UCSB) is one of six projects supported under the Initiative. These projects are viewed by their sponsors as the cornerstone in a national effort to develop digital libraries. The remaining five projects are located at Carnegie-Mellon University, the University of Illinois at Champaign-Urbana, the University of Michigan, the University of California at Berkeley, and Stanford University. There is significant cooperation among the six projects.

The Alexandria Project is a consortium of several groups. Academic areas from UCSB include: the Map and Imagery Laboratory, the Department of Computer Science, the Department of Electrical and Computer Engineering, the National Center for Geographic Information and Analysis (NCGIA), and the Center for Remote Sensing and Environmental Optics. This team is augmented by researchers from the NCGIA at SUNY at Buffalo and the University of Maine at Orono. Libraries participating in the project include the Library of Congress, the University of California Division of Library Automation, the library of SUNY (Buffalo), the library of the United States Geological Survey, and the St Louis Public library. Other partners include AT&T, Digital Equipment Corporation (DEC), Environmental Systems Research Institute (ESRI), E-Systems, Lockheed, San Diego Supercomputer Center, US Navy, Xerox and Excalibur.

The strategy adopted by the Alexandria Project for achieving its goals is that of an incremental approach to the development of a digital library comprising many nodes distributed over the Internet. Each node will support a variety of library components that include interfaces, catalogs, databases, and ingest facilities. The two major classes of user activity supported by ADL include access to many classes of spatially-indexed materials and the application of procedures that extract useful information from accessed items. Access to ADL for users may, for example, take the form of browsing, viewing, processing, and downloading data and metadata. ADL will incrementally extend the services offered by analog libraries to include services - such as content-based searching of images - that are economically feasible only with the use of digital technology. While the initial focus of ADL is on accessing and processing geographically referenced materials, there will be phased extensions to more general classes of spatially-indexed and textual materials.

During its first six months, the Alexandria Project completed the design and implementation of a successful "rapid prototype'' system (RPS). The RPS is a "stand-alone'' digital library built primarily from three large software packages: the Sybase relational database management software; the Tcl/Tk scripting language and user interface toolkit; and the ArcView GIS. The RPS involves interface, catalog, storage, and ingest components and is running in the Map and Imagery Library at UCSB, which has 5.2 million items, of which approximately 4.7 million are remote-sensing imagery (aerial photographs and Landsat satellite images in the main). The collections supported by the RPS are a small group (ca. 100) of geographically-referenced materials, such as maps, satellite images, digitized aerial photographs, and associated metadata. These collections are focused on Santa Barbara, Ventura, and Los Angeles Counties in southern California.

During its second six months, the Alexandria Project is extending the RPS to a system comprising multiple instances of the ingest, storage, catalog, and user interface components distributed over the Internet. In line with its basic strategy, the second version of ADL will be connected to the World Wide Web (WWW), aiming for the end of 1995. The collections supported by the initial Web version of ADL will include graphical materials involving more general forms of spatially-indexed and referenced materials, such as astronomical images, digitized plans, digitized images of artwork, multi-media, and remote services such as WWW sites. For further information, visit the ADL's Website: http://alexandria.sdc.ucsb.edu/

The Challenge

Categories of Spatial Data

Spatial data is, for the purposes of this presentation, limited to data that has coordinate references; it may be of Earth or of any of the other planets in our solar system, or the universe. We do need a few more definitions of terms. "Images" when it is used on its own in this paper refers to any graphic; "remote-sensing images" refer to spatial data, such as aerial photographs and satellite images. In its most general usage, the term is considered to include medical imaging also, which is not at this point a part of Alexandria's sphere of work. Such data may be hardcopy or digital; the most common forms are: maps; atlases; profiles; sections; views; diagrams; remote-sensing images; globes; and models. Globes and models depict surfaces three-dimensionally, while all the rest depict it in two dimensions. For the purposes of this presentation, it works well to divide them into two groups, maps and remote-sensing images.

Remote-sensing images are photographs (aerial) and images from non-camera sensors, such as scanners. Maps are everything else in the list, since they are in each case selections, abstractions and symbolizations; the cartographer, whether working with analog or digital materials, is first selecting what information is to be shown, summarizing and shrinking it (in spite of Lewis Carroll's statement that the really useful scale is 1:1), and using symbols to represent that information. Remote-sensing images, at least at the time of collection, are very different in that the selection is on the basis of what the scanner (cameras; non-camera sensors, such as scanners) is able to "see" in the electromagnetic spectrum (e.g., visible light only; portions of visible light plus infrared; and so on). Human beings working with remote-sensing images do indeed do selection, along with some abstraction and symbolization, in the process of manipulation. Let's deal with maps first.

Hard-Copy Items

Maps of any pedigree whatsoever - and of any usefulness - are accompanied by a legend, which explains what the symbols (and here color must be understood to be a symbol or a part of a symbol) used on the map are intended to mean. Medium- and large-scale series (especially topographic and geologic series) have separately issued symbols sheets that relate to the series as a whole. The U.S. Geological Survey, for example, has a symbols sheet for its suite of topographic series (mainly 1:250,000, 1:100,000, 1:62,500, 1:24,000) and for its geologic series (primarily 1:24,000, 1:62,500, 1:250,000).

Click : Geology symbols sheet. Topographic symbols sheet, in 5 separate files, one panel per file. Section of 2 USGS topographic sheets.

(1) These symbols sheets that are intended to serve a multitude of scales assume the human ability to understand that a symbol may be a different size on each of several different maps (e.g., rivers of varying widths) but still represents the same feature; in some cases (e.g., the benchmark symbol) may be the same size on each map. Because this paper is to be available over the Web, these items were scanned at 150 dots per inch so that the files would be relatively small; even so, the map is about 5 megabytes.

Aerial photographs also come in many different scales - the Map and Imagery Lab has them from about 1:6,000 to about 1:160,000. Interpreting aerial photographs is by no means a simple matter; geography and geology departments offer classes either on the subject as a whole, or as a part of field-work classes (Avery and Berlin 1992; Sabins 1987). Interpreting the products of non-camera sensors (e.g., Landsat Thematic Mapper; SPOT imagery) is even more complex, with not just courses but degrees focussing on the subject. A major difference from aerial photography is that very often the data is collected in digital form in the first place and it is therefore always most efficient and effective to manipulate it using digital equipment, which brings us to the next point.

Digital Items

Here there are two main categories; either items are initially produced in hardcopy and are changed into digital form, by scanning (raster; data as pixels) or by digitizing (vector; points connected by arcs); or the data is collected in digital form in the first place, e.g., satellite imagery (non-camera sensors) or GIS (geographic information systems). It is important to note here that digital files of spatial data tend to be relatively large. A black and white aerial photograph, nine inches by nine inches, scanned at 600 dots per inch is 26 megabytes, while a color aerial photograph is 98 megabytes; a full SPOT satellite image can be 500 megabytes, and in the near future, EOS (Earth observation Satellite) images will be 1.5 gigabytes each.

Finding Features

For both of these general types - maps; remote-sensing images - visual interpretation, either by a researcher or by a cataloger, is extremely time-consuming, both in terms of training and of time required to extract information. In addition, the human eye, marvelous though this first ever remote sensor is, has a limited ability to differentiate between tones; it is also quite difficult for an interpreter to analyze large numbers of images simultaneously (Bow 1992 pp. 16-17). It is desirable to have computer software that can identify features for users, as to some level spectral signatures do. In a way, this is expert systems revisited, that is, having an expert in the field of e.g. aerial-photograph interpretation communicate object-recognition knowledge to software, and thus to other persons.

There are two ways to develop a database for content-based searching. One way is to have a cataloger garner all generic features on each item catalogued and record latitude and longitude information for each feature. This is very time-intensive and is not practical in the vast majority of cases. The second way is to have computer software process digital files and then search for features. The human eye (and brain) have a limited ability to discern tonal values, and to analyze numerous images simultaneously; it is more qualitative than quantitative. It is thus far better in terms of accuracy and consistency of interpretation to have numerical values manipulated by a computer.This latter method is the focus of this paper.

Raster Data

Once again, we go to a division, based on the division between whether or data are initially collected in digital form. Let us begin with items collected in digital form, and specifically with those items such as non-camera satellite images that are in raster form, as for example the Landsat satellite MSS (MultiSpectral Scanner) and TM (Thematic Mapper). In each case, the sensor senses spectral reflectance in a number of different bands of the electromagnetic spectrum. The idea is that different items on the Earth's surface reflect energy from a source (e.g., the Sun) differently and therefore each will have a unique spectral signature. The researcher first locates on the Earth's surface an area that the researcher knows contains objects of interest to the study in hand; this is called "ground-truthing." It is most accurate when the researcher actually visits a spot in question at the time that the sensor is scanning the area. The researcher then requests that the software search a digital image, looking for exactly the same spectral signature as appears at a given spot on that digital image, and display all occurrences of areas with that spectral signature. The problem with this is that if the bands are relatively few in number, and the bands themselves relatively broad, there will be many items that have the same spectral reflectance as other items. For example, when thus queried, software will show an MSS image of the area of southern California, centered on Santa Barbara County, with kelp not only off-shore, but also up in the hills behind the city of Santa Barbara; it seems that certain hill/pond vegetation has the same MSS spectral signature as does kelp. But the more bands one has, the larger the imagefile will be.

Vector Data

Another way to search for like features is to have a geographical information system (GIS) database that includes a layer which records that feature. For example, a GIS of Santa Barbara County that has DLG (Digital Line Graph) files from the U.S. Geological Survey, or TIGER (U.S. Bureau of the Census) data may be queried for airports, or for roads, or for a number of other features for which there already exist data in digital form that has separated out these features. Indeed, the query that started this paper - show me all the beaches in a given area- is a fairly simple GIS-type enquiry. But if you needed a subdivision that went beyond the way in which the data were collected - if, for example, you wanted to look at all four-lane highways, and the only breakdown was to material (e.g., concrete/asphalt; gravel/dirt) in the original data collection - then you are out of luck. Every feature that is separately identified is tied to a list of attributes, which includes latitude/longitude or some other location device. As long as information has been recorded in numeric attribute fields, you may request high, low, median, mean, and standard deviation values. This is obviously the state of matters where everyone wants to end up! But there is a considerable amount of work and thus expense to do this. Estimates as to percentage of a total cost of a GIS that is spent on collecting data and putting it into digital form are up around 80 percent.

Texture Search

The last method is to have software that will search by texture/pattern, shape, color, or a combination of these, and that will deal with the effects of orientation, scale, and resolution, which is the area where research is being done in the Alexandria Digital Library (ADL). This has been a hot topic apparently for some years in the computer-science world and is becoming more so, especially with such applications as automatic face recognition (for e.g. credit-card verifcation) (Jones 1994). The technique used is in some way analogous to that mentioned above, using spectral signatures. One starts with a "library" of textures or patterns, that is, one has ground truth for whatever objects one wishes to search. One "describes" an image texture to the software; the image features that are like that texture are shown. It seems likely that certain patterns or textures will be easier to find that others; generally speaking, human-made objects with relatively strict or distinctive patterns (e.g., orchards; roads) may be easier for software to identify than will be natural objects, such as trees. Nature seems to abhor not only a vacuum but also straight lines; perhaps the work done with fractals could provide a method of working with identifying natural features (Mandelbrot 1977; Briggs 1992).

Questions immediately arise as to how much of this may be done as pre-processing, and how much at the time of request; one approach would be that the material likely to be most frequently requested should be pre-processed (which is computationally intensive), so that such materials will be relatively quickly searched. It is impossible to anticipate all possible content-based search requests so inevitably there will be slower, on-the-fly searches. Also, what about three-dimensional representations of planets or portions thereof? How does one perform a texture search of a hologram?

This all becomes especially challenging when one is dealing with large databases, and even more so when each file in the database tends to be large, as happens so often with databases composed of spatial data. The ADL will be a pluperfect example of a large database with large files; at the same time it is attempting to be an examplar of how a user-oriented large database works. Thus an important area of research for Alexandria is that of content-based image searching.

The UCSB Department of Electrical and Computing Engineering has several researchers working on this matter, specifically in the use of wavelets; see Manjunath and Ma for a recently issued paper on this topic (2). What they did was to take a volume of texture patterns (Brodatz 1966), run Gabor filters over the digital files of selected patterns, search for the best pattern-retrieval performance, and then apply the technique to working with aerial photographs. They are particularly interested in applying this technique to aerial photographs, at different scales, of the same area, since extracting features at multiple scales and orientations is a key part of their research. A research project will be to take a collection of aerial photographs, grid them, and create from them a texture-shape dictionary/library.

While the default in the world of cartographic materials is to have north at the top, and some symbols (e.g., the symbol for swamps) always appear on a map at the same angle, this is the exception rather than the rule. Another point is that some requests, such as, "Show me all the barns that appear on aerial photographs of Iowa and North Dakota," may run into problems because barns can be of many different shapes, not just rectangular but in some cases - admittedly rarely - circular or octagonal.

The literature of pattern recognition and image search is extensive. Just as examples, two other relatively recent approaches to content-based searching are IBM's QBIC (Query By Image Content) software, and Avian Systems' image-pattern search software, FullPixelSearch. QBIC can index and retrieve images by the content, with database queries able to select for color, texture, shape,and layout, and to query by example. This software's designers see the software's purpose as not to find the one perfect photograph but rather to filter out all the images that definitely are not the ones needed, and thus reduce the number of images that the querier will need to inspect. QBIC is intended to be used in combination with traditional full-text and relational-database searching. One cautionary statement in the very last paragraph is that "There is still no substitute for trained librarians tagging the photos with keywords" (IBM ... 1994). FullPixelSearch, for searching PICT and TIFF images, allows users either to define a search criterion, or to select a portion of an image to be matched (Norr 1995)

There will be a researchers workshop on content-based retrieval on November 8, 1995, at the University of California, Santa Barbara, hosted by the Alexandria Digital Library, at which the chief topics for discussion will be:

a. applications: what are the representative applications that cannot be realized without content-based retrieval?

b. digital-library projects and content-based retrieval: how is this issue being handled by the various digital-library projects?

c. solutions: what does this involve? does content-based search basically mean extracting "metadata" either with human assistance or automatically? or is there something besides creating metadata structures for each application?

d. other data types: are these solutions applicable to other media types (e.g., audio; video)?

e. library applications and content-based retrieval: in this context, does it mean integrating the storage and catalog components? may content-based search be considered to be second-generation coding (knowledge-based coding)

f. role of pattern-recognition techniques: image, speech, and video processing

g. database issues vs. pattern recognition issues

h. enabling applications going beyond the functionality of a "traditional" library, e.g., query by example; data-mining; creating personalized catalogs, and so on.

We plan to have a wavelet demonstration up on the Alexandria homepage by the end of this year.

Summary/Conclusion

Will it be possible to have a gigantic texture/shape library that includes terms associated with textures/shapes? Thus when a user queries the system for areas with sand beaches on the central California coast, the user would get an image that satisfies that query, with beaches marked, and ties to images that show specific beach areas in detail. Keep tuned to the Alexandria Web page, and find out!

Footnotes

(1) In order to present the University of Kentucky on one sheet, I was forced to cut up two U.S. Geological Survey topographic quadrangles; Murphy's Law of Map Use postulates that any area needed will most often be at the intersection of four sheets so this is better than one would have expected. Map Link of Santa Barbara generously provided me with two kamikaze sheets to use for this purpose.

(2) Wavelet image compression is a technique to deal with "file obesity," " "that describes data in terms of its frequency, energy, and timing, and 'remembers' certain attributes of n image" ("Wavelet-compressed ... 1995, p. 31). Alexandria is planning on using wavelet compression to allow for progressive transmission of image data; thus a user would first request a thumbnail of an image, then a browse file, and finally the actual item, with data being added in increments instead of the image being completely rewritten each time, and with the image stored only once, instead of storing, for each image, a thumbnail, a browse file, and the full file.

Bibliography

Alexandrov, A.D.; Ma, W. Y.; El Abbadi, A.; and Manjunath, B. S. 1995. Adaptive filtering and indexing for image databases. Proceddings of SPIE on Storage and Retrieval of Image and Video Databases - III, pp. 12-23, San Jose CA, February 1995.

Avery, Thomas Eugene; and Berlin, Graydon Lennis. 1992. Fundamentals of remote sensing and airphoto interpretation. 5th ed. New York: Macmillan.

Briggs, John. 1992. Fractals, the patterns of chaos: a new aesthetic of art, science, and nature. New York: Simon & Schuster.

Brodatz, P. 1966. Textures: a photographic album for artists & designers. New York: Dover.

Chang, T.; and Kuo, C.-C. Jay. 1993. Texture analysis and classification with tree-structured wavelet transform. IEEE Trans. Image Processing 2(4):429-41, October.

Handbook of pattern recognition and computer vision. 1992. River Edge: World Scientific Publishing Company.

IBM unleashes QBIC image-content search; queries images by color, texture, shape. 1994. Seybold report on desktop[ publicshing 9(1):34-37, September 12.

Jones, Jennifer. 1994. The new face of recognition technology. Federal computer week 8(18):1, July 11.

Ma, W. Y.; and Manjunath, B. S. 1995. A comparison of wavelet features for texture annotation. To be presented at IEEE International Conference on Image Processing '95, Washington, D.C. October 1995.

Mandelbrot, Benoit B. 1977. Fractals: form, chance, and dimension. Sna Francisco: W.H. Freeman.

Manjunath, B. S.; and Ma, W. Y. 1995. Texture features for browsing and retrieval of image data. Santa Barbara: University of California, Center for Information Processing Research. (CIPR TR-95-06)

Manjunath, B. S.; Chekhar, C.; and Chellappa, R. A new approach to image feature detection with applications. To appear in Pattern Recognition Journal.

Manual of remote sensing. 1983. Falls Church VA: American Society of Photogrammetry. - this publication is now being issued on CD-ROM; there is a 1995 edition which the author of this paper has not yet seen

Niblack, W.; et al. 1993. The QBIC project: querying images by content using color, texture, and shape. SPIE 1908:173-81, February.

Norr, Henry. 1995. FullPixelSearch locates patterns in image data. macWEEK 9(8):6, February 20.

Picard, R. W.; and Kabir, T. 1993. Finding similar patterns in large image databases. Proc. IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP '93), Minneapolis 5:161-64, April.

Sabins, Floyd F. 1987. Remote sensing, principles and interpretation. 2d ed. New York: W.H. Freeman.

Smith, J. R.; and Chang, S. F. 1994. Transform features for texture classification and discrimination in large image databases. IEEE International Conference on Image Processing (ICIP '95), Austin TX. Proceedings 3:407-11, October.

United States. Geological Survey. [195-?] Geologic map symbols commonly used in maps of the United States Geological Survey. [Reston, VA: Geological Survey].

____. 1987a. Lexington East, Ky. 1987 ed. Reston, VA: Geological Survey. 1:24,000.

____. 1987b. Lexington West, Ky. 1987 ed. Reston, VA: Geological Survey. 1:24,000.

____. [199-?] Topographic map symbols. [Reston, VA]: Geological Survey.

"Wavelet-compressed, ultra-tiny files.' 1995. PC magazine, May 16, 1995, p.31.