Statement of Competency E
In which I discuss design, query and evaluation of information retrieval systems.
Paper Mulberry (Broussonetia papyrifera)
Hanshi, being made of kozo, where very durable and had no fear of sticking together when soaked in water. Therefore, in case of fire, books made of hanshi could be thrown into a well, which when taken out after the fire and dried would not only remain intact, but separated sheet by sheet.
Narita, K. (1980). A Life of Ts'ai Lung and Japanese Paper-Making. Tokyo: The Paper Museum, p. 28.
Linocut by Vlasta Radan, 2002.
Before enrolling in the SLIS program, I took an introductory MS Access class in the Riverside Community College. It proved to be one of those choices that show their real value only later down the road. I must admit that I had a hard time grasping the concept of relational databases, primary keys and queries. My mind was conditioned by Excel spreadsheets and it took some effort before I was able to absorb the new concept. Eventually, towards the end of the class, the pieces finally fell together and I was able to create a simple database and get it to do something. Admittedly, I did not particularly enjoy the experience although I did learn a lot. Secretly, I was hoping that I will never have a need to deal with Access or a database ever again.
However, librarianship is all about satisfying informational needs, what is really the definition of information retrieval. And information retrieval is all about databases. And because of the fortuitous decision to take the Access class, my experience of the Information Retrieval class was of excited wonder rather than total bewilderment. Once, I understood the basics of ways to structure a query, all the information about the precision and recall, classification and all other tools of information professionals made perfect sense. Even more, when our group needed to create the simple object database for a class assignment, I volunteered to do it.
The technical knowledge of Access software was even more handy when I was asked to enter the processing information about the Anne McCaffrey Papers in an Access database. However, as it could be seen from the description of the development of the Anne McCaffrey Papers database, my EVIDENCE 1 to this competency, the technical knowledge about the software is only a minor element in designing the database.
Before even starting to build a database, it is necessary to understand a collection that will be inputted. Any system of structuring of information in a database needs to represent the information contained in the collection. The organization will imply the relationship between the information, which will ultimately help to direct the search. In the case of the McCaffrey Papers, the classification of the materials was related to the book, and books were related together in bibliographical series. The information in the Anne McCaffrey Paper database was not indexed, but Access allows for a key-word search inside the text fields and performing the queries according to different criteria.
Although natural language searches of full-text are very popular with users, however, like in cases where a single word have number of possible meanings, indexing done by humans give much better search results. If looking for a recipe for salsa sauce, the Google search would, in addition to references to food, bring a number of hits about Latin music and dance. To improve the precision of results and retrieve only information about a salsa sauce, the query needs to precisely define the scope of our search and exclude all other options.
The option is to use Boolean operators AND or NOT. We can run through a Google query “salsa and sauce” or “salsa and food.” However that requires fair idea what is what we want. In one science fiction story, people find the computer that contains all the knowledge of the universe. However, when humans tried to ask the age old questions like “Why are we here?”, they find out that in order to ask right question – one that computer can answer – one needs to know half of the answer. Therefore to be able to ask the right question about salsa, one need to know that there dance, music and sauce with the same name. Key-word searches require a fair amount of knowledge about the subject of search in order to perform a successful search.
One of the ways to help the user to perform a successful search is to pre-order the data in the database by indexing it. This description of aboutness and relationships could be a string of words like Library of Congress subject headings or single words like tags used on the number of social information sharing web sites (Flickr, Delicio.us). I had an opportunity to explore issues and pitfalls of indexing during the assignment in indexing and vocabulary design in Information Retrieval class, my EVIDENCE 2 of this competency.
As a team, we needed to create a database where records would be indexed using pre-coordinated (pre-co) and post coordinated (post-co) index terms. From the pool of suggested literature, the group selected twenty articles with a focus on systems design and end-user research. We developed lists of pre-coordinate and post-coordinate controlled vocabulary that best described what the articles were about. Using DB/TextWorks database software, the group created a database where author, title, citation and abstract text fields were key-word searchable, and two fields with drop-down list of pre-coordinated and post-coordinated terms.
To test the quality of our indexing we performed a series of searches and analyzed the results. The results had no real scientific quality – we knew exactly what information is in the database, so our queries were inevitably biased. However, our queries pointed to the superiority of human indexing over the natural language in bringing the most relevant records.
On the Internet, this problem is addressed through the metadata field in the heading of the HTML documents, where web page creators insert index words or short description of what the web page is about. Most of the search engines use this information in addition to the key-word search to improve the precision of queries. However, as the relevance of the information is influenced by a number of factors, the opinion of end-users not the least important one, social tagging become very popular. Using the Internet software, users assign tags, i.e., index words to information, which are than pulled together in various ways, clouds of tags being one of most familiar.
As all libraries should follow the same cataloging rules, and counting that most libraries are members of the OCLC which provides cataloging services, one would expect one should be able to sort and search through library collections in the same way. However, that is not so. Even if we assume that the same type of data is inputted into library catalogs, as we can see from my EVIDENCE 3 to this competency, catalog interfaces, search engines and types of information that they retrieve are customized to the type of the library and informational needs of their patrons.
The paper analyses OPACs of the Riverside County Public Library System and Riverside Community College District (RCCD) library catalog LAMP (Library Access to Monographs and Periodicals), and compares their pros and cons in terms of interface, search options and display of retrieved information. Using basic and then power search, the same query was performed on each database. Both catalogs – of Riverside County Public Library System and RCCD LAMP were quite intuitive and easy to use. It was interesting that public library catalog offered more advanced searching options than one in academic library (albeit community college).
The County OPAC also displays records enhanced with various information and options, which are not necessarily part of the bibliographical entry but could be useful for the general public. With sidebars, suggesting various subject or classification categories catalog provides for numerous possible roads to the discovery. In contrast, the LAMP catalog is concentrated on straightforward bibliographic information. Although the lack of advanced (Boolean) search is rather odd, the catalog follows traditional university display of hits, subject headings, and records. The introductory home page explains every function in the catalog, various educative links, and links to the subscription databases and some external links. The OPAC itself is reserved exclusively for searches inside RCCD collections.
However, any evaluating of the information retrieval system could be very tricky, because it is quite difficult to separate usability of the interface from the quality of underlying data.
This web site was developed to satisfy the graduation requirements for
the School for Library and Information Science at San Jose State University California
Text, design, and digital imaging by Vlasta Radan