Use of Images in Understanding of Documents in Cross-Language Information Retrieval
The introduction of the research paper clearly gives the solution for Cross-Language Information Retrieval and that being used for image in understanding foreign languages.
The author goes on to say that a document can be represented using series of images that has been drawn from significant terms in the document itself and therefore, because of this the document can be understood quiet simply as a whole or partly.
The research clearly gives the introduction to CLIR.The researcher says that if the above mentioned technique works then there would be no requirement for, Translation as these images can be used for multi-lingual representation.
Reduced dependency on lexicons.No need for maintenance.
No need for human translation. No need for computer based translation.
The technique would use images that are available on the internet. The researcher then tries to derive sub-sets of images of languages. The aim of the paper is to see how images can be used in document understanding, so that all the above advantages can be benefited from. The paper is a generalised research looking into the following areas
Whether search terms and images are similar in meaning. Theory development what the subject understand from the images. Images for language sub-sets. Research into the uses involved. Research into the search categories of words and images returned.
The research context takes the reader through the entire cycle of CLIR, how the research started and how it has evolved over the period of time. CLIR itself is described, defined and explained in different ways so that the reader can understand the depth of it.
Documents are available in different languages and that requires the computer user to have at least a minimum understanding of the language to comprehend it. Document representation has not been that effective keeping in mind documents that far technical or that needs a higher level of understanding. CLIR is used in
A multi-language search using only one query language. Searchers understand the document but are not efficient enough to query in the same language.
A person who does not understand English can retrieve documents in English by a query in their own language or a language they understand.
All the above points are reflected in research done by Grefenstette (1998a), Oard (2001), Sanderson and Clough (2002), Pirkola et al (2001), Scott McCarley and Roukos (1998).
According to Rosch et al (1976) object categorisation is done with reference to a ‘basic level’ categorisation. The basic requirement for CLIR is the World Wide Web (Scott McCarley and Roukos (1998), Ballesteros and Croft (1998a) and Grefenstette (1998a)) and available on-line documentations.
Some of the approaches of CLIR are Document Translation, Query Translation (Dorr (1996), Resnik (1997), Hull (1998) and Fluhr et al (1998), Ballesteros and Croft (1998a)), Parallel Corpora (Scott McCarley and Roukos (1998)), Latent Semantic Indexing (Dumais et al’s (1996)). The researcher has very effectively explained the different approaches to the CLIR explaining the methods adopted from the very beginning.
The advantages and the disadvantages are clearly explained using references to Oard (1998), Scott McCarley and Roukos (1999). The enormity of pages (Google (2003)) makes indexing of documents in foreign languages very difficult to translate. CLIR with images stated off with Sanderson and Clough (2002) research requires no form of gisting to judge the accuracy of the returned item because a correlation is got between the retrieved image and the searched text.
The only area that the researcher does not explain is the kind of difference in subject, styles and types of recovery. So it is vague in understanding the possible errors or misinterpretation that can arise if these points are taken into account.
Machine translation types (Hutchins and Somers (1992) and Somers (2003)) have been explained; direct, transfer and interlingua along with the limitations (Leech et al (1989)) have also been explained. Limitations being in the area of speed ((Somers 2003) and (www.speechtechnology.com (2003)), ambiguity (O’Grady et al (1996:270), (Hutchins and Somers (1992)).
Context and Real World Knowledge (Somers (2003)), Problems with Lexicons (Reeder and Loehr (1998)), Not Translated Words (Reeder and Loehr (1998)), Unknown Proper Nouns (Ballesteros and Croft (1998a)), Compound Words (Hutchins and Somers (1992), Sheridan and Ballerini (1998)), New Words ((O’Grady (1997)), Document Context (Somers (2003)), Minority Languages (Somers (2003)), Babelfish (Hutchins and Somers (1992)) and Sub Languages (Somers (2003)) are all well explained with examples.