Osku Salerma. Design of a Full Text Search index for a database management system. Master's thesis, University of Helsinki, January 2006.
Abstract: Full Text Search (FTS) is a term used to refer to technologies that allow efficient retrieval of relevant documents matching a given search query. Going through each document in a collection and determining if it matches the search query does not scale to large collection sizes, so more efficient methods are needed.
We start by describing the technologies used in FTS implementations, concentrating specifically on inverted index techniques. Then we conduct a survey of six existing FTS implementations, of which three are embedded in database management systems and three are independent systems. Finally, we present our design for how to add FTS index support to the InnoDB database management system. The main difference compared to existing systems is the addition of a memory buffer that caches changes to the index before flushing them to disk, which gives us such benefits as real-time dynamic updates and less fragmentation in on-disk data structures.
- Thesis, formatted for 2-sided printing (PDF, 357 KB)
- Thesis, formatted for 1-sided printing (PDF, 358 KB)
July 19, 2013: I was alerted to the fact that my thesis has been blatantly plagiarized in a published journal paper. Though plagiarists are by definition lazy and incompetent, these two really hit their marks: they didn't even bother changing the sections where I used my name as an example, making it trivial for a suspicious reader of their paper to find my thesis and alert me.
The journal is International Journal of Computer Science Issues, published in that hotbed of scientific research, Mauritius (I had to look it up myself. A hint: east of Madagascar). It was published in July 2011, Volume 8, Issue 4. Here's a link to the paper: http://www.ijcsi.org/papers/IJCSI-8-4-1-384-392.pdf.
The two authors are Ajit Kumar Mahapatra and Sitanath Biswas, both from Siksha ‘O’ Anusandhan University.
I have contacted the journal, the university (the dean of the university, dean of the department, and the two full professors in their department) but have received no responses at all. I am thus making the issue a matter of public record since obviously these institutions do not care about such trivial things as ensuring published research is actually original.