Home > Uncategorized > Full text searching in MSDE (without MSSEARCH)

Full text searching in MSDE (without MSSEARCH)

One of the drawbacks of MSDE is that Full Text Searching (FTS) us unavailable. However, you can put together your own version of FTS with a stored procedure and a few tables.

The benifit of FTS is that it can match queries against english text without great difficullty. Say we had some textual data "The cat sat on the mat" and we wanted to make a select statement for text containing the word "cat" we would use: select from sometable where text like ‘% cat %’. This has two knock on effects – (a) text "like" comparisons are relatively slow. and (b) since we have no idea how many records the query will return, we cannot use select top to speed up the query. Furthermore, if we searched for a collection of words such as "cat", "mat" and "floor", the above text should still come with a match, albeit with a lower degree of certainty, since the word "floor" is not in the data.

Instead, if we create a table conatining all the words in our table, joined on another table containing a link to each word, and a link to the document’s primary key. we can then do queries such as

select ID into #phraseIDs from phrases where phrase = ‘ cat’
select DocumentID into #DocIDs from links L join #phraseIDs P on P.ID = L.ID
select * from documents d join #DocIDs d2 on d.ID = D2.DocumentID

Although this looks more complex, since we are only using exact string comparison and joins on integers (assuming all the relevant columns are properly indexed). The query should work out to execute quicker.

The trick to doing this starts with a nice user defined function called fn_Split (http://www.windowsitpro.com/Windows/Articles/ArticleID/21071/pg/2/2.html)

Then I created two temporary tables in a seperate database

create database dinfoFTS
go
create table dinfoFTS..Phrases
( id int identity(1,1) not null,
  Phrase varchar(1000)
)
go
create table dinfoFTS..Links
( id int identity(1,1) not null,
  PhraseID int,
  DocumentID int
)
go

I then created a stored procedure to populate these tables, making use of fn_split

alter procedure CreateFTS as
SET NOCOUNT on
delete from  dinfoFTS..Phrases
delete from dinfoFTS..Links
create table #phrases (phrase varchar(1000),minID int)
DECLARE @name varchar(1000)
DECLARE @id int
DECLARE @lastInserted int
DECLARE companies_cursor CURSOR FOR
SELECT name + ‘ , ‘ + address,id
FROM companies
OPEN companies_cursor
FETCH NEXT FROM companies_cursor INTO @name,@id
WHILE @@FETCH_STATUS = 0
BEGIN
 print @id 
 — parse the company name and address
 insert into #phrases (phrase)
 select value from dbo.fn_Split(@name,’ ‘)
  
 — set the minID for any duplicates
 update #phrases
 set #phrases.minID = dinfoFTS..Phrases.id
 from dinfoFTS..Phrases 
 where dinfoFTS..Phrases.Phrase = #phrases.phrase
 
 set @lastInserted = scope_identity()

 — insert any non-duplicates
 insert into dinfoFTS..Phrases (phrase)
 select phrase from #phrases where minID is null
 
 — set the minID for recently inserted phrases
 update #phrases
 set #phrases.minID = dinfoFTS..Phrases.id
 from dinfoFTS..Phrases  
 where dinfoFTS..Phrases.Phrase = #phrases.phrase
 and #phrases.minID is null
 and dinfoFTS..Phrases.id>@lastInserted
 

 — Then insert #phrases into the dinfoFTS..Links table
 insert into dinfoFTS..Links (PhraseID,DocumentID)
 select minID,@id from #phrases
 
 — clear out #phrases
 delete from #phrases
 FETCH NEXT FROM companies_cursor INTO @name,@id
END
CLOSE companies_cursor
DEALLOCATE companies_cursor
drop table #phrases

As a benchmark it took 20 seconds to read 1000 rows, and I’m now running it on a 11M row database!.

If anybody has any optimization hints, just post a comment.

Advertisements
Categories: Uncategorized
  1. No comments yet.
  1. No trackbacks yet.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: