Google
 

Monday, November 12, 2007

MS SQL Server 2005 full-text indexing

Instead of buying Visual Developer Studio 2005 and SQL Server, I made my database-driven website using the free versions: Visual Web Developer Express and SQL Server Express. However, I still wanted to be able to search my database of product reviews using a full-text search, because I was sure that Microsoft had put a lot of work into developing this, and so it must be infinitely better than any crummy search mechanism I could cook up myself. Here's how I did it, and what I found...

Getting started with full-text search
Make sure that you download the version of SQL Server Express with the additional full-text search capabilities. This is the larger one of the two downloads possible.
Read up a bit on Full Text Search concepts.
Making the CatalogNow you need to create a full-text catalog. The catalog seems to be a container to put all your full-text indexes into. So, if you are going to need to search the text of three tables, you make three indexes (one for each table) and put all three indexes in the one catalog.WARNING: Some of the information of the Microsoft website relates to the full version of SQL Server, such as the "Storage folder" that Express doesn't have. Don't panic, you just need to create the catalog by hand, rather than through the Management Interface.For my application, I just made one catalog and put two of my indexes into it. From what I understand, if your indexes are going to be huge, you should make a separate catalog for each index. Here is how to make a catalog:

Start SQL Server Management Studio Express, and connect to your database.
Click on New Query to get a box where you can type raw SQL.
Type:use mydatabasegoEXEC sp_fulltext_database 'enable'goCREATE FULLTEXT CATALOG mycatalog go
Click "Execute" to run the SQL. This will enable fulltext searching on the database, and create the catalog, which is in fact a folder on your computer called "mycatalog" somewhere within the Microsoft SQL Server folder. If your catalog is going to be HUGE, it may be worth researching some extra parameters for this command to tell it where to put the catalog, e.g. on a different disk to the database itself. This would be for performance improvement more than anything else.

Making the Index
Next we are going to create a full-text index. But if you want to create a full-text index on a table, the table must have a "unique, single-column, non-nullable index". In most cases this will be your primary key. What Microsoft is saying here is that you must have a non-null index on the table (all primary keys automatically are this), and that the index must be on just one column. Oh Drat! My table has a primary key that is on two columns, so I have to make an index that is on a single unique column in order to use full-text searching. What to do, what to do? Well, my solution was to add a new column to the table called "text_id". I made this an int column, and also made it an identity column. This way, the text_id field gets automatically filled with a unique integer for each record that is entered, without me needing to change any of my existing aspx pages.Also, if you already have a primary key, but it is on a really long field, I suggest you make an int field and use that instead, because the full-text index that gets created will be huge if the primary key you use is big, such as a GUID.
Ok, here is how to create the full-text index:
If you need to add an identity column, do this first.
Make sure you have an index on the identity column. I called mine "myindex".
Type in and execute this SQL:CREATE FULLTEXT INDEX ON mydatabase.dbo.mytable(column_to_index Language 0X0)KEY INDEX myindex ON mycatalogWITH CHANGE_TRACKING AUTOWhere mytable is the name of the table, column_to_index is the name of the column that is full of text you want to be able to search on, myindex is the name of the index (could be PK_something if you used your primary key), and mycatalog is the name of the catalog you created earlier. The "Language" bit just tells SQL Server to not treat this as being in any specific language. I don't know how to specify another language here, but you can change it later by using the SQL Server Manager interface. I didn't notice any difference between a UK English index and a language-neutral index. I suspect it is only important for languages other than English.
I did discover that sometimes when you try to make the index, it doesn't get fully made until you do a little bit through the user interface. To verify that things have worked, start the SQL Server Manager interface, right-click on the table name and choose "Modify". Right-click on the column and choose "Full text index...". In here you need to check that "Columns" is set to the column name you want to index. Also check whether "Active" is Yes — if not, it means the index isn't built yet.
You may have to wait a while for the index to be created. If you have a large table, the CPU usage on your PC will probably go up to 100% while this happens. If you get problems, see the troubleshooting section below...
Performing a FindThe actual SELECT statement to find records is then very easy. Here is an example:SELECT descriptionFROM mytableWHERE CONTAINS(description, '"shark attack"') (You need to put the search terms in double-quotes, within the single-quotes needed for the SQL)
Eliminating Noise WordsIf your text fields are in HTML, then you should add things lik "BR" and "P" to the list of noise words.
What I discovered
At first, full-text search queries were taking about 30 seconds to return results. This was pretty disappointing, and I started looking for ways to tune the system up. But while I was looking, the whole thing sorted itself out, and suddenly queries became almost instant. It is very impressive, I can search for a phrase like "african drumming" on my 20,000 record database almost instantly. I don't know what caused this delay then speed-up. Presumably the SQL Server was building the indexes, and it took a while. I'm not sure how come I could do queries while the index was being built though. It's a puzzle.
Also, I discovered that searching for "drum* lesson*" returns better results than searching for "drum lesson", because the first one will also find "drumming lessons". So, I made it so that queries that people enter automatically have a * put after each word.
A Bit of Troubleshooting
If it doesn't work, here are some of the problems I encountered, and the ways of debugging and solving the problems:
Application Events
Right Click on My Computer, choose Manage, then Event Viewer, then Application. This shows you the Application Event Log. In here you will see any error messages from the Full Text service. Usually one of these errors will just redirect you to the server logs themselves.
SQL Server Logfile & Fulltext Index Log-file
Have a look in the logs. I found mine in C:\Program Files\Microsoft SQL Server\MSSQL.1\MSSQL\LOG. The fulltext logs are called SQLFTxxxxxx.LOG. They are hard to understand, but they may help.
Try ReBooting
You need to reboot after you install FullText Indexing, and a reboot is always a good place to start when troubleshooting. I had the error message "Error '0x80040e09' occurred during full-text index population" appearing in my logfiles, and no fulltext indexes created at all until I figured out that I needed to reboot.
Commands & Queries that you can type into SQL Server 2005:
How to check the status of the full-text indexes in a catalog:
USE mydatabase;GOEXEC sp_help_fulltext_tables mycatalog;GO
Enable fulltext search on a table:
USE mydatabase;GOALTER FULLTEXT INDEX ON mytable ENABLE;GO
Check if full text indexing is enabled:
USE mydatabase;GOSELECT DATABASEPROPERTY('mydatabase', 'IsFullTextEnabled');GO
Update
I've been looking at this a bit more recently, and worked out a few things:
You can have more than one fulltext indexed column per table. Old SQL Server versions prevented this, but 2005 allows it. So, you can have a table with a title and a summary column, and both can be full-text indexed.
The language setting is important, despite what I wrote above. I now use the code '2057' to set my indexes to be in UK English. This means that searches on "dogs" will also find "dog".
You can find the status of your fulltext indexes like this:exec sp_help_fulltext_columns mytableThis shows you which columns in a table are set up for full-text searching.select * from sys.fulltext_indexesThis shows you the status of the current indexes and when they were last updated. I think the key is to make sure that change_tracking_state_desc is set to "AUTO" and that has_crawl_completed is "1". You can also see the time of the last crawl, i.e. when the index was last updated, which can help you see why things aren't working.
Here is my example .sql statement, that sets up three columns of a table for fulltext indexing, and sorts out the change tracking state, and starts a full crawl:
use contentexec sp_fulltext_database 'enable'goexec sp_fulltext_catalog 'cmscatalog', 'create'goexec sp_fulltext_table 'document', 'create', 'cmscatalog', 'PK_document'goexec sp_fulltext_column 'document', 'title', 'add', '2057'exec sp_fulltext_column 'document', 'keywords', 'add', '2057'exec sp_fulltext_column 'document', 'search_text', 'add', '2057'goexec sp_fulltext_table 'document', 'activate'goEXEC sp_fulltext_table 'document', 'Start_background_updateindex';goexec sp_fulltext_table 'document', 'start_full'go
Conclusion
Full Text Searching is easy to set up and use in SQL Server Express. When it works, it works fast!