main logo
Subject: Re: Indexing by words
Author: Scott Lawton
Posted: 2001/05/31 18:53:45
 
View Entire Thread
New Search


>That is in fact what I'm doing. But it is still not ideal. Using this
>strategy, searching for
>
>Smith-Watson
>
>will find records containing the words
>
>Smith AND Watson
>
>(not necessarily together)

I've never found a product (SQL or otherwise) that provides the mix of fast "full text" and "exact" searching that I need. Meanwhile, here are 3 ways to handle the above with Valentina.

1. in the user's query, convert the dashed-text (to coin a phrase) to a quoted string, i.e.
Smith-Watson
becomes
"Smith Watson"

(At least, I assume that Valentina supports this; I can't remember if I've tried it.)


2. create 2 copies of the field, one that has dash replaced with space (so that Watson will match), the other that has dash replaced with nothing (therefore Smith-Watson becomes SmithWatson, and the above match works). Create a method that concatenates t
he two fields; search this method.


3. if the above can't all be done with methods and you want to avoid doubling the length of each field (and you have a decent programming language available), just replace every word that contains a dash with two copies, e.g. Smith-Watson becomes
Smith Watson SmithWatson

then index that. (I also convert to lower case so that my searches are case-insensitive.)

Here's roughly how to do that with Text Machine (a commercial product for the Mac; callable from anything that can send arbitrary AppleEvents):
replace all "[(1+ letter)1, dash, (1+ letter)2]"
with "[group1, space, group2, space, group1, group2]"

where "1+" can be read as "one or more". A standard regex would be something like
replace all "([A-Za-z]+)-([A-Za-z]+)"
with "1 2 12"

Of course you may need to make the pattern fancier, e.g. allow digits in the "word" and/or support puctuation other than dash.

You could use similar patterns (with a single group) for solution #1.

Scott

P.S. The usual URL for Text Machine is http://www.prefab.com/ but the DNS server is being moved so you may only be able to reach it at http://www.prefabsoftware.com/



-------------------------------------------------------------
The Valentina mailing list is brought to you by MacServe.net
For info on lists services, see http://www.macserve.net/lists.html
 
©2001 Scott Lawton
<-- Prior Message New Search Next Message -->