History of SearchPackage

Differences from version 9 to 12



@@ -1,34 +1,23 @@

 {maketoc}
-This package makes content on your site searchable. Currently, wiki pages, blog headers and entries, articles and comments.
-
 !Overview
-!!Random Indexing
-The indexing engine in Bitweaver would, every so-many clicks, would pick a random piece of content and index it. (This number of clicks is the "Search Refresh Rate" in administration->search->search settings). This tactic, I presume, is designed to limit index table additions, and prevent the overloading of the server or database engine.
-
-The issue with "random" indexing is that when you submit an article, or wiki page - that page wouldn't necessarily ever get in the index. It is based on chance. If your site had enough traffic, the page would eventually get picked up. For smaller sites, however, this technique fails to deliver expected results - namely, submitted contet should get indexed and be findable using search.
+This package makes all liberty content on your site searchable. Currently this includes articles, wiki pages, blog headers, blog posts and comments. Search is a service and automatically indexes content when the content is saved. The exception is articles which are only indexed after they are approved.
 
-!!Index On Submit
-There is now In CVS (after the 1.2.1 release) an additional indexing option called "Index On Submit". This causes content to be indexed immediately after it is created (except articles that require approval. That content is indexed only after approval).
+Developers: check SearchPackageDevNotes
 
-!Search Options
-!!Full Text Search
-This switch isn't used and has been removed
+!Search Settings
+Administration -> Search -> Search Settings
 !!Search Statistics
 Record searches made and their frequency. Checking this makes "search statistics" appear on the search admin menu, and gives you access to the statistics page.
-!!Index On Submit
-Described above - forces the index words for the content you are creating or editing to be created or refreshed immediately upon submit.
-!!Search Refresh Rate
-This applies only if you do __not__ have Index On Submit checked. This setting is the number of page reads between a random content piece is chosen and reindexed. There is no control over which content - it is random.
 !!Minimum number of letters in search word
 Index words that are this number of letters or longer only. I set mine to 4, then words like "the" aren't indexed.
 !!Maximum number of words
-This applies to partial word searches. It is the maximum number of words containing a partial word that can be serached for in any one search.
+This applies to partial word searches. It is the maximum number of words containing a "partial word" that can be used in any one search. So, if you type in "three" - search will pick up three and twentythree. That is two - the limit is 100 by default.
 !!Age in hours of Search cache
 Again - applies to partial word searches. It defines the maximum age of cached search results for any given partial word. The results cache will be used to provide a search result if it is available, and will be cleared after either the age, or when the results cache reaches it's limit (LRU List Length)
 !!LRU Purging Rate
 Applies to partial word searches. LRU = Least Recently Used. Purges the oldest cache entries if there are more than the LRU List length entries every "rate" pages. This will keep space available in the cache for new seach results
 !!LRU List Length
-Applies to partial word searches. Limit the results cache to this number of entries. I'd say the 100 default here is pretty low.
+Applies to partial word searches. Limit the results cache to this number of entries. I'd say the 100 default here is pretty low. I set mine to 10000
 !!Clear Search Words
 Deletes all words from the partial word search cache
 !!Delete Index

@@ -37,21 +26,19 @@

 Deletes and rebuilds the index for the content type selected. This can take a really long time if you have a lot of content. There will be a command line tool available soon for those that need it. The page timeout will be set to 5 minutes during this operation - however, I have been told this may not be even close to the time needed if your site has thousands of pages.
 
 !After installing the new search module ...
-After installing the new search module, you will have to do a reindex at least once on your site to get everything in there. This hasn't been tested yet on a large data set - so beware, and backup.
-
-!Remaining Issues
-Remaining issues that need to be tackled:
-*Fix partial word searching
-*Fix pagination of results
-*Fix the "find part" portion of search
-*Allow Stop Words
-*Allow search to search more than articles, blog posts, wiki pages and comments
-**phpbb
-**generic tiki content
-*Make the search cache work
-*Make search into a Bitweaver "service"
+After installing the new search module, you will have to do a reindex at least once on your site to get everything in there. (unless you have done this using CVS versions already). This hasn't been tested yet on a large data set - so beware, and backup.
 
+!Command Line Reindex
+There is a ))cmd_line_reindex.php(( file under the search directory. It can be run from the command line in case you have a large site that needs a complete reindex. This way, web page time-outs don't interfere with the indexing.
 
+It has been tested in Windows and Linux. If you get to a command prompt in the search directory, type:
+php ))cmd_line_reindex.php((.
 
+There are several options for cmd_line_reindex - all reasonable well documented in the source code.
 
+!Remaining Issues
+Remaining issues that I'd like to see tackled:
+*Allow Stop Words
+*Allow "common" word list (to prevent indexing of common words)
+*Allow search to search more than liberty content - like phpbb
 
Page History
Date/CommentUserIPVersion
30 Jun 2008 (09:30 UTC)
Kozuch85.207.244.16012
Current • Source
Sean Lee71.254.8.12310
View • Compare • Difference • Source
Sean Lee71.254.3.1289
View • Compare • Difference • Source
Sean Lee71.161.201.1908
View • Compare • Difference • Source
Sean Lee71.161.201.1907
View • Compare • Difference • Source
Sean Lee71.161.201.1906
View • Compare • Difference • Source
Sean Lee71.161.201.1905
View • Compare • Difference • Source
Sean Lee71.161.201.1904
View • Compare • Difference • Source
Sean Lee71.161.201.1903
View • Compare • Difference • Source
Sean Lee71.161.201.1902
View • Compare • Difference • Source
SEWilco207.195.192.91
View • Compare • Difference • Source