Home Php C# Sql C C++ Javascript Python Java Go Android Git Linux Asp.net Django .net Node.js Ios Xcode Cocoa Iphone Mysql Tomcat Mongodb Bash Objective-c Scala Visual-studio Apache Elasticsearch Jar Eclipse Jquery Ruby-on-rails Ruby Rubygems Android-studio Spring Lua Sqlite Emacs Ubuntu Perl Docker Swift Amazon-web-services Svn Html Ajax Xml Java-ee Maven Intellij-idea Rvm Macos Unix Css Ipad Postgresql Css3 Json Windows-server Vue.js Typescript Oracle Hibernate Internet-explorer Github Tensorflow Laravel Symfony Redis Html5 Google-app-engine Nginx Firefox Sqlalchemy Lucene Erlang Flask Vim Solr Webview Facebook Zend-framework Virtualenv Nosql Ide Twitter Safari Flutter Bundle Phonegap Centos Sphinx Actionscript Tornado Register | Login | Edit Tags | New Questions | 繁体 | 简体


10 questions online user: 31

0
votes
answers
12 views
+10

need sphinx configuration for the non integer primary key

i want to create sphinx search for following table structure:

CREATE TABLE IF NOT EXISTS `books` (
  `productID` varchar(20) NOT NULL,
  `productName` varchar(256) NOT NULL,
  `ISBN` varchar(20) NOT NULL,
  `author` varchar(256) DEFAULT NULL,
  `productPrice` float(10,2) NOT NULL,
  `discount` float(10,2) NOT NULL,
  `brandID` int(11) NOT NULL,
  `qty` int(11) NOT NULL,
  `status` tinyint(1) NOT NULL,
  PRIMARY KEY (`productID`),
  KEY `status` (`status`),
  KEY `ISBN` (`ISBN`),
  KEY `author` (`author`),
  KEY `brandID` (`brandID`),
  KEY `books_index` (`productName`)
) ENGINE=innodb DEFAULT CHARSET=latin1;

I can't alter the productID column in above table..

i have dependency tables for author and Brands

CREATE TABLE IF NOT EXISTS `authors` (
      `authorID` ini(11) NOT NULL,
      `author_name` varchar(256) NOT NULL
      PRIMARY KEY (`authorID`)
    ) ENGINE=innodb DEFAULT CHARSET=latin1;


CREATE TABLE IF NOT EXISTS `brands` (
      `brandID` ini(11) NOT NULL,
      `brandName` varchar(256) NOT NULL
      PRIMARY KEY (`brandID`)
    ) ENGINE=innodb DEFAULT CHARSET=latin1;

please some one provide configuration for sphinx search.

i am using following config.

source src1
{
        type                    = mysql


        sql_query               = SELECT CRC32(productID) as productid,productID,productName,ISBN,brandID,author FROM sapna_ecom_products

        sql_attr_uint           = productID
        sql_field_string        = ISBN
        sql_field_string        = productName
        sql_field_string        = brandID
        sql_attr_multi         = uint brandID from field; SELECT brandID,brandName FROM sapna_ecom_brands
        sql_attr_multi         = uint author from field; SELECT authorID,author_name FROM sapna_ecom_authors

        sql_query_info          = SELECT productID,productName,ISBN,brandID,author  FROM sapna_ecom_products  WHERE CRC32(productID)=$id
}

I am getting results if i search by productName but not for author and brand

My aim is to get the results if user search any by productName or author or brand

Please somebody provide me suitable configuration..

thanks..

8
votes
answers
39 views
+10

In Sphinx Search, how do I add “hashtag” to the charset_table?

I would like people to be able to search #photography as well as photography. Those should be treated as two different words in Sphinx. By default, #photography maps to photography, and I can't search for hashtags.

I read on this page that you can add the hash tag to the charset_table to accomplish this. I am completely clueless on how to do that. I don't know unicode, and I don't know what my charset_table should be.

Can someone tell me what my charset_table should be? Thanks.

# charset_table     = 0..9, A..Z->a..z, _, a..z, U+410..U+42F->U+430..U+44F, U+430..U+44F

Note: I plan on using real-time index. (not sure if this makes a difference)

沙发
+80
+50

這是U+0023根據Unicode表。所以最終的配置應該是這樣的

charset_table     = 0..9, A..Z->a..z, _, a..z, U+23, U+410..U+42F->U+430..U+44F, U+430..U+44F

不要忘記charset_type變量。AFAIK,這個例子charset_table是為了utf-8除此之外,您應該U+23blend_chars變量中刪除以允許Sphinx將其作為合法字符編入索引。

Thanks Paul. What would the entire string look like? Do I just add that to the end, with a comma before it? Not sure what the final result will be... – TIMEX Apr 30 '12 at 18:49

Updated and provided some more info. – Pavel Selitskas May 2 '12 at 16:26

In addition to the current requirement, is there any way to make it so that when users search "photography", it also returns results from "#photography"? But not the other way around... – TIMEX May 4 '12 at 23:45

expand_keywords should resolve this issue, though infix search ought to be used instead of prefix search. I don't know if it works with special characters, such as hash sign. – Pavel Selitskas May 7 '12 at 11:20

0

我希望人們能夠搜索#photography以及攝影。這些應該被視為Sphinx中的兩個不同的詞。默認情況下,#photography會映射到攝影,我無法搜索主題標籤。

美好的一天。

我認為這對你來說有一些解決方法,但是:

直接從用戶查詢調用搜索功能是不好的方法。

在sphinx引擎中調用搜索功能之前,需要對用戶字符串進行某種處理。例如,您可以檢查用戶字符串中的某些特殊字符,並從查詢中刪除特殊字符。你可以用繼續查詢來調用搜索功能。

祝好運。

7
votes
answers
18 views
+10

sphinx index with many-to-many relation

I am trying to set up a Sphinx index with a basic many-to-many relation between artworks and genres:

artworks
---------------
id
title
description

genres
---------------
id
name

artwork_genres
---------------
artworks_id
genres_id

In my sphinx config file I have something like

source src_artwork {
    ...
    sql_query    = SELECT id, title, description FROM artworks
    sql_attr_multi = uint tag from query; SELECT id,name FROM genres
}

This is from the docs, as far as I can understand, on multi-valued attributes and sql_attr_multi

But obviously there is no mention of the tie table in there and I can't understand how that is brought into the config. I'd simply like for a search on "Impressionism" to result in artworks belonging to that genre (weighted as appropriate if the term is seen in the other fields)

沙发
+70
+50

I would consider ignoring the attributes feature in this case. The simplest way to create a genre field by which to search artworks is to "de-normalise" the genres table into the sql_query.

In the FROM clause of your SQL query, you would JOIN the genres table to the artworks via the linking table. In the SELECT clause, you can then GROUP_CONCAT genres.name into a column, which becomes a Sphinx field to search on.

Your sql_query might look like this:

source src_artwork {
        ...
    sql_query    = SELECT a.id, a.title, a.description, GROUP_CONCAT( DISTINCT g.name SEPARATOR ' ') AS genre 
        FROM artworks AS a 
        LEFT JOIN artwork_genres AS ag ON ag.artworks_id = a.id  
        LEFT JOIN genres AS g ON g.id = ag.genres_id
        GROUP BY a.id;
}

Then a sphinx search for artworks looking for "impressionism" in the @genre field will return the "row".

必須在SQL語句的末尾添加GROUP BY id,它才有效!我聽說過GROUP_CONCAT,但是我沒有把它放在一起,因為它可以創建這樣的關係列表 - 這是SQL中的一個老問題。在這種情況下,巨大的救星。謝謝! - sbeam於2011年1月17日21:37

哦,是的 - 好點。我編輯了答案以反映您的評論。真高興你做到了! - 富翁2011年1月18日22:32

真的做到了。謝謝! - Samin 2013年6月3日11:29

6
votes
answers
9 views
+10

Sphinx main/delta indexing, sql_query_killlist

I am currently using Sphinx for indexing a MySQL query with 20+ million records.

I am using a delta index to update the main index and add all new records.

Unfortunately allot of changes to the tables are deleted.

I understand that I can use sql_query_killlist to get all document ID's that need to be deleted or updated. Unfortunately I don't understand how this actually works and the documentation from Sphinx does not have a good enough example for me to understand.

If I use the following example, how could I implement the killlist?

in MySQL

CREATE TABLE sph_counter
(
    counter_id INTEGER PRIMARY KEY NOT NULL,
    max_doc_id INTEGER NOT NULL
);

in sphinx.conf

source main
{
    # ...
    sql_query_pre = SET NAMES utf8
    sql_query_pre = REPLACE INTO sph_counter SELECT 1, MAX(id) FROM documents
    sql_query = SELECT id, title, body FROM documents 
        WHERE id<=( SELECT max_doc_id FROM sph_counter WHERE counter_id=1 )
}

source delta : main
{
    sql_query_pre = SET NAMES utf8
    sql_query = SELECT id, title, body FROM documents 
        WHERE id>( SELECT max_doc_id FROM sph_counter WHERE counter_id=1 )
}

index main

{
    source = main
    path = /path/to/main
    # ... all the other settings
}

note how all other settings are copied from main, but source and path are overridden (they MUST be) index delta : main

{
    source = delta
    path = /path/to/delta
}
沙发
+60
+50

具體細節很大程度上取決於您如何標記已刪除的文檔。但只會添加類似的東西

 sql_query_killist = SELECT id FROM documents 
                     WHERE status='deleted' 
                           AND id<=( SELECT max_doc_id FROM sph_counter 
                                     WHERE counter_id=1 )

三角洲指數。這將捕獲主索引中已刪除記錄的ID,並將它們添加到killlist中,以便它們永遠不會出現在搜索結果中。

如果想要捕獲更新的記錄,需要安排新的行包含在delta的主sql_query中,並將它們的id放在kill-list中。

Hi Barry. How can I select something if it is deleted out of my table? – gt-info Aug 9 '12 at 22:05

Well if you really do 'delete' rather than just changing some sort of status flag. Then you will need another way to get a list of deleted documented. When you delete a document in the application, could insert the id into a new table. And use that? – barryhunter Aug 10 '12 at 12:41

It is clear to me now Barry, thanks. I am going to add a table which consists of all deleted ID's from the main table. I can do a select * from table, for my kill-list. Does the row actually get deleted from the index? Or only ignored? – gt-info Aug 12 '12 at 8:27

The kill list just kills it from the result set, its still in the actual index. searchd doesnt in general modify indexes once they have been created by indexer. (UpdateAttributes is a basically the only exception) – barryhunter Aug 13 '12 at 11:08

Worth to mention that " Kill-list for a given index suppresses results from other indexes, depending on index order in the query" [source: sphinxsearch.com/docs/current.html#conf-sql-query-killlist] – Alex Jun 26 '15 at 15:46

7
votes
answers
8 views
+10

Does Sphinx auto update is index when you add data to your SQL?

I am curious as to whether or not Sphinx will auto update its index when you add new SQL data or whether you have to tell it specifically to reindex your db.

If it doesn't, does anyone have an example of how to automate this process when the database data changes?

沙发
+20
+50

正如sphinx文檔部分中有關實時索引的內容

實時索引(或簡稱的RT索引)是一個新的後端,允許您動態插入,更新或刪除文檔(行)。

因此,要動態更新索引,您只需要進行查詢即可

{INSERT | REPLACE} INTO index [(column, ...)]
VALUES (value, ...)
[, (...)]

So where do you run this SQL like statement? I am reading through the documentation but all their examples show it being queried within mysql as it is. – lockdown Sep 29 '11 at 19:02

You could issue that via your favorite MySQL client – tmg_tt Sep 30 '11 at 8:00

+30

答案是否定的,您需要告訴sphinx重新編制數據庫。

您需要了解一些步驟和要求:

  1. 主要和三角洲是要求
  2. 首次運行時,您需要索引主索引。
  3. 在第一次運行之後,您可以通過旋轉它來索引delta(以確保服務正在運行並且當時可以使用Web上的數據)
  4. 在進一步開始之前,您需要創建一個表來標記“最後編入索引的行”。最后索引的行ID可以用於下一個索引增量並將delta合併到main中。
  5. 您需要將delta索引合併到主索引。作為sphinx文檔中的內容http://sphinxsearch.com/docs/current.html#index-merging
  6. 重啟sphinx服務。

    提示:創建自己的程序,可以使用C#或其他語言執行索引。您可以嘗試Windows的任務計劃也可以。

這是我的conf:

source Main
{
type            = mysql

sql_host        = localhost
sql_user        = root
sql_pass        = password
sql_db          = table1
sql_port        = 3306  # optional, default is 3306
sql_query_pre = REPLACE INTO table1.sph_counter SELECT 1, MAX(PageID) FROM table1.pages;
sql_query       = 
    SELECT  pd.`PageID`, pd.Status from table1.pages pd
    WHERE pd.PageID>=$start AND pd.PageID<=$end 
    GROUP BY pd.`PageID`

sql_attr_uint       = Status

sql_query_info      = SELECT * FROM table1.`pages` pd WHERE pd.`PageID`=$id
sql_query_range     = SELECT MIN(PageID),MAX(PageID)
              FROM tabl1.`pages`
sql_range_step      = 1000000
}


source Delta : Main
{
sql_query_pre = SET NAMES utf8

sql_query = 
    SELECT  PageID, Status from pages 
    WHERE PageID>=$start AND PageID<=$end 

sql_attr_uint       = Status

sql_query_info      = SELECT * FROM table1.`pages` pd WHERE pd.`PageID`=$id
sql_query_range     = SELECT (SELECT MaxDoc FROM table1.sph_counter WHERE ID = 1) MinDoc,MAX(PageID) FROM table1.`pages`;
sql_range_step      = 1000000
}


index Main
{
source          = Main
path            = C:/sphinx/data/Main
docinfo         = extern
charset_type        = utf-8
}


index Delta : Main
{
    source = Delta
path = C:/sphinx/data/Delta
charset_type = utf-8
}

You do not need to restart searchd if you pass the --rotate param. – Christian Apr 1 '13 at 23:30

+20

擴展Anne的答案 - 如果您使用的是SQL索引,它將不會自動更新。您可以在每次更改後管理重新索引的過程 - 但這可能很昂貴。解決此問題的一種方法是使用包含所有內容的核心索引,然後使用具有相同結構的增量索引來僅對更改進行索引(這可以通過布爾值或時間戳列來完成)。

這樣,您可以在超常規的基礎上重新索引delta索引(更小,從而更快),然後更少地定期處理核心和delta(但最好至少每天都這樣做)。

但是否則,新的RT索引值得關注 - 你仍然需要自己更新內容,而且它與數據庫無關,所以它是一種不同的心態。另外:RT索引沒有SQL索引所具有的所有功能,因此您需要確定哪些更重要。

0
votes
answers
6 views
+10

How do I use multiple sources in one index in Sphinx?

The Sphinx config file hints to it supporting multiple sources for one index, how do I actually specify it?

Here's the snippet from the config file:

# document source(s) to index
# multi-value, mandatory
# document IDs must be globally unique across all sources
source                  = src1

I've tried setting it in the following formats:

source = src1, src2
source = [src1, src2]

and I've also tried using the source variable twice, eg:

source = src1
source = src2

I suspect that I'm just being a dunce, as I'm not sure of the syntax to use in the config file. Any ideas?

Using the second code snippet I get the following error:

ERROR: index 'iwa': fulltext fields count mismatch (me=iwa_publications, in=iwa_events, myfields=3, infields=8).

The two sources are iwa_events and iwa_publications. Both have unique id columns, and both sources work when indexed individually.

9
votes
answers
14 views
+10

When updating an index in sphinx.conf is restarting searchd in sphinx always required?

If I update a resource in my sphinx.conf file I can reindex with --rotate and everything works fine. If I update an index in my sphinx.conf or add a new index --rotate has no effect and I have to restart searchd.

Am I doing this correctly, I feel like --rotate should correctly index the new or modified index configurations.

沙发
+80
+50

It depends on your sphinx version. In the latest versions just about anything (except maybe the searchd config section) will work with changing the config file.

Just changing the settings on an individual index, a --rotate indexing of the particular index is enough. If you change the settings of particular index, and dont actully reindex it, searchd probably wont pickup the changes. (because it reads stuff from the index header, not direct from conf file)

I just tested adding a index, and removing a index. both happened with a seemless rotate. Sphinx 2.0.1-beta (r2792)

Prior to 0.9.9-rc1 - a restart would be required for most config file changes.

+10

You have to restart searchd when modifying the sphinx.conf file.

Rotate will not effect new index additions to your sphinx.conf file - it reindexes an analogous index of the original. Kind of like having a file and file-copy(1) then swapping them over. If you modify the .conf file its sort of like declaring a brand new index. Thus --rotate does not work if the exact index does not previously exist. See; http://sphinxsearch.com/docs/2.0.1/ref-indexer.html

看來您的解釋是正確的,但我在sphinx docs上找不到任何引用(sphinxsearch.com/docs/archives/2.0.1/ref-indexer.html)。你有其他參考嗎? - gerky 2016年12月7日11:02

如果指定配置文件,會 - 工作嗎?(用-c)我會想像它會根據配置源構建新索引嗎? - gerky 2016年12月7日11:21

11
votes
answers
16 views
+10

Thinking sphinx fuzzy search?

I am implementing sphinx search in my rails application.
I want to search with fuzzy on. It should search for spelling mistakes e.g if is enter search query charact*a*ristics, it should search for charact*e*ristics.

How should I implement this

沙发
+60
+50

Sphinx doesn't naturally allow for spelling mistakes - it doesn't care if the words are spelled correctly or not, it just indexes them and matches them.

There's two options around this - either use thinking-sphinx-raspell to catch spelling errors by users when they search, and offer them the choice to search again with an improved query (much like Google does); or maybe use the soundex or metaphone morphologies so words are indexed in a way that accounts for how they sound. Search on this page for stemming, you'll find the relevant section. Also have a read of Sphinx's documentation on the matter as well.

I've no idea how reliable either option would be - personally, I'd opt for #1.

謝謝拍,我想使用raspell,但不符合我的要求。我正在閱讀電子郵件內容並蒐索通過電子郵件訂購的可能的產品名稱。我無法向用戶建議更正的選項。而對於raspell,碰巧它將一些縮寫名稱替換為無關的替代品,例如led(LED)替換為蓋子。嘗試使用soundex和metaphone,它改善了我的結果但不准確。 - Pravin 2011年5月20日6:44

+30

By default, Sphinx does not pay any attention to wildcard searching using an asterisk character. You can turn it on, though:

development:
  enable_star: true
  # ... repeat for other environments

See http://pat.github.io/thinking-sphinx/advanced_config.html Wildcard/Star Syntax section.

+20

Yes, Sphinx generaly always uses the extended match modes.

There are the following matching modes available:

SPH_MATCH_ALL, matches all query words (default mode);
SPH_MATCH_ANY, matches any of the query words;
SPH_MATCH_PHRASE, matches query as a phrase, requiring perfect match;
SPH_MATCH_BOOLEAN, matches query as a boolean expression (see Section 5.2, “Boolean query syntax”);
SPH_MATCH_EXTENDED, matches query as an expression in Sphinx internal query language (see Section 5.3, “Extended query syntax”);
SPH_MATCH_EXTENDED2, an alias for SPH_MATCH_EXTENDED;
SPH_MATCH_FULLSCAN, matches query, forcibly using the "full scan" mode as below. NB, any query terms will be ignored, such that filters, filter-ranges and grouping will still be applied, but no text-matching.

SPH_MATCH_EXTENDED2 was used during 0.9.8 and 0.9.9 development cycle, when the internal matching engine was being rewritten (for the sake of additional functionality and better performance). By 0.9.9-release, the older version was removed, and SPH_MATCH_EXTENDED and SPH_MATCH_EXTENDED2 are now just aliases.

enable_star

Enables star-syntax (or wildcard syntax) when searching through prefix/infix indexes. >Optional, default is is 0 (do not use wildcard syntax), for compatibility with 0.9.7. >Known values are 0 and 1.

For example, assume that the index was built with infixes and that enable_star is 1. Searching should work as follows:

"abcdef" query will match only those documents that contain the exact "abcdef" word in them.
"abc*" query will match those documents that contain any words starting with "abc" (including the documents which contain the exact "abc" word only);
"*cde*" query will match those documents that contain any words which have "cde" characters in any part of the word (including the documents which contain the exact "cde" word only).
"*def" query will match those documents that contain any words ending with "def" (including the documents that contain the exact "def" word only).

Example:

enable_star = 1

9
votes
answers
28 views
+10

Ideas for full text search MongoDB & node.js [closed]

I am developing a search engine for my website and i want to add following features to it;

  1. Full text search
  2. Did you mean feature
  3. Data store in MongoDB

I want to make a restful backend. I will be add data to mongodb manually and it will be indexed (which one i can prefer? Mongodb indexing or some other search indexing libraries like Lucene). I also want to use node.js. These are what i found from my researches. Any idea would be appreciated for the architecture

Thanks in advance

沙发
+70
+50

I'm using Node.js / MongoDB / Elasticsearch (based on Lucene). It's an excellent combination. The flow is stunning as well, since all 3 packages (can) deal with JSON as their native format, so no need for transforming DTO's etc.

Have a look: http://www.elasticsearch.org/

感謝您的回复,我也得到了這3對,但您使用谷歌搜索“你的意思”功能嗎?我也需要它。我認為彈性搜索沒有這樣的未來 - HüseyinBABAL於2012年8月4日19:05

是的,你是對的,他們正在等待Lucene 4.0在實施之前發布:github.com/elasticsearch/elasticsearch/issues/911 - Geert-Jan 8月6日'12在10:00

謝謝,我也是這樣的。我會等待那個版本。 - HüseyinBABAL於2012年8月6日10:31

你可以發一個關於如何將Elastic搜索添加到node.js + mongodb的簡單教程或指導嗎? - Marwan Roushdy 2013年2月11日10:45

也許將來。現在通過http / rest與ES交談。只需在Node中使用HTTP-stuff或使用github.com/phillro/node-elasticsearch-client獲得一個像樣的庫。我的方式是在ES中進行搜索查詢,並檢索docids。然後使用這些docids在mongoDB中進行多鍵提取。這允許ES保持精益(除了你需要在Mongo中進行提取之外的任何東西都不需要存儲任何東西),保持數據存儲(mongodb和ES)同步等方面的問題較少等等 - Geert-Jan Apr 4' 13時21分59秒

+20

I personally use Sphinx and MongoDb, it is a great pair and I have no problems with it.

I back MongoDB onto a MySQL instance which Sphinx just quickly indexes. Since you should never need to actively index _id, since I have no idea who is gonna know the _id of one of your objects to search for, you can just stash it in MySQL as a string field and it will work just fine.

When I pull the results back out of Sphinx all I do is convert to (in PHP) a new MongoId, or in your case a ObjectId and then simply query on this object id for the rest of the data. It couldn't be simpler, no problems, no hassle, no nothing. And I can spin off the load of reindexing delta indexes off to my MySQL instance keeping my MongoDB instance dealing with what it needs to: serving up tasty data for the user.

感謝您的回复,我將嘗試與mongodb的獅身人面像 - HüseyinBABAL8月4日'12在19:07

@cubuzoa np,一條建議; 如果您使用MySQL連接器(像我一樣),請確保將MySQL的默認自動遞增ID作為Sphinx到Delta Index的關鍵。我也建議使用實時索引atm的delta索引。 - Sammaye於2012年8月4日19:14

0
votes
answers
16 views
+10

Php connection to Sphinx refused

I have installed Sphinx on my server and everything seems to working, except that when I run the test.php from web browser, I am getting this error: Query failed: connection to localhost:9312 failed (errno=111, msg=Connection refused).

I have searched online, including stack overflow, almost all suggestions were make sure the searchd service is running, and it's listening on the right port. I can say yes to both questions because if run the same test program directly in command line, everything works. I understand that the hosting provider may not have opened the port, but the port should be listening with the server, right? The fact that it works under command line should confirm that the service is listening on that port and the php program is able to get through to it. So I don't understand why the same program run from the web browser rejects the connection.

I have also enabled fsockopen in php.ini.

Any suggestion to help understand why the connection is rejected or even better how to solve it is very welcome!

Thanks