{"id":2753,"date":"2010-07-30T20:34:43","date_gmt":"2010-07-31T04:34:43","guid":{"rendered":"https:\/\/www.ultrasaurus.com\/?p=2753"},"modified":"2010-07-30T20:34:43","modified_gmt":"2010-07-31T04:34:43","slug":"lucenesolr-meetup-july-28","status":"publish","type":"post","link":"https:\/\/www.ultrasaurus.com\/2010\/07\/lucenesolr-meetup-july-28\/","title":{"rendered":"lucene\/solr meetup, july 28"},"content":{"rendered":"
I attended the Lucene\/Solr meetup<\/a> this week — quite a swank event sponsored by Salesforce<\/a> with tasty appetizers, beers and an incredible view of the bay. The three speakers were very knowledgeable and well spoken and I enjoyed hearing about the different applications of Lucene and Solr. Below are my rough notes.\u00a0 For folks who want to learn more about Lucene and Solr, check out the upcoming conference Lucene Revolution<\/a>, Oct 5-8, 2010 in Boston.<\/p>\n Salesforce uses Lucene 2.2 (not Solr) and shared some stats about their seriously large scale operation:<\/p>\n It’s a multi-tenant architecture, each org has 1-100,000s users and had a single codebase, which means there is just 1 version to support at one time.<\/p>\n They use post-filtering for:<\/p>\n They query db to bridge the gap with indexing lag.<\/p>\n They are faced with new search challenges driven by what Salesforce CEO calls “the facebook imperative.” When he started Salesforce, he used to ask “why donesn’t every enterprise app look like amazon?” Now he asks: “why doesn’t every enterprise app look like Facebook?”\u00a0 (side note: this is an echo of what many folks have been saying for a while, that social networking makes sense as a feature of an app, rather than just destinations like Facebook and LinkedIn.)<\/p>\n Salesforce allows you to have a feed on a record, follow accounts, status updates for accounts.\u00a0 They index tracked changed.\u00a0 They need to search this rich set of data which is people articulating their interests. Bill noted that the needs of structured data are really different from unstructured data.<\/p>\n Grant Ingersoll spoke of “two tales of relevance”<\/p>\n Better search results = less time searching, more time acting<\/p>\n Other cases to consider:<\/p>\n Befre undertaking any relevance tuning, you need to define what “better search” means to you.\u00a0 There are many ways to test and measure:<\/p>\n Capturing user feedback:<\/p>\n Grant notes that Lucene searches default to “or” out of the box, when “and” is typically better today.\u00a0 He had a list of links that he suggested we check out (sadly I couldn’t type fast enough, but here are some I wrote down):<\/p>\n auto-add phrases to your questies — surround with quotes — automtric win Logfile managemetn in the cloud (no Hadoop).\u00a0 Logs are painful — distributed, large, ephemeral.\u00a0 Most log search is hightly skewed.\u00a0 “We’re just implementing grep across terabytes of data.”\u00a0 This was a compelling talk, but it took most of my attention to follow, so my notes are weak and may make sense to no one except me:<\/p>\n syslog + 0MQ + SolrCloud run many indexers, “hot shards” — the indexers update small shards<\/p>\n 0MQ gives us node-specific input queues for Solr<\/p>\n nrt + solrCloud = Our Nirvana<\/p>\n Hot shards re chilled when we stop writing to them<\/p>\n Solr is awesome at what it does, but not so good for data mining I attended the Lucene\/Solr meetup this week — quite a swank event sponsored by Salesforce with tasty appetizers, beers and an incredible view of the bay. The three speakers were very knowledgeable and well spoken and I enjoyed hearing about the different applications of Lucene and Solr. Below are my rough notes.\u00a0 For folks who… Continue reading Search@salesforce.com, Bill Press, Salesforce<\/h2>\n
\n
\n
\n
Practical Relevance, Grant Ingersoll, Lucid Imagination<\/h2>\n
\n
\n
\n
\n
\n
\nauto-add a “sloppy phrase” — large slop factor, like an AND, boost when words are close<\/p>\nLogs, Search, Cloud, Jon Gifford, Loggly<\/h2>\n
\n0MQ – not traditional queing, it fails, when it fails we lose data, but it is very fast
\nSolr give s us facets which gives us graphs<\/p>\n
\n— plan to plug in Hadoop for large-volume analytics
\nSyslog is the only way in for now, adding others, http, scribe, flume,<\/p>\n","protected":false},"excerpt":{"rendered":"