Just the other day we wrote about Sensei, the new distributed, real-time full-text search database built on top of Lucene and here we are again writing about another “new” distributed, real-time, full-text search server also built on top of Lucene: SolrCloud.
In this post we’ll share some interesting SolrCloud bits and pieces that matter mostly to those working with large data and query volumes, but that all search lovers should find really interesting, too. If you have any questions about what we wrote (or did not write!) in this post, please leave a comment – ...
Mark your calendars today! The largest worldwide conference dedicated to Lucene and Solr will take place in Boston May 7-10.
The 2012 conference will build on the success of last year’s Lucene Revolution in San Francisco. Sponsored by Lucid Imagination with additional support from community and other commercial co-sponsors, we’ll be adding new sessions, new speakers, and new training sessions to the agenda. Lucid Imagination is the commercial entity exclusively dedicated to Apache Lucene/Solr open …
Once upon a time there was no decent open-source search engine. Then, at the very beginning of this millennium Doug Cutting gave us Lucene. Several years later Yonik Seeley wrote Solr. In 2010 Shay Banon released ElasticSearch. And just a few days ago John Wang and his team at LinkedIn announed Sensei 1.0.0 (also known as SenseiDB). Here at Sematext we’ve been aware of Sensei for a while now (2 years?) and are happy to have one more great piece of search software available for our own needs and those of our customers. As a matter of fact, we are so excited about Sensei that ...
The second phase of SolrCloud has been in full swing for a couple of months now and it looks like we are going to be able to commit this work to trunk very soon! In Phase1 we built on top of Solr’s distributed search capabilities and added cluster state, central config, and built-in read side fault tolerance. Phase 2 is even more ambitious and focuses on the write side. We are talking full-blown fault tolerance for reads …
The Solr Reference Guide has been updated for the 3.5 release of Solr and Lucene. Only minor changes were needed this time around. In particular, we added information on:
Support for the Hunspell stemmer
The new langid UpdateProcessor
Numeric types now support sortMissingFirst/Last
New parameter hl.q for use with highlighting
Field types supported by the StatsComponent now includes date and string fields
The Solr Reference Guide is available for free online or as a downloadable …
Yes, Berlin Buzzwords is back on the 4th & 5th June 2012! This really is only conference for developers and users of open source software projects, focusing on the issues of scalable search, data-analysis in the cloud and NoSQL-databases. All the talks and presentations are specific to three tags; "search", "store" and "scale".
Looking back to last year, this event had a great turnout. There were well over 440 attendees, of which 130 internationals (from all over including Israel, US, UK, NL, Italy, Spain, Austria and more) and an impressive show of 48 speakers. It was a 2 day event ...
Date: Thursday, January 19, 2012
Time: 7:00 PM – 9:00 PM
Location: 12200 Olympic Blvd, Los Angeles, CA
The latest Los Angeles/ OC Apache Lucene/Solr User group meeting was held at Shopzilla in LA. We had Grant Ingersoll from Lucid Imagination speaking at the event. In this talk, Grant spoke about some of the tools available (recommendations, faceting options, amongst others) in Solr and Mahout to aid in the discovery process and how these two …
2011 was a good year for Sematext. Here are some highlights.
Products
In 2011, we’ve released several new versions of our popular AutoComplete, Key Phrase Extractor, and DYM ReSearcher products and have witnessed a number of organizations adopting them.
SaaS
After months of hard work, we’ve opened up our Search Analytics and Performance Monitoring services to public. Anyone can sign up for an account and use either or both of these services for free. Yes, both services are completely free now and can be used without any restrictions.
Services – Tech Support
In addition ...
Here are two cool things about Search Analytics that I’d like to point out. The slides are stolen from our Search Analytics presentation at Enterprise Search Summit 2011 in Washington DC.
Search Analytics for A/B testing, relevance tuning and improvements
This slide shows how Search Analytics can be used to help with A/B testing. Concretely, in this slide we see two Solr Dismax handlers selected on the right side. If you are not familiar with Solr, think of a Dismax handler as an API that search applications call to execute searches. In this example, each Dismax handler is ...
The big Hadoop 1.0.0 release has arrived. The general notes about releases from the dev team include:
security
Better support for HBase (append/hsynch/hflush, and security)
webhdfs (with full support for security)
performance enhanced access to local files for HBase
other performance enhancements, bug fixes, and features
You can also find the complete release notes here and see all fixes, improvements and new features included in the release. To save you time, please find below additional information about some of the items that attracted our attention from the Hadoop 1.0.0 ...
We get asked a lot by customers what’s in a new Solr/Lucene release that applies to them, and with our own LucidWorks Platform available, customers naturally want to know what they’ll get that they don’t already have. If you’re happily running along on Solr 1.4, why or when should you update to a newer version? Should you migrate to LucidWorks?
So we decided to try to put together a matrix of major features and show …
LucidWorks Enterprise 2.0.1 is an interim bug-fix release. We have resolved a couple of critical bugs and LDAP integration issues. The list of issues resolved with this updates are available here.
Download
You can download the latest version 2.0.1 here.
Install
If you are running LucidWorks Enterprise 1.7 or LucidWorks 1.8, you can use the upgrade scripts and move to version 2.0.1.
For those of you running LucidWorks Enterprise 2.0, you can now …
I really dislike the so called "Boolean Operators" ("AND", "OR", and "NOT") and generally discourage people from using them. It's understandable that novice users may tend to think about the queries they want to run in those terms, but as you become more familiar with IR concepts in general, and what Solr specifically is capable of, I think it's a good idea to try to "set aside childish things" and start thinking (and encouraging your users to think) in terms of the superior "Prefix Operators" ("+", "-").
The year 2011 is coming to an end and it’s time to reflect on the past 12 months. Without further fluff, let’s look back and summarize all significant events that happened in Lucene and Solr world over the course of last dozen months. In the next few paragraphs we’ll go over major changes in Lucene and Solr, new blood, relevant conferences and books.
We should start by pointing out that this year Apache Lucene celebrated its 10 year anniversary as an Apache Software Foundation project. Lucene itself is actually over 10 years old. Otis is one of the very few people from ...
Working at Lucid Imagination a customer once asked me about how they could modify the score of the documents in Solr in order to get most relevant results higher in the results list. While I was trying to respond the question I realized that there are too many different options, and that not all of them are very easy to understand, so I decided to write some notes summarizing the most common/most used ways to …
The next JavaMUG meeting is on December 14th 2011. Erik Hatcher from Lucid Imagination will be presenting at the event. He will talk about Apache Solr, its features and benefits. This will be an introductory Solr talk.
Apache Solr serves search requests at enterprises and the largest companies around the world. Built on top of the top–notch Apache Lucene library, Solr makes indexing and searching integration into your applications straightforward.
Solr provides faceted navigation, spell …
The first San Francisco Apache Mahout user meeting was held on November 29th 2011 at Lucid Imagination head quarters in Redwood City. The 3-hour session hosted 2 talks followed by networking, food and drinks.
Session topics -
“Using Mahout to cluster, classify and recommend, plus a demonstration of using scripts packaged with Mahout” by Grant Ingersoll from Lucid Imagination.
“How using random projection in Machine learning can benefit performance with out sacrificing quality” …
[ Tuesday, 13 December 2011; ] Last minute mention, in case you happen to be in the Central VA area (Richmond and surrounding areas) tomorrow night… I’ll be discussing Solr and the latest greatest techniques folks are using to work with Solr from Ruby. The abstract blurb follows: “Erik Hatcher will discuss and demonstrate the state of the art with using Solr from Ruby. He’ll cover RSolr (and the forthcoming deprecation and removal of solr-ruby, …
Wildcard query terms aren’t analyzed, why is that?
Prior to the current 3x branch (which will be released as 3.6) and the trunk (4.0) Solr code, users have frequently been perplexed by wildcard searching being un-analyzed, often manifesting in case sensitivity. Say you have an analysis chain in your schema.xml file defined as follows and a field named lc_field of this type:
<fieldType name="lowercase" class="solr.TextField" >
<tokenizer class="solr.WhitespaceTokenizerFactory"/>
<filter class="solr.LowercaseFilterFactory" />
</fieldType>
Now, you ...
Official release announcement for Lucene/Solr 3.5:
November 27 2011, Apache Lucene™ 3.5.0 available
The Lucene PMC is pleased to announce the release of Apache Lucene 3.5.0.
Apache Lucene is a high-performance, full-featured text search engine
library written entirely in Java. It is a technology suitable for nearly
any application that requires full-text search, especially cross-platform.
This release contains numerous bug fixes, optimizations, and
improvements, some of which are highlighted below. The release…