Apache Solr

  • Search Engine
  • Issues with traditional searching
    • Performs only sub string matching
    • It doesn't understand linguistic variations (buy vs buying are same)
    • It doesn't understand synonyms(buying vs purchasing)
    • doesn't omit unimportant words(a, the, of..)
    • There is no sense of relevancy in results
    • It slows down with data increase
  • How does Solr solve the traditional search issues
    • Solr uses an index that maps contents to documents instead of mapping documents to contents
    • Inverted Index is at heart of how Search engines work

Characteristics of Search engine

  • Text Centric
  • Read dominant
  • Document Oriented
  • Large amount of data
  • flexible schema (Relation db is not flexible as every document requires same structure

Install and Startup

  • Download jar & extract it
  • bin/solr start -p 8983 
    • Start Java webserver listenting at 8983 port
    • solr is web appln that runs by default in jetty webserver

Solr 

  • stores data in documents
  • Documents are more flexible than rows in rdbms
  • documents can be  hirerachical. rdbms needs different tables & rows
  • Indexing process

Solr Core

  • Single physical index
  • directory structure
    • server
      • solr
        • core1
          • conf. -> contains managed-schema.xml & solrconfig.xml
          • data   ->
Solr Document
  • Basic unit of solr information
  • Json object with key value pairs
  • It is similar to dbms table but more flexible
  • It can be hierarchical
  • It can have array of values
  • it can have object as values
  • It is denormalized document - all data belonging to an entity is in same document

Indexing Process

  • Documented is converted into solr format(json)
Designing Schema
  • Create sub-directory on configsets directory 
  • copy configuration from _default directory

Twitter Search appln

 Elements of text analysis

  • analyzer
  • tokenizer
  • chain of token filers

Faceted Search

  •  

Other points

  • If use case is to write faster then use NO SQL db like Cassandra
  •  
  •  

Comments