Apache Solr

Apache Solr

November 05, 2020

Search Engine
Issues with traditional searching

Performs only sub string matching
It doesn't understand linguistic variations (buy vs buying are same)
It doesn't understand synonyms(buying vs purchasing)
doesn't omit unimportant words(a, the, of..)
There is no sense of relevancy in results
It slows down with data increase

How does Solr solve the traditional search issues

Solr uses an index that maps contents to documents instead of mapping documents to contents
Inverted Index is at heart of how Search engines work

Characteristics of Search engine

Text Centric
Read dominant
Document Oriented
Large amount of data
flexible schema (Relation db is not flexible as every document requires same structure

Install and Startup

Download jar & extract it
bin/solr start -p 8983

Start Java webserver listenting at 8983 port
solr is web appln that runs by default in jetty webserver

Solr

stores data in documents
Documents are more flexible than rows in rdbms
documents can be hirerachical. rdbms needs different tables & rows
Indexing process

Solr Core

Single physical index
directory structure

server

solr

core1

conf. -> contains managed-schema.xml & solrconfig.xml
data ->

Solr Document

Basic unit of solr information
Json object with key value pairs
It is similar to dbms table but more flexible
It can be hierarchical
It can have array of values
it can have object as values
It is denormalized document - all data belonging to an entity is in same document

Indexing Process

Documented is converted into solr format(json)

Designing Schema

Create sub-directory on configsets directory
copy configuration from _default directory

Twitter Search appln

Elements of text analysis

analyzer
tokenizer
chain of token filers

Faceted Search

Other points

If use case is to write faster then use NO SQL db like Cassandra

Comments