Posts

Showing posts from November, 2020

Apache Spark

Implement in Scala Parallel execution framework  supports to perform big data sets Partition Just a data Task Java code executing on Partition data RDD Resilient Distributed Dataset DAG Directed Acyclic Graph POM dependencies Spark-core Spark-sql Hadoop-hdfs Initial dataset JavaRDD<Integer>  myRdd=sc.parallelize(inputdata); Reduce Takes input as 2 variables and returns output of same type  eg: result = myRdd.reduce((value1,value2)-> value1+value2); Mapping return type can be different from input data Transform rdd structure from one form to another  eg:  JavaRDD<double> sqrtRdd= myRdd.map(value-> Math.sqrt(value));  Procedure sqrtRdd. forEach(value -> print(value)); Collect Get all data from different nodes to current working node Tuples storing related objects together instead of having a new class eg  var itmes = ("one","two","three")  PairRDD PairRDD allows rich operations against keys Group by key produces another RDD of t...

Vault

Hashicorp Vault Manages(generates, stores, revokes) static & dynamic secrets  Provides encryption service Auditing Server start vault server start create secret vault write secret/cookie receipe=sugar vault read secret/cookie Platform to secure, store and tightly control access to tokens, passwords, certificates and encryption keys for protecting sensitive data and other secrets in dynamic infrastructure. Protects sensitive data like user/passwd api keys certificates tokens encryption keys Benefits of Vault Centralize secrets across organization eliminates long lives secrets Provides encryption as service automate generation of certificates for authentication Features How does Vault protect data Vault creates an encryption key The encryption key is used to encrypt the data stored on vault Encryption key is stored along side of the data Vault needs a master key to protect encrypted key master key is not stored on any persistence storage Master key is generated when Vault is init...

Apache Solr

Search Engine Issues with traditional searching Performs only sub string matching It doesn't understand linguistic variations (buy vs buying are same) It doesn't understand synonyms(buying vs purchasing) doesn't omit unimportant words(a, the, of..) There is no sense of relevancy in results It slows down with data increase How does Solr solve the traditional search issues Solr uses an index that maps contents to documents instead of mapping documents to contents Inverted Index is at heart of how Search engines work Characteristics of Search engine Text Centric Read dominant Document Oriented Large amount of data flexible schema (Relation db is not flexible as every document requires same structure Install and Startup Download jar & extract it bin/solr start -p 8983  Start Java webserver listenting at 8983 port solr is web appln that runs by default in jetty webserver Solr  stores data in documents Documents are more flexible than rows in rdbms documents can be  hirera...