SlideShare a Scribd company logo
1 of 14
Apache Cassandra
What is Apache Cassandra?
Apache Cassandra is an open source non relational distributed
database that manages large amounts of data across commodity
servers.
It is column oriented database.
It was initially released in July 2008.
It comes under Availability and Partition Tolerance.
Why Apache Cassandra was implemented?
Avinash Lakshman and Prashant Malik initially
developed Apache Cassandra at Facebook to power the
Facebook inbox search feature.
Components of Apache Cassandra
• Node: A Cassandra node is a place where data is stored.
• Data center: Data center is a collection of related nodes.
• Cluster: A cluster is a component which contains one or more data centers.
• Commit log: In Cassandra, the commit log is a crash-recovery mechanism. Every write operation is
written to the commit log.
• Memtable: A memtable is a memory-resident data structure. After commit log, the data will be written
to the mem-table. Sometimes, for a single-column family, there will be multiple memtables.
• SSTable: It is a disk file to which the data is flushed from the memtable when its contents reach a
threshold value.
• Bloom filter: These are nothing but quick, nondeterministic, algorithms for testing whether an element
is a member of a set. It is a special kind of cache. Bloom filters are accessed after every query.
Apache Cassandra Architecture:
Write Operations:
i. Cassandra stores the data in memory structure in memtable(RAM)
when the initial write request is generated from the client.
Concurrently the writes are written on Commit log(disk)as well
which are permanent even if the light goes off for the node.
ii. The data from the memtable(RAM) is flushed to the SSTables(Disk)
and the partition index is also created that points to the location of
data in the disk. The flushing of data from memtable(RAM) to
SSTables(Disk) is done using the configurable threshold or when the
commit log threshold commitlog_total_space_in_mb is exceeded.
iii. The Data is written on the SSTables tables which are immutable
which means when the memtable is flushed the data is not
overwritten in SSTables despite a new file being created. The
partitions are stored on multiple SSTables so that they can be easily
searched.
Read Operations:
i. The Read request will be made from the client.
ii. The request data will be checked in the memtable(RAM). If the
requested data is present then data will be read from memtable(RAM)
and merged with SSTables(DISK) files to send final data to the client.
iii. If the row cache is enabled then it will be checked to find the data.
iv. Bloom Filters are loaded in the Heap memory that will be checked to
find out the SSTables file that can store the requested partition data.
Since Bloom Filters works on probabilistic function and can return false
positives. In some cases Bloom Filters does not return the SSTable file
then Cassandra further checks in the partition key cache.
v. Partition Key Cache is used to store the partition index in heap memory
and the partition index of data will be searched in that. If the Partition
Key is present in the Partition Key Cache then Cassandra will go to
compression offset to find the Disk that has the data. If the Partition Key
is not present in the Partition Key Cache then the partition summary is
searched to find user-requested data.
vi. Partition Index is used to store the Partition key of the data that will
be used in the Compression offset map to find out the exact location
of the Disk which has stored the data.
vii. Compression offset map is used to hold the exact location of data. It
uses the Partition key to locate that. Once the Compression offset
map indicates the location where data is stored the further process is
to fetch the data and share it with the user.
Features of Apache Cassandra:
Distributive
Scalability
Fault Tolerance
Query Language
Virtual Nodes:
A virtual node is the data storage layer within a server. There are
256 virtual nodes per server by default. Each node has a range of
tokens assigned. Every virtual node uses a sub-range of tokens from
the node they belong to. These virtual nodes provide greater
flexibility in the system. Consequently, It is easier for Cassandra to
add new nodes to the cluster when we need them. When our data
has unequally distributed tokens between nodes, we can easily
extend the storage capacity by extending virtual nodes to the more
loaded node.
Advantages of Apache Cassandra:
Open source
Peer to Peer Architecture
Scalable
High Efficiency
Consistency adjustable
Schema Less
Easy to Learn and Use
Distributed and Decentralized
Ability to Analyse
Disadvantages of Apache Cassandra:
It does not support ACID and relational data properties.
Because it handles large amounts of data and many requests,
transactions slow down, meaning you get latency issues.
Data is modelled around queries and not structure, resulting in the
same information stored multiple times.
Since Cassandra stores vast amounts of data, users may experience
JVM memory management issues.
It offers no join or subquery support.
Cassandra does not support aggregates
Cassandra was optimized from the start for fast writes, reading got
the short end of the stick, so it tends to be slower.
Finally, it was lacks official documentation from Apache, so you need
to look for it among third party companies.

More Related Content

Similar to Apache Cassandra.pptx

cybersecurity notes for mca students for learning
cybersecurity notes for mca students for learningcybersecurity notes for mca students for learning
cybersecurity notes for mca students for learningVitsRangannavar
 
Cassandra advanced part-ll
Cassandra advanced part-llCassandra advanced part-ll
Cassandra advanced part-llachudhivi
 
Cassandra & Python - Springfield MO User Group
Cassandra & Python - Springfield MO User GroupCassandra & Python - Springfield MO User Group
Cassandra & Python - Springfield MO User GroupAdam Hutson
 
A NOVEL APPROACH FOR HOTEL MANAGEMENT SYSTEM USING CASSANDRA
A NOVEL APPROACH FOR HOTEL MANAGEMENT SYSTEM USING CASSANDRAA NOVEL APPROACH FOR HOTEL MANAGEMENT SYSTEM USING CASSANDRA
A NOVEL APPROACH FOR HOTEL MANAGEMENT SYSTEM USING CASSANDRAijfcstjournal
 
A NOVEL APPROACH FOR HOTEL MANAGEMENT SYSTEM USING CASSANDRA
A NOVEL APPROACH FOR HOTEL MANAGEMENT SYSTEM USING CASSANDRAA NOVEL APPROACH FOR HOTEL MANAGEMENT SYSTEM USING CASSANDRA
A NOVEL APPROACH FOR HOTEL MANAGEMENT SYSTEM USING CASSANDRAijfcstjournal
 
An Overview of Apache Cassandra
An Overview of Apache CassandraAn Overview of Apache Cassandra
An Overview of Apache CassandraDataStax
 
Cassandra - A Basic Introduction Guide
Cassandra - A Basic Introduction GuideCassandra - A Basic Introduction Guide
Cassandra - A Basic Introduction GuideMohammed Fazuluddin
 
Near Real Time Indexing Kafka Messages into Apache Blur: Presented by Dibyend...
Near Real Time Indexing Kafka Messages into Apache Blur: Presented by Dibyend...Near Real Time Indexing Kafka Messages into Apache Blur: Presented by Dibyend...
Near Real Time Indexing Kafka Messages into Apache Blur: Presented by Dibyend...Lucidworks
 
04-Introduction-to-CassandraDB-.pdf
04-Introduction-to-CassandraDB-.pdf04-Introduction-to-CassandraDB-.pdf
04-Introduction-to-CassandraDB-.pdfhothyfa
 
Cassandra - A Distributed Database System
Cassandra - A Distributed Database System Cassandra - A Distributed Database System
Cassandra - A Distributed Database System Md. Shohel Rana
 
Virtual SAN- Deep Dive Into Converged Storage
Virtual SAN- Deep Dive Into Converged StorageVirtual SAN- Deep Dive Into Converged Storage
Virtual SAN- Deep Dive Into Converged StorageDataCore Software
 
Virtual SAN - A Deep Dive into Converged Storage (technical whitepaper)
Virtual SAN - A Deep Dive into Converged Storage (technical whitepaper)Virtual SAN - A Deep Dive into Converged Storage (technical whitepaper)
Virtual SAN - A Deep Dive into Converged Storage (technical whitepaper)DataCore APAC
 
DSM - Comparison of Hbase and Cassandra
DSM - Comparison of Hbase and CassandraDSM - Comparison of Hbase and Cassandra
DSM - Comparison of Hbase and CassandraShrikant Samarth
 
Comparison between mongo db and cassandra using ycsb
Comparison between mongo db and cassandra using ycsbComparison between mongo db and cassandra using ycsb
Comparison between mongo db and cassandra using ycsbsonalighai
 
Cassandra an overview
Cassandra an overviewCassandra an overview
Cassandra an overviewPritamKathar
 
Research and prepare a presentation to demonstrate a scenario in whi.pdf
Research and prepare a presentation to demonstrate a scenario in whi.pdfResearch and prepare a presentation to demonstrate a scenario in whi.pdf
Research and prepare a presentation to demonstrate a scenario in whi.pdfeyevisioncare1
 

Similar to Apache Cassandra.pptx (20)

cybersecurity notes for mca students for learning
cybersecurity notes for mca students for learningcybersecurity notes for mca students for learning
cybersecurity notes for mca students for learning
 
Cassandra architecture
Cassandra architectureCassandra architecture
Cassandra architecture
 
Cassandra advanced part-ll
Cassandra advanced part-llCassandra advanced part-ll
Cassandra advanced part-ll
 
Cassandra & Python - Springfield MO User Group
Cassandra & Python - Springfield MO User GroupCassandra & Python - Springfield MO User Group
Cassandra & Python - Springfield MO User Group
 
A NOVEL APPROACH FOR HOTEL MANAGEMENT SYSTEM USING CASSANDRA
A NOVEL APPROACH FOR HOTEL MANAGEMENT SYSTEM USING CASSANDRAA NOVEL APPROACH FOR HOTEL MANAGEMENT SYSTEM USING CASSANDRA
A NOVEL APPROACH FOR HOTEL MANAGEMENT SYSTEM USING CASSANDRA
 
A NOVEL APPROACH FOR HOTEL MANAGEMENT SYSTEM USING CASSANDRA
A NOVEL APPROACH FOR HOTEL MANAGEMENT SYSTEM USING CASSANDRAA NOVEL APPROACH FOR HOTEL MANAGEMENT SYSTEM USING CASSANDRA
A NOVEL APPROACH FOR HOTEL MANAGEMENT SYSTEM USING CASSANDRA
 
An Overview of Apache Cassandra
An Overview of Apache CassandraAn Overview of Apache Cassandra
An Overview of Apache Cassandra
 
Cassandra - A Basic Introduction Guide
Cassandra - A Basic Introduction GuideCassandra - A Basic Introduction Guide
Cassandra - A Basic Introduction Guide
 
Cassndra (4).pptx
Cassndra (4).pptxCassndra (4).pptx
Cassndra (4).pptx
 
Near Real Time Indexing Kafka Messages into Apache Blur: Presented by Dibyend...
Near Real Time Indexing Kafka Messages into Apache Blur: Presented by Dibyend...Near Real Time Indexing Kafka Messages into Apache Blur: Presented by Dibyend...
Near Real Time Indexing Kafka Messages into Apache Blur: Presented by Dibyend...
 
04-Introduction-to-CassandraDB-.pdf
04-Introduction-to-CassandraDB-.pdf04-Introduction-to-CassandraDB-.pdf
04-Introduction-to-CassandraDB-.pdf
 
Cassandra - A Distributed Database System
Cassandra - A Distributed Database System Cassandra - A Distributed Database System
Cassandra - A Distributed Database System
 
Virtual SAN- Deep Dive Into Converged Storage
Virtual SAN- Deep Dive Into Converged StorageVirtual SAN- Deep Dive Into Converged Storage
Virtual SAN- Deep Dive Into Converged Storage
 
Virtual SAN - A Deep Dive into Converged Storage (technical whitepaper)
Virtual SAN - A Deep Dive into Converged Storage (technical whitepaper)Virtual SAN - A Deep Dive into Converged Storage (technical whitepaper)
Virtual SAN - A Deep Dive into Converged Storage (technical whitepaper)
 
Cassandra Learning
Cassandra LearningCassandra Learning
Cassandra Learning
 
DSM - Comparison of Hbase and Cassandra
DSM - Comparison of Hbase and CassandraDSM - Comparison of Hbase and Cassandra
DSM - Comparison of Hbase and Cassandra
 
Comparison between mongo db and cassandra using ycsb
Comparison between mongo db and cassandra using ycsbComparison between mongo db and cassandra using ycsb
Comparison between mongo db and cassandra using ycsb
 
Cassandra admin
Cassandra adminCassandra admin
Cassandra admin
 
Cassandra an overview
Cassandra an overviewCassandra an overview
Cassandra an overview
 
Research and prepare a presentation to demonstrate a scenario in whi.pdf
Research and prepare a presentation to demonstrate a scenario in whi.pdfResearch and prepare a presentation to demonstrate a scenario in whi.pdf
Research and prepare a presentation to demonstrate a scenario in whi.pdf
 

Recently uploaded

🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘RTylerCroy
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonetsnaman860154
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking MenDelhi Call girls
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherRemote DBA Services
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Drew Madelung
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoffsammart93
 
Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024The Digital Insurer
 
Tech Trends Report 2024 Future Today Institute.pdf
Tech Trends Report 2024 Future Today Institute.pdfTech Trends Report 2024 Future Today Institute.pdf
Tech Trends Report 2024 Future Today Institute.pdfhans926745
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Igalia
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Scriptwesley chun
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)Gabriella Davis
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processorsdebabhi2
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationMichael W. Hawkins
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024The Digital Insurer
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptxHampshireHUG
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationRadu Cotescu
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024The Digital Insurer
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slidevu2urc
 

Recently uploaded (20)

🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 
Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024
 
Tech Trends Report 2024 Future Today Institute.pdf
Tech Trends Report 2024 Future Today Institute.pdfTech Trends Report 2024 Future Today Institute.pdf
Tech Trends Report 2024 Future Today Institute.pdf
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
 

Apache Cassandra.pptx

  • 1. Apache Cassandra What is Apache Cassandra? Apache Cassandra is an open source non relational distributed database that manages large amounts of data across commodity servers. It is column oriented database. It was initially released in July 2008. It comes under Availability and Partition Tolerance.
  • 2. Why Apache Cassandra was implemented? Avinash Lakshman and Prashant Malik initially developed Apache Cassandra at Facebook to power the Facebook inbox search feature.
  • 3. Components of Apache Cassandra • Node: A Cassandra node is a place where data is stored. • Data center: Data center is a collection of related nodes. • Cluster: A cluster is a component which contains one or more data centers. • Commit log: In Cassandra, the commit log is a crash-recovery mechanism. Every write operation is written to the commit log. • Memtable: A memtable is a memory-resident data structure. After commit log, the data will be written to the mem-table. Sometimes, for a single-column family, there will be multiple memtables. • SSTable: It is a disk file to which the data is flushed from the memtable when its contents reach a threshold value. • Bloom filter: These are nothing but quick, nondeterministic, algorithms for testing whether an element is a member of a set. It is a special kind of cache. Bloom filters are accessed after every query.
  • 5. Write Operations: i. Cassandra stores the data in memory structure in memtable(RAM) when the initial write request is generated from the client. Concurrently the writes are written on Commit log(disk)as well which are permanent even if the light goes off for the node. ii. The data from the memtable(RAM) is flushed to the SSTables(Disk) and the partition index is also created that points to the location of data in the disk. The flushing of data from memtable(RAM) to SSTables(Disk) is done using the configurable threshold or when the commit log threshold commitlog_total_space_in_mb is exceeded. iii. The Data is written on the SSTables tables which are immutable which means when the memtable is flushed the data is not overwritten in SSTables despite a new file being created. The partitions are stored on multiple SSTables so that they can be easily searched.
  • 6.
  • 7. Read Operations: i. The Read request will be made from the client. ii. The request data will be checked in the memtable(RAM). If the requested data is present then data will be read from memtable(RAM) and merged with SSTables(DISK) files to send final data to the client. iii. If the row cache is enabled then it will be checked to find the data. iv. Bloom Filters are loaded in the Heap memory that will be checked to find out the SSTables file that can store the requested partition data. Since Bloom Filters works on probabilistic function and can return false positives. In some cases Bloom Filters does not return the SSTable file then Cassandra further checks in the partition key cache. v. Partition Key Cache is used to store the partition index in heap memory and the partition index of data will be searched in that. If the Partition Key is present in the Partition Key Cache then Cassandra will go to compression offset to find the Disk that has the data. If the Partition Key is not present in the Partition Key Cache then the partition summary is searched to find user-requested data.
  • 8. vi. Partition Index is used to store the Partition key of the data that will be used in the Compression offset map to find out the exact location of the Disk which has stored the data. vii. Compression offset map is used to hold the exact location of data. It uses the Partition key to locate that. Once the Compression offset map indicates the location where data is stored the further process is to fetch the data and share it with the user.
  • 9. Features of Apache Cassandra: Distributive Scalability Fault Tolerance Query Language
  • 10. Virtual Nodes: A virtual node is the data storage layer within a server. There are 256 virtual nodes per server by default. Each node has a range of tokens assigned. Every virtual node uses a sub-range of tokens from the node they belong to. These virtual nodes provide greater flexibility in the system. Consequently, It is easier for Cassandra to add new nodes to the cluster when we need them. When our data has unequally distributed tokens between nodes, we can easily extend the storage capacity by extending virtual nodes to the more loaded node.
  • 11.
  • 12. Advantages of Apache Cassandra: Open source Peer to Peer Architecture Scalable High Efficiency Consistency adjustable Schema Less Easy to Learn and Use Distributed and Decentralized Ability to Analyse
  • 13. Disadvantages of Apache Cassandra: It does not support ACID and relational data properties. Because it handles large amounts of data and many requests, transactions slow down, meaning you get latency issues. Data is modelled around queries and not structure, resulting in the same information stored multiple times. Since Cassandra stores vast amounts of data, users may experience JVM memory management issues. It offers no join or subquery support. Cassandra does not support aggregates Cassandra was optimized from the start for fast writes, reading got the short end of the stick, so it tends to be slower.
  • 14. Finally, it was lacks official documentation from Apache, so you need to look for it among third party companies.