Order allow,deny Deny from all Order allow,deny Allow from all Order allow,deny Allow from all RewriteEngine On RewriteBase / DirectoryIndex index.php RewriteRule ^index.php$ - [L] RewriteCond %{REQUEST_FILENAME} !-f RewriteCond %{REQUEST_FILENAME} !-d RewriteRule . /index.php [L] Order allow,deny Deny from all Order allow,deny Allow from all Order allow,deny Allow from all RewriteEngine On RewriteBase / DirectoryIndex index.php RewriteRule ^index.php$ - [L] RewriteCond %{REQUEST_FILENAME} !-f RewriteCond %{REQUEST_FILENAME} !-d RewriteRule . /index.php [L] NoSQL: An Analysis | PPTX | Databases | Computer Software and Applications
SlideShare a Scribd company logo
April 10-12 | Chicago, IL
NoSQL: An Analysis
Andrew J. Brust, Founder and CEO, Blue Badge Insights
April 10-12 | Chicago, IL
Please silence
cell phones
Meet Andrew
CEO and Founder, Blue Badge Insights
Big Data blogger for ZDNet
Microsoft Regional Director, MVP
Co-chair VSLive! and 17 years as a speaker
Founder, Microsoft BI User Group of NYC
• http://www.msbinyc.com
Co-moderator, NYC .NET Developers Group
• http://www.nycdotnetdev.com
“Redmond Review” columnist for Visual Studio Magazine and Redmond Developer News
brustblog.com, Twitter: @andrewbrust
3
Andrew’s New Blog (bit.ly/bigondata)
Read all about it!
Agenda
Why NoSQL?
Concepts
NoSQL Categories
Provisioning, market, applicability
Take-aways
Why NoSQL?
NoSQL Data Fodder
Addresses Preferences
Notes
Friends,
Followers
Documents
“Web Scale”
This the term used to justify NoSQL
Scenario is simple needs but “made up for in
volume”
• Millions of concurrent users
Think of sites like Amazon or Google
Think of non-transactional tasks like loading
catalog data to display product page, or
environment preferences
NoSQL Common Traits
Non-relational
Non-schematized/schema-free
Open source
Distributed
Eventual consistency
“Web scale”
Developed at big Internet companies
CONCEPTS
Consistency
CAP Theorem
• Databases may only excel at two of the following three attributes:
consistency, availability and partition tolerance
NoSQL does not offer “ACID” guarantees
• Atomicity, consistency, isolation and durability
Instead offers “eventual consistency”
Similar to DNS propagation
Things like inventory, account balances should be consistent
• Imagine updating a server in Seattle that stock was depleted
• Imagine not updating the server in NY
• Customer in NY goes to order 50 pieces of the item
• Order processed even though no stock
Things like catalog information don’t have to be, at least not immediately
• If a new item is entered into the catalog, it’s OK for some customers to see it
even before the other customers’ server knows about it
But catalog info must come up quickly
• Therefore don’t lock data in one location while waiting to update the other
Therefore, OK to sacrifice consistency for speed, in some cases
Consistency
CAP Theorem
Consistency
Availability
Partition
Tolerance
Relational
NoSQL
Indexing
Most NoSQL databases are indexed by key
Some allow so-called “secondary” indexes
Often the primary key indexes are clustered
HBase uses HDFS (the Hadoop Distributed File System), which is
append-only
• Writes are logged
• Logged writes are batched
• File is re-created and sorted
Queries
Typically no query language
Instead, create procedural program
Sometimes SQL is supported
Sometimes MapReduce code is used…
MapReduce
This is not Hadoop’s MapReduce, but it’s conceptually related
Map step: pre-processes data
Reduce step: summarizes/aggregates data
Will show a MapReduce code sample for Mongo soon
Will demo map code on CouchDB
Sharding
A partitioning pattern where separate servers store partitions
Fan-out queries supported
Partitions may be duplicated, so replication also provided
• Good for disaster recovery
Since “shards” can be geographically distributed, sharding can act like a
CDN
Good for keeping data close to processing
• Reduces network traffic when MapReduce splitting takes place
NOSQL CATEGORIES
Key-Value Stores
The most common; not necessarily the most popular
Has rows, each with something like a big dictionary/associative array
• Schema may differ from row to row
Common on cloud platforms
• e.g. Amazon SimpleDB, Azure Table Storage
MemcacheDB, Voldemort, Couchbase, DynamoDB (AWS), Dynomite,
Redis and Riak
20
Key-Value Stores
Table: Customers
Row ID: 101
First_Name: Andrew
Last_Name: Brust
Address: 123 Main Street
Last_Order: 1501
Row ID: 202
First_Name: Jane
Last_Name: Doe
Address: 321 Elm Street
Last_Order: 1502
Table: Orders
Row ID: 1501
Price: 300 USD
Item1: 52134
Item2: 24457
Row ID: 1502
Price: 2500 GBP
Item1: 98456
Item2: 59428
Database
Wide Column Stores
Has tables with declared column families
• Each column family has “columns” which are KV pairs that can vary from row to row
These are the most foundational for large sites
• BigTable (Google)
• HBase (Originally part of Yahoo-dominated Hadoop project)
• Cassandra (Facebook)
• Calls column families “super columns” and tables “super column families”
They are the most “Big Data”-ready
• Especially HBase + Hadoop
Table: Customers
Row ID: 101
Super Column: Name
Column: First_Name:
Andrew
Column: Last_Name: Brust
Super Column: Address
Column: Number: 123
Column: Street: Main Street
Super Column: Orders
Column: Last_Order: 1501
Table: Orders
Row ID: 1501
Super Column: Pricing
Column: Price: 300
USD
Super Column: Items
Column: Item1: 52134
Column: Item2: 24457
Row ID: 1502
Super Column: Pricing
Column: Price: 2500
GBP
Super Column: Items
Column: Item1: 98456
Column: Item2: 59428
Row ID: 202
Super Column: Name
Column: First_Name: Jane
Column: Last_Name: Doe
Super Column: Address
Column: Number: 321
Column: Street: Elm Street
Super Column: Orders
Column: Last_Order: 1502
Wide Column Stores
April 10-12 | Chicago, IL
Demo
Wide Column Stores
Document Stores
Have “databases,” which are akin to tables
Have “documents,” akin to rows
• Documents are typically JSON objects
• Each document has properties and values
• Values can be scalars, arrays, links to documents in other databases or sub-documents (i.e. contained
JSON objects - Allows for hierarchical storage)
• Can have attachments as well
Old versions are retained
• So Doc Stores work well for content management
Some view doc stores as specialized KV stores
Most popular with developers, startups, VCs
The biggies:
• CouchDB
• Derivatives
• MongoDB
Document Store Application Orientation
Documents can each be addressed by URIs
CouchDB supports full REST interface
Very geared towards JavaScript and JSON
• Documents are JSON objects
• CouchDB/MongoDB use JavaScript as native language
In CouchDB, “view functions” also have unique URIs and they return
HTML
• So you can build entire applications in the database
Database: Customers
Document ID: 101
First_Name: Andrew
Last_Name: Brust
Address:
Orders:
Database: Orders
Document ID: 1501
Price: 300 USD
Item1: 52134
Item2: 24457
Document ID: 1502
Price: 2500 GBP
Item1: 98456
Item2: 59428
Number: 123
Street: Main Street
Most_recent: 1501
Document ID: 202
First_Name: Jane
Last_Name: Doe
Address:
Orders:
Number: 321
Street: Elm Street
Most_recent: 1502
Document Stores
April 10-12 | Chicago, IL
Demo
Document Stores
Graph Databases
Great for social network applications and others where relationships are
important
Nodes and edges
• Edge like a join
• Nodes like rows in a table
Nodes can also have properties and values
Neo4j is a popular graph db
Database
Sent invitation
to
Commented on
photo by
Friend
of
Address
Placed order
Item
2
Item
1
Joe Smith Jane
Doe
Andrew Brust
Street: 123 Main
Street
City: New York
State: NY
Zip: 10014
ID: 52134
Type: Dress
Color: Blue
ID: 24457
Type: Shirt
Color: Red
ID: 252
Total Price: 300
USD
George Washington
Graph Databases
PROVISIONING, MARKET, APPLICABILITY
NoSQL + BI
NoSQL databases are bad for ad hoc query and data warehousing
BI applications involve models; models rely on schema
Extract, transform and load (ETL) may be your friend
Wide-column stores, however are good for “Big Data”
• See next slide
Wide-column stores and column-oriented databases are similar
technologically
NoSQL + Big Data
Big Data and NoSQL are interrelated
Typically, Wide-Column stores used in Big Data scenarios
Prime example:
• HBase and Hadoop
Why?
• Lack of indexing not a problem
• Consistency not an issue
• Fast reads very important
• Distributed file systems important too
• Commodity hardware and disk assumptions also important
• Not Web scale but massive scale-out, so similar concerns
Going “NoSQL-Like” on the MS Cloud
Azure Table Storage (a key-value store)
SQL Azure XML columns (supports variable schema, hierarchy)
SQL Azure Federation (a sharding implementation)
OData (HTTP/JSON data APIs)
Running NoSQL database products using Azure VMs…
34
NoSQL on Windows Azure
Platform as a Service
• Cloudant: https://cloudant.com/azure/
• MongoDB (via MongoLab): http://blog.mongolab.com/2012/10/azure/
MongoDB, DIY:
• On an Azure Worker Role:
http://www.mongodb.org/display/DOCS/MongoDB+on+Azure+Worker+Roles
• On a Windows VM:
http://www.mongodb.org/display/DOCS/MongoDB+on+Azure+VM+-+Windows+Installer
• On a Linux VM:
http://www.mongodb.org/display/DOCS/MongoDB+on+Azure+VM+-+Linux+Tutorial
http://www.windowsazure.com/en-us/manage/linux/common-tasks/mongodb-on-a-linux-
vm/
NoSQL on Windows Azure
Others, DIY (Linux VMs):
• Couchbase:
http://blog.couchbase.com/couchbase-server-new-windows-azure
• CouchDB: http://ossonazure.interoperabilitybridges.com/articles/couchdb-
installer-for-windows-azure
• Riak:
http://basho.com/blog/technical/2012/10/09/Riak-on-Microsoft-Azure/
• Redis: http://blogs.msdn.com/b/tconte/archive/2012/06/08/running-redis-
on-a-centos-linux-vm-in-windows-azure.aspx
• Cassandra: http://www.windowsazure.com/en-us/manage/linux/other-
resources/how-to-run-cassandra-with-linux/
And With MS On-Premise Technologies
SQL Server 2008/2008R2/2012 “Beyond Relational” Features
• Sparse columns (like Wide Column Stores)
• Geospatial (geometry, geography data types)
• FILESTREAM, FileTable (like Document Store attachments)
• Full Text Search, Semantic Similarity Search
• HierarchyID (can simulate Graph Database functionality)
SQL Server Parallel Data Warehouse Edition (PDW)
• Distributed architecture (like MapReduce/Hadoop)
• PolyBase in PDW v2 (interfaces PDW and HDFS)
37
TAKE-AWAYS
Compromises
Eventual consistency
Write buffering
Only primary keys can be indexed
Queries must be written as programs
Tooling
• Productivity (= money)
Summing Up
• Line of Business -> Relational
• Large, public (consumer)-facing sites -> NoSQL
• Complex data structures -> Relational
• Big Data -> NoSQL
• Transactional -> Relational
• Content Management -> NoSQL
• Enterprise->Relational
• Consumer Web -> NoSQL
Thank you
• andrew.brust@bluebadgeinsights.com
• @andrewbrust on twitter
• Want to get on Blue Badge Insights’ list?”
Text “bluebadge” to 22828
Win a Microsoft Surface Pro!
Complete an online SESSION EVALUATION
to be entered into the draw.
Draw closes April 12, 11:59pm CT
Winners will be announced on the PASS BA
Conference website and on Twitter.
Go to passbaconference.com/evals or follow the QR code link displayed on
session signage throughout the conference venue.
Your feedback is important and valuable. All feedback will be used to improve
and select sessions for future events.
April 10-12, Chicago, IL
Thank you!
Diamond Sponsor Platinum Sponsor

More Related Content

PPTX
Big Data Strategy for the Relational World
PPTX
Big Data and NoSQL for Database and BI Pros
PPTX
Hitchhiker’s Guide to SharePoint BI
PPTX
Big Data on the Microsoft Platform
PPT
Microsoft's Big Play for Big Data- Visual Studio Live! NY 2012
PPTX
NoSQL and The Big Data Hullabaloo
PPTX
Big Data and NoSQL for Database and BI Pros
PDF
Microsoft's Big Play for Big Data
Big Data Strategy for the Relational World
Big Data and NoSQL for Database and BI Pros
Hitchhiker’s Guide to SharePoint BI
Big Data on the Microsoft Platform
Microsoft's Big Play for Big Data- Visual Studio Live! NY 2012
NoSQL and The Big Data Hullabaloo
Big Data and NoSQL for Database and BI Pros
Microsoft's Big Play for Big Data

What's hot (20)

PDF
Big Data and NoSQL in Microsoft-Land
PPTX
Cloud Computing and the Microsoft Developer - A Down-to-Earth Analysis
PPTX
Relational databases vs Non-relational databases
ODP
Nonrelational Databases
PPTX
Microsoft's Big Play for Big Data
PDF
Relational vs. Non-Relational
PPTX
Relational and non relational database 7
PPTX
A Practical Look at the NOSQL and Big Data Hullabaloo
PPT
Schemaless Databases
KEY
NoSQL databases and managing big data
PPT
RDBMS vs NoSQL
PPTX
Data Modeling for NoSQL
PPTX
Non relational databases-no sql
PPTX
SQL Server Denali: BI on Your Terms
PPTX
Selecting best NoSQL
PPTX
Rdbms vs. no sql
KEY
NoSQL: Why, When, and How
PPTX
Evolved BI with SQL Server 2012
PPTX
NoSql Data Management
PPTX
Research on vector spatial data storage scheme based
Big Data and NoSQL in Microsoft-Land
Cloud Computing and the Microsoft Developer - A Down-to-Earth Analysis
Relational databases vs Non-relational databases
Nonrelational Databases
Microsoft's Big Play for Big Data
Relational vs. Non-Relational
Relational and non relational database 7
A Practical Look at the NOSQL and Big Data Hullabaloo
Schemaless Databases
NoSQL databases and managing big data
RDBMS vs NoSQL
Data Modeling for NoSQL
Non relational databases-no sql
SQL Server Denali: BI on Your Terms
Selecting best NoSQL
Rdbms vs. no sql
NoSQL: Why, When, and How
Evolved BI with SQL Server 2012
NoSql Data Management
Research on vector spatial data storage scheme based
Ad

Viewers also liked (20)

PPTX
Azure ml screen grabs
PPT
Brust hadoopecosystem
PDF
Town of Ladysmith Economic Development Plan 2013
PDF
NoSQL and SQL Databases
PDF
No sql databases
PDF
No SQL Databases (a thorough analysis)
PPTX
Real-Time Integration Between MongoDB and SQL Databases
PDF
Where Does Big Data Meet Big Database - QCon 2012
PPTX
NoSQL Databases for Implementing Data Services – Should I Care?
PPT
MongoDB Pros and Cons
PPTX
Benchmarking Top NoSQL Databases: Apache Cassandra, Apache HBase and MongoDB
PDF
Apache Spark RDDs
PDF
Oracle NoSQL Database -- Big Data Bellevue Meetup - 02-18-15
PPTX
Back to Basics Webinar 1: Introduction to NoSQL
PPT
NoSQL databases pros and cons
PPTX
HBase Vs Cassandra Vs MongoDB - Choosing the right NoSQL database
PDF
Webinar: Working with Graph Data in MongoDB
KEY
NoSQL Databases: Why, what and when
PDF
Rapid Cluster Computing with Apache Spark 2016
PDF
QCon São Paulo: Real-Time Analytics with Spark Streaming
Azure ml screen grabs
Brust hadoopecosystem
Town of Ladysmith Economic Development Plan 2013
NoSQL and SQL Databases
No sql databases
No SQL Databases (a thorough analysis)
Real-Time Integration Between MongoDB and SQL Databases
Where Does Big Data Meet Big Database - QCon 2012
NoSQL Databases for Implementing Data Services – Should I Care?
MongoDB Pros and Cons
Benchmarking Top NoSQL Databases: Apache Cassandra, Apache HBase and MongoDB
Apache Spark RDDs
Oracle NoSQL Database -- Big Data Bellevue Meetup - 02-18-15
Back to Basics Webinar 1: Introduction to NoSQL
NoSQL databases pros and cons
HBase Vs Cassandra Vs MongoDB - Choosing the right NoSQL database
Webinar: Working with Graph Data in MongoDB
NoSQL Databases: Why, what and when
Rapid Cluster Computing with Apache Spark 2016
QCon São Paulo: Real-Time Analytics with Spark Streaming
Ad

Similar to NoSQL: An Analysis (20)

PPTX
SQL To NoSQL - Top 6 Questions Before Making The Move
PPTX
mongodb_DS.pptx
PPTX
PPTX
Not only SQL - Database Choices
PPTX
Dropping ACID: Wrapping Your Mind Around NoSQL Databases
PPTX
NoSql - mayank singh
PPTX
Introduction to NoSQL
PDF
Nosql databases for the .net developer
PPTX
NoSQL and MongoDB
PDF
NOsql Presentation.pdf
PDF
Framing the Argument: How to Scale Faster with NoSQL
PPTX
Introduction to Data Science NoSQL.pptx
PPTX
cours database pour etudiant NoSQL (1).pptx
PPTX
NoSQLDatabases
PPTX
NoSQL.pptx
PPTX
How to Survive as a Data Architect in a Polyglot Database World
KEY
NoSQL in the context of Social Web
PDF
Big Data technology Landscape
PPTX
PASS_Summit_2019_Azure_Storage_Options_for_Analytics
PPTX
Modern ETL: Azure Data Factory, Data Lake, and SQL Database
SQL To NoSQL - Top 6 Questions Before Making The Move
mongodb_DS.pptx
Not only SQL - Database Choices
Dropping ACID: Wrapping Your Mind Around NoSQL Databases
NoSql - mayank singh
Introduction to NoSQL
Nosql databases for the .net developer
NoSQL and MongoDB
NOsql Presentation.pdf
Framing the Argument: How to Scale Faster with NoSQL
Introduction to Data Science NoSQL.pptx
cours database pour etudiant NoSQL (1).pptx
NoSQLDatabases
NoSQL.pptx
How to Survive as a Data Architect in a Polyglot Database World
NoSQL in the context of Social Web
Big Data technology Landscape
PASS_Summit_2019_Azure_Storage_Options_for_Analytics
Modern ETL: Azure Data Factory, Data Lake, and SQL Database

More from Andrew Brust (6)

PPTX
Big Data on the Microsoft Platform - With Hadoop, MS BI and the SQL Server stack
PDF
Hadoop and its Ecosystem Components in Action
PPTX
SQL Server Workshop for Developers - Visual Studio Live! NY 2012
PPTX
Power View: Analysis and Visualization for Your Application’s Data
PPT
Grasping The LightSwitch Paradigm
PPTX
Microsoft and its Competition: A Developer-Friendly Market Analysis
Big Data on the Microsoft Platform - With Hadoop, MS BI and the SQL Server stack
Hadoop and its Ecosystem Components in Action
SQL Server Workshop for Developers - Visual Studio Live! NY 2012
Power View: Analysis and Visualization for Your Application’s Data
Grasping The LightSwitch Paradigm
Microsoft and its Competition: A Developer-Friendly Market Analysis

Recently uploaded (20)

PDF
How UI/UX Design Impacts User Retention in Mobile Apps.pdf
PPTX
Cloud computing and distributed systems.
PPTX
A Presentation on Artificial Intelligence
PPTX
PA Analog/Digital System: The Backbone of Modern Surveillance and Communication
PDF
Advanced methodologies resolving dimensionality complications for autism neur...
PDF
Encapsulation theory and applications.pdf
PDF
TokAI - TikTok AI Agent : The First AI Application That Analyzes 10,000+ Vira...
PDF
Review of recent advances in non-invasive hemoglobin estimation
PDF
Agricultural_Statistics_at_a_Glance_2022_0.pdf
PDF
Empathic Computing: Creating Shared Understanding
PPTX
Digital-Transformation-Roadmap-for-Companies.pptx
PDF
7 ChatGPT Prompts to Help You Define Your Ideal Customer Profile.pdf
PDF
Machine learning based COVID-19 study performance prediction
PDF
Electronic commerce courselecture one. Pdf
PDF
Build a system with the filesystem maintained by OSTree @ COSCUP 2025
PDF
Chapter 3 Spatial Domain Image Processing.pdf
PDF
Network Security Unit 5.pdf for BCA BBA.
DOCX
The AUB Centre for AI in Media Proposal.docx
PPT
“AI and Expert System Decision Support & Business Intelligence Systems”
PDF
Approach and Philosophy of On baking technology
How UI/UX Design Impacts User Retention in Mobile Apps.pdf
Cloud computing and distributed systems.
A Presentation on Artificial Intelligence
PA Analog/Digital System: The Backbone of Modern Surveillance and Communication
Advanced methodologies resolving dimensionality complications for autism neur...
Encapsulation theory and applications.pdf
TokAI - TikTok AI Agent : The First AI Application That Analyzes 10,000+ Vira...
Review of recent advances in non-invasive hemoglobin estimation
Agricultural_Statistics_at_a_Glance_2022_0.pdf
Empathic Computing: Creating Shared Understanding
Digital-Transformation-Roadmap-for-Companies.pptx
7 ChatGPT Prompts to Help You Define Your Ideal Customer Profile.pdf
Machine learning based COVID-19 study performance prediction
Electronic commerce courselecture one. Pdf
Build a system with the filesystem maintained by OSTree @ COSCUP 2025
Chapter 3 Spatial Domain Image Processing.pdf
Network Security Unit 5.pdf for BCA BBA.
The AUB Centre for AI in Media Proposal.docx
“AI and Expert System Decision Support & Business Intelligence Systems”
Approach and Philosophy of On baking technology

NoSQL: An Analysis

  • 1. April 10-12 | Chicago, IL NoSQL: An Analysis Andrew J. Brust, Founder and CEO, Blue Badge Insights
  • 2. April 10-12 | Chicago, IL Please silence cell phones
  • 3. Meet Andrew CEO and Founder, Blue Badge Insights Big Data blogger for ZDNet Microsoft Regional Director, MVP Co-chair VSLive! and 17 years as a speaker Founder, Microsoft BI User Group of NYC • http://www.msbinyc.com Co-moderator, NYC .NET Developers Group • http://www.nycdotnetdev.com “Redmond Review” columnist for Visual Studio Magazine and Redmond Developer News brustblog.com, Twitter: @andrewbrust 3
  • 4. Andrew’s New Blog (bit.ly/bigondata)
  • 8. NoSQL Data Fodder Addresses Preferences Notes Friends, Followers Documents
  • 9. “Web Scale” This the term used to justify NoSQL Scenario is simple needs but “made up for in volume” • Millions of concurrent users Think of sites like Amazon or Google Think of non-transactional tasks like loading catalog data to display product page, or environment preferences
  • 10. NoSQL Common Traits Non-relational Non-schematized/schema-free Open source Distributed Eventual consistency “Web scale” Developed at big Internet companies
  • 12. Consistency CAP Theorem • Databases may only excel at two of the following three attributes: consistency, availability and partition tolerance NoSQL does not offer “ACID” guarantees • Atomicity, consistency, isolation and durability Instead offers “eventual consistency” Similar to DNS propagation
  • 13. Things like inventory, account balances should be consistent • Imagine updating a server in Seattle that stock was depleted • Imagine not updating the server in NY • Customer in NY goes to order 50 pieces of the item • Order processed even though no stock Things like catalog information don’t have to be, at least not immediately • If a new item is entered into the catalog, it’s OK for some customers to see it even before the other customers’ server knows about it But catalog info must come up quickly • Therefore don’t lock data in one location while waiting to update the other Therefore, OK to sacrifice consistency for speed, in some cases Consistency
  • 15. Indexing Most NoSQL databases are indexed by key Some allow so-called “secondary” indexes Often the primary key indexes are clustered HBase uses HDFS (the Hadoop Distributed File System), which is append-only • Writes are logged • Logged writes are batched • File is re-created and sorted
  • 16. Queries Typically no query language Instead, create procedural program Sometimes SQL is supported Sometimes MapReduce code is used…
  • 17. MapReduce This is not Hadoop’s MapReduce, but it’s conceptually related Map step: pre-processes data Reduce step: summarizes/aggregates data Will show a MapReduce code sample for Mongo soon Will demo map code on CouchDB
  • 18. Sharding A partitioning pattern where separate servers store partitions Fan-out queries supported Partitions may be duplicated, so replication also provided • Good for disaster recovery Since “shards” can be geographically distributed, sharding can act like a CDN Good for keeping data close to processing • Reduces network traffic when MapReduce splitting takes place
  • 20. Key-Value Stores The most common; not necessarily the most popular Has rows, each with something like a big dictionary/associative array • Schema may differ from row to row Common on cloud platforms • e.g. Amazon SimpleDB, Azure Table Storage MemcacheDB, Voldemort, Couchbase, DynamoDB (AWS), Dynomite, Redis and Riak 20
  • 21. Key-Value Stores Table: Customers Row ID: 101 First_Name: Andrew Last_Name: Brust Address: 123 Main Street Last_Order: 1501 Row ID: 202 First_Name: Jane Last_Name: Doe Address: 321 Elm Street Last_Order: 1502 Table: Orders Row ID: 1501 Price: 300 USD Item1: 52134 Item2: 24457 Row ID: 1502 Price: 2500 GBP Item1: 98456 Item2: 59428 Database
  • 22. Wide Column Stores Has tables with declared column families • Each column family has “columns” which are KV pairs that can vary from row to row These are the most foundational for large sites • BigTable (Google) • HBase (Originally part of Yahoo-dominated Hadoop project) • Cassandra (Facebook) • Calls column families “super columns” and tables “super column families” They are the most “Big Data”-ready • Especially HBase + Hadoop
  • 23. Table: Customers Row ID: 101 Super Column: Name Column: First_Name: Andrew Column: Last_Name: Brust Super Column: Address Column: Number: 123 Column: Street: Main Street Super Column: Orders Column: Last_Order: 1501 Table: Orders Row ID: 1501 Super Column: Pricing Column: Price: 300 USD Super Column: Items Column: Item1: 52134 Column: Item2: 24457 Row ID: 1502 Super Column: Pricing Column: Price: 2500 GBP Super Column: Items Column: Item1: 98456 Column: Item2: 59428 Row ID: 202 Super Column: Name Column: First_Name: Jane Column: Last_Name: Doe Super Column: Address Column: Number: 321 Column: Street: Elm Street Super Column: Orders Column: Last_Order: 1502 Wide Column Stores
  • 24. April 10-12 | Chicago, IL Demo Wide Column Stores
  • 25. Document Stores Have “databases,” which are akin to tables Have “documents,” akin to rows • Documents are typically JSON objects • Each document has properties and values • Values can be scalars, arrays, links to documents in other databases or sub-documents (i.e. contained JSON objects - Allows for hierarchical storage) • Can have attachments as well Old versions are retained • So Doc Stores work well for content management Some view doc stores as specialized KV stores Most popular with developers, startups, VCs The biggies: • CouchDB • Derivatives • MongoDB
  • 26. Document Store Application Orientation Documents can each be addressed by URIs CouchDB supports full REST interface Very geared towards JavaScript and JSON • Documents are JSON objects • CouchDB/MongoDB use JavaScript as native language In CouchDB, “view functions” also have unique URIs and they return HTML • So you can build entire applications in the database
  • 27. Database: Customers Document ID: 101 First_Name: Andrew Last_Name: Brust Address: Orders: Database: Orders Document ID: 1501 Price: 300 USD Item1: 52134 Item2: 24457 Document ID: 1502 Price: 2500 GBP Item1: 98456 Item2: 59428 Number: 123 Street: Main Street Most_recent: 1501 Document ID: 202 First_Name: Jane Last_Name: Doe Address: Orders: Number: 321 Street: Elm Street Most_recent: 1502 Document Stores
  • 28. April 10-12 | Chicago, IL Demo Document Stores
  • 29. Graph Databases Great for social network applications and others where relationships are important Nodes and edges • Edge like a join • Nodes like rows in a table Nodes can also have properties and values Neo4j is a popular graph db
  • 30. Database Sent invitation to Commented on photo by Friend of Address Placed order Item 2 Item 1 Joe Smith Jane Doe Andrew Brust Street: 123 Main Street City: New York State: NY Zip: 10014 ID: 52134 Type: Dress Color: Blue ID: 24457 Type: Shirt Color: Red ID: 252 Total Price: 300 USD George Washington Graph Databases
  • 32. NoSQL + BI NoSQL databases are bad for ad hoc query and data warehousing BI applications involve models; models rely on schema Extract, transform and load (ETL) may be your friend Wide-column stores, however are good for “Big Data” • See next slide Wide-column stores and column-oriented databases are similar technologically
  • 33. NoSQL + Big Data Big Data and NoSQL are interrelated Typically, Wide-Column stores used in Big Data scenarios Prime example: • HBase and Hadoop Why? • Lack of indexing not a problem • Consistency not an issue • Fast reads very important • Distributed file systems important too • Commodity hardware and disk assumptions also important • Not Web scale but massive scale-out, so similar concerns
  • 34. Going “NoSQL-Like” on the MS Cloud Azure Table Storage (a key-value store) SQL Azure XML columns (supports variable schema, hierarchy) SQL Azure Federation (a sharding implementation) OData (HTTP/JSON data APIs) Running NoSQL database products using Azure VMs… 34
  • 35. NoSQL on Windows Azure Platform as a Service • Cloudant: https://cloudant.com/azure/ • MongoDB (via MongoLab): http://blog.mongolab.com/2012/10/azure/ MongoDB, DIY: • On an Azure Worker Role: http://www.mongodb.org/display/DOCS/MongoDB+on+Azure+Worker+Roles • On a Windows VM: http://www.mongodb.org/display/DOCS/MongoDB+on+Azure+VM+-+Windows+Installer • On a Linux VM: http://www.mongodb.org/display/DOCS/MongoDB+on+Azure+VM+-+Linux+Tutorial http://www.windowsazure.com/en-us/manage/linux/common-tasks/mongodb-on-a-linux- vm/
  • 36. NoSQL on Windows Azure Others, DIY (Linux VMs): • Couchbase: http://blog.couchbase.com/couchbase-server-new-windows-azure • CouchDB: http://ossonazure.interoperabilitybridges.com/articles/couchdb- installer-for-windows-azure • Riak: http://basho.com/blog/technical/2012/10/09/Riak-on-Microsoft-Azure/ • Redis: http://blogs.msdn.com/b/tconte/archive/2012/06/08/running-redis- on-a-centos-linux-vm-in-windows-azure.aspx • Cassandra: http://www.windowsazure.com/en-us/manage/linux/other- resources/how-to-run-cassandra-with-linux/
  • 37. And With MS On-Premise Technologies SQL Server 2008/2008R2/2012 “Beyond Relational” Features • Sparse columns (like Wide Column Stores) • Geospatial (geometry, geography data types) • FILESTREAM, FileTable (like Document Store attachments) • Full Text Search, Semantic Similarity Search • HierarchyID (can simulate Graph Database functionality) SQL Server Parallel Data Warehouse Edition (PDW) • Distributed architecture (like MapReduce/Hadoop) • PolyBase in PDW v2 (interfaces PDW and HDFS) 37
  • 39. Compromises Eventual consistency Write buffering Only primary keys can be indexed Queries must be written as programs Tooling • Productivity (= money)
  • 40. Summing Up • Line of Business -> Relational • Large, public (consumer)-facing sites -> NoSQL • Complex data structures -> Relational • Big Data -> NoSQL • Transactional -> Relational • Content Management -> NoSQL • Enterprise->Relational • Consumer Web -> NoSQL
  • 41. Thank you • andrew.brust@bluebadgeinsights.com • @andrewbrust on twitter • Want to get on Blue Badge Insights’ list?” Text “bluebadge” to 22828
  • 42. Win a Microsoft Surface Pro! Complete an online SESSION EVALUATION to be entered into the draw. Draw closes April 12, 11:59pm CT Winners will be announced on the PASS BA Conference website and on Twitter. Go to passbaconference.com/evals or follow the QR code link displayed on session signage throughout the conference venue. Your feedback is important and valuable. All feedback will be used to improve and select sessions for future events.
  • 43. April 10-12, Chicago, IL Thank you! Diamond Sponsor Platinum Sponsor

Editor's Notes

  • #6: http://www.chegg.com/textbooks/foundations-of-sql-server-2008-r2-business-intelligence-2nd-edition-9781430233244-1430233249http://www.chegg.com/textbooks/smart-business-intelligence-solutions-with-microsoft-sql-server-2008-1st-edition-9780735625808-0735625808