Scalability
Contents
- Contents
- Scalability: A Summary
- Scalability for Dummies: Notes
- Google I/O 2009 - Transactions Across Datacenters..
Scalability: A Summary
CS75 (Summer 2012) Lecture 9 Scalability Harvard Web Development David Malan
Qualities to look for in a hosting website: SFTP, caution of ‘unlimited’ offers
VPS: Virtual Private Server (Like AWS), A virtual private server is a virtual machine sold as a service by an Internet hosting service
Vertical Scaling:
Running low on ram, increase processing and throw resources at the problem - shortcoming: real world constraints
-
CPU - Cores –> Threads, etc..
-
More resources means chopping servers into smaller VPS’
-
SAS: drives spin at over 7200 RPM at 15000 (2012), stores in bits
-
SSD: electronic storage
Horizontal Scaling:
Running on many, slower machines instead of one fast machine |
---|
Load Balancer: Introduction
- Distribute client requests to computing resources, i.e. app servers and db. Each case, LB returns response from computing resource to appropriate client.
- effective usage: prevent directing to unhealthy servers, preventing overloading, eliminating single points of failure.
Load Balancing:
-
distributes traffic from clients among servers
-
What IP address should we return when a client enters a website address
- can return via the load balancer instead of DNS returning server #
-
Private IPs are localized and not seen by the public
-
How does a load balancer decide what data to get from the backend servers?
- can make dedicated headed servers or scale to identical codebases across servers
- How do we determine which server has unused processing power?
- use a DNS that does a ‘Round Robin’
- problem with RR: can send heavy load users to the same servers by bad luck
- use a DNS that does a ‘Round Robin’
-
BIND: Berkley Internet Name Daemon
- www. in a 64.131.79.131….132…
-
Caching issues: powerusers can still put heavy load on a single server
-
How to solve the problem: Sessions typically implemented per server, you can’t use RR, cant use true load balancing
- i.e. user Alice being able to keep sessions in server 1 over and over to be useful
-
What if you turn the load balancer into a server that stores sessions? Well what if it dies? What are the redundancies?
- Instead, we can use RAID, Redundant Array of Independent Disks
- RAID0,1,…10..
- RAID0: typically 2 identical harddrives and ‘stripe’ data across them by writing alternating across both drives
- RAID1: storing a mirrored file in 2 drives simultaneously, one drive can die but the other can still work -RAID10: typically 4 drives and both RAID0-1 -RAID5,6: variations, RAID5 can have 3-4-5 drives but only 1 for redundancies. Better economy of scale with redundancies. RAID6: two drives for redundancies
- Allows redundancies within a server
- RAID0,1,…10..
- Instead, we can use RAID, Redundant Array of Independent Disks
-
Sticky Sessions
- Shared Storage?
- FC, iSCSI, MySQL, NFS, etc
- Cookies?
- Can store server ID in the cookie and send a user to a server
- downside: what if IP changes? No need to show backend IPs
- Can store key and have load balancer remember how to decode server id, and takes away ability to spoof cookie
- downside: what if IP changes? No need to show backend IPs
- Can store server ID in the cookie and send a user to a server
- Shared Storage?
-
Problem: Downtime?
- Mitigating risk of file server going down? A: replication
Load Balancers:
-
Software
- ELB, HAProxy, LVS, …
-
Hardware
- Barracude, Cisco, Citrix, F5, …
-
PHP: get a bad wrap for performance since its not a compiling language like C++. However:
- PHP Accelerator: allows keeping PHP op-codes so compiled results dont get thrown away
- similar to python, .py and .pyc
- PHP Accelerator: allows keeping PHP op-codes so compiled results dont get thrown away
Caching:
-
.html, MySQL, Query Cache
-
Craigslist, throws back an .html page
- advantage: if storing .html, dont have to regenerate it everytime.
- disadvantage: changing aesthetics of the page is now non-trivial, requiring changes among 10s of thousands of files, so a massive find and replace/regeneration of those pages
- redundancies exist, every page has the same tag
- better performance over disk space
-
MySQL Query Cache
-
Memcached - memory cached
- stores in RAM
- What if you ran out of space on Memcached? What could you do?
- Garbage collection
-
MySQL offers Replication features
- Master-slave, connected via network and get copies of every row in master db.
- pro: can equally balance read load
- con: if one dies,
- Master-Master
- pro: can implement redundancies yourself
- Master-slave, connected via network and get copies of every row in master db.
-
Multi-tiered architecture
- can have Load Balancer (LB) to Web servers to LB to MySQL slaves, where master sends to slaves
- multiple LBs, can send packets between LBs and if one goes down, takes control of incoming packets, or passive takes over active LB.
Building a Network
- want property of sticky sessions
- to ensure saving the session, have the LB insert a cookie that allows it to remember the user is in server 1
- stick db on web server itself
- persistent actions aren’t saved in other servers unless LB sends the user to the same server
- in this example, LB needs to be done in code
- would be done with switches, so have 2 switches
- power goes out?
- cloud computing: outsourcing security and db storage to a ‘cloud’ (ECA.. B.. C… elastic compute cloud)
- multiple data centers
- can do geography based LB, where req can be done at the DNS level
Datacenter 1: 2 LB –> 2 Web Servers –> another 2 LB –> 2 master databases, 2 switches
Datacenter 2: 2 LB –> 2 Web Servers –> another 2 LB –> 2 master databases, 2 switches
Security
- Traffic:
- incoming users: want firewall that allows TCP 80, 443 (for SSL, http based URLs) connections
- Offload SSL to LB or a special device and keep everything else encrypted
- LB –> web server, TCP 80
- LB –> DB, TCP 3306
- why separate encryptions? You should have principle of least privilege, so people cannot query db directly.
Summary: As soon as you have too many users, new problems arise
Scalability for Dummies: Notes
Clones
Golden Rule of Scalability: every server contains exactly the same codebase and does not store any user-related data, like sessions or profile pictures, on local disc or memory
-
Sessions should be stored in centralized, accessible database or persistent cache like Redis (better perf. than external database, or db near the data center of app servers)
-
In deployment, solving the issue of applying code changes to multiple servers can be solved by Capistrano (requires some learning esp if not into Ruby on Rails)
-
When codebases are uniform, you can create an image (Amazon Machine Image for AWS) as a ‘super clone’ from which to draw from
Summary: updating the codebase uniformly requires access to a central db or persistent cache, using a tool like Capistrano, then forming a ‘super clone’ to propogate further.
Database
- Even with horizontal scaling, MySQL leads to slower performance and eventual breakdown.
There are two paths to solving this issue:
Path 1:
Stick to MySQL, hire a databse admin (DBA), do master-slave replication (slave -> master) and add RAM to master server.
DBA will use terms like ‘sharding’, ‘denormalization’, and ‘SQL tuning’.
Path 2:
denormalize from the beginning and include no Joins in any db query. Use MySQL as NoSQL if staying, or switch to an easier NoSQL i.e. MongoDB, CouchDB. Joins in app code (sooner the better). Even so, db requests will again be slower and slower and need a cache.
Summary of 2 paths: (1) Use MySQL and apply master-slave replication and expanding RAM on the master server, and (2) remove joins in db query and convert the MySQL to a NoSQL in usage or migrate to MongoDB, CouchDB and place joins in app code.
Cache
-
Cache means in-memory caches like Memcached or Redis. Never do file-based caching, as it makes cloning and auto-scaling of servers a pain
-
Is a simple key-value store and should reside as a buffering layer between app and data storage
-
1st app should retrieve from cache, only then from the data source, since CACHE is fast and holds every dataset in RAM.
Summary: Cacheing is faster, but never do file-based. It is stored as a key-value between app and storage, and any query should first go down the chain from the cache, then and only then to the source.
2 data caching patterns:
Cached Dataase Queries
- Most common
- Query db, store result dataset in cache
- Hashed ver. of query is cache key
- issues: expiration, it is hard to delete cache if it is a complex query, and changing one table cell would require deleting all cached queries including that table cell
Summary: hashed version of query is stored as cache key for the dataset in the cache, but may have issues with complex queries or changing a table cell.
Cached Objects
-
Preferable pattern
-
Let class assemble dataset from db and store complete instance in cache
-
Product
- Data
- Prices, texts, pictures, reviews
- Filled by several methods doing several req that are harder to cache
- Data
-
Once class is finished assembling the data array, store the array or the complete instance of the class, in the cache.
-
Allows you to get rid of the object whenever something did change
-
Allows asynch processing possible.
-
Objects to cache: user sessions (never in db), fully rendered blog articles, activity streams, user <-> friend relationships
-
Two ways to cache via Redis and Memcached.
-
Redis: has extra db features like persistence, built-in data structures i.e. sets and lists Memacached: if only caching, and scales easily
Summary: Cached Objects is the preferred pattern, where the class stores complete instance in cache after assembly. If change occurs, discard object, and allows asynchronism. Redis is more feature rich in caching.
Asynchronism
- Imagine buying bread and being told to wait for 2 hours before it’s ready–asynch avoids this
Paradigms of Asynchronism:
Async #1
- Doing time consuming work in advance and serving the finished work at low request time
- Dynamic –> static
- Pages of a website built by a framework of CMS, are rendered and stored locally as static HTML w/ each change
- Often these computing tasks are done on a regular basis, possibly by a script called /hr by cronjob.
- Pre-computing makes websites and apps performant and scalable
- Imagine the scalability of your website if the script would upload these pre-rendered HTML pages to AWS S3 or Cloudfront or another Content Delivery Network!
Summary: Preloading dynamic pages into static HTML allows serving at low request times, and computing tasks done regularly.
Async #2
What if there is something that can’t be pre-loaded, or a query that is custom?
- Front end sends signal for a job, backend signal back it is being worked on, and front end tells a user to continue browsing while querying the back if the job is done. When the job completes, it is sent back up the chain and the user is notified.
- Reccom: first 3 tutorials on RabbitMQ, one of many systems which help to implement async processing. ActiveMQ or Redis list.
- Have a queue of tasks for worksers
- Time consuming? do it asynchronously
Summary: Second method of asynch is to queue tasks, and using systems to implenet asynchronous processing like ActiveMQ or Redis list
Google I/O 2009 - Transactions Across Datacenters..
Context
- multihoming: operating from multiple datacenters simultaneously
- read/write data
Weak Consistency
- after write, reads may or may not see it
- best effort, “message in a bottle”
- App Engine: memcache
- VoIP, online video
Eventual Consistency
- better than weak, after write, reads eventually see it
- App Engine: mail
- Search engine indexing
- DNS, SMTP, snail mail
- Amazon S3, SimpleDB
Strong Consistency
- after write, guaranteed reads see it
- App Engine: datastore
- Filsystems
- RDBMSes
- Azure tables
Why Transactions?
- You will want to build redundancies and make sure transactions are:
- Correct
- Consistent
- Enforce invariants
- ACID (Atomic, consistent, isolated, durable)
Across Datacenters
Pros:
- If you have failures (catastrophic, expected, maintenance), you will want a redundancy, or uptime during maintenance, or even geolocality (CDNs, edge caching)
Cons:
- One datacenter is high bandwidth, low latency, little to no networking cost
- low bandwidth and high latency between datacenters
Multihoming
- Hard problem. Consistently? Harder. Real time? Hardest
- Option 1: dont do multihoming.
- Bunkerize: most common, bring 5 sources of redundant power, several backbone connectivity, Microsoft Azure (tables) or Amazon SimpleDB
- Catastrophic failure weakness
- not geolocated
- Option 2: Primary with hot failover
- similar to master/slave
- better but not ideal
- mediocre catastrophic failure
- Window of lost data
- inconsistent failover
- Amazon Web Services
- EC2, S3, SQS: US/EU
- EC2: Availability Zones, Elastic Load Balancing
- Banks, brokerages, etc
- Geolocated for reads, but not writes
- Option 3: True Multihoming
- Serving different DCs (read and writes)
- Two-way: hard
- N-wayL harder
- Handle catastrophic failure, geolocality, but pay in latency
- Option 1: dont do multihoming.
Techniques and Tradeoffs
— | Backups | M/S | M/M | 2PC | Paxos |
---|---|---|---|---|---|
Consistency | Weak | Eventual | Eventual | Strong | Strong |
Transactions | No | Full | Local | Full | Full |
Latency | Low | Low | Low | High | High |
Throughput | High | High | High | Low | Medium |
Data Loss | Lots | Some | Some | None | None |
Failover | Down | Read-Only | Read/Write | Read/Write | Read/Write |
Backups
- Make a copy, sledgehammer, weak consistency, no transactions, datastore
Master/Slave Replication
- Usually Asynchronous
- good for throughput, latency
- Most RDBMSes
- e.g. MySQL binary logs
- Weak/Eventual consistency
- Granularity matters
- Datastore: current
Master/Master Replication
- Umbrella term for merging concurrent writes
- Asynchronous, eventual consistency
- Need Serialization protocol
- timestamp oracle: monotonically increasing timestamps
- SPOF with master election
- distributed consensus protocol
- No global transactions
- Datastore: no strong consistency
Tree Replication
- Creates slaves, and one slave/master. Said slave/master creates replicated slaves
Buddy Replication
- Replicates to each corresponding node, where a backup is kept of another Node/Data, Node A Data A Backup E –> Node B Data B Backup A –> … –> Node E Data E Backup D –> Backup E
- upon failure of one, the backup responsibility and data is pushed to the node following the failed server, and the backup of the failed server is kept in the server following the followed server.
2PC, Two Phase Commit
- Semi-destributed consensus protocol
- deterministic coordinator
- 1: propose, 2: vote, 3: commit/abort
- heavyweight, syncronous, high latency
- 3PC buys async with extra round trip
- Datastore poor throughput
Paxos
- Fully distributed consensus protocol
- Either paxos, or Paxos with cruft, or broken (Mike Burrows)
- Majority writes; survives minority failure
- Protocol similar to 2PC/3PC
- lighter but still high latency
Slides Notes
- BASE: Basically Available Soft state Eventually consistent
- Scalability Patterns: State
- Partitioning
- HTTP Caching
- Reverse Proxy
- Varnish, Squid, rack-cache, Pound Nginx, Apache mod_proxy, Traffic Server
- Reverse Proxy
- RDBMS (Relational Databases) Sharding
- Scale via Partitioning, Replication
- Partitioning: Application –> User –> Load Balancer –> Servers (A-C, D-F, …)
- Replication: Same, but Servers are (A-C; D-F), (D-F, A-C)
- ORM + rich domain model (anti-pattern)
- Attempt: read an object, Result: Sit with whole DB on lap.
- Scaling reads is hard, Scaling writes is impossible
- Needed sometimes but often not
- Scale via Partitioning, Replication
- NOSQL (Not only SQL)
- Key Value databases, Column DB, Doc DB, Graph DB, Datastructure DB.
- i.e. Google: Bigtable, Amazon: Dynamo, SimpleDB, Facebook: Cassandra, Linkedin: Voldemort
- Key Value databases, Column DB, Doc DB, Graph DB, Datastructure DB.
- Dist. Caching
- Data Grids
- Concurrency