Wikimedia's (Cache) Network

Mark Bergsma <mark@nedworks.org>

IRC: mark @ Freenode / IrcNet / Blitzed...

Slides available online:
http://www.nedworks.org/~mark/presentations/hd2006/

Overview

Wikimedia clusters

Wikimedia Sysadmins

Squid caching architecture

Most content is in Tampa, Florida

With the exception of some Asian language wikis, which are served from Seoul, South Korea.

Two-tier setup

Squid caching architecture

How to distribute users?

Geographically distributed clusters are nice, but how to distribute users over them?

Geographical DNS

Observations:

So why not have the Wikimedia DNS servers determine the location of the querying DNS resolvers, and give out answers based on that?

Geographical DNS

PowerDNS and Geobackend

Therefore:

Alternative methods

Some alternative geographical balancing options could be:

A combination of all these options might work.

Object Purging

Old method

HTTP PURGE requests over HTTP TCP connections to all Squids, from all MediaWiki servers

Object Purging

This didn't scale well:

First idea: have MediaWiki contact a single external daemon, which takes care of the rest.

A little better, but mostly the same problems, moved.

Object Purging using Multicast

Object purging is simply a broadcast or multicast message! Let the network take care of it.

  1. MediaWiki sends a single purge message
  2. The network delivers it to all interested hosts (only)

But... Only one-way communication possible

  • No TCP / HTTP!

HTCP

  • Already using HTCP inter-cache protocol...UDP based!
  • The HTCP specification includes a purge method: HTCP CLR
  • ...but HTCP CLR wasn't implemented in Squid
  • A buggy, non working patch was found on the Internet
    Got it working after heavy modification
  • Squid's HTCP code was rather immature, despite being the reference implementation
    Incorrect implementation according to the RFC, memleaks, inefficiencies...

Multicast between clusters

  • Multicast on our internal networks is no problem, but...
  • Multicast routing between our clusters over the Internet is

Solution: convert multicast packet to unicast, send over the Internet, and reconvert to multicast again.
A simple Python script accomplishes this.

The Wikimedia Network

The Wikimedia Network

Florida, currently:
  • 2 VLANs, external and internal
  • Failover setup:
    • Two uplinks to ISP, using BGP
    • HSRP
  • 2 layer 3 switches (Cisco Catalyst 3560G)
  • 4 access switches (Cisco Catalyst 2948G, Netgear)

The Wikimedia Network

Florida, soon:
  • One core switch/router
    • But more resilient
  • Older switches used for "access ports" and emergency backup
  • Multihomed network

Foundry BigIron RX-8

Questions

Questions?

  • Who needs MS Powerpoint?