Wednesday, September 27, 2006

Book Review: Linux(R) Quick Fix Notebook (Bruce Perens Open Source) by Peter Harrison


I'm a big fan of the cookbook approach to tech books. I usually don't have time to read a book to get a broad and general understanding of a topic. I'm usually after what this book promises: a quick fix. I want answers to discrete problems. That's what _Linux(R) Quick Fix Notebook_ delivers. When I did have time to read an entire chapter, I learned a lot. When I flipped to random pages, there was a good chance I learned something. It's full of gold nuggets and neat tricks.

I work in IT, and I often show someone something that, to me, is pretty basic. But it saves them a lot of time. This book addresses those gaps in my own knowledge: the basic stuff I never happened to pick up. Sometimes it's so basic nobody bothers to write about it. Until this book.

Unfortunately, I couldn't hold on to the review copy long enough to finish it. But I'm buying a copy for myself. That should tell you something! Especially when I have about a dozen books on linux and unix system administration already. This approach works for me, and this book implements that approach really well.
Book Review: Tao of Network Security Monitoring


(From my review on Amazon)
This is a great book. With most geek books, I browse and grab what I need. With this one, I even read the apendices!

At first, the author's tone put me off. He spends the introductory chapters talking about the "Way" of Network Security Monitoring, (capitalized) and how it's much better than other approaches. It felt a little like, "My Burping Crane Kung-Fu will defeat your Shining Fist techniques!" I really didn't see much difference between what he was talking about and other approaches. I admit to being much newer to this discipline than the author, and he has an impressive appendix on the intellectual history of intrusion detection (uncapitalized). So it may be that the lessons he advocates have already been internalized; my exposure may have been to a field that has already moved up to his standard. But I have a hard time imagining that intrusion analysts have ever been satisfied with a single approach with no correlation. As I understand what he means by upper-case NSM, it's basically the efficient use of multiple techniques to detect intrusions. I can't see trying to argue the contrary position.

Ah, but then we get to the good stuff. He goes through the major types of indicators and the means of reviewing them. He covers the use of a number of important tools, but doesn't rehash what is better covered elsewhere. For example, he doesn't bother covering Snort, because there are plenty of books on Snort already. If you are reading the book, it's almost a certainty that you are familiar with Snort. Good call to skip over that. Instead, he covers some other tools that might be useful in the same area. He also refers to tons of other books. I made a lengthy wish-list based on his recommendations and they've been good. (He also reviews exhaustively here on Amazon). So this book is like the first stone in an avalanche- it triggers the acquisition of many other books.

The book provided many 'light bulb' moments. For example, he talks about giving up on source-based focus. In a world where a DDoS attack is currently using 23,000 separate bots, we may exhaust our resources tracking low-value drones. So focus on the targets they are after: light-bulb! In spite of my earlier resistance, I was soon going through it as eagerly as I did with the Patrick O'Brian Aubrey/Maturin novels. It's fun to read such clear, authoritative writing.

One quibble - he trashes the SANS intrusion detection course, which I took and thought was terrific. He has taught the class, and considered the course material out of date. Maybe they have updated, but his book didn't contradict anything in the course as I took it 1.5 years ago.
XP SP2 Firewall is a joke!

(14:24:22)
jimmythegeek: HAHAHAHAHA!!! OMFG!
(14:24:46) jimmythegeek: I installed XP sp2 on a workstation. Get a report that a terminal emulator is very slow
(14:25:03) jimmythegeek: takes like 15 minutes to connect to this old minicomputer, but it connects
(14:25:17) jimmythegeek: Fix is to add an exception in Windows Firewall for that port.
(14:25:41) jimmythegeek: WTF? "We don't know how to drop packets, but we can sure slow 'em down! It's called playing to your strengths."
(14:26:22) jimmythegeek: A firewall should stop traffic or permit it. Or rate-limit, if it is fancy. This is...just sad.
(14:27:05) WuTang: :D
(14:27:10) WuTang: that's awesome

Monday, September 25, 2006

Book Review: Visible Ops Handbook, Kevin Behr, Gene Kim, and George Spafford

You know the old saw about, "Sorry for the long letter; I didn't have time to write you a short one." ? This is a short one, but I don't feel at all cheated on page count. It's a Good Thing when a book covers the topic...and stops.

The authors codify the operational approaches that highly proficient IT shops have adopted. I'm hostile to dumb performance metrics, but using some measurements even I can agree are useful, they identify high-performing organizations. Some of the metrics: unplanned downtime, Mean Time Between Failure (MTBF), Mean Time to Repair, % staff time spent on unscheduled|unplanned work, and ratio of servers to administrators supporting them. They noticed a quantum-gap, where the outfits that did well in these areas tended to do well in all of them, and there wasn't really a continuum. Organizations were either high performers or low, not a lot in the middle.

Turns out the high performers all independently adopted similar operational approaches, and there really isn't a middle way. There's a discipline to the discipline, and it starts with "the only acceptable number of unauthorized changes is zero."

High performing IT shops have a culture of change management. They cite a stat indicated that 80% of outages (incidents or time? both?) are self-inflicted. That's obviously the place to look for improvements. And you won't get improvements with just a little change management.

This approach has a lot to offer besides operational efficiency. IT goons have to deal with useless auditing and compliance directives. (Some of it is worthwhile, but it looks like even the worthwhile efforts are not well done in practice.) Having effectively managed controls in place makes for auditable networks.

The Mean Time to Repair is improved by a Culture of Causality. Once you have the change control in place, you can have faith that you know when something changed, and look in those places for the cause. Proficient shops - even those running Windows - reboot 1/10 as often as their less proficient counterparts. It's possible to hit a 90% first-fix rate.

They claim (and I believe) that it's also cheaper to rebuild than repair. Figuring out what went sour is time-consuming and uncertain. Automated rebuild is the way to go.

Interestingly, they claim that the frenzy of patching that many of us go through is not part of the culture at these proficient shops. A patch is a change, and subject to the same build verification any other architecture change would be. Consequently, OS patches get rolled out more or less organically, as part of a whole system. This is a little harder for me to swallow. I have been immersed in the SANS koolade. Depending on the application, I don't think you can wait for some patches. I do grok the "one, few, many" approach used by Tom Limoncelli (http://www.aw-bc.com/catalog/academic/product/0,1144,0201702711,00.html) Pilot a patch, test, expand to a pilot group, test, release to production, cross fingers. If you have a disciplined shop, this approach works. If you are like the rest of us, the balance of risk suggests to me you are better off dealing with the turmoil a patch might cause than live with a worm outbreak.

Another interesting point is that change management is MORE important during a crises. Convene your change approval team but stick to the discipline lest you make things worse.

The authors claim that a transition to the techniques and culture that the high performing organizations have in common has been done in a few months. I don't see it happening that quickly around here, but we could certainly get started. They outline 4 phases:

1) stabilize the patient - set up an "electric fence" so that you can monitor configuration changes and hold staff responsible for unauthorized changes. Confining changes to those approved by a change management team, and only during maintenance windows will have an immediate effect, they claim. But accountability is key - without it, there's an inevitable slip back to the sloppy practices everyone is used to. The fence comes from tools like Tripwire, which can tell you when things change. You can then refer to authorized changes/work orders and see if they match up. If not, some coaching is in order.

2) ID the fragile systems you don't dare touch

3) develop a repeatable build library so you can start moving services off the systems identified in #2

4) continuous improvement

I am not sure how much this applies to my environment - we are pretty stable, but not proficient. Most of the action on our net is at the desktop level, and I think this is aimed more at network operations and the data center than the Help Desk. It has me thinking, though. If we could extend the principles to everybody, what would change? What would it look like?
Rant: arithmetic operations on ordinal numbers

In virtually every discussion of computer|network security and asset protection, people trot out a risk equation on the lines of:

Risk = Threat x Vulnerability x Cost

This seems brain dead to me. Risk is the expected monetary loss from an event. This is a little better:

Risk = (Impact of an Event) * (Probability of an event)

Let's look at these factors. The Impact can have a dollar value associated with it, which can be more or less successfully generated by looking at replacement cost, revenue loss, etc.

The other factor, Probability, is going to be one of two general levels of accuracy. In some cases, you can know the probability of an event is one (that is, certain). You can be certain that an unpatched Windows file server exposed to the internet will be violated, probably within 2 hours.
http://isc.sans.org/survivalhistory.php In all other cases, you are pulling a number of the air. Or out of your ass. I've actually read a web publication that claimed to assess earthquake frequency and felt it could do something with that data in a risk equation. I don't buy it. But on to the rant.

Usually, the Risk Equation is done with qualitative factors, for example, at

http://www.sans.org/reading_room/whitepapers/auditing/1204.php , in section 2.2.4 on page 4

The author describes "Qualitative Risk Defined Mathematically".

Relative Risk = Asset Value x Vulnerability x Threat

To the author's credit, there is no actual attempt at doing math. But I have seen (and, at gunpoint, participated in) security assessments where these factors are assigned numeric values. So for example, a file server might get a 4 on a scale of 1-5. A vulnerability guesstimate would be, oh, 3. (But again, that number is pulled out of the air or wherever. YOU DON'T KNOW how vulnerable an OS is. Is there a Zero-Day attack employed by the bad guys? You either are, or are not vulnerable. I don't know which is the case. And neither do you. The best you could do is a qualitative ranking based on history, which is of unmathematical accuracy when predicting future performance. This ranking could be useful in thinking about what platforms are used for which purposes, but it should revolve around the skill level required to successfully compromise the assset. For example, "This is unpatched - Vulnerability = 5. This is patched, but the OS has a monthly patch cycle so it's almost certain that holes exist which haven't been found by the good guys - Vulnerability = 4. This OS has had one remote root in the default install in 6 years, we'd have to posit an unknown vulnerability in the absence of any history of published exploits - Vulnerability = 1")

Where these things go sour is when you multiply rankings. ( Impact = 5 ) * (Vulnerability = 5) = (Risk = 25) BZZZT!!!!

Ichiro Suzuki had the most hits in Major League Baseball in 2004. Ranking = 1

He was (I'm making this up) the 5ooth tallest guy in the League (MLB players tend to be tall). Ranking = 500.

1 * 500 = Nothing. Nothing real can be generated from multiplying two rankings together.

Rankings are ordinal numbers. You can say that 1 is higher|lower than 5. You can't say that it is 5 times better|worse. (In pro sports, being champ, #1, is INFINITELY better than #2.) You can't say it is 4 better|worse. You can't infer any precise degree at all.

So: don't multiply ordinal (ranking) numbers. Make a matrix, sure. It probably is useful to rely on your subjective evaluation of where an asset fits (this has an impact or value of "9", that's a "3"). Then make a matrix of impact vs. vulnerability or whatever, and remediate accordingly. But DON'T use bogus math to drive decisions. ("This 4 x 4 = 16 is greater than that 5*3 = 15")

Now, without making up fairy tales about infinitely skilled attackers and such, you can generate some actual data for security performance metrics. Richard Bejtlich (who though way smarter than me (and probably you) is guilty of doing math on ordinal stuff in this post: http://taosecurity.blogspot.com/2003/10/dynamic-duo-discuss-digital-risk-ive.html) suggests the way to get real metrics on useful subjects is to do timed pen-testing and the like. Did it take longer for a skilled|unskilled team than last year? In other words, don't measure your team members' shoe sizes, look at the scoreboard! Here's the post: http://taosecurity.blogspot.com/2006/07/control-compliant-vs-field-assessed.html

This is hard, and expensive. But if you want useful metrics, it's what you do.
Review of Designing Large Scale LANS, by Kevin Dooley

Good book! This is what the title implies: a book about designing large networks. It's not primarily an implementation book. It treats its subject rigorously, but without tons of detail at the end points. For example, you won't find cat5e pinouts discussed. You will see a redundant, heirarchical network design. I like a book with real math as , and the author actually provides some for aggregate Meant Time Between Failure (MTBF) calculations. Stats and probability! Cool! He gives less rigorous but useful rules of thumb for capacity planning.

Lot's of advice reflecting his extensive real-world experience. Like the importance of physically redundant trunk links (rather than just two circuits in the same fiber bundle|conduit). My impression was that stuff never failed unless a backhoe severed it, but I was...incorrect. Thanks! I will be working on a plan to get redundant links in place.

I had an intuitive sense that there is a trade-off between redundancy and complexity. Reliability is the goal, and you can add features (primarily redundant circuits and components) to a point where the complexity reduces reliability. Dooley gives a fairly clear impression of where the trade off is profitable.

The VLAN treatment is extensive. Again, I knew that trunking all VLANS on the campus net across all trunks was wasteful; he quantifies it.

Overall, the book stands up well after 4 years. He doesn't spend much more than a sentence or two on wildly obsolete media like 10Base2 (coax). There's the occasional PanAm moment (the shuttle taken to the space station in the movie "2001" is operated by PanAm) like when he refers to Compaq as a manufacturer of network interface cards. I still see issues with 10BaseT and probably you do too, so I don't begrudge him any space on the topic. He was forward thinking enough to mention gigabit ethernet. He refers to Cat6 cable as a future standard. He cautions against using intermediate patch panels, which I was given to understand are o.k. One major building on our campus uses them, at the behest of the wiring designer. Oops. I haven't noticed any problems, but now I know to look.

Wireless is the area where change has been fastest, I think. Probably something to do with inexpensive, commodity hardware (with broken initial specs) leading to faster refresh rates. He mentions (back in 2002, I remind you) the utter brokenness of the WEP encryption standard. But if wireless in detail is your thing, this is not your book.


There isn't much on different types of fiber optic cable. (not in book - this is my own accretion of data) What I know of is: single-mode has 9nm cores, and goes from 10km to 80km depending on the fiber transceivers. Multi-mode is in 50nm (newer(?) better distance|speeds) and 62.5nm (more common) cores. If you reach this page trying to see what the difference is etc., you can actually sub the multimode cable pretty freely. You will lose signal going from 62.5 to 50nm, but the optical power budget may support a connection even with the loss. Every splice and connection costs signal power. Every meter of distance costs signal power. Takeaway
is that SX transceivers (for multimode) don't care which you use, so you might as well install 50nm fiber. Single-mode transceiver vendors HP, Cisco, and Transition Networks use different names to designate stuff for the 10 km vs. 80 km stuff. For Transition, you have to look at the specs for particular units. They make a variety and call them all LX.

Vendor 10km80km
HPLXLH
CiscoLX/LHZX
TransitionNetworks-doesn't follow patternLXLX


The IP routing/subnetting stuff is good.

QoS treatment is good: he shows why you can't just throw bandwidth at a problem to give good video|voice. Variable latency (called "jitter") makes it hard for voice|video apps to buffer, leading to pops and crackle that drive users up a tree. Of the three approaches, he recommends only Guaranteed Delivery will suffice.

Multicast treatment is good. I have never had a handle on that stuff. Now I do.

Some good operational details - in the network monitoring section, he urges us to monitor even quiet backup links. If the backup failed and nobody noticed, they will when the primary dies.

In sum, this book is worth the time to read it. It's a little old, but the stuff that is essential to its topic has not changed. Heck, the age just means you can get it dirt cheap. Check ebay or amazon used.
  • Syslog-ng is the platform. Central log server

http://www.loganalysis.org/

Tina Bird runs a loganalysis list

Swatch sec.pl - simple event correlator lets you watch for and act on combinations of events

splunk - a search tool, the google approach to logs, rather than trimming and wading through

logwatch

logsentry
AFS notes -

At a geek meeting (Seattle Area System Administrator's Guild, formerly Seattle SAGE)
randome recommendation to use heimdahl, not MIT)

Wednesday, September 20, 2006

I went to a meeting of the local Infraguard chapter (http://www.infragard.net/) today.

Couple of interesting things: a presentation by one of the agents that worked on the zotob case. That case resulted in the arrest and conviction of a Morrocan citizen (and the ongoing prosecution of a Turkish citizen). It didn't hurt the investigation that the bots phoned home to a server in a domain named for one of the suspects. (Note to self: don't set up a botnet that uses irc.jimmythegeek.com for command and control. Other note to self: don't use the googlemaps link to my house for a domain name for botnet C&C, either. Other, other note to self: don't set up a botnet at all)

A Cisco guy gave a presentation on "Self-defending" networks, with the usual credibility- augmenting bashing of his own company's marketing department. Overall, I'd say there's a case to be made for multiple layers/levels of defense, all coordinated. The guy cited a competitor's approach (ISS?) that's "all about the math". No layer has to be perfect, if in the aggregate the layers reduce successful exploitation chances to near zero. There was a little magic security spray (http://www.ranum.com/security/computer_security/marketing/index.html) but I suspect the claimed 1,500 programmers/researchers are able to gin up some useful behavioral characteristics to alert on. It would take time I don't have to evaluate whether it actually worked well.

Besides, it's unafforable and annoyingly a-la-carte. Want an IPS? Sure! Just send massive ducats. Want reports out of it? That's extra. Want stats from the router or switch? That's extra. Ick. I don't want to spend another minute of my life managing licenses for tools to manage my real work.