The dark side of OSPF transit capability (2)

Last week I wrote part 1 of this series on how do to nasty things to OSPF's transit areas. This time we'll jump right in with two more examples, so I'd suggest starting with part 1 if you haven't read it.

And remember, all cases in this series assume that transit area capability is not enabled!

Case number 2

This example builds on Case number 1, so it's technically more like case number 1.5. Details.

I thought it would be amusing if I introduced two equal-cost paths to make matters even more chaotic. Basically, I wanted R5 to load-balance towards Vlan100, so I changed the cost on the R5-R4 link to 90 - the result you can see below.

Remember, R4 has to use R3 as its only exit point, so nothing much changes (apart from the overall cost).

R5 has two options, only they're both as desirable now: getting to Vlan100 through R3 or R2 costs the same: 120. It installs both routes in its routing table and the result pretty much looks like this:

R4#ping 100.100.100.100 r 10

Type escape sequence to abort.
Sending 10, 100-byte ICMP Echos to 100.100.100.100, timeout is 2 seconds:
!.!.!.!.!.

Success rate is 50 percent (5/10), round-trip min/avg/max = 60/62/68 ms

We still have a routing loop, but only half of the time. When R4 routes a packet through R5, if R5 decides to forward it to R3, all's well. But there's a 50-50 chance it'll forward the packet back to R4.

Obviously, the same solution applies: just let capability transit do its magic and R4 will choose R2 as its exit towards Area 0, thereby avoiding this "part-time" loop.

Case number 3

Now, I can hear some of you grumbling that the previous examples are far-fetched, why would you put the VL across the area like that, are you even serious about those costs... etc.

This case is a bit different, there's a partitioned backbone area now. But the most important difference here is that I haven't touched the transit area costs. The problem is hidden by an asymmetric cost out in the backbone. Quite harder to notice.

Now let's look what happens after the Virtual-Link is configured (with capability transit disabled). You're probably saying a routing loop, obviously. And you are totally right.

This time R1 was already an ABR, but now, as it has an Area 0 adjacency with R4, it can stop ignoring its LSAs and calculate some paths as internal routes. Therefore, it calculates the Vlan600 path as R1->R2->R4->R6 with a cost of 10+10+100+10.

But R2 and R3, the routers internal to Area 100, have two choices again. They get Type3 LSAs from both R4 and R5 (with a cost of 110 and 20, respectively). So R2, in order to get to Vlan600, chooses the R2->R1->R3->R5->R6 path with a cost of 10+10+10+10+10.

And there's our loop.

For the return path (R6->Vlan100), I'll leave it to you to answer the following questions: is it reachable? how many paths does it have? are they a loop, sub-optimal or what you'd expect?. Let me know in the comments below.

In order to fix this loop, capability transit makes sure that R1 uses R5 as the ABR to reach Vlan600 due to its lower cost.

Another solution that works is to bring up another Virtual-Link between R1 and R5. More VLs, yay!

Wrapping up

I hope you enjoyed these examples - the main thing to take away from this post is that the trouble-maker might not be where you expect it to be. In a bigger routing domain with bad (or non-existent) documentation, it can be very hard to predict and troubleshoot what happens so do your designs properly!.

We'll conclude the series with an example of why summarization is not allowed into transit areas.

And, as always, thanks for reading.

THE DARK SIDE OF OSPF TRANSIT CAPABILITY (2)

Case number 2

Case number 3

Wrapping up

Any comments? Contact me via Mastodon or e-mail.

Share & Subscribe!