Guidelines for providers offering VOIP service over Amplex’s network

February 21st, 2010 by mark

Amplex provides service to customers using a fixed position wireless technology.  The network is capable of providing excellent performance for voice services - provided that the guidelines in this post are followed.  Failure to follow these recommendations will likely result in poor call quality and significant customer dissatisfaction.

#1. Before selling or purchasing voice services please check with Amplex to determine if the existing Internet service is appropriate.   We do not recommend deploying VoIP services to locations served by 900Mhz or 802.11 equipment.  In some cases we may be able to upgrade service to accommodate VoIP services.   Significant costs may be encountered if  additional height is needed for the antenna.  The type of equipment in use can not be determined by looking at our invoices - you must contact us.

#2. Notify Amplex of the IP address(es) of the service providers VOIP gateway.  Amplex runs a ‘Quality of Service’ (QoS) enabled network.   This means that we classify and prioritize traffic flowing over the network.   For traffic entering the network at our external borders (from the Internet) any existing QoS is reset to default priority. Traffic is then classified into several categories.  For the purpose of this discussion RTP (the voice part of a IP call) from KNOWN PROVIDERS is set to high priority (DSCP 46). If you notify Amplex of the VOIP gateway address we will mark traffic as high priority. As of the publication date Buckeye Telesystems is the only provider to have supplied this information.

#3. Notify Amplex of the amount of traffic at the customer location that is needed for the number of call sessions * codec bandwidth.  Based on this information we will enable and set a high priority CIR (committed information rate) for the desired amount of bandwidth. Do not request more bandwidth than is needed as excessive bandwidth reservations will negatively affect other network traffic.

#4). Set the correct DSCP headers in outgoing RTP traffic.   Outgoing traffic to be handled as high priority is determined by the CPE (Amplex’s Customer Premise Equipment) based on the low latency bit of the DSCP header - specifically bit 3 of the 6 bit DSCP header.   We suggest using DSCP code point 46 (or 101110 in binary, some equipment may refer to this as ‘EF’ for Expedited Forwarding).  Note that DSCP is a 6 bit field that is part of an 8 bit IP header.  Equipment that requests a 8 bit value should use 10111000 as a binary value.  Either the customer VIOP gateway or the SIP phones must set DSCP appropriately.

#5)  The customer router MUST clear, at a minimum, the low latency bit of the DSCP field of all non-voice traffic. Failure to accomplish this step will allow other common traffic (SCP, SSH, video) to consume high priority upload bandwidth resulting in poor call quality.   Manipulation of the DSCP values can be handled by many business class routers and firewalls. We have found the Juniper SRX series to be cost effective and capable.

#6.  Check with Amplex to verify that we are seeing traffic flowing in the high priority queue of the radio, in both directions, during a VOIP call.

Following these guidelines should result in a quality VOIP experience. If you are considering either purchasing or selling voice service over Amplex’s network we strongly encourage you to ensure that the VOIP provider and your internal networking team (or consultant) is capable of understanding and following the recommendations in this whitepaper.

Send questions or comments to support@amplex.net

Why does my router keep locking up?

February 15th, 2010 by mark

Alternate title:  Why do you keep telling me I have a virus when my Internet quits working?

Home routers are technically not really routers at all - they are network address translation (NAT) boxes.   So what is NAT and why do I care?

NAT was developed in order to conserve address space.   NAT is used in consumer routers as it conserves address space, is easy to configure, and provides some firewall protection to the computers.

How NAT works is pretty simple.   The router (what we are going to call the NAT box) has an outside and a inside interface.   The ‘outside’ is the side connected to the Internet.  The ‘inside’ is the side connected to the computers in your house.

For demonstration purposes lets have 2 computers that we call “A” and “B” in the house.

When computer A connects to a site on the Internet the router makes an entry in the NAT table that says “computer A is talking to Google”.    Computer B wants to connect to Yahoo.   The router makes an entry in it’s NAT table to remember that B is talking to Yahoo.    So far so good.    When responses come back from Google the router knows that ‘Google’ traffic goes to computer A and that “Yahoo” traffic  goes to computer B.    The router now has 2 entries in it’s NAT table - one for computer A to Google and one for computer B to Yahoo.    Computer A now goes to a different site - this adds another entry in the NAT table.

So - each new connection from a computer to a site on the Internet uses up one slot in the NAT table (actually it uses several as web pages are composed of multiple images, text, advertising, etc.).  Most consumer routers have NAT tables that can hold a few thousand entries.   How does the router decide when to discard the NAT table entries?      If the connection between the computer is terminated cleanly (the TCP protocol has a way to do this) the entries are removed from the NAT table.   Entries that are not cleanly terminated (and some protocols do not have a method to indicate they are done transferring data) are eventually timed out of the table.  Many routers will also start discarding the oldest entries if the NAT table is full or close to full.

So what happens when the NAT table is full?   The router no longer has a place to store information required to process the data coming back from the Internet.   The computer will not be able to establish a connection and the connection will time out.   Since web sites are actually composed of many items when the NAT table is nearly full parts of the page may load while the remainder loads slowly or not at all.    Some routers (that don’t expire entries when the NAT table in nearly full) will appear to lock up at this point and need to be rebooted.   Others will reboot spontaneously or recover if the computers are shut off.

So why would a NAT table be full?   The most common reasons a NAT table is full (or overloaded) is that the computers are trying to talk to too many sites and/or the connections are not being properly terminated (and therefore not being removed from the NAT table).    What kinds of software tries to talk to large numbers of computers on the Internet?   Peer-to-Peer file sharing and Viruses.    Let’s take each one separately.

Peer-to-Peer networks are programs that enable you to share files from your computer with others on the Internet who would like to download them.   This is most commonly used for (illegally, but that’s another matter) downloading music and video files from others.   The Wikipedia page has a good description of how peer-to-peer networks work.     Depending on the configuration of the peer-to-peer software the program may not limit the number of computers it is sharing files with and/or may not limit the amount of bandwidth being used.    All of the programs we have seen have options for limiting the number of concurrent connections and the amount of bandwidth.   We suggest setting those as low as possible if you are having lockup issues.

Viruses:   Pretty much by definition viruses try to propogate themselves by attacking other computers.   Once a computer has been taken over by a virus or other malware it is impossible to say what it is going to do - but they often try establishing so many connections that they quickly overload the NAT table.

So what is Amplex looking at when I call in?    Amplex also uses NAT in our customer premise equipment (CPE).   The NAT table in our equipment is limited to 4096 entries.   When a customer calls in with a connection issue one of the first things we check is to see if the NAT table in the CPE is full.   If it is and the customer says they are not running file sharing we are going to assume it is a virus issue.   If you are running file sharing we are going to suggest turning it off or adjusting it’s settings.

When we tell you we are seeing signs of virus activity it is not that we are looking at your computer or even seeing the specific traffic .  We are seeing the large number of entries in the NAT table of the CPE.

How does an end user figure out which computer is causing the problem?   It can be difficult as viruses do their best to hide themselves.   Easiest is usually to try turning off one computer at a time and see if the problem goes away.    Keep in mind more than one computer may be infected.

But but but.. we don’t want Amplex to do NAT.  I want to have a transparent connection to the Internet!    Ok - no problem, just let us know.   You will need to understand how to set a static IP address on your router.     Please research how to do that before contacting us and we will happily disable NAT on our CPE.

Update on new tower sites

December 8th, 2009 by mark

Seems like projects always take longer than they should.   In any case…

The Gibsonburg site is up and running.   I am not completely happy with the coverage area we are getting from the 2.4Ghz sector at the site but the 5.7Ghz transmitter is working very well.    As soon as we have the funds we will swap the 2.4 for a couple of sectors which should improve coverage in the area.

The Dirlam Road site just east of Bowling Green is up and running - we will be converting many of the 900Mhz customers south of SugarRidge and/or north-east of the Bays Rd tower to the new site over the next couple of weeks.   This will result in a significant performance increase for those customers.

Rising Sun is on the back burner for the winter - I do not expect to have equipment at Rising Sun until spring 2010.

The North Baltimore / Hoytville site is a project for late December or early January - funding and weather may delay this though.

New tower sites coming soon!

October 8th, 2009 by mark

Amplex has received permission from Countyline Co-Op to use the grain silo’s at Gibsonburg and RisingSun.   This will give us much better coverage in the western and central portions of Sandusky and south-eastern Wood county.   We expect to have the Gibsonburg site operational by October 19th and the Rising Sun site by November 15th.

We have purchased the rights to place our equipment on a tower just east of Bowling Green on Dirlam Rd.   This tower should be operational in early November.   This tower will provide improved service to customers south of SR-105 between Bowling Green and Pemberville.

A new water tower is being constructed by the Northwest Water and Sewer District at  Hoytville.   This tower is located at the new rail yard being constructed west of North Baltimore.  This tower should be operational by late November.

The Northwest Water and Sewer District water tower at Weston is being replaced with a newer (and much larger) water tower in the spring of 2010.    Amplex will be moving our equipment from the existing tower in Weston to the new tower in the spring.

A new Northwest Water and Sewer District tower east of Luckey will be operational in the spring of 2010.

POP3 versus IMAP mail

March 1st, 2009 by mark

Methods to check your mail:

Amplex supports several different ways to access your email:

  • POP3 (Post Office Protocol #3)
  • IMAP (Internet Message Application Protocol)
  • Webmail

The major difference between POP3 and IMAP is where the messages are stored.

When retrieving messages with POP3 the default behaviour is to:

  1. Retrieve from the mail server (at your ISP) the number of new messages on the server.
  2. Transfer the messages from the ISP to your computer
  3. Delete the messages from the mail server.

When checking a mailbox using IMAP a completely different thing happens:

  1. Compare the list of messages at the server and the local computer to determine message state (new, read, deleted, replied to, etc.)
  2. Show the current state of the mailbox.  Synchronize the state of the messages on the server and the local computer.

The big difference between the two is that POP3 REMOVES the messages from the server once it has transferred them to your local computer.  That POP3 removes the messages from the server is very important to understanding the difference between the two accounts.  IMAP leaves the messages on the server until they are deleted by you.
Webmail is simply a way of using a web browser to read your mail using IMAP.  Webmail interacts with your mailbox using IMAP.
Nearly all mail client software (Outlook, Outlook Express, Thunderbird, Incredimail, Entourage, Vista Mail, etc.) can be set up to check mail using either POP3 or IMAP but all default to POP3 unless told otherwise.

So why would you want to use POP3 or IMAP?   Which one should you choose?

If you always check your mail from the same computer then POP3 is a good choice.   Since POP3 transfers the mail to your computer you always have a copy of your mail and you can read it when you are not connected to the Internet.  Remember - POP3 will transfer the mail and then delete it from the server.   Once you retrieve your mail using POP3 it is erased from the ISP’s mail server.
If you check your mail from multiple computers then IMAP is a better method.   Since IMAP keeps the mail on the server along with the state of the mail (read, unread, replied to) it makes it much easier to check your mail from multiple computers.   If you have a computer at work and at home both set up to check the same account using IMAP you will see the same messages on both computers.   When you read a message on one computer and then check the other one the message will show up as having been read already.
If you set up two computers to check mail using POP3 then something really confusing happens.   If both computers are set to check mail every 10 minutes (the default) then the first computer to check after a new message arrives retrieves it and deletes it from the server.   Let’s say for example   your ‘home’ computer is checking for messages using POP3 at 5, 15, and 25 minutes after the hour.   Your ‘work’ computer is checking at 0, 10, and 20 minutes after the hour.   When a new message arrives at 2 minutes after the hour it will show up only on the home machine.   A message that arrives at 8 minutes after the hour ends up only on the work machine.   A message arriving at 12 minutes would only show up on the ‘home’ machine.   Very confusing if your at work waiting for a message to arrive.
Things can get very  confusing if you are using both IMAP and POP3 at the same time.  Keep in mind that Webmail is really an IMAP client.  Let’s assume your home computer is set up to use POP3 and you leave it running and it’s checking for new mail every 10 minutes.   If you’re at work and decide to check your mail using webmail you log in and don’t see any messages - because your home computer is retreiving and deleting the messages from the server every 10 minutes.  Or you get lucky and catch the message before your home computer retrieves it - and then you check again 15 minutes later and it’s gone -  because your home computer just retrieved it and deleted it off the server.

So what’s the moral of this story?

Pick a method of checking mail and stick with it - if you use webmail then always use webmail.

If you want to use both webmail and a mail program like Outlook then set it up to check mail using IMAP.

If you want to use POP3 to check your mail then make sure you DO NOT leave it running when you are not using it.

If your messages all suddenly disappear off webmail it’s a safe bet that somewhere you have a computer checking your mail using POP3 and all of your mail was transferred to that computer.

Are there exceptions to the above discussion?

Yes - there are options available in most mail clients to tell POP3 not to delete messages off the server, to delete them after a certain amount of time, or based on other criteria.  These options are available to make POP3 behave more like IMAP - but they are something of a kludge - your probably better off using a protocol like IMAP.

Two other things occasionally happen with mail:

When using POP3 if the connection to the server is interrupted before all of the messages are retrieved the next time you connect  you will get another copy of all the message you already received.   The is because messages are not deleted until after all the messages are transferred.
When using both POP3 and IMAP the POP3 client will occasionally show a message in your mailbox that says “DO NOT DELETE THIS MESSAGE - INTERNAL MESSAGE DATA”.   This message is stored on the mail server and contains information used by IMAP.   Occasionally a POP3 client accidentally retrieves this message.   You can safely delete the message without hurting anything.

Partial Internet outage 11/12/08 4:24pm to 4:45pm

November 12th, 2008 by mark

We noticed a brief loss of connectivity to some destinations on the Internet this afternoon.   The problem occured in a portion of the Verizon network and affected traffic to some popular destinations such as CNN, MySpace, and Facebook.     The problem cleared while we were analyzing the situation and deciding on a course of action.

Numerous network operators are reporting the problem on outage mailing lists.   Verizon has not issued a statement at this time.   The rumor mill is pointing the finger at Level3 claiming bad announcements from Level3 (another very large network).

So how does all this work you ask?  (or the really short intoduction to BGP).

The Internet is not a single entity but rather a collection of independent networks connected together.  The networks connect to each other at gateway routers.   The gateway routers speak a language (actually a protocol) called BGP where they announce to each other what networks (and destinations) are available by sending traffic through the gateway.

Amplex maintains connections to two large networks (Verizon and Cogent) and we recieve information from both telling our router the fastest way to deliver traffic to it’s destination.   Should a network cease to be able to carry traffic to a particular destination (say MySpace) the neighbor router is supposed to ‘withdraw’ it’s offer to carry traffic to that destination.    When that happens, if we still have a route to the requested destination via our other connection, we will send data out the working connection.   Sometimes the route is withdrawn by both providers at the same time - this likely indicates that the destination network itself is no longer online.

In today’s outage Verizon continued to tell our router that the best path to MySpace, CNN, and other sites was to deliver the traffic to Verizon.   Unfortunatly Verizon was not keeping that promise but rather dropping the traffic inside it’s own network.    While that situation is not supposed to happen it does on fairly rare occasions.

Verizon will likely issue a ‘root cause analysis’ regarding the outage at a later date to explain to the routing engineers at other companies how and why this happened and how to prevent it in the future.

How could Amplex work around this problem?

We would shut down the connection to Verizon which then routes all traffic to Cogent.  Unfortunately this is not a decision to be made lightly since shutting down an upstream carrier causes our own announcements to the rest of the Internet to change.   There can be fairly long waits (and disconnections of existing VPN, Video, and other sessions) while the Internet determines the new best path to reach us.

Once we had established that the problem was at Verizon we were preparing to shut down the connection when the problem in Verizon’s network was resolved.

Why is it so hard to make a small router that works properly?

November 8th, 2008 by mark

How Netgear routers manage to blow up the network:

We have a customer that was reporting frequent temporary lockups on his wireless connection.   To diagnose a situation like this we have a variety of standard things that we do:

  • Check the signal strength at the customer premise radio and at the transmitting tower.
  • Check for a high number of re-registrations of the customer radio.
  • Check for errors on the Ethernet interface at the customer site.
  • Verify that the software load on the Canopy radio is current.

Assuming none of the above reveal any problems we use a program called Multiping to ping the customer radio and the customer router.   Multiping sends a ICMP Echo Request to the target computer or router and waitw for the response.  If there is a reply the round trip time is plotted on a graph.  If there is no reply that is marked on the graph as well.

In this case Multiping was showing only an occasional dropped packet (no reply).   This is relatively normal behavior and when kept below 1% it is not an issue unless the drops are sequential.   It is important to note when looking at ICMP reply times that routers (and computers) consider responding to ICMP requests a very low priority - if they respond at all.  The lack of a response, or a high ping time to a router in the network path, does NOT necessarily imply a problem - it’s just another piece of information and must be evaluated along with other troubleshooting steps).

If we can’t find any problem at this point well… hard to say.   The problem could be the customers computer, perhaps the customers routers, maybe the site they are trying to reach, or some other issue outside of our control.   In this case we noticed that the packet loss occurred at the same time for the devices between the Oak Harbor router and the Carroll Water customers.   This pointed to a possible issue at Oak Harbor or with the VLAN we use for the Carroll Water tower.   Last week we tried removing the VLAN from the router at Oak Harbor and moving it’s gateway back to the core router at Lemoyne.      While this initially appeared to have no effect the amount of packet loss on the network radically increased as the network load picked up during the day.  Monitoring the network at the network tap locations did not show any obvious reason for the increased loss.  Due to multiple customer complaints we removed the changes made to Carroll Water midday (something we normally try to avoid during weekdays).

It was very odd that moving the VLAN made things worse - it shouldn’t but it did.   The only possibility left is that the problem is something at Carroll Water or Oak Harbor.    On Wednesday we replaced the router at Oak Harbor - which helped nothing.

On Thursday night around 11:45pm the network monitor indicated problems with much of the network.  Normally when this happens (not that it happens often) it indicates a loop on the network or a broadcast storm.   While troubleshooting something very odd appeared - large quantities of ICMP traffic destined to the customer we have been having a problem with.  The traffic was coming from the public IP address of other customers on the network but carried the payload of the packets from the machine running Multiping.  Even worse - the packets have the ‘broadcast’ flag turned on.

Tracking down the routers the packets are coming from reveals that they are all Netgear routers with static IP addresses assigned.  ARG!   Now it’s obvious what is happening…   A packet destined to the customer gets slightly mangled on the way turning on the broadcast bit.   The Netgear routers fail to detect that the packet checksum doesn’t match (since it’s mangled) and far far worse proceed to create a copy of the packet and send it at the original destination.   All the other Netgear routers on the network hear this broadcast packet and do the same thing.  This is like throwing a ball in a room full of mousetraps - the whole thing blows up.

So now it’s obvious… The reason the customer is having problems isn’t that he is losing connectivity - it’s that he is being buried under bogus traffic from a bunch of buggy Netgear routers.   When we moved the VLAN back to Lemoyne earlier in the week this traffic overload hit the entire network rather than being directed at Carroll Water.

The Solution:

Since we were able to identify all of the customer routers involved we contacted the customers on Friday and had them change the type of connection they use (from Static to NAT).  This prevents the routers from doing what they have been doing.

What a mess…..

Mark

Mail servers were slow today

October 20th, 2008 by mark

Mail processing was slowed today due to a high load on the machine that checks mail for viruses and spam. The problem occured while performing upgrades to the operating system.

How is mail processed?   It’s far more complex than it appears…

There are 3 machines responsible for processing mail - 2 machines (named sylvio and paulie) serve as the front end and are responsible for initially receiving incoming and outgoing mail, making a few preliminary checks to see if the recipient is valid, and storing the mail to disk (a process called queuing).   Once the mail is queued a seperate process sends the mail to a third server (tony) to be checked for spam and viruses and then (presuming no viruses were found) returns it to sylvio or paulie where it is again queued to disk.   A third process then collects the queued mail and performs final delivery to the local mailbox (for local users) or the recipient’s mail server (for non-local users).

Why so complex?   A bunch of good reasons actually…

  • 2 front end machines allow us to work on one machine without disrupting mail processing.
  • Spam filtering and virus checking is a slow and difficult process and requires considerable resources (CPU, Memory).   Separating the storage and processing helps prevent client timeouts.   Many mail clients (i.e. Outlook Express, Outlook, Thunderbird, etc.) will generate error messages if the mail server does not accept mail quickly.
  • Delivering mail from disk (rather than from memory) is safer.   By queuing mail to disk before acknowledging acceptance we do not lose mail in the event of a software or server crash.
  • Mail is often bursty in nature - a few messages a minute to hundreds a minute.   Since it’s possible for the incoming rate to exceed the rate that messages can be checked for spam and viruses the front end servers hold the mail until the scanner can check it.

The servers have had an issue for some time where the servers will lock up when requested to make a ’snapshot’ (backup) of the disk.  The lockup issue is a known problem with the operating system version we have been using.    We are in the process of upgrading the operating system which caused the high load on the server today.

Upcoming Maintenance, AKA: Nuintari Will be Sleepy

July 16th, 2008 by nuintari

The Domain Name System, often referred to simply as DNS is fundamentally important to internet communications. DNS is the system that translates easily remember names, such a www.amplex.net into something a computer can understand. Computers don’t understand English, or even the crazy subset of English that make up most domain names. When you type www.amplex.net into your browser, one of the first things your computer tries to do is figure out what that really is in a form it can understand. It asks one of your ISP’s DNS servers, Amplex has three. One of those DNS servers should get back to you, and inform your computer that www.amplex.net is in fact 64.246.100.105. This makes far more sense to your computer, and it will begin connecting to the server that houses www.amplex.net. This holds true for just about any site you wish to get to. Most DNS servers don’t actually know the answer to most sites you are going to try to find, but they know who to ask. Amplex’s DNS servers know everything about anything at amplex.net, as well as any other websites we happen to host. For anything else, we ask another set of DNS servers. Those servers might know the answer, or more likely, they also know who to ask. After many questions, you will usually get the information you seek. Our DNS servers will then hold onto that information for a little while, in case someone else also needs that information. This process is calling “caching.”

Unfortunately, almost all caching DNS servers have recently been observed to contain a software issue where a malicious third party could interfere with the data in the DNS server’s cache. So, when you try to reach www.whatever.com, your computer might be tricked into going to someplace nasty, with a virus waiting for you. Never fear, this was just a proof of concept, no known exploits exist for this yet, and most DNS software vendors, ours included had patches released this past weekend. Of the three Amplex DNS servers, I have successfully patched two of them without incident, the third one, the busiest of the three, and the one most likely to cause a disruption when I bring it down for a patch, is getting patched tonight, very late tonight. Like I said, we have three DNS servers, and most customers are configured to use the two closest to them for name resolution. But, the most centrally located, and therefore closest to the most people is the one I am working on tonight. basically means, if you are surfing the web tonight, you might notice, as I have to reboot the machine twice to properly apply this patch.

Long story short, expect a brief outage, starting at about 1am tonight, as I work my magic.

New Webserver!

May 11th, 2008 by nuintari

I have just finished what I hope is the last bug tests for the new FreeBSD/Apache web server. This will replace the current web server, which has been very stable and very reliable, and I hope I am not changing that, as it is running the same base software as before. What is different is that my sanity will no longer be negatively effected by administration of this machine. The current web server requires me to log in and make changes to the configuration, by hand, every time someone needs a new site, or a change to a site. This is needlessly wasteful of my time, and slows down site implementation times. New sites take me well over ten minutes to manually enter and bring live. Site visitor tracking has to be manually configured as well, and I have a nasty habit of forgetting to set that up for new sites. There is also the issue that sites on the web server, do not always get entered into the billing system. Data that needs to be entered in two places and kept consistent is never a good idea.

So, the wonderful solution was to build a new machine that is completely tied into our billing system. You want a new site, I enter it into the billing system, which provisions the site for me. I only have to input a sitename, a username, and a password. You get a site that is ready to go, ftp access, and stats tracking, all automatically.

Now I have to start moving customer sites over to the new machine, a process that should about the next eternity. In addition to the new automated provisioning system, a few upgrades were an order while I was at it. Most notable being that PHP version 4 is no longer supported, so the new webserver has version 5 installed. This is almost guaranteed to break several websites, as the differences between versions 4 and 5 are immense. But, security patches for version 4 no longer exist, meaning any new bugs found in version 4, will never be fixed. Kind of a bad thing to leave PHP4 installations around at this time.

If you have a site on the current webserver, and want to migrate it to the new machine yourself. Please let me know. I can give you access to both machines at once, and you would be saving me a load of time.