March 04, 2003

Not-quite-async HTTP

From the first day of Syndirella development (when it was still called RSSistor), I decided that I would use the .NET asynchronous functions instead of explicit threading. I have quite a lot of experience writing multithreaded programs in the traditional way, using the Win32 API and pthreads, so the async functions looked both more interesting and more of a "right way" to do things. It took me a while to realize that the async functions don't make the usual threading problems magically go away - I still need to do proper locking, and I need to do UI stuff in the UI thread and use BeginInvoke() to pass the control there. The "not enough worker threads" problem was more of a surprise to me, but it was still quite easy to understand and to fix - putting a reasonable limit to the number of feeds updated at the same time is a good thing, in any case.

However, there is still one more problem that is not so easy to work around. If you start Syndirella and click on "Update all feeds", you will probably notice that the UI will get "locked" several times, and not respond to your actions for a couple of seconds. Not a proper behavior for asynchronous updates, right?

Actually, what Syndirella is actually doing when it's locked is resolving DNS host names. Although HttpWebRequest.BeginGetResponse() will download the response asynchronously, the DNS resolution is still done synchronously, and the function will not return until the DNS query is complete.

As it can be seen from the Rotor sources, the class responsible for DNS resolutions is called ServicePoint, and the query is done in its method called GetNextIPAddressInfo(). And while it is simple to do DNS queries asynchronously (Dns.BeginResolve() is readily available), it is not so easy to pass the DNS information to HttpWebRequest. So far, I have thought of only one solution not resorting to manual threading: I can parse the request URLs and replace the host name with a dotted-decimal IP address after the DNS resolution is complete (for example, a request to "home.yole.ru/weblog/index.xml" would be rewritten as "80.70.224.68/weblog/index.xml").

I don't want to implement this right now, because this will involve some risky changes. I'll need to implement a cache so that each host is not resolved on every request, and it would be best to have a limited time to live for each name resolution result (like 6 or 12 hours). But unless a better solution is found, that's what I will do in 0.92.

Hints on better ideas to solve this problem are welcome. :-)

Posted by yole at March 4, 2003 10:31 PM | TrackBack
Comments

I hope I'm not being patronising, but: Be careful to take into account multiple sites running on the same host.

www.groovymother.com and www.laughatlantis.com both resolve to the same IP, but need the Host: HTTP header to be set to return the right files.

rOD.

Posted by: rODbegbie on March 4, 2003 11:01 PM

rOD: Yes, this is a valid concern, which I somehow missed. :-) I hope that it will be possible to put in a correct "Host:" header manually.

Posted by: Dmitry Jemerov on March 5, 2003 12:50 AM

I don't think you need to worry about passing the DNS info to HttpWebRequest; DNS has lots of caching already built into the system.

Once you've done the DNS resolution (asynchronsously), then (assuming the DNS resolution succeeds) use HttpWebRequest as you currently are doing. The DNS info will be cached someplace "close" (if not on the local computer itself, then on the DNS server in use), so that the DNS resolution that is done by HttpWebRequest will be very quick.

This also frees you from manually having to cache DNS info.

Posted by: Dan on March 5, 2003 06:57 PM

Dan,
That's the kind of seriously good idea that makes you whack yourself in the head.

Dmitry, consider resolving all DNS entries asynch, and when that's done, then async all the web feed requests.

I can't test the theory right now. I gathered a few obsure domain names you might use to benchmark lookups to see if the idea is successful, though:

www.blargle.net
www.blargle.co.uk
www.serialpurrs.org
pub70.ezboard.com
www.clankah.net
butter.blogspot.com
gcc.gnu.org
www.brint.com
lists.fresco.org
www.exim.org
pub55.ezboard.com
www.zvon.org
www.biglist.com
ghost-planet.warped.com
www.surfpoint.com
mingoia.tripod.com
web.pitas.com
www.classicgaming.com
www.starletsdatabase.com
varnish89.blogspot.com
www.pgsc.freeservers.com
www.blackbook.org
www.surfpoint.com
benderl.diaryland.com
www.braineater.com
www.50cups.com
elfwood.lysator.liu.se
indiboi.com
www.badgertronics.com
www.twisted.co.nz
sagewire.sage.org
vibigclan.recongamer.com
angband.oook.cz
www.goolak.pwp.blueyonder.co.uk
www.codes-sources.com
www.ebroadcast.com.au
barachan.meirse.nu

As a side note, the power of open source shines here... Changing this to be async in the offending class would be trivial.

Posted by: Jeremy Dunck on March 5, 2003 09:25 PM
Post a comment