Problem with the Crawlers

-

We've been upgrading our infrastructure this weekend. Everything went according to the plan except our crawlers are very slow right now. The are the plenty of the pages that has not been checked for as long as three days.

We are doing our best to isolate and fix the problem, but the is no ETA right now. We hope everything will be back to normal in a couple days. We will keep you updated.

Update (March 16): Our crawlers hum with even greater speed then before. We expect to clear all backlog in a couple hours and things will be back to normal again.

RSS Feed for National Press Club Upcoming Library Events

-

National Press Club posted the instruction how to obtain RSS feed of upcoming library events:

The official club calendar doesn’t have RSS feeds yet, but there is still a way to get a RSS feed of all the upcoming library events. You can enter the library events page URL, http://www.press.org/library/events.cfm, into the Page URL box on Page2RSS, http://page2rss.com/. After the resulting link is plugged into your feed reader you will receive a notice every time something is added to the library events page.

There is a simple way to announce there is an RSS feed on the page. As a webmaster you can include direct link to Page2RSS feed. Format is very straight forward http://page2rss.com/rss?url=http://yourwebsite.com/mypage.html. Substitute http://yourwebsite.com/mypage.html with address of your page and that's it.

This how direct link to National Press Club library events will look like http://page2rss.com/rss?url=www.press.org/library/events.cfm. Note that you can safely omit http:// part of URL.

Adding RSS Autodiscovery will make all recent browsers show nice RSS Icon in the address bar. That will make it easier for people to subscribe to your RSS feed. To enable autodiscovery you have to include on line of code inside the head tag of your HTML document:

<link
rel="alternate"
type="application/rss+xml"
title="The Eric Friedheim Library: Events and Classes - Page2RSS"
href="http://page2rss.com/rss?url=www.press.org/library/events.cfm">

Page2RSS Downtime

-

Perharps everybody noticed several hours of downtime of Page2RSS on May 15. Here is the story.

That morning we found that our domain does not exists anymore. Our registrar, GoDaddy, has suspended the domain and send an email requiring to remove specific content. The problem is that they did it without any notification. No phone calls. Yes, sometimes they call us pitching some additional service. No, they didn't call this time.

Quick investigation revealed that somebody used Page2RSS to obtain feed of warez site featuring full text of: Half-Life 2 Mods For Dummies (ISBN: 0470096314), Windows Vista For Dummies (ISBN: 0471754218). Investigators hired by John Wiley & Sons, Inc, copyright owner, didn't contact us, but contacted our domain registrar GoDaddy instead. They sent a letter with the subject "Demand for Immediate Take-Down: Notice of Infringing Activity" at 8:09, Go Daddy Software's owned IP was here at 8:12. Very shortly after that, the domain page2rss.com disappeared from the internet.

It took 30 seconds to remove the content and write an email to CopyrightClaims@GoDaddy.com. Several hours to wait until they read the email and turn the domain back on. When we have to fix the dns records GoDaddy broke and wait until all changes propagated on the internet.

Thank you, GoDaddy for teaching us such a great experience - we didn't know that it was a bad idea to deal with GoDaddy.

Thank you, Ms. Internet Investigator from far overseas, for your hard work on keeping US Internet free of links to illegal copies of Windows Vista For Dummies. Keep it up! Next time try to contact ICANN, maybe they will shutdown the entire .COM.Thank you for not shutting down Google for exactly the same.

Thank you, unknown hacker from another far overseas, for keeping the Ms. Internet Investigator employed and for revealing what GoDaddy actually is.

Thank you, SlashDot crowd, for support on the similar case with seclists.org.

Thank you, Fyodor, for your nodaddy.com

Well... Could somebody recommend a registrar who, at least, contact it's customers first?





Valentine's Day, Google's way

-

Google is celebrating Valentine's Day with some strawberry in chocolate.
Add this feed to your RSS reader and see it for yourself the next time.

What are People Saying About Page2RSS

-

Monitor any webpage with Page2RSS - sumanthtechsavvy.blogspot.com

Page2RSS is just a perfect solution. Just enter your favorite site which you want to monitor and it will display a meesage as below:
This page is monitored for updates. There are no any changes detected since 02/05/07 06:57:40.
To receive updates for this page in RSS format copy-paste this link into your feed reader. And add the link to your RSS readers and you will start receiving the changes.

Page2RSS - www.librarystuff.net

I’m back in Keeping Current mode these days, wanting to come up with some case studies on a "How-To-Do-It-Good" approach. One tool that I have been playing with is Page2RSS, which basically takes any static URL and creates a feed with any new changes in that page. A great idea, theoretically. The one problem it has is that I can’t rule out specific parts of a page, like the date, which changes daily on most sites (usually via javascript). Take a look at this page. See what I mean? If only Page2RSS took a "page" from Web Site Watcher (the king of keeping current tools), I’d be thrilled. Page2RSS is really close to being an amazing tool. I hope they continue to advance it.

Page2RSS - www.lockergnome.com

Just type in the URL, click the button, and you’re done. All that’s left to do is subscribe to the feed by copying the RSS or Atom feed into your feed reader. The page will then be monitored for updates, and you’ll receive all of them in your familiar aggregator. For example, the custom feed for the sparsely populated Google home page will notify you whenever a new text blurb or logo makes it to the front page. Try the example on Page2RSS to see what I’m talking about. A Page2RSS bookmarklet is also available to help simplify the creation of these feeds

4 Tips for getting the most from Google reader - www.gearfire.net

If you frequently visit a website, but it does not have a feed, you aren’t necessarily out of luck. A new web app called Page2RSS will turn any page into a subscribe-able RSS feed for you.

RHS CDT Page - stuartmeldrum.co.uk

One thing I really used to miss about their site was a feed so I didn’t have to check if it had been updated (you do use feeds by now don’t you?) so I went away and made one using page2rss.com, so if you want to be kept up to date with what they’ve been doing then you can use it too: RSS feed for the RHS CDT Department.

Thanks for a kind words guys and gals.

New Feeds

-

PostgreSQL vs. MySQL Performance

-

When I read MySQL vs PostgeSQL benchmarks my question was always: "How close is that to real life workloads. My real life workloads." I don’t want to argue about how close TPC-C test simulates real scenarios for a majority of MySQL installations. The only thing this test tells is how well MySQL runs TPC-C Test. That test suits MySQL very well: database doesn’t have any variable length text fields; total database size is small. But if you try to run forum or portal software, that is there MySQL has major installed base, you will see completely different picture.

At the Page2RSS.com we have experienced very poor MySQL performance. Workload is essentially one ~2Gb table with dozen of fields. Couple of fields is variable length text fields with up to 32Kb of data. There are 1-2 inserts per second and up to 10 read requests per second. We used to have MySQL 5.0.24, innodb, no transactions. Now we have PostgreSQL 8.2.0. Server is PC-class Linux box: P4 2.8GHz, 1Gb memory, SATA disks - noting special at all.

Graph below is a Google's Time spent downloading a page (in milliseconds).



That sharp drop in the middle of December is when we switched to the PostgreSQL.