Converting FOAF to OPML

Soon my first professional contract comes to an end, and I will have time again to develop my newsfeed addiction.

Previously I used Bloglines, which is certainly one of the better on-line feed readers. I'm switching to a desktop feed reader however, I'm trying out Liferea.

I'd like to add all the Planet Ugent blogs to my list of feeds, but the available OPML contains no URL's. Pretty useless this way. Nothing is lost however, since they still have a FOAF export. The FOAF however contains only the blog URLs, not the actual feeds.

Ruby to the rescue!

require 'rubygems'
require 'open-uri'
require 'hpricot'

doc = Hpricot open('http://planet-ugent.be/foafroll.xml')

def extract_feed(url)
  return 'error://no url' unless url && url != ''
  begin
    page = Hpricot(open(url))
    %w(application/atom+xml application/rss+xml).each do |t|
      link = page.at("link[@type=#{t}]")
      if link
        link = link.attributes['href']
        if link =~ /^\\//
          link = url[/^[\w]+:\\/\\/[^\\/]+/] + link
        elsif link !~ /^[^\\/]+:\\/\\//
          link = url[/^.*\\//] + link
        end        
        return link
      end
    end
    'error://not found'
  rescue Timeout::Error
    'error://timeout'
  rescue SocketError
    'error://socket error'
  end
end

feeds = doc.search('foaf:member').map do |m| 
  name = m.at('foaf:name').inner_html
  url = m.at('foaf:document').attributes['rdf:about']
  [name, extract_feed(url)]
end

puts %(<opml version="1.1">
  <head>
    <title>Planet UGent
    <dateCreated>#{Time.now.rfc822}
    <dateModified>#{Time.now.rfc822}
    <ownerName>Ikke
    <ownerEmail>eikke at eikke dot commercial
  </head>
  <body>
)

feeds.each do |name, url|
  puts %(    <outline text="#{name}" xmlUrl="#{url}"/>\\n)
end

puts "  </body>\\n</opml>"

This script will extract the FOAF names and urls, load each page and extract the feed. Atom feeds get precedence over RSS feeds. It should be able to handle relative URLs, but this is not thoroughly tested. The OPML is written on standard out.

Find the result here.

Comments

Post new comment

The content of this field is kept private and will not be shown publicly.
CAPTCHA
This question is for testing whether you are a human visitor and to prevent automated spam submissions.
Syndicate content