rubric - Ruby Aggregator and Portal Creator

Skip contents

Contents

About

rubric (etymology) is a program that creates an HTML portal page from local copies of RSS feeds that are downloaded by rss_fetcher (included).

Author: Josef 'Jupp' Schugt <jupp@rubyforge.org>
Homepage: http://rubric.rubyforge.org/
Download: http://rubyforge.org/projects/rubric/
License: GPL 2.0 or later

rubric and rss_fetcher are written in plain Ruby. They do not install any libraries or extensions. The only requirement is an installation of Ruby 1.8 or higher.

Development platform is GNU/Linux. I am willing to support other platforms including Windows but please note that I do not have access to a system running these operating systems.

rubric is Free Software hosted at RubyForge; you can redistribute and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation; either version 2 of the License, or (at your option) any later version.

Screenshot / Example Output

You are searching for a screenshot of a program that generates an HTML file? I guess you are rather searching for an example output generated by rubric.

Documentation

Config file

Most configuration is done in configuration files located in $HOME/.rubric/. A Ruby-centric sample configuration file is ruby.txt (it is $HOME/.rubric/ruby on my local system). More details about the configuration file(s) is in the Files section of this document.

rubric

Here's the output of rubric --help. It should be all the help you need but please make sure you also read the Known Issues section.

rubric  [--help|-h] [--only <regular_expression>] [-o <regular_expression>]
        [<file_name> ...]

--help, -h
        show this help

--only, -o
        Only import feeds with URLs matching <regular_expression>

The configuration file <file_name> defaults to $HOME/.rubric/default

rss_fetcher

Here's the output of rss_fetcher --help. It should be all the help you need.

rss_fetcher [--help|-h] [--only <regular_expression>] [-o <regular_expression>]
                       [<file_name> ...]

--help, -h
        show this help

--only, -o
        Only download feeds with URLs matching <regular_expression>

The configuration file <file_name> defaults to $HOME/.rubric/default

Known Issues

While being usable rubric is still is not mature. This means that there are bugs, shortcomings and missing features. When I become aware and/or am informed about them they will be added here.

Download

rubric is available at the rubric project page.

Installation

Installation is quite simple:

  1. Download the archive and unpack it.
  2. Move ruby.txt to $HOME/.rubric/default and edit it according to you needs
  3. Put rubric and rss_fetcher somewhere in you PATH.
  4. On Windows rename rubric to rubric.rb and rss_fetcher to rss_fetcher.rb

Usage Example

Once they are installed using rss_fetcher and rubric is as simple as can be:

  1. Run rss_fetcher to download the feeds.
  2. Run rubric to generate a page from them.

Take a look at a sample page generated by 'rubric'.

The sample configuration file and the sample output are snapshots of the configuration I use and the output that has been generated using that configuration so the programs and the configuration are known to work at least on my system.

Please use a caching proxy

You may have noticed that rss_fetcher does not try to find out wether a feed has been modified since last visit before downloading it. This article shows why that can be a problem (buzzword: slashdotting). I am fully aware of that problem but I will not implement such a feature. The reason is that I do not want to re-invent the wheel.

The name of the wheel is caching proxy. A caching proxy works as follows:

  1. Your web client (in this case rss_fetcher) asks your caching proxy (in my case WWWOFFLE) to provide the information to be found at some URL.
  2. Now two cases are possible:
    1. The URL is new to the proxy:
      1. The proxy connects to the site in question and asks for the information (this is the reason for the proxy part of the name).
      2. The proxy then simultanously delivers the information to your client and stores a copy of the information (this is the reason for the caching part of the name).
    2. The URL has already been visited:
      1. The proxy connects to the site in question and asks wether the information has been updated since last visit.
      2. If the information has not changed it delivers the local copy otherwise it proceeds as in the first case by downloading the information, delivering it and storing a copy.

Note that I did simplify things and ignore all special cases. In real life things are much more involved and this is why it is a good idea to delegate the task to another program and not implementing it in rss_fetcher.

Files

Follows description of all files in the archive:

ChangeLog

just a reference to the RSS feed.

GPL.txt

The GNU General Public License (which is the license for rubric)

index.html

this page

news.html

HTML version of RSS feed generated using rubric.

README

Just a reference to this page.

rss_fetcher

is a Ruby program for fetching RSS news feeds from remote servers.

rubric

The rubric RSS aggregator and portal creator itself.

rubric.rubyforge.org.rss

RSS feed that serves as ChangeLog.

ruby.txt

An example configuration file. The entries should be self-expanatory but you can follow the hyperlinks for explanation.

title:  Ruby-related News


proxy:
        host:   localhost
        port:   8080


hold_time:
        items:  36 hours
        hashes: 40 days


dir:
        rss:    $HOME/rubric/RSS
        db:     $HOME/rubric/ruby



portal_file:    $HOME/www/ruby.html


feeds:
        - url:  http://codedbliss.com/weblog/syndication.rss
        - url:  http://homepages.ihug.com.au/~naseby/rss.xml
        - url:  http://kapheine.hypa.net/rcrchive.xml
        - url:  http://raa.ruby-lang.org/raa-rdf10.xml
        - url:  http://rubyforge.org/export/rss_sfnewreleases.php
        - url:  http://rubyforge.org/export/rss_sfnews.php
        - url:  http://www.blogtari.com/index.rb/rss/
        - url:  http://www.pragprog.com/pragdave/index.rss
        - url:  http://www.raphinou.com/rubynews/backend/rubynews.rdf
        - url:  http://www.ruby-doc.org/index.rb/rss0.91.xml
        - url:  http://www.ruby-lang.org/en/index.rdf
        - url:  http://www.rubygarden.com/rdf/cached/rubygarden-wiki.rdf
        - url:  http://www.rubygarden.com/rdf/cached/rurl.rdf
        - url:  http://www.rubynet.org/index.rdf
        - url:  http://www.rubyxml.com/index.rb/rss
        - url:  http://www.whytheluckystiff.net/why.xml
        - { type: pre, url: http://rubyforge.org/export/rss_sfnews.php }

Details on configuration files

title
Title for generated page.
proxy
Proxy configuration. If absent (deprecated!) no proxy server is used.
host
Host name of proxy server.
port
Port proxy server listens on, usually 3128 or 8080.
hold_time
Hold time configuration, defines how long will data be stored.
items
Hold time for message texts, defines how long a message text will show up on the generated page.
hashes
Hold time for message hashes. Hashes are used to identify old news because feeds often don't provide information about the date when an entry has been added. Make sure this value is appropriately large. Otherwise stone old news may pop up again.
dir
Configuration of directories where date is stored.
rss
Directory where downloaded copies of feeds are stored.
db
Directory where database is stored
portal_file
Portal file. This is the file you want to read.
feeds
List of feeds to be downloaded by rss_fetcher. What follows is a list of feeds. Usually you simply provide the feed's url. Some feeds contain pre-formatted text, for those you need to provide the type information. Not listed in the above example is cdata that is defined in the same way as url and type - if it has the value broken it is assumed that the feed misuses the CDATA specifier (I am presently not subscribed to such a broken feed). Note that YAML requires curly braces and separation using commas if you provide more than just a url.
url
URL the feed is actually located at.
type
You can usually leave this unspecified. Only use type: pre if the feed uses pre-formatted text (rarely used).
cdata
CDATA allows to easily embed HTML without quoting < and other characters that have special meanings in XML. Some feeds use CDATA but nevertheless escape those characters so that correct interpretation of CDATA results in rubric displaying the HTML source. Defining cdata: broke enables a bugfix for such feeds.

News

To be informed about the latest changes to rubric use the RSS feed. Of course you can also read the HTML version generated by rubric.

Feedback

Feel free to contact Josef 'Jupp' SCHUGT <jupp@rubyforge.org> (note that I silently discard any message that exceeds 100 kB).

Etymology of 'rubric'

Middle French rubrique, literally, "red ocher", from Latin rubrica, from ruber "red".

Derived ultimately from Latin ruber, "red", rubric was originally used in Middle English to name red ocher, a red pigment. Yet in present-day English rubric is used to mean "an authorative rule" or "an explanatory commentary". This semantic transformation is derived from the practice orinated centuries ago of putting instructions or explanations in a manuscript or printed book in red ink to contrast with the black ink of the text. [Source: Webster's New Encyclopedic Dictionary]

[Valid RSS]

Last changed: 2004-02-16