Introduction

This is the document of rss2sql

Specification

There are three class and a configuration file.

Classes

ToolKit

This class define some useful static method to handle parsed feed

RSS

Just a representation of a single RSS feed item.

SQL

Creating table according to configuration file, call its fetch method to store data after that table created scuessfully.

Configuration File

Thanks to the diversity of RSS feed, it is necessary to configure settings manually. Here is some examples:

Minimal configure

Store nothing but id.

rss:
  url: "http://songshuhui.net/feed"
sql:
  tablename: "songshuhui"
  field:
    - name: id
      val: "x.get('id')"
      type: VARCHAR
      type_parameter: 64
      nullable: false
      primary_key: true
      autoincrement: false

Using built-in reference table type instead of ENUM type

Using REFTABLE instead of ENUM, then it will define a reference table with field id as primary key and the field you named.

For example:

rss:
  url: https://nyaa.si/?page=rss
sql:
  tablename: nyaa
  field:
    - name: id
      val: "x.get('id')"
      type: VARCHAR
      type_parameter: 256
      nullable: false
      primary_key: true
      autoincrement: false
    - name: cate
      val: "x.get('nyaa_category')"
      type: REFTABLE
      type_parameter:
        - VARCHAR
        - 20

It will create a reference table with field id and cate, and in table nyaa the cate field is a INT type.

Common configure

rss:
  url: "https://share.dmhy.org/topics/rss/rss.xml"
  proxies:
    https: "https://127.0.0.1:1080"
sql:
  tablename: "dmhy"
  field:
    - name: id
      val: "x.get('id')"
      type: VARCHAR
      type_parameter: 256
      nullable: false
      primary_key: true
      autoincrement: false
    - name: title
      val: "x.get('title')"
      type: TEXT
    - name: link
      val: "x.get('link')"
      type: TEXT
    - name: pubtime
      val: "ToolKit.struct_time_To_datetime(x.get('published_parsed'))"
      type: TIMESTAMP
      index: true
    - name: summary
      val: "x.get('summary')"
      type: TEXT

Just remember the x in val denotes the dict instance of an item which parsed by feedparser library

Usage

Within code

from rss2sql import SQL
SQL('/path/to/configuration','uri://of:your@own/database').fetch()

Within commandline

python rss2sql.py -c /path/to/configuration -d uri://of:your@own/database --hide_banner

Discover mode

Configuration file is needed, omit the field section, and run

python rss2sql.py -c /path/to/configuration  --discover

the configuration file should look like

rss:
  url: http://songshuhui.net/feed
sql:
  tablename: nyaa