Skip to content
This repository has been archived by the owner on Nov 20, 2017. It is now read-only.

Program Requirements

kaspermarkus edited this page Nov 19, 2011 · 5 revisions

"Boxes"

Input

  • Relataional Databases (mySQL, postgreSQL, mSQL, etc)
  • Spreadsheets (Excel, OO Calc, etc)
  • .csv
  • Structured files; XML, json, etc
  • Other? (doc oriented DBs, outlook, log, console, word, pdf, ...)

Output

  • Relataional Databases (mySQL, postgreSQL, mSQL, etc)
  • Spreadsheets (Excel, OO Calc, etc)
  • .csv
  • Structured files; XML, json, etc

Data-manipulation

  • Join tables
  • Split tables
  • Only select some columns
  • Search/Replace
  • Generate text
  • Rename Column
  • Multiple columns -> Single column
  • One column -> Multiple columns (eg. tokenize)
  • One row (split field) -> Multiple rows (eg. tokenize)
  • multiple rows (merge fields) -> Single row (join with separator)
  • Text-manipulation: Add prefix, add postfix, all capitals, all small, substring, google translate?, etc.
  • Filter (if cell has value X throw data in one output pipe, else in another pipe)
  • Convert datatypes
  • Skip row if field has PATTERN value
  • Sort?
  • Math (simple +/-/*/%/mod/etc)
  • Count (with GROUP BY)
  • Append rows (appends one list to the other)

Needs to:

  • Support datatypes (since both input and output can be data type specific)
  • Be able to group a pipeline section (group of connected boxes) together and use as 'new unit'
Clone this wiki locally