
On Sun, Sep 18, 2016 at 10:54:06AM +1000, h wrote:
There are two areas I have been unable to find information on:
* Writing sql / script files
An sql script file is just a bunch of sql commands in a sequence. There are numerous ways to run such a script, including piping or redirecting it into /usr/bin/psql, using `\i scriptname.sql` from within an existing psql session, writing a simple perl DBI wrapper (or similar in python or whatever) to connect to the DB and start issuing SQL commands, etc. The latter is most useful if you want conditional execution of commands depending on the results of previous commands, but don't want to write in a db-specific language like pl/sql (note: postgresql, at least, has options to embed the perl, python, lua, sh, and/or tcl languages into the postgres server itself so you can write stored procedures in those languages too). It's usually best to run sql scripts wrapped in a transaction, so that either all of the commands in the script succeed or they all get rolled back as if they'd never happened. What else do you need to know?
* Structuring tables either within a database, or on a server
That's a much more complex topic. There's no set answer, it depends entirely on your data and how you intend to use it. Wikipedia has a set of articles on this topic, which serve as a great introduction to the concepts. I'd start with: https://en.wikipedia.org/wiki/Database_normalization Read that, and then follow your interests with the links at the bottom. It won't tell you everything you need to know, but a few hours reading will at least teach you enough to know what to search for. Stack Exchange (SE) also has a site dedicated to db-related questions and answers, at http://dba.stackexchange.com/ http://dba.stackexchange.com/help/on-topic says: dba.se is for those needing expert answers to advanced database-related questions concerning traditional SQL RDBMS and NoSQL alternatives. If you have a question about... * Database Administration including configuration and backup / restore * Advanced Querying including window-functions, dynamic-sql, and query-performance * Data Modelling and database-design, including referential-integrity * Advanced Programming in built-in server-side languages including stored-procedures and triggers. * Data Warehousing and Business Intelligence including etl, reporting, and olap ...then you're in the right place to ask your question! Even if you don't post a question yourself, there are lots of good questions and answers in an easy to find format (unlike a forum site, you won't have to wade through page after page of inconsequential chat, bickering, ill-informed nonsense, obsolete information etc just to find the few hidden gems of useful information)
The particular problems I have are:
* I regularly update my tables from multiple csv files, all residing in the same folder. Currently I have a script with a hardwired path for each csv file. I would like to have a single 'variable' I could change to define the path to all the csvs.
No matter which language the script is written in, the best solution for this is to use getopts to process command line option. Even in sh or bash, it's a lot easier than you might think to get good option and argument processing, just like "real programs" :) e.g. your script could have a '-p pathname' or '--path pathname' command-line option. Alternatively (or in addition), it could get the path from an environment variable - e.g. if your script is called myscript, then set and export 'MYSCRIPT_CSV_PATH' in your environment any time before running it. You could then have: - a hard-coded default - which can be overridden by the environment variable - which can be overridden by -p or --path on the command-line. For Bourne-like shells (ash, dash, bash, ksh, etc. even zsh), you have the choice of either: - built in getopts (can only do short single-character options) - getopt from util-linux (can do both short and --long options) I wrote an example back in June, showing/comparing how to use both at: http://unix.stackexchange.com/a/287344/7696 NOTE: if you use getopt, use ONLY the version from util-linux. Most (all?) other versions have serious flaws and are dangerous to use. For perl, use Getopt::Std for short options, or Getopt::Long for both short and long. There's also Getopt::ArgParse, which implements something a lot like python's argparse in perl. argparse is probably overkill for your needs but it's worth knowing about because it's an easy way to implement sub-commands (e.g. like git, which has numerous subcommands, like 'git add', 'git commit', 'git log', and many more, each with their own set of options and args) For python, there's getopt which provides short and long options. if you need something fancier, use argparse or maybe gflags. There may be the odd exception, but every other language available on linux will have some kind of standard option parsing library. Many will also implement something like argparse.
* I have a database which I have decided should contain three types of table - core definitions / lookups, - raw data and - derived data for specific tasks. I would like to create separate areas in the database / (server?) for each table type, or learn of other options for how the tables could be structured.
Start with the database normalisation wikipedia article I mentioned above. BTW, defining a view is often a good substitution for filling your database with derived data. depends on whether you need to do a lot of heavy processing on the existing data (derived copy is best) or just a complicated select with multiple joins, constraints etc (a view is good for this)
I am NOT looking for specific answers to these problems, but rather some teaching / groundwork information about relational databases / sql / postgis concepts.
Packt, O'Reilly, etc have numerous books on Data Science, Data Analysis, Data Modelling, GIS, postgres etc. These, btw, are good keywords to google for. Wikipedia is great for general overview type material (on science / tech / fact-based topics, at least - too many edit wars on political or controversial topics) dba.stackexchange.com is good for specific answers to specific questions, and many of the people writing answers are serious experts in the field who put in significant effort to write good answers with both background info/theory as well as a practical howto. I'm a big fan of SE sites, the Q&A format with up/down-voting and reputation scores really encourages this, and discourages people from being jerks. Knowledgeable and helpful people end up with huge scores from answering question and you can partially use the rep score as a proxy for "does this person know what they're talking about" when reading an answer. High-score says "almost certainly". Low score says "unknown, not enough data to judge" craig -- craig sanders <cas@taz.net.au>