Re: Can anyone make suggestions for training / learning resources wrt databases.

18 Sep 2016

      On Sun, Sep 18, 2016 at 10:54:06AM +1000, h wrote:
...
There are two areas I have been unable to find information on:
*  Writing sql / script files
An sql script file is just a bunch of sql commands in a sequence.

There are numerous ways to run such a script, including piping or
redirecting it into /usr/bin/psql, using `\i scriptname.sql` from within
an existing psql session, writing a simple perl DBI wrapper (or similar
in python or whatever) to connect to the DB and start issuing SQL
commands, etc.

The latter is most useful if you want conditional execution of commands
depending on the results of previous commands, but don't want to write
in a db-specific language like pl/sql (note: postgresql, at least, has
options to embed the perl, python, lua, sh, and/or tcl languages into
the postgres server itself so you can write stored procedures in those
languages too).

It's usually best to run sql scripts wrapped in a transaction, so that
either all of the commands in the script succeed or they all get rolled
back as if they'd never happened.

What else do you need to know?
...
*  Structuring tables either within a database, or on a server
That's a much more complex topic.  There's no set answer, it depends
entirely on your data and how you intend to use it.

Wikipedia has a set of articles on this topic, which serve as a great
introduction to the concepts.  I'd start with:

https://en.wikipedia.org/wiki/Database_normalization

Read that, and then follow your interests with the links at the bottom.
It won't tell you everything you need to know, but a few hours reading
will at least teach you enough to know what to search for.

Stack Exchange (SE) also has a site dedicated to db-related questions
and answers, at http://dba.stackexchange.com/

http://dba.stackexchange.com/help/on-topic says:

	dba.se is for those needing expert answers to advanced database-related
	questions concerning traditional SQL RDBMS and NoSQL alternatives.

	If you have a question about...

	* Database Administration including configuration and backup / restore
	* Advanced Querying including window-functions, dynamic-sql, and 
      query-performance
	* Data Modelling and database-design, including referential-integrity
	* Advanced Programming in built-in server-side languages including
      stored-procedures and triggers.
	* Data Warehousing and Business Intelligence including etl, reporting,
      and olap

	...then you're in the right place to ask your question!

Even if you don't post a question yourself, there are lots of good
questions and answers in an easy to find format (unlike a forum site,
you won't have to wade through page after page of inconsequential chat,
bickering, ill-informed nonsense, obsolete information etc just to find
the few hidden gems of useful information)
...
The particular problems I have are:
* I regularly update my tables from multiple csv files, all residing
  in the same folder. Currently I have a script with a hardwired path
  for each csv file. I would like to have a single 'variable' I could
  change to define the path to all the csvs.
No matter which language the script is written in, the best solution
for this is to use getopts to process command line option.  Even in sh
or bash, it's a lot easier than you might think to get good option and
argument processing, just like "real programs" :)

e.g. your script could have a '-p pathname' or '--path pathname'
command-line option.

Alternatively (or in addition), it could get the path from an
environment variable - e.g. if your script is called myscript, then
set and export 'MYSCRIPT_CSV_PATH' in your environment any time before
running it.  You could then have:

 - a hard-coded default
 - which can be overridden by the environment variable
 - which can be overridden by -p or --path on the command-line.

For Bourne-like shells (ash, dash, bash, ksh, etc. even zsh), you have
the choice of either:

 - built in getopts (can only do short single-character options)
 - getopt from util-linux (can do both short and --long options)

I wrote an example back in June, showing/comparing how to use both at:

http://unix.stackexchange.com/a/287344/7696

NOTE: if you use getopt, use ONLY the version from util-linux.  Most
(all?) other versions have serious flaws and are dangerous to use.

For perl, use Getopt::Std for short options, or Getopt::Long for both
short and long.  There's also Getopt::ArgParse, which implements
something a lot like python's argparse in perl.

argparse is probably overkill for your needs but it's worth knowing
about because it's an easy way to implement sub-commands (e.g. like git,
which has numerous subcommands, like 'git add', 'git commit', 'git log',
and many more, each with their own set of options and args)

For python, there's getopt which provides short and long options. if you
need something fancier, use argparse or maybe gflags.

There may be the odd exception, but every other language available on
linux will have some kind of standard option parsing library.  Many will
also implement something like argparse.
...
* I have a database which I have decided should contain three types of
  table - core definitions / lookups, - raw data and - derived data
  for specific tasks.  I would like to create separate areas in the
  database / (server?) for each table type, or learn of other options
  for how the tables could be structured.
Start with the database normalisation wikipedia article I mentioned
above.

BTW, defining a view is often a good substitution for filling your
database with derived data. depends on whether you need to do a lot of
heavy processing on the existing data (derived copy is best) or just a
complicated select with multiple joins, constraints etc (a view is good
for this)
...
I am NOT looking for specific answers to these problems, but rather
some teaching / groundwork information about relational databases /
sql / postgis concepts.
Packt, O'Reilly, etc have numerous books on Data Science, Data Analysis,
Data Modelling, GIS, postgres etc.  These, btw, are good keywords to
google for.

Wikipedia is great for general overview type material (on science /
tech / fact-based topics, at least - too many edit wars on political or
controversial topics)

dba.stackexchange.com is good for specific answers to specific
questions, and many of the people writing answers are serious experts in
the field who put in significant effort to write good answers with both
background info/theory as well as a practical howto.

I'm a big fan of SE sites, the Q&A format with up/down-voting and
reputation scores really encourages this, and discourages people from
being jerks. Knowledgeable and helpful people end up with huge scores
from answering question and you can partially use the rep score as
a proxy for "does this person know what they're talking about" when
reading an answer. High-score says "almost certainly". Low score says
"unknown, not enough data to judge"

craig

--
craig sanders <cas@taz.net.au>

Re: Can anyone make suggestions for training / learning resources wrt databases.

Craig Sanders