I feel that spatial databases should be treated no differently to traditional databases. They are essentially doing the same thing, storing large amounts of data for fast retrieval. As an example, in PostgreSQL / PostGIS, the geometry is just another datatype. Just like text, or integer. Same in SQL Server 2008. Same in Oracle. If the "spatial" part is just another field type in the database, then is it really that different from the original database? Does this mean we should throw out all the rules of traditional database design?
Obviously normalization can be taken too far, just as with traditional databases, so it's a trade off to find the best design that suits your needs.
If you are planning on created a highly de-normalized structure with tables of 100 columns, then you have to ask yourself what is likely to change in the future? With a vast increase in rows, is this also going to affect querying performance? Is this going to affect maintainability in the future?
What's wrong with created a normalized structure and using views to expose all the data to the database client, be it GIS, or any other client?
All of these questions apply to both traditional databases and spatial databases. If you go through http://en.wikipedia.org/wiki/Database_normalization you will find that it applies to spatial databases as well.
If the software you are using on top of the database is forcing you to use highly de-normalized structures, then this is a different argument. You are constrained by the software and not the database, so you have no choices in the best database design.
So I think, the short answer is (in my opinion) database design is just as important with spatial databases as with traditional databases.
"Point data that doesn't always have spatial data" -- actually, if it's point data it is spatial data.
If I understand you correctly, you've got to a static survey point and record one or more species observed and a species can be observed at more than one static survey point. This is a many-to-many relationship. You need three tables for it.
And the transect observations would be the same? If the only difference is one is persistent and another is a one-time event they can be stored in the same table. You can use a field to flag a survey location as static or not. You might even infer it from time of observation (transects only have one, statics have multiple).
Here's one suggestion...
Think of it as Locations, Observations and Species. Locations have geometry (points), Species do not (unless you're talking about plants that don't move?). You observe species at locations, static or not. So, Species are related to Locations by Observations.
Store the data in these tables (not actual SQL):
- Survey Locations (pkey id, geometry, static or transect?, other attributes...)
- Species (pkey id, other attributes...)
- Observations (fkey to survey locations, fkey to species, datetime, other attributes...)
In answer to your numbered questions:
- No, no, no. Do not duplicate data in an RDBMS. You'll have a maintenance nightmare.
- NULLs are legal in a geometry column. NULL means unknown but you won't have observations with unknown locations, will you? And if you haven't observed a species, there's no observations in the Observations table.
- More maintenance headaches, work on modeling and relating your data better.
Best Answer
Aside from Spatialite, you might also want to consider PostGIS. Think of it as Spatialite's big brother ;-) It's just another data source for QGIS while you can connect to it using the RODBC package in R.