[GIS] Three table postgres sql join query

postgispostgresqlsql

I'm having trouble getting my SQL query to work with three tables. I'm new to SQL to I've already spent considerable time trying to understand the documentation and making some sense of it.
I'm using Postgres 8.3 with POSTGIS.
I have three tables:

  1. A table holding polygon geometry called subshed2
  2. A table holding point geometry called withdrawalsites with a primary key ic_site_id
  3. A table holding values I want to summarize on a monthly basis, both within a single specific year as well as the average over all years of each month. Table name is withdrawals.
    This table has a record with the same ic_site_id as found in the second table above, but there are many records for each ic_site_id. This table has my dates in timestamp form. Here's a few records:
    withdrawal table example

I need my final output to look like this:
output table

The shed_id parameter has already been supplied from another function, so it's being supplied to this query as ${param.shed_id}.
I'll be letting my user select a specific year to summarize to get the first table (and likely suppling it using another ${param.requested_year} ), but then I also want to average the values by month over all the years in the dataset. I might have to run a separate slightly altered query to get the second table, but I was hoping I could do it in one query.

My query looks like this:

SELECT
    to_char(withdrawals.begintime, 'YYYY') AS year,
    to_char(withdrawals.begintime, 'MONTH') AS month,
    sum(withdrawals.withdrawalvalue_mgd) AS mgdsum,
    withdrawals.begintime AS date
FROM
    withdrawalsites
    INNER JOIN subshed2
    on (st_contains(subshed2.the_geom, withdrawalsites.the_geom))
    INNER JOIN withdrawals
    on withdrawals.ic_site_id = withdrawalsites.ic_site_id
where
    subshed2.id = '${param.shed_id}',
    withdrawals.intervalcodes_id = '2'
GROUP BY to_char(withdrawals.begintime, 'YYYY MONTH')
ORDER BY YYYY MONTH

When I run the query, it gives a syntax error on a comma, but it doesn't tell me which one, but I suppose there's a lot more wrong with this query. Any ideas on how to query these three tables correctly, or at least point me in a direction to investigate? It's also unclear to me if I should instead be using the date_part function.

Final Solution:
I solved this problem using some of the thoughts Mike provided. I couldn't get the first function to work, but realized I was getting years with decimals (i.e. 2008.0) and was getting the month number, not the month name. I then wanted to sort those month names correctly. Here's what my worked in the end:

SELECT
    (EXTRACT(YEAR FROM withdrawals.begintime))::int AS year,
    to_char(date_trunc('month', withdrawals.begintime), 'MON') AS month,
    sum(withdrawals.withdrawalvalue_mgd) AS mgdsum
FROM
    medford.withdrawalsites
    JOIN medford.subshed2 ON
    ST_Contains(medford.subshed2.the_geom,medford.withdrawalsites.the_geom)
    JOIN medford.withdrawals ON medford.withdrawals.ic_site_id =
        medford.withdrawalsites.ic_site_id
where
    subshed2.id = '${param.shed_id}'
AND
    withdrawals.intervalcodes_id = '2'
GROUP BY (EXTRACT(YEAR FROM withdrawals.begintime))::int, date_trunc('month', 
    withdrawals.begintime)
ORDER BY (EXTRACT(YEAR FROM withdrawals.begintime))::int, date_trunc('month', 
    withdrawals.begintime)

Best Answer

The aggregate functions could be improved. I'm a fan of the first/last aggregate functions, which are used here:

SELECT
    first(date_part('year', withdrawals.begintime)) AS year,
    first(to_char(withdrawals.begintime, 'FMMONTH')) AS month,
    sum(withdrawals.withdrawalvalue_mgd) AS mgdsum
FROM
    withdrawalsites
    JOIN subshed2 ON ST_Contains(subshed2.the_geom, withdrawalsites.the_geom)
    JOIN withdrawals ON withdrawals.ic_site_id = withdrawalsites.ic_site_id
where
    subshed2.id = '${param.shed_id}'
    AND withdrawals.intervalcodes_id = '2'
GROUP BY date_part('year', withdrawals.begintime), date_part('month', withdrawals.begintime)
ORDER BY date_part('year', withdrawals.begintime), date_part('month', withdrawals.begintime)

Note there are several other changes, such as proper ordering of months, using their month numbers via date_part. Also I generally prefix the format of to_char with FM to avoid fill spaces.

Related Question