I have seen this technique used in the past. It was explained to me by Zain Memon (from Trulia) who helped giving some input when Michal Migurski was creating TileStache. Zain went over it while explaining his Trulia demo that uses this technique at one of our older SF GeoMeetup meetings some time back. In fact, if you are in SF next week (this is my lame attempt at a plug, he will touch on this, so feel free to show up :)
OK, now to the explanation.
First, you are looking slightly in the wrong place when looking at the json files above.
Let me explain (as short as I can), why.
The tiles are being passed just as regular rendered tiles, no big deal there, we know how to do that and so I don't need to explain that.
If you inspect it in Firebug, you will see that you also get a whole bunch of images that seem to be blank, like this one.
Why is it blank? It is not. The pixels contain data - just not traditional visible image data. They are using a very clever technique to pass data encoded in the pixels themselves.
What has been going in the past decade, is that people have been trading off readability and portability data of formats at the expense of storage efficiency.
Take this example of xml sample data:
<data>
<feature>
<point>
<x> -32.1231 </x>
<y> 10.31243 </y>
</point>
<type>
sold
</type>
</feature>
<feature>
<point>
<x> -33.1231 </x>
<y> 11.31243 </y>
</point>
<type>
available
</type>
</feature>
</data>
OK, how many bites to transfer this? Provided that we are utf8 (1 byte per character when dealing with this content). Well, we have around 176 chars (without counting tabs or spaces) which makes this 176 bytes (this is being optimistic for various reasons that I will omit for the sake of simplicity). Mind you, this is for 2 points!
Still, some smart ass that doesn't understand what he is talking about, somewhere, will claim that "json gives you higher compression".
Fine, let's put the same xml nonsense as json:
{ "data": [
"feature" : { "x" : -32.1231, "y" : 10.31243 , "type": "sold" },
"feature" : { "x" : -33.1231, "y" :11.31243, "type": "avail" },
]
}
How many bytes here? Say ~115 characters. I even cheated a bit and made it smaller.
Say that my area covers 256x256 pixels and that I am at a zoom level so high that each feature renders as one pixel and I have so many features, that it is full. How much data do I need to show that 65,536 features?
54 characters (or utf bytes - and I am even ignoring some other things) per "feature" entry multiplied x 65,536 = 3,538,944 or about 3.3MB
I think you get the picture.
But this is how we transport data in a service oriented architecture. Readable bloated crap.
What if I wanted to transport everything in a binary scheme that I invented myself? Say that instead, I encoded that information in single band image (i.e black and white). And I decided that 0 means sold, and 1 means available, and 2 means I do not know. Heck, in a 1 byte, I have 256 options that I can use - and I am only using 2 or three of them for this example.
What is the storage cost of that? 256x256x 1 (one band only). 65,536 bytes or 0.06MB. And this doesn't even take in consideration other compression techniques that I get for free from several decades of research in image compression.
At this point, you should be asking yourself why do people not simply send data encoded in binary format instead of serializing to json? Well first, turns out, javascript sucks big time for transporting binary data, so that is why people have not done this historically.
An awesome work around has been used by some people when the new features of HTML5 came out, particularly canvas. So what is this awesome work-around? Turns out, you can send data over the wire encoded on what appears to be an image, then you can shove that image into an HTML5 Canvas, which allows you to manipulate the pixels directly! Now you have a way to grab that data, decode it on the client side, and generate the json objects in the client.
Stop a moment and think about this.
You have a way of encoding a huge amount of meaningful geo-referenced data in a highly compressed format, orders of magnitude smaller than anything else done traditionally in web applications, and manipulate them in javascript.
The HTML canvas doesn't even need to be used to draw, it is only used as a binary decoding mechanism!
That is what all those images that you see in Firebug are about. One image, with the data encoded for every single tile that gets downloaded. They are super small, but they have meaningful data.
So how do you encode these in the server side? Well you do need to generalize the data in the server side, and create a meaningful tile for every zoom level that has the data encoded. Currently, to do this, you have to roll your own - an out of the box open source solution doesn't exist, but you have all tools you need to do this available. PostGIS will do the generalization through GEOS, TileCache can be used to cache and help you trigger the generation of the tiles. On the client side, you will need to use HTML5 Canvas to pass on the special "fake tiles" and then you can use OpenLayers to create real client-side javascript objects that represent the vectors with mouse-over effects.
If you need to encode more data, remember that you can always generate RGBA images per pixel (which gives you 4 bytes per pixel or 4,294,967,296 numbers you can represent per pixel). I can think of several ways to use that :)
Update: Answering the QGIS question below.
QGIS like most other Desktop GISes, do not have a fixed set of zoom levels. They have the flexibility of zooming at any scale and just render. Can they show data from WMS or tiles based sources? Sure they can, but most of the time they are really dumb about it: Zoom to a different extent, calculate the bounding box, calculate the required tiled, grab them, show them. Most of the time they ignore other things, like http header caches that would make it so they did not have to refetch. Sometimes they implement a simple cache mechanism (store the tile, if you ask for it, check for the tile, don't ask for it). But this is not enough.
With this technique the tiles and the vectors need to be refetched at every zoom level. Why? Because the vectors have been generalized to accomodate zoom levels.
As far as the whole trick of putting the tiles to an HTML5 canvas so you can access the buffers, that whole thing is not necessary. QGIS allows you to write code in Python and C++, both languages have excellent support for handling binary buffers, so this work around is really irrelevant for this platform.
*UPDATE 2**:
There was a question about how to create the generalized vector tiles in the first place (baby step 1 before being able to serialize the results into images). Perhaps I did not clarify enough. Tilestache will allow you create effective "vector tiles" of your data at every zoom level (it even has an option that allows you to either clip or not clip the data when it passes the tile boundary). This takes care of separating the vectors into tiles at various zoom levels. I would choose the "not clip" option (but it will pick an arbitrary tile where it covers more area). Then you can feed every vector through GEOS generalize option with a big number, in fact, you want it big enough that polylines and polygons collapse onto themselves, because if they do, you can remove them from the zoom level since for that stage they are irrelevant. Tilestache even allows you to write easy pythonic data providers where you can put this logic. At that stage, you can choose to serve them as json files (like they do with some of the african map samples) or as serialized geometries into the pngs, like they do in other samples (or the Trulia one) I gave above.
The issue here is TopoJSON’s default quantization behavior; you need to increase the quantization precision if you want to preserve detail when zoomed in. Try -q 1e5
to increase the quantization factor from the default by 10, or -q 1e6
by 100.
The appropriate value of Q depends on the maximum effective size of your map and limited by the original precision of the geometry. Your map appears to have 11 zoom levels by powers of 2, so starting with the smallest size of 560×410 at zoom level 0, the effective size of the map at zoom level 10 is 573,440×419,840. If you want to maintain this precision, you will need Q = 1e6 (and a significantly larger TopoJSON file).
More details: the quantization factor Q determines the maximum number of differentiable points and defaults to 10,000. For best efficiency, Q should be a factor of 10 because digits are base-10 encoded in JSON. The default value is appropriate for displaying a map on a computer screen which typically has at most a resolution of 2,560×1,440 pixels, well under 10,000×10,000. (Q is an upper bound on differentiability; if points are not uniformly spaced, you may get fewer differentiable points.)
TopoJSON uses quantization to determine whether two points are coincident for the purpose of simplification. This is required because GeoJSON does not encode topology, and exact matches would be overly strict due to floating point error in GeoJSON coordinates. Also, quantization is a major factor in reducing the size of the TopoJSON encoding, especially in conjunction with the delta encoding.
Slightly related, here is an example of the Asia Lambert Conic Conformal projection used for Russia.
Best Answer
You would need to create your own filter. You can do this by getting the JSON via ajax:
https://api.jquery.com/jquery.ajax/
Then you will need to delete the elements you do not want from jsonResult, and re-stringify it. You'll need to be somewhat familiar with the structure of the GeoJSON data you're working with, GeoJSON in general so you don't delete something OL needs to parse it, and working with JSON.
Then you can read the remaining elements into features and add them to your source: