How big is the OGF data set?

Posted by Luciano on 2 January 2016 in Korean (한국어)

Earlier today we had some down-time... I'm not sure why but it seems to have recovered.

Since I had planned on working on OGF during that time, I started messing around with my most recent downloaded backup file (dated 12/31).

There are some interesting utilities that can be used to manage OSM data offline, in order to get manageable data sets in OSM format (or other formats). The two tools I use all the time are osmconvert and osmfilter.

OSM format files can be edited in JOSM and uploaded later, but these utilities have other abilities too.

Out of curiosity, today, I decided to get some statistics on our current OGF map, and make comparisons with my country, Ardisphere (abbreviated FA). I made two OSM files - one was for the entire OGF planet and the other was using my FA polygon file (.poly) which gives me a .osm file limited to my country's borders (roughly).

I found that the OGF planet file (uncompacted .osm) is currently about 4.05 GB, while the Ardisphere, alone, is 618MB.

"Wow," I thought. That looks like the Ardisphere, a relatively tiny country (about the size of Uruguay) on the world map, accounts for 15% of OGF's data volume. I guess I've drawn a lot of nodes! I'm nowhere close to finished, either.

As a point of comparison, OGF's entire Western Hemisphere, where only vandals and admins dare map, is approximately 65MB.

As another point of comparison, the REAL WORLD .osm file is 46 GB, so our imaginary world has less than 10% of the data. Meanwhile, Uruguay's .osm is 342MB, and Switzerland's is 3.6GB. Hmm. Ardisphere has twice the data as its real-world counterpart, while tinier but well-mapped Switzerland is almost the size of the entire OGF planet.

Both of the files I made, OGF.osm and FA.osm, are much too large to edit in JOSM - my desktop computer will crash on .osm files larger than about 100MB. I have other polygon files of smaller areas for if I'm making an extract for editing.

However, seeing those relative sizes, I was interested in compiling some statistics, using a feature of the osmfilter utility. Here is summary of the stats I found (cut off arbitrarily at ogf-counts > 10000) - screenshot from OpenOffice spreadsheet:


Analysis and thoughts:

Most of these stats make sense to me.

I am not surprised to see that Ardisphere accounts for 75% of the world's "natural" keys - I have been working hard to completely cover the country with detailed landuse and land cover polygons, and to create a realistic hydrologic system.

My custom-made tag "ruta:survey" has become a place to record shorthand tags that I use with JOSM filters to control my edit space - when I complete an area I try to remove those. I was somewhat surprised to see that Ardisphere's 13000 "places" account for 25% of the world.

The very low proportion of highways in the country makes perfect sense - many mappers are road-crazy and neglect other features, under the mistaken belief that a road map is a complete map.

I was surprised by the low proportion of buildings, but it's true I haven't gotten down to mapping many individual neighborhoods, so far, while some other users have done quite a bit with this. I will note, however, that we have many cases where users have imported detailed information from OSM (despite policy against it) and much of that detail is dense with buildings. I have no idea what proportion of our map can be attributed to imported data.

Anyway happy new year! / ¡feliz nuevo año! / 새해 복 많이 받으세요!

and as always, happy mapping

Location: Calle Ficticia, Samosata, Comuna Vías, Boreal, 화구체연합

Comment from Leowezy on 2 January 2016 at 12:35

Wow, that's a lot of awesome data you prepared there for us! Thanks a lot. I'm always amazed by your I think quite unique mapping style in the Ardisphere, and especially in the beginning that was a huge inspiration to me; so I guess, your 15% are very well deserved ;D From me as well a cheerful 'Frohes neues Jahr!', looking forward to 2016 :)

Comment from histor on 2 January 2016 at 14:08

Thanks you for this statistics, Luciano. Was a lot of work, I think.

Some tags - I think - must not be used so often

a) electricified = most railways, subways and streetcars today are. This is no important information. (17.333 *)

b) ruta:survey = what is that? (380.853 *)

c) lanes = on map you see nothing of it (98.960 *)

d) access = o.k. - may be impotant, but in most cases not

e) population = can be written in the wiki (11 270 *)

Nevertheless this roundabout 400 000 items are only few seen against all nodes, ways or areas. But everybody of us can ask himself / herself, if tags are necessary, which nobody see on the map (lanes, hi-speed-limit, electrification).

And indeed the OSM-software has its data-redundances. Why a bridge (or tunnel) must be "bridge = yes" and "layer = 1" instead of "bridge = 1" (where "1" is the layer)? There are other examples, to make more datas as needed.

Comment from Luciano on 2 January 2016 at 14:58

As I explained in my comments, the "ruta:survey" is a tag I invented to keep track of "survey" information - it's temporary and I use it to control filter behavior in JOSM. In a completed map, it will be deleted.

I disagree that info not shown on the map is always useless. In fact, for example, I keep track of the population of my towns using the OGF data - I use a program to "pull" the information out of the OSM file and build a wiki table (such as my community lists). Likewise for my registers of various objects. Basically, I am storing information in the OGF database and then I can pull that information out into my wiki pages. If I make a change in the map (say I change a town's population or a church's name), the change will appear in the wiki list automatically, the next time I make an update! I don't have to take the time to make a change to the wiki, except for periodic updates.

You might be right about lanes. I put lane information because it helps me to think clearly about how my towns are evolving as I work on the map, since I try to view their growth in historical terms. I start a street as a 1 or 2 lanes, but then if I imagine it becoming more important, I will increase the number of lanes. Perhaps if my map reaches a "finished" state, I will delete the lane information.

Comment from Thunderbird on 2 January 2016 at 15:37

Very nice, I was always impressed with your country and its level of detail. I'm trying to get that far myself but it's certainly a slow process.

Comment from BelpheniaProject on 2 January 2016 at 18:11

4.05GB of OpenGeofiction world data? Wow, that is interesting! Thanks a lot for the statistics Luciano!

Comment from Voytek on 3 January 2016 at 02:26

Great entry Luciano! Interesting to read about it and see that statistics :) I'm also using lanes. When I draw a road in JOSM and choose a preset (idk if it's called like that in English version) it shows a window with tag fields. So I set maxspeed, lanes, surface. Maybe we can make a plugin that will generate a route in the future and it can be useful for that OGF GPS (or whatever a gps system is called on our planet). It would also be a great tool to check if everything is connected correctly ;)

Can't wait to read more from you.

