Jan 2, 2013

Visualizing Unicode Codespace

Unicode defines a vast address space for all the world's languages — but what does it all look like, exactly?

Unicode at 1 pixel per code point. Click on the squares.

Wikipedia vis, not bad ☺
Wikipedia has a nice space-fill block layout for the Basic Multilingual Plane. For me it's instantly intuitive, but it isn't interactive, the address labels distract, the range name labels are too far from the data and the ranges don't directly correspond with the canonical Unicode Block names.

What I see

  • Unicode codespace is vastly empty
  • Even within the 3 first, mostly-populated planes there are unexplained gaps. Perhaps padding for future expansion, or previously-used space revoked? Resource allocation is hard.
  • Technical note about the visualization itself: the image looks noticeably different on each of Chrome, Firefox, Safari, Internet Explorer and Opera. It looks best in Chrome and Safari and worst in Internet Explorer.

Other Visualizations

Links

  1. Unicode Character Databases: Blocks.txt
  2. fullsize interactive SVG image
  3. project on github with raw data and a python script for generating the visualization

Comments

Ryan Flynn is a programmer and problem solver.