Unicode defines a vast address space for all the world's languages — but what does it all look like, exactly?
Unicode at 1 pixel per code point. Click on the squares.
Wikipedia has a nice space-fill block layout for the Basic Multilingual Plane. For me it's instantly intuitive, but it isn't interactive, the address labels distract, the range name labels are too far from the data and the ranges don't directly correspond with the canonical Unicode Block names.
What I see
- Unicode codespace is vastly empty
- Even within the 3 first, mostly-populated planes there are unexplained gaps. Perhaps padding for future expansion, or previously-used space revoked? Resource allocation is hard.
- Technical note about the visualization itself: the image looks noticeably different on each of Chrome, Firefox, Safari, Internet Explorer and Opera. It looks best in Chrome and Safari and worst in Internet Explorer.
- Nice space-fill block layout of the Basic Multilingual Plane
- Table-based summary of Unicode planes and code point ranges — condenses the unused Planes nicely, something my vis doesn't do
- Layout of UTF-8 variable-length encoding