Economists have spent 15 years measuring national complexity through exports, patents, and research output. Software was invisible in all of it. A new paper in Research Policy, authored by Sándor Juhász, Johannes Wachs, Jermain Kaminski, and César Hidalgo, fixes that gap using GitHub Innovation Graph data covering 163 economies, 150 programming languages, and quarterly developer activity from 2020 to 2023. The finding: a software-derived Economic Complexity Index predicts GDP per capita and income inequality even after controlling for every traditional measure.
The method is precise. The team queried the GitHub GraphQL API to map language co-occurrence across repositories, applied cosine similarity with normalization to prevent polyglot repos from skewing results, and used hierarchical clustering to compress 150 languages into 59 coherent technology stacks called software bundles. They then ran the standard ECI pipeline: revealed comparative advantage, binarization, iterative scoring. A country specializing in rare, non-ubiquitous bundles, think certified embedded systems for aerospace, scores high. A country whose developers concentrate in Python and JavaScript scores low. The paper also confirms the principle of relatedness holds in software: countries move into technology stacks adjacent to their existing specializations, not random ones.
The interview with all four researchers is worth reading in full, particularly the exchange where Hidalgo explains the kitchen analogy and where Kaminski describes code crossing borders via git push as the economy's digital dark matter. The Q4 2025 Innovation Graph data release accompanies the paper. The underlying dataset and the 59-bundle taxonomy are the real assets here, and researchers replicating or extending this work now have a cleaner foundation than anything previously available.
[READ ORIGINAL →]