The data point to latitude and the number of people who speak a language as having statistically significant impacts on consonant inventories and word length as related to vowel inventories (multiple citations are omitted in the blockquote below without editorial indication):
[C]onsonants are more likely to be found further away from the equator and . . . more vowels are associated with shorter word lengths. . . .
The clustering of a large number of consonants north and south of the equator might be the result of climatic zones. Cold-climate languages, for instance, are found to have a high frequency of consonants at a very low frequency (obstruents). By contrast, languages in warm climatic zones possess sound classes exhibiting moderate to high levels of sonority (rhotic consonants, laterals, nasals and vowels).
As for word length: It is widely acknowledged in the literature that vowels and consonants subtend to different linguistic functions. Consonants are often associated with word identification, whereas vowels contribute to grammar and to prosody. Vowel quality is also implicated in regular sound change, which results in paradigmatic changes, whereas consonants are frequently cited as being changed through lexical diffusion. Lastly, consonant systems expand with “minimum articulatory cost”, whereas vowel systems appear to follow pressure for maximal perceptual differences.
All of this, in turn, is part of a larger slew of recent rebuttals of Atkinson's 2011 paper purporting, quite unconvincingly with statistical tools and a popular online atlas of language features, to show serial founder effects in the global patterns of phoneme inventories, which would suggest that as people moved further from Africa that they lost sounds from their languages without gaining new ones.
Another key point made in the whole post, but not really captured in the quote above, is that the data aren't as independent as they seem, because phonemes are embedded in an interdependent set of language features such that particular language features bias languages towards having many other complementary features, and when one of these features changes, other features typically change in response to them. Put another way, most language features are functional rather than neutral, even though they show great diversity from language to language.