Statistics on syllables in Italian

I generated some statistics about the frequencies of syllables in Italian language to be used in cognitive psychology.

Here's an example of the computations that my script and programmes made.

I started from a list of words with their frequencies in Italian.

Starting from this example file (the actual one was extremely longer):

  • 346 abacó
  • 130 abachi
  • 10982 abaco
  • 1090 abate
  • 11656 abati
  • 7117 abat-jour
  • 17595 abbacchi
  • 6415 abbacchiare
  • 22948 abbacchiarsi
  • 31126 abbacchiato
  • 14558 abbacchio
  • 3571 abbachi
  • 22879 abbacina
  • 18492 abbacinai
  • 1200 te

I hyphenated the words (separated the syllables with ~) and got

  • 346 a~ba~cò
  • 130 a~ba~chi
  • 10982 a~ba~co
  • 1090 a~ba~te
  • 11656 a~ba~ti
  • 7117 a~bat-jour
  • 17595 ab~bac~chi
  • 6415 ab~bac~chia~re
  • 22948 ab~bac~chiar~si
  • 31126 ab~bac~chia~to
  • 14558 ab~bac~chio
  • 3571 ab~ba~chi
  • 22879 ab~ba~ci~na
  • 18492 ab~ba~ci~nai
  • 1200 te

Then the scripts generated some statistic tables about syllables frequencies. Below you'll find some examples.

The meaning of the columns are:

Syllable
a syllable in italian language
% tot
the syllable frequency in percentage
absolute frequency
the absolute syllable frequency
whole word
the absolute frequency when the syllable is the whole word
1, 2, 3, 4
the absolute frequency when the syllable appears in that position in the word
last
the absolute frequency when the syllable appears at the end of the word
Syllable % tot absolute frequency whole word 1 2 3 4 last
a 5.1365 31321 0 31321 0 0 0 0
ab 22.5631 137584 0 137584 0 0 0 0
ba 11.3396 69146 0 0 69146 0 0 0
bac 15.1928 92642 0 0 92642 0 0 0
bat 1.1672 7117 0 0 7117 0 0 0
chi 3.4924 21296 0 0 0 21296 0 21296
chia 6.1565 37541 0 0 0 37541 0 0
chiar 3.7634 22948 0 0 0 22948 0 0
chio 2.3874 14558 0 0 0 14558 0 14558
ci 6.7846 41371 0 0 0 41371 0 0
co 1.8010 10982 0 0 0 10982 0 10982
0.0567 346 0 0 0 346 0 346
jour 1.1672 7117 0 0 0 7117 0 7117
na 3.7520 22879 0 0 0 0 22879 22879
nai 3.0326 18492 0 0 0 0 18492 18492
re 1.0520 6415 0 0 0 0 6415 6415
si 3.7634 22948 0 0 0 0 22948 22948
te 0.3755 2290 1200 0 0 1090 0 1090
ti 1.9115 11656 0 0 0 11656 0 11656
to 5.1045 31126 0 0 0 0 31126 31126

Here the syllables are grouped by “skeleton”: c stands for consonant and v for vowel

Skeleton % tot absolute frequency whole word 1 2 3 4 last
Svvc 1.1672 7117 0 0 0 7117 0 7117
ccv 3.4924 21296 0 0 0 21296 0 21296
ccvv 8.5440 52099 0 0 0 52099 0 14558
ccvvc 3.7634 22948 0 0 0 22948 0 0
cv 35.9410 219159 1200 0 69146 65445 83368 107442
cvc 16.3600 99759 0 0 99759 0 0 0
cvv 3.0326 18492 0 0 0 0 18492 18492
v 5.1365 31321 0 31321 0 0 0 0
vc 22.5631 137584 0 137584 0 0 0 0

Here the syllables are grouped by number of letters

N of letters % tot absolute frequency whole word 1 2 3 4 last
1_let 5.1365 31321 0 31321 0 0 0 0
2_let 58.5040 356743 1200 137584 69146 65445 83368 107442
3_let 22.8850 139547 0 0 99759 21296 18492 39788
4_let 9.7111 59216 0 0 0 59216 0 21675
5_let 3.7634 22948 0 0 0 22948 0 0

Here the syllables are grouped by “type”. In Italian there are “open” and “closed” syllables.

Syllable type % tot absolute frequency whole word 1 2 3 4 last
open 56.1464 342367 1200 31321 69146 138840 101860 161788
closed 43.8536 267408 0 137584 99759 30065 0 7117