## A324-x 

'''
Finding the average time units needed for the 
transmission of one character using the Morse code.
'''


## 0. Data

# Frequencies of characters, including space '_'

F={'a': 651738, 'b':  124248, 'c': 217339, 'd': 349835, 'e':1041442, 
   'f': 197881, 'g':  158610, 'h': 492888, 'i': 558094, 'j':   9033, 
   'k':  50529, 'l':  331490, 'm': 202124, 'n': 564513, 'o': 596302, 
   'p': 137645, 'q':    8606, 'r': 497563, 's': 515760, 't': 729357, 
   'u': 225134, 'v':   82903, 'w': 171272, 'x':  13692, 'y': 145984, 
   'z':   7836, '_': 1918182}
   

# Dictionary M whose entries have the form x:(a,b), where
# x is a character, and a, b are the number of dots and dashes
# in the Morse encoding of x. Note the entry '_':(1,0)

M = {
'a': (1,1), 'b': (3,1), 'c': (2,2), 'd': (2,1), 'e':(1,0), 
   'f':(3,1), 'g':(1,2), 'h':(4,0), 'i': (2,0), 'j':(1,3), 
   'k':(1,2), 'l':(3,1), 'm':(0,2), 'n': (1,1), 'o':(0,3), 
   'p':(2,2), 'q':(1,3), 'r':(2,1), 's': (3,0), 't':(0,1), 
   'u':(2,1), 'v':(3,1), 'w':(1,2), 'x': (2,2), 'y':(1,3), 
   'z':(2,2), '_':(1,0)
}

##  1. 
''' Define a function time_units(x) that gives the
    the number of time units needed fot the transmission of x.
    Add comments that document your reasoning.
'''
#Counting time units
'''
A dot counts for 2: 
   1 for the dot and 1 for the space before next Morse element. 
A dash counts for 4: 
   3 for the dash and 1 for the space before next Morse element.
These two parts amount to 2a+4b.

For the separation before next letter (3 units), we have to count
2 units, for one is already counted in the preceeding elements.
Thus far we have 2a+4b+2.

The interval between words is 7 units. If we count space as 
one more character, 3 of these units are accounted for by the previous
character, and of the remaning four, the last three are
trailing units to separate the space from next character.
So a space is indeed well represented by (1,0).
To summarize, for all characters, including '_',
the number of units is 2a+4b+2. 
'''
def time_units(x): return 2*M[x][0] + 4*M[x][1] + 2

##  2. 
''' Produce an expression that gives the (weighted) average
    A of time units required to transmit a character.
'''

A = sum(F[c]*time_units(c) for c in F)/sum(F.values())

print("average =", A)


##  3. 
''' Which characters have maximum (minimum) time-length?
'''  

U = [time_units(c) for c in F]
m = max(U)

Umax = [c for c in F if time_units(c) == m]
print('maximum length: ',m, ', achived by the characters ',Umax)

Umin = [c for c in F if time_units(c) == 4]

print('minimum length: ',4, ', achived by the characters ',Umin)

##  4.
''' Compare the average A with the average time-length 
    of Huffman encoding assuming that a bit transmission 
    amounts to a time unit. 
'''
# The average number of bits per character for the Huffman
# encoding was about 4.11, while the Morse encoding requires
# A = 8.07 time units per character. The comparison, however,
# is rather unfair, as Morse transmission was done
# under quite different conditions that those present 
# in Huffman transmission. 




