1 Descriptive statistics

For the exercises below download the friendsAndCars.csv file and save it on your I:-drive. Make sure that you don't have any empty lines at the end of the file.

1.1 Preprocessing

The friendsAndCars.csv file contains a relation between people and the cars they own. In order to use descriptive statistics it is best to calculate frequencies:
  • people and how many cars they own, or
  • cars and how many people own these cars.

    The following Python script counts how often each type of car is in the list:

    from networkx import *
    from operator import *
    from sets import Set
    F = read_edgelist("friendsAndCars.csv",delimiter=",",create_using=DiGraph())
    a = [item[1] for item in sorted(F.edges(),key=itemgetter(1))]
    for item in Set(a):
       print item + "," + str(a.count(item))
    
    You can save this as countFreq.py and run it on the command-line using
    python countFreq.py > CarsCounted.csv
    

    1.2 Using Excel (or OpenOffice)

    If you double click on CarsCounted.csv, it will open in Excel.

    Measures of central tendency and dispersion are functions in Excel. For example, AVERAGE(B1:B6) calculates the average (arithmetic mean) of the values in cells B1 to B6.

    measureformula
    modeMODE()
    medianMEDIAN()
    mean AVERAGE()
    varianceVAR()
    standard deviationSTDEV()

    1.3 Exercises

    1) Calculate the measures in the table above for the Cars data.

    2) How can you interpret the data: what is the central value? Is this a normal distribution?

    3) Produce a chart (diagram) of the data. In order to do this, you should highlight the data and then select the chart wizzard. You may want to create a label for each column first.