Short Introduction to Programming in Python

Overview

Teaching: 30 min
Exercises: 5 min

Questions

How do I program in Python?

How can I represent my data in Python?

Objectives

Describe the advantages of using programming vs. completing repetitive tasks by hand.

Define the following data types in Python: strings, integers, and floats.

Perform mathematical operations in Python using basic operators.

Define the following as it relates to Python: lists, tuples, and dictionaries.

Interpreter

Python is an interpreted language which can be used in two ways:

“Interactively”: when you use it as an “advanced calculator” executing one command at a time. To start Python in this mode, execute python on the command line:

$ python

Python 3.5.1 (default, Oct 23 2015, 18:05:06)
[GCC 4.8.3] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>>

Chevrons >>> indicate an interactive prompt in Python, meaning that it is waiting for your input.

2 + 2

print("Hello World")

Hello World

“Scripting” Mode: executing a series of “commands” saved in text file, usually with a .py extension after the name of your file:

$ python my_script.py

Hello World

Introduction to variables in Python

Assigning values to variables

One of the most basic things we can do in Python is assign values to variables:

text = "Data Carpentry"  # An example of assigning a value to a new text variable,
                         # also known as a string data type in Python
number = 42              # An example of assigning a numeric value, or an integer data type
pi_value = 3.1415        # An example of assigning a floating point value (the float data type)

Here we’ve assigned data to the variables text, number and pi_value, using the assignment operator =. To review the value of a variable, we can type the name of the variable into the interpreter and press Return:

text

"Data Carpentry"

Everything in Python has a type. To get the type of something, we can pass it to the built-in function type:

type(text)

<class 'str'>

type(number)

<class 'int'>

type(pi_value)

<class 'float'>

The variable text is of type str, short for “string”. Strings hold sequences of characters, which can be letters, numbers, punctuation or more exotic forms of text (even emoji!).

We can also see the value of something using another built-in function, print:

print(text)

Data Carpentry

print(number)

This may seem redundant, but in fact it’s the only way to display output in a script:

example.py

# A Python script file
# Comments in Python start with #
# The next line assigns the string "Data Carpentry" to the variable "text".
text = "Data Carpentry"

# The next line does nothing!
text

# The next line uses the print function to print out the value we assigned to "text"
print(text)

Running the script

$ python example.py

Data Carpentry

Notice that “Data Carpentry” is printed only once.

Tip: print and type are built-in functions in Python. Later in this lesson, we will introduce methods and user-defined functions. The Python documentation is excellent for reference on the differences between them.

Types of Data

How information is stored in Python objects affects what we can do with it and the outputs of calculations as well. There are two main types of data that we will explore: numeric and text data types.

Text Data Type

The text data type is known as a string in Python, or object in pandas. Strings can contain numbers and / or characters. For example, a string might be a word, a sentence, or several sentences. A pandas object might also be a plot name like 'plot1'. A string can also contain or consist of numbers. For instance, '1234' could be stored as a string, as could '10.23'. However strings that contain numbers can not be used for mathematical operations!

Numeric Data Types

Numeric data types include integers and floats. A floating point (known as a float) number has decimal points even if that decimal point value is 0. For example: 1.13, 2.0, 1234.345. If we have a column that contains both integers and floating point numbers, pandas will assign the entire column to the float data type so the decimal points are not lost.

An integer will never have a decimal point. Thus if we wanted to store 1.13 as an integer it would be stored as 1. Similarly, 1234.345 would be stored as 1234. You will often see the data type Int64 in pandas which stands for 64 bit integer. The 64 refers to the memory allocated to store data in each cell which effectively relates to how many digits it can store in each “cell”. Allocating space ahead of time allows computers to optimize storage and processing efficiency.

So we’ve learned that computers store numbers in one of two ways: as integers or as floating-point numbers (or floats). Integers are the numbers we usually count with. Floats have fractional parts (decimal places). Let’s next consider how the data type can impact mathematical operations on our data. Addition, subtraction, division and multiplication work on floats and integers as we’d expect.

print(5+5)

print(24-4)

If we divide one integer by another, we get a float. The result on Python 3 is different than in Python 2, where the result is an integer (integer division).

print(5/9)

0.5555555555555556

print(10/3)

3.3333333333333335

We can also convert a floating point number to an integer or an integer to floating point number. Notice that Python by default rounds down when it converts from floating point to integer.

# Convert a to an integer
a = 7.83
int(a)

# Convert b to a float
b = 7
float(b)

7.0

Operators

We can perform mathematical calculations in Python using the basic operators +, -, /, *, %:

2 + 2  # Addition

6 * 7  # Multiplication

2 ** 16  # Power

13 % 5  # Modulo

We can also use comparison and logic operators: <, >, ==, !=, <=, >= and statements of identity such as and, or, not. The data type returned by this is called a boolean.

3 > 4

False

True and True

True

True or False

True

True and False

False

Sequences: Lists and Tuples

Lists

Lists are a common data structure to hold an ordered sequence of elements. Each element can be accessed by an index. Note that Python indexes start with 0 instead of 1:

numbers = [1, 2, 3]
numbers[0]

A for loop can be used to access the elements in a list or other Python data structure one at a time:

for num in numbers:
    print(num)

1
2
3

Indentation is very important in Python. Note that the second line in the example above is indented. Just like three chevrons >>> indicate an interactive prompt in Python, the three dots ... are Python’s prompt for multiple lines. This is Python’s way of marking a block of code. [Note: you do not type >>> or ....]

To add elements to the end of a list, we can use the append method. Methods are a way to interact with an object (a list, for example). We can invoke a method using the dot . followed by the method name and a list of arguments in parentheses. Let’s look at an example using append:

numbers.append(4)
print(numbers)

[1, 2, 3, 4]

To find out what methods are available for an object, we can use the built-in help command:

help(numbers)

Help on list object:

class list(object)
 |  list() -> new empty list
 |  list(iterable) -> new list initialized from iterable's items
 ...

Tuples

A tuple is similar to a list in that it’s an ordered sequence of elements. However, tuples can not be changed once created (they are “immutable”). Tuples are created by placing comma-separated values inside parentheses ().

# Tuples use parentheses
a_tuple = (1, 2, 3)
another_tuple = ('blue', 'green', 'red')

# Note: lists use square brackets
a_list = [1, 2, 3]

Tuples vs. Lists

What happens when you execute a_list[1] = 5?

What happens when you execute a_tuple[2] = 5?

What does type(a_tuple) tell you about a_tuple?

What information does the built-in function len() provide? Does it provide the same information on both tuples and lists? Does the help() function confirm this?

Dictionaries

A dictionary is a container that holds pairs of objects - keys and values.

translation = {'one': 'first', 'two': 'second'}
translation['one']

'first'

Dictionaries work a lot like lists - except that you index them with keys. You can think about a key as a name or unique identifier for the value it corresponds to.

rev = {'first': 'one', 'second': 'two'}
rev['first']

'one'

To add an item to the dictionary we assign a value to a new key:

rev['third'] = 'three'
rev

{'first': 'one', 'second': 'two', 'third': 'three'}

Using for loops with dictionaries is a little more complicated. We can do this in two ways:

for key, value in rev.items():
    print(key, '->', value)

'first' -> one
'second' -> two
'third' -> three

for key in rev.keys():
    print(key, '->', rev[key])

'first' -> one
'second' -> two
'third' -> three

Changing dictionaries

First, print the value of the rev dictionary to the screen.

Reassign the value that corresponds to the key second so that it no longer reads “two” but instead 2.

Print the value of rev to the screen again to see if the value has changed.

For loops

Loops allow us to repeat a workflow (or series of actions) a given number of times or while some condition is true. We would use a loop to automatically process data that’s stored in multiple files (daily values with one file per year, for example). Loops lighten our work load by performing repeated tasks without our direct involvement and make it less likely that we’ll introduce errors by making mistakes while processing each file by hand.

Let’s write a simple for loop that simulates what a kid might see during a visit to the zoo:

animals = ['lion', 'tiger', 'crocodile', 'vulture', 'hippo']
print(animals)

['lion', 'tiger', 'crocodile', 'vulture', 'hippo']

for creature in animals:
    print(creature)

lion
tiger
crocodile
vulture
hippo

The line defining the loop must start with for and end with a colon, and the body of the loop must be indented.

In this example, creature is the loop variable that takes the value of the next entry in animals every time the loop goes around. We can call the loop variable anything we like. After the loop finishes, the loop variable will still exist and will have the value of the last entry in the collection:

animals = ['lion', 'tiger', 'crocodile', 'vulture', 'hippo']
for creature in animals:
    pass

print('The loop variable is now: ' + creature)

The loop variable is now: hippo

We are not asking Python to print the value of the loop variable anymore, but the for loop still runs and the value of creature changes on each pass through the loop. The statement pass in the body of the loop means “do nothing”.

Challenge - Loops

What happens if we don’t include the pass statement?

Rewrite the loop so that the animals are separated by commas, not new lines (Hint: You can concatenate strings using a plus sign. For example, print(string1 + string2) outputs ‘string1string2’).

Suppose you have a list of number xs = [3, 34, 23, 56, 14, 56]. Write a loop to sum the numbers of the list.

If Statements

The body of the test function now has two conditionals (if statements) that check the values of start_year and end_year. If statements execute a segment of code when some condition is met. They commonly look something like this:

a = 5

if a<0:  # Meets first condition?

    # if a IS less than zero
    print('a is a negative number')

elif a>0:  # Did not meet first condition. meets second condition?

    # if a ISN'T less than zero and IS more than zero
    print('a is a positive number')

else:  # Met neither condition

    # if a ISN'T less than zero and ISN'T more than zero
    print('a must be zero!')

Which would return:

a is a positive number

Change the value of a to see how this function works. The statement elif means “else if”, and all of the conditional statements must end in a colon.

The if statements in the function yearly_data_arg_test check whether there is an object associated with the variable names start_year and end_year. If those variables are None, the if statements return the boolean True and execute whatever is in their body. On the other hand, if the variable names are associated with some value (they got a number in the function call), the if statements return False and do not execute. The opposite conditional statements, which would return True if the variables were associated with objects (if they had received value in the function call), would be if start_year and if end_year.

Challenge - Loops + If statements

Suppose you have a list of number xs = [3, 34, 23, 56, 14, 56]. Write a loop to sum the even numbers from the list.

Functions

Defining a section of code as a function in Python is done using the def keyword. For example a function that takes two arguments and returns their sum can be defined as:

def add_function(a, b):
    result = a + b
    return result

z = add_function(20, 22)
print(z)

Key Points

Python is an interpreted language which can be used interactively (executing one command at a time) or in scripting mode (executing a series of commands saved in file).

One can assign a value to a variable in Python. Those variables can be of several types, such as string, integer, floating point and complex numbers.

Lists and tuples are similar in that they are ordered lists of elements; they differ in that a tuple is immutable (cannot be changed).

Dictionaries are data structures that provide mappings between keys and values.

previous episode

Data Analysis and Visualization in Python for Ecologists

next episode