Appearance
Week 09 Laboratory Sample Solutions
Objectives
- Developing Python & Shell skills
- Exploring simple approaches to scraping data from the web
Preparation
Before the lab you should re-read the relevant lecture slides and their accompanying examples.
Getting Started
Set up for the lab by creating a new directory called lab09 and changing to this directory.
mkdir lab09
cd lab09There are no provided files for this lab.
Exercise: What Courses Does UNSW Have this Year - Shell
Write a POSIX-compatible shell script courses.sh which given a course prefix, e.g. COMP, prints the course codes and names of all UNSW courses with that prefix offered this year on the Kensington Campus.
courses should be sorted by course number (lowest to highest).
duplicate course codes should be removed, keeping the course whose name is alphabetically first.
./courses.sh VISN
VISN1101 Seeing the World: Perspectives from Vision Science
VISN1111 Geometrical and Physical Optics
VISN1221 Visual Optics
VISN2111 Ocular Anatomy and Physiology
VISN2211 Organisation and Function of the Visual System
VISN3111 Development and Aging of the Visual System
VISN4016 Vision Science Honours
VISN5511 The Visual System, Impairments and Implications
VISN5512 Sensory Processes and Movement
VISN5513 Orientation and Mobility Foundations: Disability, Diversity and Inclusion
VISN5521 Orientation and Mobility Techniques
VISN5522 Vision Rehabilitation
VISN5523 Orientation and Mobility in Practice
VISN5531 Development and Ageing: Implications for Orientation and Mobility
./courses.sh COMP | tail
COMP9491 Applied Artificial Intelligence
COMP9511 Human Computer Interaction
COMP9517 Computer Vision
COMP9727 Recommender Systems
COMP9801 Extended Algorithm Design and Analysis
COMP9814 Extended Artificial Intelligence
COMP9900 Information Technology Project
COMP9991 Research Project A
COMP9992 Research Project B
COMP9993 Research Project CMake the first line of your shell-script #!/bin/dash
The information you need for the course code prefix COMP can be found in this web page: http://www.timetable.unsw.edu.au/2024/COMPKENS.html. You can assume this is the case for all valid prefixes.
The command curl will download a URL and print it to standard output.e.g.:
curl --location --silent http://www.timetable.unsw.edu.au/2024/COMPKENS.html | head
<title>Class Search by Teaching Period</title>
<link rel="stylesheet" type="text/css" href="../layout/2024/myunsw.css">
<head>
<meta http-equiv="Content-Type" content="text/html; charset=iso-8859-1">
<table width="100%" cellspacing="0" cellpadding="0">
<form name="googleForm" method="GET" action="http://www.google.com/u/theuniversityofnewsouthwales" target="_blank">
<tr>
<td width="30%" style="height:120px; border-bottom:10px solid #FFCC00; background-color:#fff;"><a href="http://www.unsw.edu.au" target="_blank"><img border="0" src="/images-timetable/banner2020.jpg" alt="The University of New South Wales" width="836" height="179" style="float:left; margin-left:20px; margin-top:25px;"></a></td>
<td width="70%" style="height:120px; border-bottom:10px solid #FFCC00; background-color:#fff; vertical-align:bottom; " align="right">In a script it is best run as curl --silent so it doesn't print extra information on standard error.
The --location is required so curl will follow a HTTP redirect from the URL.
You may find uniq's -w option useful when removing duplicate courses.
You may not use non-POSIX-compatible shell features such as bash extensions.
Your script must work when run by/bin/dashon a CSE system.
You are not permitted to rely on the extra features provided by/bin/bashor/bin/sh.
You can assume anything that works with the version of/bin/dashon CSE systems is POSIX compatible.You may not use Perl, C, Python, or any other language.
You must use the URL provided above to download the course list.
No error checking is necessary.
When you think your program is working, you can use autotest to run some simple automated tests:
2041 autotest shell_coursesWhen you are finished working on this exercise, you must submit your work by running give:
give cs2041 lab09_shell_courses courses.shbefore Monday 15 April 12:00 (midday) (2024-04-15 12:00:00) to obtain the marks for this lab exercise.
Sample solution for courses.sh
#! /bin/dash
YEAR=2024
case "$#" in
1) course_prefix="$1" ;;
*)
echo "Usage: $0 <course prefix>"
exit 1
;;
esac
case "${course_prefix}" in
[A-Z][A-Z][A-Z][A-Z]) ;;
*)
echo "Invalid course prefix: ${course_prefix}"
exit 1
;;
esac
url="http://www.timetable.unsw.edu.au/${YEAR}/${course_prefix}KENS.html"
curl --location --silent "${url}" |
grep -E "${course_prefix}[0-9]{4}\.html" |
sed '
s/.*href="//
s/.html">/ /
s/<.*//
' |
grep -Ev "${course_prefix}[0-9]{4} ${course_prefix}[0-9]{4}$" |
sort |
uniq -w8
Exercise: What Courses Does UNSW Have this Year - Python/subprocess
Write a Python script courses_subprocess.py which given a course prefix, e.g. COMP, prints the course codes and names of all UNSW courses with that prefix offered this year on the Kensington Campus.
./courses_subprocess.py VISN
VISN1101 Seeing the World: Perspectives from Vision Science
VISN1111 Geometrical and Physical Optics
VISN1221 Visual Optics
VISN2111 Ocular Anatomy and Physiology
VISN2211 Organisation and Function of the Visual System
VISN3111 Development and Aging of the Visual System
VISN4016 Vision Science Honours
VISN5511 The Visual System, Impairments and Implications
VISN5512 Sensory Processes and Movement
VISN5513 Orientation and Mobility Foundations: Disability, Diversity and Inclusion
VISN5521 Orientation and Mobility Techniques
VISN5522 Vision Rehabilitation
VISN5523 Orientation and Mobility in Practice
VISN5531 Development and Ageing: Implications for Orientation and Mobility
./courses_subprocess.py COMP | tail
COMP9491 Applied Artificial Intelligence
COMP9511 Human Computer Interaction
COMP9517 Computer Vision
COMP9727 Recommender Systems
COMP9801 Extended Algorithm Design and Analysis
COMP9814 Extended Artificial Intelligence
COMP9900 Information Technology Project
COMP9991 Research Project A
COMP9992 Research Project B
COMP9993 Research Project CYou should use the subprocess module to download the web page.
Using the same curl command as the last activity.
The information you need for the course code prefix COMP can be found in this web page: http://www.timetable.unsw.edu.au/2024/COMPKENS.html. You can assume this is the case for all prefixes.
Your answer must be Python only. You can not use other languages such as Shell, Perl or C.
You may run external programs with the
subprocessmodule.You must use the URL provided above to download the course list.
No error checking is necessary.
When you think your program is working, you can use autotest to run some simple automated tests:
2041 autotest python_courses_subprocessWhen you are finished working on this exercise, you must submit your work by running give:
give cs2041 lab09_python_courses_subprocess courses_subprocess.pybefore Monday 15 April 12:00 (midday) (2024-04-15 12:00:00) to obtain the marks for this lab exercise.
Sample solution for courses_subprocess.py
#! /usr/bin/env python3
import subprocess
import sys
import re
YEAR = "2024"
assert len(sys.argv) == 2, f"Usage: {sys.argv[0]} <course prefix>"
course_prefix = sys.argv[1]
assert re.fullmatch(r"[A-Z]{4}", course_prefix), f"Invalid course prefix: {course_prefix}"
url = f"http://www.timetable.unsw.edu.au/{YEAR}/{course_prefix}KENS.html"
proc = subprocess.run(['curl', '--location', '--silent', url], capture_output=True, text=True)
courses = []
for m in re.findall(rf"^.*{course_prefix}[0-9]4\.html.*$", proc.stdout, flags=re.MULTILINE):
m = re.search(r"""<a href="(?P<code>[A-Z]{4}[0-9]{4})\.html">(?P<name>.*?)</a>""", m)
code = m.group('code')
name = m.group('name')
if code != name:
courses.append((code, name))
for code, name in sorted(set(courses)):
print(f"{code} {name}")
Exercise: What Courses Does UNSW Have this Year - Python/requests
Write a Python script courses_requests.py which given a course prefix, e.g. COMP, prints the course codes and names of all UNSW courses with that prefix offered this year on the Kensington Campus.
./courses_requests.py VISN
VISN1101 Seeing the World: Perspectives from Vision Science
VISN1111 Geometrical and Physical Optics
VISN1221 Visual Optics
VISN2111 Ocular Anatomy and Physiology
VISN2211 Organisation and Function of the Visual System
VISN3111 Development and Aging of the Visual System
VISN4016 Vision Science Honours
VISN5511 The Visual System, Impairments and Implications
VISN5512 Sensory Processes and Movement
VISN5513 Orientation and Mobility Foundations: Disability, Diversity and Inclusion
VISN5521 Orientation and Mobility Techniques
VISN5522 Vision Rehabilitation
VISN5523 Orientation and Mobility in Practice
VISN5531 Development and Ageing: Implications for Orientation and Mobility
./courses_requests.py COMP | tail
COMP9491 Applied Artificial Intelligence
COMP9511 Human Computer Interaction
COMP9517 Computer Vision
COMP9727 Recommender Systems
COMP9801 Extended Algorithm Design and Analysis
COMP9814 Extended Artificial Intelligence
COMP9900 Information Technology Project
COMP9991 Research Project A
COMP9992 Research Project B
COMP9993 Research Project CYou should use the requests module to download the web page.
You should use the BeautifulSoup and html5lib modules to parse the HTML.
The fetch_website_text.py lecture example uses BeautifulSoup (but not requests).
Use BeautifulSoup to find all links (a tags) in a page then look at the link's href and text fields.
The information you need for the course code prefix COMP can be found in this web page: http://www.timetable.unsw.edu.au/2024/COMPKENS.html. You can assume this is the case for all prefixes.
Your answer must be Python only. You can not use other languages such as Shell, Perl or C.
You may not run external programs.
You must use the URL provided above to download the course list.
No error checking is necessary.
You can find documentation for the
requestsmodule here: https://pypi.org/project/requests/You can find documentation for the
BeautifulSoupmodule here: https://pypi.org/project/beautifulsoup4/
When you think your program is working, you can use autotest to run some simple automated tests:
2041 autotest python_courses_requestsWhen you are finished working on this exercise, you must submit your work by running give:
give cs2041 lab09_python_courses_requests courses_requests.pybefore Monday 15 April 12:00 (midday) (2024-04-15 12:00:00) to obtain the marks for this lab exercise.
Sample solution for courses_requests.py
#! /usr/bin/env python3
import requests
import bs4
import sys
import re
YEAR = "2024"
assert len(sys.argv) == 2, f"Usage: {sys.argv[0]} <course prefix>"
course_prefix = sys.argv[1]
assert re.fullmatch(r"[A-Z]{4}", course_prefix), f"Invalid course prefix: {course_prefix}"
url = f"http://www.timetable.unsw.edu.au/{YEAR}/{course_prefix}KENS.html"
soup = bs4.BeautifulSoup(requests.get(url).text, 'html5lib')
courses = []
for tag in soup.find_all('a'):
code = tag.get('href', '')
name = tag.text
if re.fullmatch(r'[A-Z]{4}[0-9]{4}\.html', code) and code[:-5] != name:
courses.append((code[:-5], name))
for code, name in sorted(set(courses)):
print(f"{code} {name}")
Challenge Exercise: What Can't Regexes Do?
Write a regular expression which matches a unary number iff it is composite (not prime).
In other words, write a regex that matches a string of n ones iff n is composite.
Here is a test program assist you in doing this:
#! /usr/bin/env python3
from sys import argv
from re import search
from math import log, floor
assert len(argv) == 4, f"Usage: {argv[0]} <min> <max> <regex>"
min, max, regex = argv[1], argv[2], argv[3]
assert len(regex) <= 80, "regex too large";
padding = floor(log(int(max) + 1, 10)) + 1
for i in range(int(min), int(max) + 1):
unary = '1' * i
print(f"{i:{padding}} = {unary} unary -", "composite" if search(regex, unary) else "prime")
Download test_regex_prime.py, or copy it to your CSE account using the following command:
cp -n /import/ravel/A/cs2041/public_html/24T1/activities/regex_prime/test_regex_prime.py test_regex_prime.pyFor example to test the regex ^1{7,10}$ against the integers 2 to 12, you can run
chmod 755 test_regex_prime.py
./test_regex_prime.py 2 12 '^1{7,10}$'
2 = 11 unary - prime
3 = 111 unary - prime
4 = 1111 unary - prime
5 = 11111 unary - prime
6 = 111111 unary - prime
7 = 1111111 unary - composite
8 = 11111111 unary - composite
9 = 111111111 unary - composite
10 = 1111111111 unary - composite
11 = 11111111111 unary - prime
12 = 111111111111 unary - primePut your solution in regex_prime.txt, for example:
./test_regex_prime.py 40 50 "$(cat regex_prime.txt)"
40 = 1111111111111111111111111111111111111111 unary - composite
41 = 11111111111111111111111111111111111111111 unary - prime
42 = 111111111111111111111111111111111111111111 unary - composite
43 = 1111111111111111111111111111111111111111111 unary - prime
44 = 11111111111111111111111111111111111111111111 unary - composite
45 = 111111111111111111111111111111111111111111111 unary - composite
46 = 1111111111111111111111111111111111111111111111 unary - composite
47 = 11111111111111111111111111111111111111111111111 unary - prime
48 = 111111111111111111111111111111111111111111111111 unary - composite
49 = 1111111111111111111111111111111111111111111111111 unary - composite
50 = 11111111111111111111111111111111111111111111111111 unary - compositeThis exercise is not possible with true regular expression, i.e using
|*()alone,
You will need to use additional regular expression syntax to achieve the same result.
Python regular expression syntax is described in the In the RE module documentation.Your regex must be less than 80 characters.
This is a
(in)?famousproblem to try and solve with regex.
There will be answers easily found online.
Don't google for other people solutions - see if you can come up with your own.
When you think your program is working, you can use autotest to run some simple automated tests:
2041 autotest regex_primeWhen you are finished working on this exercise, you must submit your work by running give:
give cs2041 lab09_regex_prime regex_prime.txtbefore Monday 15 April 12:00 (midday) (2024-04-15 12:00:00) to obtain the marks for this lab exercise.
Sample solution for regex_prime.txt
^1?$|^(11+?)\1+$
Submission
When you are finished each exercises make sure you submit your work by running give.
You can run give multiple times. Only your last submission will be marked.
Don't submit any exercises you haven't attempted.
If you are working at home, you may find it more convenient to upload your work via give's web interface.
Remember you have until Week 10 Monday 12:00:00 (midday) to submit your work.
You cannot obtain marks by e-mailing your code to tutors or lecturers.
You check the files you have submitted here.
Automarking will be run by the lecturer several days after the submission deadline, using test cases different to those autotest runs for you. (Hint: do your own testing as well as running autotest.)
After automarking is run by the lecturer you can view your results here. The resulting mark will also be available via give's web interface.
Lab Marks
When all components of a lab are automarked you should be able to view the the marks via give's web interface or by running this command on a CSE machine:
2041 classrun -sturecCOMP(2041|9044) 24T1: Software Construction is brought to you by
the School of Computer Science and Engineering
at the University of New South Wales, Sydney.
For all enquiries, please email the class account at cs2041@cse.unsw.edu.au
CRICOS Provider 00098G