Lecture 14 – HTTP Basics

DSC 80, Spring 2022



Recap: Imputation

Example: Heights 🧍📏

Mean imputation

The 'child' column has missing values.

Probabilistic imputation

The 'child' column has missing values.

No spikes!



Multiple imputation


  1. Start with observed and incomplete data.
  2. Create several imputed versions of the data through a probabilistic procedure.

    • The imputed datasets are identical for the observed data entries.
    • They differ in the imputed values.
    • The differences reflect our uncertainty about what value to impute.
  3. Then, estimate the parameters of interest for each imputed dataset.

    • For instance, the mean, standard deviation, median, etc.
  4. Finally, pool the parameter estimates into one estimate.

Let's try this procedure out on the heights_mcar dataset.

Each time we run the following cell, it generates a new imputed version of the 'child' column.

Let's run the above procedure 100 times.

Let's plot some of the imputed columns above.

Let's look at the distribution of means across the imputed columns.

Summary of imputation techniques

See the end of Lecture 13 for a detailed summary of all imputation techniques that we've seen so far.

Introduction to HTTP

Collecting data

Data on the internet

Collecting data from the internet


UCSD was a node in ARPANET, the predecessor to the modern internet (source).

The request-response model

HTTP follows the request-response model.

Request methods

Example GET request

Below is an example GET HTTP request made by a browser when accessing datascience.ucsd.edu.

GET / HTTP/1.1
Host: datascience.ucsd.edu
User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/100.0.4896.127 Safari/537.36
Connection: keep-alive
Accept-Language: en-US,en;q=0.9

Example GET response

The response below was generated by executing the request on the previous slide.

HTTP/1.1 200 OK
Date: Fri, 29 Apr 2022 02:54:41 GMT
Server: Apache
Link: <https://datascience.ucsd.edu/wp-json/>; rel="https://api.w.org/"
Link: <https://datascience.ucsd.edu/wp-json/wp/v2/pages/2427>; rel="alternate"; type="application/json"
Link: <https://datascience.ucsd.edu/>; rel=shortlink
Content-Type: text/html; charset=UTF-8

<!DOCTYPE html>
<html lang="en-US">
    <meta charset="UTF-8">
    <link rel="profile" href="https://gmpg.org/xfn/11">
    <style media="all">img.wp-smiley,img.emoji{display:inline !important;border:none

Consequences of the request-response model

Example: istheshipstuck.com

Read Inside a viral website, an account of what it's like to run a site that gained 50 million+ views in 5 days.

Making HTTP requests

Making HTTP requests

There are (at least) two ways to make HTTP requests:

Making HTTP requests using curl

curl is a command-line tool that sends HTTP requests, like a browser.

  1. The client, curl, sends a HTTP request.
  2. The request contains a method (e.g. GET or POST).
  3. The HTTP server responds with
    • a status line, indicating if things went well,
    • response headers, and
    • (usually) a response body, containing the requested data.

Example: GET requests via curl

curl -v https://httpbin.org/html
# (`-v` is short for verbose)

Queries in a GET request


Example: POST requests via curl

curl -d 'name=King Triton' https://httpbin.org/post

Making HTTP requests using requests

Example: GET requests via requests

To access the source code of the UCSD home page, all we need to run is the following:

text = requests.get('https://ucsd.edu').text

resp is now a Response object.

The text attribute of resp is a string that containing the entire response.

The url attribute contains the URL that we accessed.

Example: POST requests via requests

HTTP status codes

Successful requests ✅

The data formats of the internet

The internet currently relies on two key data formats – HTML and JSON.


JSON data types

See json-schema.org for more details.

Example JSON object

See data/family.json.

Aside: eval

eval gone wrong

Handling unfamiliar data

Summary, next time