Unit 2.2 Data Compression, Images
Lab will perform alterations on images, manipulate RGB values, and reduce the number of pixels. College Board requires you to learn about Lossy and Lossless compression.
- Enumerate "Data" Big Idea from College Board
- Image Files and Size
- Python Libraries and Concepts used for Jupyter and Files/Directories
- How does the meta data source and label relate to Unit 5 topics?
- Reading and Encoding Images (2 implementations follow)
- Data Structures, Imperative Programming Style, and working with Images
- Data Structures and OOP
- Additionally, review all the imports in these three demos. Create a definition of their purpose, specifically these ...
- Hacks
Enumerate "Data" Big Idea from College Board
Some of the big ideas and vocab that you observe, talk about it with a partner ...
- Data compression is the reduction of the number of bits needed to represent data
- Data compression is used to save transmission time and storage space.
- lossy data can reduce data but the original data is not recovered
- lossless data lets you restore and recover
The Image Lab Project contains a plethora of College Board Unit 2 data concepts. Working with Images provides many opportunities for compression and analyzing size.
Python Libraries and Concepts used for Jupyter and Files/Directories
Introduction to displaying images in Jupyter notebook
IPython
Support visualization of data in Jupyter notebooks. Visualization is specific to View, for the web visualization needs to be converted to HTML.
pathlib
File paths are different on Windows versus Mac and Linux. This can cause problems in a project as you work and deploy on different Operating Systems (OS's), pathlib is a solution to this problem.
- What are commands you use in terminal to access files?
- cd, ls, cat, mv, etc
- What are the command you use in Windows terminal to access files?
- cd, dir, del, move, etc
- What are some of the major differences?
- the windows terminal commands are less abbreviated than the ones used by linux and Mac
Provide what you observed, struggled with, or leaned while playing with this code.
- Why is path a big deal when working with images?
- If you list the wrong path, then the code will not be abel to display the image because the image does not exist at that path.
- Look up IPython, describe why this is interesting in Jupyter Notebooks for both Pandas and Images?
- IPython is an interactive command-line interface for Python that provides features like tab completion, object introspection, and easy access to the system shell. IPython is interesting in Jupyter Notebooks because it provides a more powerful and flexible way to work with Pandas data frames and images.
- IPython provides tab completion for column names and easy access to the documentation of Pandas functions. This makes it easier to explore and manipulate data frames interactively. In addition, IPython provides the ability to display data frames in a more readable format, making it easier to view and analyze large data sets.
from IPython.display import Image, display
from pathlib import Path # https://medium.com/@ageitgey/python-3-quick-tip-the-easy-way-to-deal-with-file-paths-on-windows-mac-and-linux-11a072b58d5f
# prepares a series of images
def image_data(path=Path("images/"), images=None): # path of static images is defaulted
if images is None: # default image
images = [
{'source': "Peter Carolin", 'label': "Clouds Impression", 'file': "clouds-impression.png"},
{'source': "Peter Carolin", 'label': "Lassen Volcano", 'file': "lassen-volcano.jpg"}
]
for image in images:
# File to open
image['filename'] = path / image['file'] # file with path
return images
def image_display(images):
for image in images:
display(Image(filename=image['filename']))
# Run this as standalone tester to see sample data printed in Jupyter terminal
if __name__ == "__main__":
# print parameter supplied image
green_square = image_data(images=[{'source': "Internet", 'label': "Green Square", 'file': "green-square-16.png"}])
image_display(green_square)
# display default images from image_data()
default_images = image_data()
image_display(default_images)
Reading and Encoding Images (2 implementations follow)
PIL (Python Image Library)
Pillow or PIL provides the ability to work with images in Python. Geeks for Geeks shows some ideas on working with images.
base64
Image formats (JPG, PNG) are often called *Binary File formats, it is difficult to pass these over HTTP. Thus, base64 converts binary encoded data (8-bit, ASCII/Unicode) into a text encoded scheme (24 bits, 6-bit Base64 digits). Thus base64 is used to transport and embed binary images into textual assets such as HTML and CSS.
- How is Base64 similar or different to Binary and Hexadecimal?
- Binary is a base-2 numbering system that uses only two symbols, typically 0 and 1, to represent data. Hexadecimal, on the other hand, is a base-16 numbering system that uses 16 symbols to represent data, typically the digits 0-9 and the letters A-F. Base64 is a method of encoding data using a set of 64 characters, typically consisting of letters, digits, and symbols. Base64 is often used to represent binary data as text, for example in email attachments or in URLs. Unlike binary and hexadecimal, Base64 is specifically designed to be human-readable and easy to work with for non-technical users.
- Translate first 3 letters of your name to Base64.
- viv -> dml2
numpy
Numpy is described as "The fundamental package for scientific computing with Python". In the Image Lab, a Numpy array is created from the image data in order to simplify access and change to the RGB values of the pixels, converting pixels to grey scale.
io, BytesIO
Input and Output (I/O) is a fundamental of all Computer Programming. Input/output (I/O) buffering is a technique used to optimize I/O operations. In large quantities of data, how many frames of input the server currently has queued is the buffer. In this example, there is a very large picture that lags.
- Where have you been a consumer of buffering?
- When watching movies or shows on a computer, sometimes the screen will be in a buffering phase. This means that its probably queuing all the frames of the movie.
- From your consumer experience, what effects have you experienced from buffering?
- Usually buffering is associated with lag so when a movie I'm watching is buffering, I can expect the entire experience to be laggy and unenjoyable.
- How do these effects apply to images?
- Really big images could buffer since there are a lot of frames that need to be loaded.
Data Structures, Imperative Programming Style, and working with Images
Introduction to creating meta data and manipulating images. Look at each procedure and explain the the purpose and results of this program. Add any insights or challenges as you explored this program.
- Does this code seem like a series of steps are being performed?
- yes
- Describe Grey Scale algorithm in English or Pseudo code?
- Load the colored image into memory.
- Create a new empty image of the same size as the original image to store the grayscale version.
- For each pixel in the original image:a. Retrieve the RGB color values for the pixel. b. Calculate the average of the RGB values. c. Assign the average value to all three RGB channels of the corresponding pixel in the new grayscale image.
- Save the grayscale image to disk.
- Describe scale image? What is before and after on pixels in three images?
- Scaling an image refers to the process of resizing it to a larger or smaller size while maintaining its aspect ratio. The aspect ratio of an image is the ratio of its width to its height.
- When an image is scaled, each pixel in the original image is either expanded or compressed to fit the new size. This can result in changes in the overall visual appearance of the image.
- Is scale image a type of compression? If so, line it up with College Board terms described?
- Scaling an image can be seen as a form of compression, specifically as a form of lossy compression, which is a type of compression that reduces the file size of an image by removing or reducing redundant information or data that can be reconstructed to a certain extent without losing too much visual information.
import sys
print(sys.executable)
# c:\Users\vivia\AppData\Local\Programs\Python\Python310 -m pip install pillow
from IPython.display import HTML, display
from pathlib import Path # https://medium.com/@ageitgey/python-3-quick-tip-the-easy-way-to-deal-with-file-paths-on-windows-mac-and-linux-11a072b58d5f
from PIL import Image as pilImage # as pilImage is used to avoid conflicts
from io import BytesIO
import base64
import numpy as np
# prepares a series of images
def image_data(path=Path("images/"), images=None): # path of static images is defaulted
if images is None: # default image
images = [
{'source': "Internet", 'label': "Green Square", 'file': "green-square-16.png"},
{'source': "Peter Carolin", 'label': "Clouds Impression", 'file': "clouds-impression.png"},
{'source': "Peter Carolin", 'label': "Lassen Volcano", 'file': "lassen-volcano.jpg"}
]
for image in images:
# File to open
image['filename'] = path / image['file'] # file with path
return images
# Large image scaled to baseWidth of 320
def scale_image(img):
baseWidth = 320
scalePercent = (baseWidth/float(img.size[0]))
scaleHeight = int((float(img.size[1])*float(scalePercent)))
scale = (baseWidth, scaleHeight)
return img.resize(scale)
# PIL image converted to base64
def image_to_base64(img, format):
with BytesIO() as buffer:
img.save(buffer, format)
return base64.b64encode(buffer.getvalue()).decode()
# Set Properties of Image, Scale, and convert to Base64
def image_management(image): # path of static images is defaulted
# Image open return PIL image object
img = pilImage.open(image['filename'])
# Python Image Library operations
image['format'] = img.format
image['mode'] = img.mode
image['size'] = img.size
# Scale the Image
img = scale_image(img)
image['pil'] = img
image['scaled_size'] = img.size
# Scaled HTML
image['html'] = '<img src="data:image/png;base64,%s">' % image_to_base64(image['pil'], image['format'])
# Create Grey Scale Base64 representation of Image
def image_management_add_html_grey(image):
# Image open return PIL image object
img = image['pil']
format = image['format']
img_data = img.getdata() # Reference https://www.geeksforgeeks.org/python-pil-image-getdata/
image['data'] = np.array(img_data) # PIL image to numpy array
image['gray_data'] = [] # key/value for data converted to gray scale
# 'data' is a list of RGB data, the list is traversed and hex and binary lists are calculated and formatted
for pixel in image['data']:
# create gray scale of image, ref: https://www.geeksforgeeks.org/convert-a-numpy-array-to-an-image/
average = (pixel[0] + pixel[1] + pixel[2]) // 3 # average pixel values and use // for integer division
if len(pixel) > 3:
image['gray_data'].append((average, average, average, pixel[3])) # PNG format
else:
image['gray_data'].append((average, average, average))
# end for loop for pixels
img.putdata(image['gray_data'])
image['html_grey'] = '<img src="data:image/png;base64,%s">' % image_to_base64(img, format)
# Jupyter Notebook Visualization of Images
if __name__ == "__main__":
# Use numpy to concatenate two arrays
images = image_data()
# Display meta data, scaled view, and grey scale for each image
for image in images:
image_management(image)
print("---- meta data -----")
print(image['label'])
print(image['source'])
print(image['format'])
print(image['mode'])
print("Original size: ", image['size'])
print("Scaled size: ", image['scaled_size'])
print("-- original image --")
display(HTML(image['html']))
print("--- grey image ----")
image_management_add_html_grey(image)
display(HTML(image['html_grey']))
print()
Data Structures and OOP
Most data structures classes require Object Oriented Programming (OOP). Since this class is lined up with a College Course, OOP will be talked about often. Functionality in remainder of this Blog is the same as the prior implementation. Highlight some of the key difference you see between imperative and oop styles.
- Read imperative and object-oriented programming on Wikipedia
- Consider how data is organized in two examples, in relations to procedures
- Look at Parameters in Imperative and Self in OOP
Additionally, review all the imports in these three demos. Create a definition of their purpose, specifically these ...
- PIL
- adds support for opening, manipulating, and saving many different image file formats
- numpy
- used to perform a wide variety of mathematical operations on arrays.
- base64
- provides functions for encoding binary data to printable ASCII characters and decoding such encodings back to binary data.
from IPython.display import HTML, display
from pathlib import Path # https://medium.com/@ageitgey/python-3-quick-tip-the-easy-way-to-deal-with-file-paths-on-windows-mac-and-linux-11a072b58d5f
from PIL import Image as pilImage # as pilImage is used to avoid conflicts
from io import BytesIO
import base64
import numpy as np
class Image_Data:
def __init__(self, source, label, file, path, baseWidth=320):
self._source = source # variables with self prefix become part of the object,
self._label = label
self._file = file
self._filename = path / file # file with path
self._baseWidth = baseWidth
# Open image and scale to needs
self._img = pilImage.open(self._filename)
self._format = self._img.format
self._mode = self._img.mode
self._originalSize = self.img.size
self.scale_image()
self._html = self.image_to_html(self._img)
self._html_grey = self.image_to_html_grey()
@property
def source(self):
return self._source
@property
def label(self):
return self._label
@property
def file(self):
return self._file
@property
def filename(self):
return self._filename
@property
def img(self):
return self._img
@property
def format(self):
return self._format
@property
def mode(self):
return self._mode
@property
def originalSize(self):
return self._originalSize
@property
def size(self):
return self._img.size
@property
def html(self):
return self._html
@property
def html_grey(self):
return self._html_grey
# Large image scaled to baseWidth of 320
def scale_image(self):
scalePercent = (self._baseWidth/float(self._img.size[0]))
scaleHeight = int((float(self._img.size[1])*float(scalePercent)))
scale = (self._baseWidth, scaleHeight)
self._img = self._img.resize(scale)
# PIL image converted to base64
def image_to_html(self, img):
with BytesIO() as buffer:
img.save(buffer, self._format)
return '<img src="data:image/png;base64,%s">' % base64.b64encode(buffer.getvalue()).decode()
# Create Grey Scale Base64 representation of Image
def image_to_html_grey(self):
img_grey = self._img
numpy = np.array(self._img.getdata()) # PIL image to numpy array
grey_data = [] # key/value for data converted to gray scale
# 'data' is a list of RGB data, the list is traversed and hex and binary lists are calculated and formatted
for pixel in numpy:
# create gray scale of image, ref: https://www.geeksforgeeks.org/convert-a-numpy-array-to-an-image/
average = (pixel[0] + pixel[1] + pixel[2]) // 3 # average pixel values and use // for integer division
if len(pixel) > 3:
grey_data.append((average, average, average, pixel[3])) # PNG format
else:
grey_data.append((average, average, average))
# end for loop for pixels
img_grey.putdata(grey_data)
return self.image_to_html(img_grey)
# prepares a series of images, provides expectation for required contents
def image_data(path=Path("images/"), images=None): # path of static images is defaulted
if images is None: # default image
images = [
{'source': "Internet", 'label': "Green Square", 'file': "green-square-16.png"},
{'source': "Peter Carolin", 'label': "Clouds Impression", 'file': "clouds-impression.png"},
{'source': "Peter Carolin", 'label': "Lassen Volcano", 'file': "lassen-volcano.jpg"}
]
return path, images
# turns data into objects
def image_objects():
id_Objects = []
path, images = image_data()
for image in images:
id_Objects.append(Image_Data(source=image['source'],
label=image['label'],
file=image['file'],
path=path,
))
return id_Objects
# Jupyter Notebook Visualization of Images
if __name__ == "__main__":
for ido in image_objects(): # ido is an Imaged Data Object
print("---- meta data -----")
print(ido.label)
print(ido.source)
print(ido.file)
print(ido.format)
print(ido.mode)
print("Original size: ", ido.originalSize)
print("Scaled size: ", ido.size)
print("-- scaled image --")
display(HTML(ido.html))
print("--- grey image ---")
display(HTML(ido.html_grey))
print()
College Board practice problems for 2.2
Q1: Which of the following is an advantage of a lossless compression algorithm over a lossy compression algorithm?
- A lossless compression algorithm can guarantee reconstruction of original data, while a lossy compression algorithm cannot.
- This is correct because lossless compression algorithms are guaranteed to be able to reconstruct the original data, while lossy compression algorithms are not.
Q2: A user wants to save a data file on an online storage site. The user wants to reduce the size of the file, if possible, and wants to be able to completely restore the file to its original version. Which of the following actions best supports the user’s needs?
- Compressing the file using a lossless compression algorithm before uploading it
- This is correct because lossless compression algorithms allow for complete reconstruction of the original data and typically reduce the size of the data.
Q3: A programmer is developing software for a social media platform. The programmer is planning to use compression when users send attachments to other users. Which of the following is a true statement about the use of compression?
- Lossy compression of an image file generally provides a greater reduction in transmission time than lossless compression does.
- This is correct because since lossy data is unable to allow for complete reconstruction of the original data, that means less bits or packets(not sure of the exact term) are sent. Less data being transferred logically means that there is a reduction in transmission time.
Lossless and Lossy Images
- Choose 2 images, one that will more likely result in lossy data compression and one that is more likely to result in lossless data compression. Explain.
This image is lossy
This image is most likely lossy because it is a JPG. JPGs are a lossy format because they use a compression algorithm that discards some of the image data in order to reduce the file size
This image is lossless
This image will more likely be lossless because it is in PNG format. PNG images are a lossless image format because they use a compression algorithm that preserves all of the original image data without discarding any information. Unlike JPEG images, which are designed to reduce the file size by discarding some of the data, PNG images use a different type of compression algorithm that is designed to preserve the data while still achieving a smaller file size.
Numpy, manipulating pixels. As opposed to Grey Scale treatment, pick a couple of other types like red scale, green scale, or blue scale. We want you to be manipulating pixels in the image.
from IPython.display import HTML, display
from pathlib import Path
from PIL import Image as pilImage
from io import BytesIO
import base64
import numpy as np
# prepares a series of images
def image_data(path=Path("images/"), images=None): # path of static images is defaulted
if images is None: # default image
images = [
{'source': "Internet", 'label': "mountains", 'file': "scenery.png"},
{'source': "Internet", 'label': "dog", 'file': "test.png"}
]
for image in images:
# File to open
image['filename'] = path / image['file'] # file with path
return images
# Large image scaled to baseWidth of 320
def scale_image(img):
baseWidth = 320
scalePercent = (baseWidth/float(img.size[0]))
scaleHeight = int((float(img.size[1])*float(scalePercent)))
scale = (baseWidth, scaleHeight)
return img.resize(scale)
# PIL image converted to base64
def image_to_base64(img, format):
with BytesIO() as buffer:
img.save(buffer, format)
return base64.b64encode(buffer.getvalue()).decode()
# Set Properties of Image, Scale, and convert to Base64
def image_management(image): # path of static images is defaulted
# Image open return PIL image object
img = pilImage.open(image['filename'])
# Python Image Library operations
image['format'] = img.format
image['mode'] = img.mode
image['size'] = img.size
# Scale the Image
img = scale_image(img)
image['pil'] = img
image['scaled_size'] = img.size
# Scaled HTML
image['html'] = '<img src="data:image/png;base64,%s">' % image_to_base64(image['pil'], image['format'])
# Create Red Scale Base64 representation of Image
def image_management_add_html_red(image):
# Image open return PIL image object
img = image['pil']
format = image['format']
img_data = img.getdata() # Reference https://www.geeksforgeeks.org/python-pil-image-getdata/
image['data'] = np.array(img_data) # PIL image to numpy array
image['red_data'] = [] # key/value for data converted to gray scale
# 'data' is a list of RGB data, the list is traversed and hex and binary lists are calculated and formatted
for pixel in image['data']:
# create red scale of image
red = pixel[0] # rbg = red blue green so first integer being 0 means the red color is what remains
if len(pixel) > 3:
image['red_data'].append((red, 0, 0, pixel[3])) # PNG format
else:
image['red_data'].append((red, 0, 0))
# end for loop for pixels
img.putdata(image['red_data'])
image['html_red'] = '<img src="data:image/png;base64,%s">' % image_to_base64(img, format)
# Create Blue Scale Base64 representation of Image
def image_management_add_html_blue(image):
# Image open return PIL image object
img = image['pil']
format = image['format']
img_data = img.getdata() # Reference https://www.geeksforgeeks.org/python-pil-image-getdata/
image['data'] = np.array(img_data) # PIL image to numpy array
image['blue_data'] = [] # key/value for data converted to gray scale
# 'data' is a list of RGB data, the list is traversed and hex and binary lists are calculated and formatted
for pixel in image['data']:
# create blue scale of image
blue = pixel[2]
if len(pixel) > 3:
image['blue_data'].append((0, 0, blue, pixel[3])) # PNG format
else:
image['blue_data'].append((0, 0, blue))
# end for loop for pixels
img.putdata(image['blue_data'])
image['html_blue'] = '<img src="data:image/png;base64,%s">' % image_to_base64(img, format)
# Create Green Scale Base64 representation of Image
def image_management_add_html_green(image):
# Image open return PIL image object
img = image['pil']
format = image['format']
img_data = img.getdata() # Reference https://www.geeksforgeeks.org/python-pil-image-getdata/
image['data'] = np.array(img_data) # PIL image to numpy array
image['green_data'] = [] # key/value for data converted to gray scale
# 'data' is a list of RGB data, the list is traversed and hex and binary lists are calculated and formatted
for pixel in image['data']:
# create green scale of image
green = pixel[1]
if len(pixel) > 3:
image['green_data'].append((0, green, 0, pixel[3])) # PNG format
else:
image['green_data'].append((0, green, 0))
# end for loop for pixels
img.putdata(image['green_data'])
image['html_green'] = '<img src="data:image/png;base64,%s">' % image_to_base64(img, format)
# Jupyter Notebook Visualization of Images
if __name__ == "__main__":
# Use numpy to concatenate two arrays
images = image_data()
# Display meta data, scaled view, and grey scale for each image
for image in images:
image_management(image)
print("-- original image --")
display(HTML(image['html']))
image_management(image)
print("--- red image ----")
image_management_add_html_red(image)
display(HTML(image['html_red']))
image_management(image) #reload og image again
print("--- blue image ----")
image_management_add_html_blue(image)
display(HTML(image['html_blue']))
image_management(image) #reload og image again
print("--- green image ----")
image_management_add_html_green(image)
display(HTML(image['html_green']))
print()
PIL: Blur the image or write Meta Data on screen, aka Title, Author and Image size
from IPython.display import HTML, display
from pathlib import Path
from PIL import Image as pilImage
from io import BytesIO
import base64
import numpy as np
from PIL import ImageFilter
# prepares a series of images
def image_data(path=Path("images/"), images=None): # path of static images is defaulted
if images is None: # default image
images = [
{'source': "Internet", 'label': "mountains", 'file': "scenery.png"}
]
for image in images:
# File to open
image['filename'] = path / image['file'] # file with path
return images
def image_data1(path=Path("images/"), images=None): # path of static images is defaulted
if images is None: # default image
images = [
{'source': "Internet", 'label': "mountains", 'file': "scenery.png"},
{'source': "Internet", 'label': "mountains", 'file': "blurImage.jpg"}
]
for image in images:
# File to open
image['filename'] = path / image['file'] # file with path
return images
# Large image scaled to baseWidth of 320
def scale_image(img):
baseWidth = 320
scalePercent = (baseWidth/float(img.size[0]))
scaleHeight = int((float(img.size[1])*float(scalePercent)))
scale = (baseWidth, scaleHeight)
return img.resize(scale)
# PIL image converted to base64
def image_to_base64(img, format):
with BytesIO() as buffer:
img.save(buffer, format)
return base64.b64encode(buffer.getvalue()).decode()
# Set Properties of Image, Scale, and convert to Base64
def image_management(image): # path of static images is defaulted
# Image open return PIL image object
img = pilImage.open(image['filename'])
# Python Image Library operations
image['format'] = img.format
image['mode'] = img.mode
image['size'] = img.size
# Scale the Image
img = scale_image(img)
image['pil'] = img
image['scaled_size'] = img.size
# Scaled HTML
image['html'] = '<img src="data:image/png;base64,%s">' % image_to_base64(image['pil'], image['format'])
# Create Red Scale Base64 representation of Image
def image_management_add_html_blur(image):
img = pilImage.open(image['filename'])
blur = img.filter(ImageFilter.BLUR)
blur.save('images/blurImage.jpg')
# Jupyter Notebook Visualization of Images
if __name__ == "__main__":
# Use numpy to concatenate two arrays
images = image_data()
# Display meta data, scaled view, and grey scale for each image
for image in images:
image_management(image)
image_management_add_html_blur(image)
images = image_data1()
for image in images:
image_management(image)
print(image['filename'])
display(HTML(image['html']))
print()