Blog

Traffic Sign Recognition

Build a Traffic Sign Recognition Project

The goals / steps of this project are the following:

Load the data set (see below for links to the project data set)
Explore, summarize and visualize the data set
Design, train and test a model architecture
Use the model to make predictions on new images
Analyze the softmax probabilities of the new images
Summarize the results with a written report

Data Set Summary & Exploration

1. Provide a basic summary of the data set and identify where in your code the summary was done. In the code, the analysis should be done using python, numpy and/or pandas methods rather than hardcoding results manually.

I use pickle to import the data into ipython notebook. Then I use numpy to calculate the statistics of the data set.

  import numpy as np

  # Number of training examples
  n_train = len(X_train)

  # Number of testing examples.
  n_test = len(X_test)

  # the shape of an traffic sign image
  image_shape = X_train[0].shape

  # number of unique classes/labels there are in the dataset.
  n_classes = len(np.unique(y_train))

  print("Number of training examples =", n_train)
  print("Number of testing examples =", n_test)
  print("Image data shape =", image_shape)
  print("Number of classes =", n_classes)

Image data shape = (32, 32, 3)
Number of classes = 43

2. Data set exploratoration and visualization

Include an exploratory visualization of the dataset and identify where the code is in your code file.

The chart below is the label distribution, which can show the label distribution are not even, some class have more data then others.

The img below is the chart to show one iamge from each class. Some of them are really dark, which makes difficulties hgiher.

import matplotlib.pyplot as plt
# Visualizations will be shown in the notebook.
%matplotlib inline
#edit matplot size in notebook
plt.rcParams["figure.figsize"] = [15,18]


fig = plt.figure(figsize=(5,5))
f, axarr = plt.subplots(9, 5)
# make it as a sigle dim. array
plts = np.reshape(axarr, -1)

#display one sample from all class
for classId in np.unique(y_train):
    thePicIndex = np.where(y_train == classId)[0]
    myplt = plts[classId]
    myplt.imshow(X_train[thePicIndex[25]])
    myplt.set_title("class " + str(classId))

plt.tight_layout()

Design and Test a Model Architecture

Preprocessed data set

####1. Describe how, and identify where in your code, you preprocessed the image data. What tecniques were chosen and why did you choose these techniques? Consider including images showing the output of each preprocessing technique. Pre-processing refers to techniques such as converting to grayscale, normalization, etc.

I have few ways to Preprocessed the data set:

Turn raw images into gray Scale
Normalize the image to range(0, 1)
Augmente the images

The advantage of normalizing image is to make gradient descent converge faster.

Gray scale image can reduce the image size, which helps to train the model easier. Image normalized in to range (0, 1) can help model to learn data set. Since the size of dataset not large enough, augmentation on image is necessary.

The Pre-processing params.

ANGLE_ROTATE = 25
TRANSLATION = 0.2
NB_NEW_IMAGES = 10000

def toGrayscale(rgb):
    result = np.zeros((len(rgb), 32, 32,1))
    result[...,0] = np.dot(rgb[...,:3], [0.299, 0.587, 0.114])  
    return result

# normalize the images
def normalizeGrascale(grayScaleImages):
    return grayScaleImages/255

def processImages(rgbImages):
    return np.array(normalizeGrascale(toGrayscale(rgbImages)))

def transformOnHot(nbClass, listClass):
    oneHot = np.zeros((len(listClass), nbClass))
    oneHot[np.arange(len(listClass)), listClass] = 1
    return np.array(oneHot)

def augmenteImage(image, angle, translation):
    h, w, c = image.shape

    # random rotate
    angle_rotate = np.random.uniform(-angle, angle)
    rotation_mat = cv2.getRotationMatrix2D((w//2, h//2), angle_rotate, 1)

    img = cv2.warpAffine(image, rotation_mat, (IMG_SIZE, IMG_SIZE))

    # random translation
    x_offset = translation * w * np.random.uniform(-1, 1)
    y_offset = translation * h * np.random.uniform(-1, 1)
    mat = np.array([[1, 0, x_offset], [0, 1, y_offset]])

    # return warpped img
    return cv2.warpAffine(img, mat, (w, h))

Image below is the Image Augumentation result.

Image below is gray scale of above images

2. Data set overview

2. Describe how, and identify where in your code, you set up training, validation and testing data. How much data was in each set? Explain what techniques were used to split the data into these sets. (OPTIONAL: As described in the “Stand Out Suggestions” part of the rubric, if you generated additional data for training, describe why you decided to generate additional data, how you generated the data, identify where in your code, and provide example images of the additional data)

since the data set provided a validation data set, thus i do not use data set spliting for validation.

Number of training examples = 34799
Number of validation examples = 4410
Number of testing examples = 12630

3. Model

Describe, and identify where in your code, what your final model architecture looks like including model type, layers, layer sizes, connectivity, etc.) Consider including a diagram and/or table describing the final model.

I use a model simliar to AlexNet with smaller number kernel used in conv. layer and less node in hidden layer. It’s because the input size is smaller wiuth this problem.

      ## ini network
      x, y, keep_prob, logits, optimizer, predictions, accuracy = nn()

      # Dro save model
      saver = tf.train.Saver()

      # TensorBoard record
      train_writer = tf.summary.FileWriter("logs/train", sess.graph)  

      # Variable initialization
      init = tf.global_variables_initializer()
      sess.run(init)

      # save the acc history
      history = []


      # Record time elapsed for performance check
      last_time = time.time()
      train_start_time = time.time()

      # Run NB_EPOCH epochs of training
      for epoch in range(NB_EPOCH):
          generator = batchGenerator(x_train_processed, y_train_processed)
          while generator.hasNext():
              x_, y_ = generator.next_batch(BATCH_SIZE)
              sess.run(optimizer, feed_dict={x: x_, y: y_, keep_prob: DROPOUT_PROB})

          # Calculate Accuracy Training set
          train_acc = calculate_accuracy(32, accuracy, x, y, x_train_processed, y_train_processed, keep_prob, sess)

          # Calculate Accuracy Validation set
          valid_acc = calculate_accuracy(32, accuracy, x, y, x_valid_processed, y_valid_processed, keep_prob, sess)

          # Record and report train/validation/test accuracies for this epoch
          history.append((train_acc, valid_acc))

          # Print log
          if (epoch+1) % 10 == 0 or epoch == 0 or (epoch+1) == NB_EPOCH:
              print('Epoch %d -- Train acc.: %.4f, valid. acc.: %.4f, used: %.2f sec' %\
                  (epoch+1, train_acc, valid_acc, time.time() - last_time))
              last_time = time.time()

      total_time = time.time() - train_start_time
      print('Training time: %.2f sec (%.2f min)' % (total_time, total_time/60))

Training

I create a class batchGenerator to manage batch train, which perform batch mangement, it helps the train fucntion cleaner. It also have shuffle function, which randomize the datasets.

hyperparameters

learning rate 0.005
drop out rate 0.5
optimizor
total epochs
batch size 128
optimizer: gradient descent algorithm
Early stop patience 3 # how many epoch will watch for early stop
Early stop min_delta 0.02 # min. threshold delta change in accuracy

class batchGenerator:
    def __init__(self, x, y, shuffle= True):
        self.dataX = x
        self.dataY = y
        self.totalData = len(self.dataX)
        if shuffle:
            self.shuffle()

    def printLog(self):
        if len(self.dataX):
            print(str(totalData-len(self.dataX))+"https://github.com/"+str(totalData) , end = '\r')
        else:
            print(str(totalData-len(self.dataX))+"https://github.com/"+str(totalData))

    def shuffle(self):
        newOrder = np.arange(len(self.dataX))
        np.random.shuffle(newOrder)
        self.dataX = self.dataX[newOrder]
        self.dataY = self.dataY[newOrder]

    def hasNext(self):
        return len(self.dataX)>0

    def next_batch(self,size):
        if(len(self.dataX) < size):
            size = len(self.dataX)
        tempX = self.dataX[0: size]
        self.dataX = self.dataX[size:]
        tempY = self.dataY[0: size]
        self.dataY = self.dataY[size:]

        return np.array(tempX), np.array(tempY)

Training process

The chart above is the accuracy of training and validation during 40 epoch training.

The final model accuracy were:

training set accuracy of 0.9769
validation set accuracy of 0.9395
test set accuracy of 0.9375

The code for calculating the accuracy of the model is located in the the Ipython notebook.

I have try different model.

Conv 3x3x16 strides 1X1
Conv 5x5x64 strides 2X2
Conv 3x3x128 strides 1X1
Fc 4096
Fc 1024
Fc 43 ps no normalization applied

The model need longer time to train to 0.8 acc., which can consider a inefficient model design. The major mistake in this model wasn’t apply batch normalization in the training, which make it need longer to train and do not fully use the nonlinearity of relu.

I tried using keras to train the model also. I use a smaller model, it gave a pretty good result without data augumentation, ~93%. predict test data 0.933096 code

Test a Model on New Images

Describe how certain the model is when predicting on each of the five new images by looking at the softmax probabilities for each prediction and identify where in your code softmax probabilities were outputted. Provide the top 5 softmax probabilities for each image along with the sign type of each probability.

1. Six new German traffic signs

Here are six German traffic signs that I found on the web:

The fifth image much more difficult, it’s because the image not only include single sign, thus it might make model harder to classify into a class.

For others, it have different lighting compare to dataset, they are completly new images, it will be a chanllege to the model.

Predictions result

Here are the results of the prediction:

Image	Prediction
Roundabout mandatory	Roundabout mandatory
Ahead only	Ahead only
Yield	Yield
Speed limit (30km/h)	Speed limit (30km/h)
Road work	Road work
General caution	Bicycles crossing

The model was able to correctly guess 5 of the 6 traffic signs, which gives an accuracy of 83.33 %.

Vislualization softmax predictions

For sample image -General caution, it seems predict a completly wrong class.

class	softmax
Bicycles crossing	84.8%
Bumpy Road	13.9%
Children crossing	0.3%
General caution	0.3%
speed limit(30rm/h)	0.3%

It seems model do not handle well when image not fully fit with sign. In General caution sample image have a little sign below General caution, it might the reason make model misclassifying it.

For the other sample images, it seems model predit well, all of them have a dominant softmax value over otehr classes.

Traffic Sign Recognition

Build a Traffic Sign Recognition Project

The goals / steps of this project are the following:

Load the data set (see below for links to the project data set)
Explore, summarize and visualize the data set
Design, train and test a model architecture
Use the model to make predictions on new images
Analyze the softmax probabilities of the new images
Summarize the results with a written report

Data Set Summary & Exploration

1. Provide a basic summary of the data set and identify where in your code the summary was done. In the code, the analysis should be done using python, numpy and/or pandas methods rather than hardcoding results manually.

I use pickle to import the data into ipython notebook. Then I use numpy to calculate the statistics of the data set.

  import numpy as np

  # Number of training examples
  n_train = len(X_train)

  # Number of testing examples.
  n_test = len(X_test)

  # the shape of an traffic sign image
  image_shape = X_train[0].shape

  # number of unique classes/labels there are in the dataset.
  n_classes = len(np.unique(y_train))

  print("Number of training examples =", n_train)
  print("Number of testing examples =", n_test)
  print("Image data shape =", image_shape)
  print("Number of classes =", n_classes)

Image data shape = (32, 32, 3)
Number of classes = 43

2. Data set exploratoration and visualization

Include an exploratory visualization of the dataset and identify where the code is in your code file.

The chart below is the label distribution, which can show the label distribution are not even, some class have more data then others.

The img below is the chart to show one iamge from each class. Some of them are really dark, which makes difficulties hgiher.

import matplotlib.pyplot as plt
# Visualizations will be shown in the notebook.
%matplotlib inline
#edit matplot size in notebook
plt.rcParams["figure.figsize"] = [15,18]


fig = plt.figure(figsize=(5,5))
f, axarr = plt.subplots(9, 5)
# make it as a sigle dim. array
plts = np.reshape(axarr, -1)

#display one sample from all class
for classId in np.unique(y_train):
    thePicIndex = np.where(y_train == classId)[0]
    myplt = plts[classId]
    myplt.imshow(X_train[thePicIndex[25]])
    myplt.set_title("class " + str(classId))

plt.tight_layout()

Design and Test a Model Architecture

Preprocessed data set

####1. Describe how, and identify where in your code, you preprocessed the image data. What tecniques were chosen and why did you choose these techniques? Consider including images showing the output of each preprocessing technique. Pre-processing refers to techniques such as converting to grayscale, normalization, etc.

I have few ways to Preprocessed the data set:

Turn raw images into gray Scale
Normalize the image to range(0, 1)
Augmente the images

The advantage of normalizing image is to make gradient descent converge faster.

Gray scale image can reduce the image size, which helps to train the model easier. Image normalized in to range (0, 1) can help model to learn data set. Since the size of dataset not large enough, augmentation on image is necessary.

The Pre-processing params.

ANGLE_ROTATE = 25
TRANSLATION = 0.2
NB_NEW_IMAGES = 10000

def toGrayscale(rgb):
    result = np.zeros((len(rgb), 32, 32,1))
    result[...,0] = np.dot(rgb[...,:3], [0.299, 0.587, 0.114])  
    return result

# normalize the images
def normalizeGrascale(grayScaleImages):
    return grayScaleImages/255

def processImages(rgbImages):
    return np.array(normalizeGrascale(toGrayscale(rgbImages)))

def transformOnHot(nbClass, listClass):
    oneHot = np.zeros((len(listClass), nbClass))
    oneHot[np.arange(len(listClass)), listClass] = 1
    return np.array(oneHot)

def augmenteImage(image, angle, translation):
    h, w, c = image.shape

    # random rotate
    angle_rotate = np.random.uniform(-angle, angle)
    rotation_mat = cv2.getRotationMatrix2D((w//2, h//2), angle_rotate, 1)

    img = cv2.warpAffine(image, rotation_mat, (IMG_SIZE, IMG_SIZE))

    # random translation
    x_offset = translation * w * np.random.uniform(-1, 1)
    y_offset = translation * h * np.random.uniform(-1, 1)
    mat = np.array([[1, 0, x_offset], [0, 1, y_offset]])

    # return warpped img
    return cv2.warpAffine(img, mat, (w, h))

Image below is the Image Augumentation result.

Image below is gray scale of above images

2. Data set overview

2. Describe how, and identify where in your code, you set up training, validation and testing data. How much data was in each set? Explain what techniques were used to split the data into these sets. (OPTIONAL: As described in the “Stand Out Suggestions” part of the rubric, if you generated additional data for training, describe why you decided to generate additional data, how you generated the data, identify where in your code, and provide example images of the additional data)

since the data set provided a validation data set, thus i do not use data set spliting for validation.

Number of training examples = 34799
Number of validation examples = 4410
Number of testing examples = 12630

3. Model

Describe, and identify where in your code, what your final model architecture looks like including model type, layers, layer sizes, connectivity, etc.) Consider including a diagram and/or table describing the final model.

I use a model simliar to AlexNet with smaller number kernel used in conv. layer and less node in hidden layer. It’s because the input size is smaller wiuth this problem.

      ## ini network
      x, y, keep_prob, logits, optimizer, predictions, accuracy = nn()

      # Dro save model
      saver = tf.train.Saver()

      # TensorBoard record
      train_writer = tf.summary.FileWriter("logs/train", sess.graph)  

      # Variable initialization
      init = tf.global_variables_initializer()
      sess.run(init)

      # save the acc history
      history = []


      # Record time elapsed for performance check
      last_time = time.time()
      train_start_time = time.time()

      # Run NB_EPOCH epochs of training
      for epoch in range(NB_EPOCH):
          generator = batchGenerator(x_train_processed, y_train_processed)
          while generator.hasNext():
              x_, y_ = generator.next_batch(BATCH_SIZE)
              sess.run(optimizer, feed_dict={x: x_, y: y_, keep_prob: DROPOUT_PROB})

          # Calculate Accuracy Training set
          train_acc = calculate_accuracy(32, accuracy, x, y, x_train_processed, y_train_processed, keep_prob, sess)

          # Calculate Accuracy Validation set
          valid_acc = calculate_accuracy(32, accuracy, x, y, x_valid_processed, y_valid_processed, keep_prob, sess)

          # Record and report train/validation/test accuracies for this epoch
          history.append((train_acc, valid_acc))

          # Print log
          if (epoch+1) % 10 == 0 or epoch == 0 or (epoch+1) == NB_EPOCH:
              print('Epoch %d -- Train acc.: %.4f, valid. acc.: %.4f, used: %.2f sec' %\
                  (epoch+1, train_acc, valid_acc, time.time() - last_time))
              last_time = time.time()

      total_time = time.time() - train_start_time
      print('Training time: %.2f sec (%.2f min)' % (total_time, total_time/60))

Training

I create a class batchGenerator to manage batch train, which perform batch mangement, it helps the train fucntion cleaner. It also have shuffle function, which randomize the datasets.

hyperparameters

learning rate 0.005
drop out rate 0.5
optimizor
total epochs
batch size 128
optimizer: gradient descent algorithm
Early stop patience 3 # how many epoch will watch for early stop
Early stop min_delta 0.02 # min. threshold delta change in accuracy

class batchGenerator:
    def __init__(self, x, y, shuffle= True):
        self.dataX = x
        self.dataY = y
        self.totalData = len(self.dataX)
        if shuffle:
            self.shuffle()

    def printLog(self):
        if len(self.dataX):
            print(str(totalData-len(self.dataX))+"https://github.com/"+str(totalData) , end = '\r')
        else:
            print(str(totalData-len(self.dataX))+"https://github.com/"+str(totalData))

    def shuffle(self):
        newOrder = np.arange(len(self.dataX))
        np.random.shuffle(newOrder)
        self.dataX = self.dataX[newOrder]
        self.dataY = self.dataY[newOrder]

    def hasNext(self):
        return len(self.dataX)>0

    def next_batch(self,size):
        if(len(self.dataX) < size):
            size = len(self.dataX)
        tempX = self.dataX[0: size]
        self.dataX = self.dataX[size:]
        tempY = self.dataY[0: size]
        self.dataY = self.dataY[size:]

        return np.array(tempX), np.array(tempY)

Training process

The chart above is the accuracy of training and validation during 40 epoch training.

The final model accuracy were:

training set accuracy of 0.9769
validation set accuracy of 0.9395
test set accuracy of 0.9375

The code for calculating the accuracy of the model is located in the the Ipython notebook.

I have try different model.

Conv 3x3x16 strides 1X1
Conv 5x5x64 strides 2X2
Conv 3x3x128 strides 1X1
Fc 4096
Fc 1024
Fc 43 ps no normalization applied

The model need longer time to train to 0.8 acc., which can consider a inefficient model design. The major mistake in this model wasn’t apply batch normalization in the training, which make it need longer to train and do not fully use the nonlinearity of relu.

I tried using keras to train the model also. I use a smaller model, it gave a pretty good result without data augumentation, ~93%. predict test data 0.933096 code

Test a Model on New Images

Describe how certain the model is when predicting on each of the five new images by looking at the softmax probabilities for each prediction and identify where in your code softmax probabilities were outputted. Provide the top 5 softmax probabilities for each image along with the sign type of each probability.

1. Six new German traffic signs

Here are six German traffic signs that I found on the web:

The fifth image much more difficult, it’s because the image not only include single sign, thus it might make model harder to classify into a class.

For others, it have different lighting compare to dataset, they are completly new images, it will be a chanllege to the model.

Predictions result

Here are the results of the prediction:

Image	Prediction
Roundabout mandatory	Roundabout mandatory
Ahead only	Ahead only
Yield	Yield
Speed limit (30km/h)	Speed limit (30km/h)
Road work	Road work
General caution	Bicycles crossing

The model was able to correctly guess 5 of the 6 traffic signs, which gives an accuracy of 83.33 %.

Vislualization softmax predictions

For sample image -General caution, it seems predict a completly wrong class.

class	softmax
Bicycles crossing	84.8%
Bumpy Road	13.9%
Children crossing	0.3%
General caution	0.3%
speed limit(30rm/h)	0.3%

It seems model do not handle well when image not fully fit with sign. In General caution sample image have a little sign below General caution, it might the reason make model misclassifying it.

For the other sample images, it seems model predit well, all of them have a dominant softmax value over otehr classes.

DiztinGUIsh (“Diz”)

A Super NES ROM Disassembler and tracelog capture/analysis tool with a focus on collaborative workflow UX. Exports .asm files ready to be compiled back into the original binary. Written in Winforms/C#.

Diz tools suite:

Official support channel is #diztinguish in the https://sneslab.net/ discord

Features

Main features

Disassembling programs (like SNES games) for some CPU architectures (like the SNES’s 658016) is a pain because you have to know a lot of information about the program at the point where it’s running. Diz is designed to make this less of a nightmare.

Demo of basic disassembling:

View more docs here: https://github.com/IsoFrieze/DiztinGUIsh/blob/master/Diz.App.Winforms/dist/docs/HELP.md

Realtime tracelog capturing: We provide a tight integration with a custom BSNES build to capture CPU tracelog data over a socket connection. You don’t have to play the game at 2FPS anymore, or deal with wrangling gigabyte-sized tracelog files. Simply hit ‘capture’ and Diz will talk directly to a running BSNES CPU, capturing data for as long as you like. Turn the ROM visualizer on and watch this process in realtime.

For more details, visit the Tracelog capturing tutorial

Other useful features

Tracelog file import support for Bizhawk and BSNES (record where the CPU is executing and what flags are set)
BSNES usage map import / Bizhawk CDL import (record which sections of ROM are code vs data)
Annotation of ROM and RAM addresses, labels, and comments. These are exported in the assembly output for humans
Merge-friendly XML based file format. Save your project file with a .dizraw extension (~1.5MB), and the uncompressed XML is easy to share, collaborate, and merge with other people easily. Great for group aggregration projects or building a database from various sources of info laying around the internet. Re-export the assembly and generate code with everyone’s collective efforts stored in one place. Say goodbye to search+replace for adding labels and variable names all over the place.
ROM visualizer, view which parts of the ROM you’ve marked as code vs data, and see visual progress.
C# .NET WinForms app, easy to add features to. Write your own plugins or use our plumbing or GUI as a base for your own tools.

NOTE: Works fine with stock asar though, there’s a bugfix you may want:

https://github.com/binary1230/asar/tree/fix_relative_addressing/src/asar

Details

Doesn’t this already exist?

There is at least one 65C816 disassembler out there already. The biggest issue with it (not with that program, but with disassembling 65C816 in general) is that some instructions assemble to different sizes depending on context. This makes it difficult to automate.

A ROM contains two broad categories of stuff in it: code and data. A perfect disassembler would isolate the code, disassemble it, and leave the data as it is (or maybe neatly format it). Differentiating data from code is already kinda hard, especially if the size of the data isn’t explicitly stated. A perfect program would need context to do its job. Turns out that keeping track of all memory and providing context for these situations is pretty much emulation. Some emulators have code/data loggers (CDLs) that mark every executed byte as an instruction and every read byte as data for the purpose of disassembly. A naive approach to disassembling then, would be to disassemble everything as code, then leave it up to a person to go back and mark the data manually. Disassembling code is the most tedius part, so this isn’t a bad approach.

In the 65C816 instruction set, several instructions assemble to different lengths depending on whether or not a bit is currently set or reset in the processor flag P register. For example, the sequence C9 00 F0 48 could be CMP.W #$F000 : PHA or CMP.B #$00 : BEQ +72 depending on if the accumulator size flag M is 0 or 1. You could guess, but if you’re wrong, the next however many instructions may be incorrect due to treating operands (#$F0) as opcodes (BEQ). This is known as desynching. So now you need context just to be able to disassemble code too.

Now for the most part, you can get away with just disassembling instructions as you hit them, following jumps and branches, and only keeping track of the M and X flags to make sure the special instructions are disassmbled properly. But more likely than not there will be some jump instructions that depend on values in RAM. Keeping track of all RAM just to get those effective addresses would be silly–again, it would basically be emulation at that point. You’ll need to manually determine the set of jumps possible, and start new disassmble points from each of those values. Don’t forget to carry over those M and X flags!

Things get more complicated if you want to determine the effective address of an instruction. Instructions like LDA.L $038CDA,X have the effective address right in the instruction ($038CDA). But most instructions look something like STA.B $03. The full effective address needs to be deduced from the data bank and direct page registers. Better keep track of those too!

So to take all of this into consideration, DiztinGUIsh tries to make the manual parts of disassembling accurately as speedy as possible, while still automating the easy parts. The goal of this software is to produce an accurate, clean disassembly of SNES games in a timely manner. Any time things look dicey, or if it becomes impossible to disassemble something accurately, the program will pause and wait for more input. Of course, there are options to go ham and just ignore warnings, but proceed at your own risk!

Features

Implemented or currently in progress:

Manual and Auto Stepping
Stepping into a branch or call
Goto effective address
Goto first or nearby unreached data
Marking data types (plain data, graphics, pointers, etc.)
Tracking M & X flags, Data Bank & Direct Page registers
Producing a customizable output file that assembles with asar

Planned stuff:

SPC700 & SuperFX architechtures
Merging multiple project files together
Better labelling effective addresses that aren’t ROM
Programmable data viewer to locate graphics easily
Setting a “base” per instruction for relocateable code
Option to put large data blocks into separate .bin files intead of in the .asm
Scripting engine & API

“Distinguish” but with a ‘z’ because it’s rad. It’s also a GUI application so might as well highlight that fact.”

Other tips

On Win11, if you have DPI or screen issues (fonts messed up or bunched up or too small): Right click on Diztinguish.exe, Compatibility, change high dpi settings, Override high DPI scaling behavior and select Scaling performed by System.

PyCEP

Consulta CEPs em vários serviços (Correios, ViaCep, OpenCep) de maneira totalmente assíncrona

Comece por aqui

Nesta seção você encontrará instruções de como instalar o pacote e também encontrará exemplos de uso

Requerimentos

Esse projeto é compatível com as versões 3.10, 3.11 e 3.12 do python no momento. A compatibilização com versões anteriores está prevista, e qualquer contribuição é bem vinda.

Instalação

PIP

pip install pycep

Poetry

poetry add pycep

Fazendo uma consulta

Tenha em mente que a lib vai retornar o serviço que responder mais rápido

from pycep import Cep

cep = Cep("75140070")

Acessando os dados da consulta

Você pode usar os atributos listados abaixo para acessar os dados do Cep:

from pycep import Cep

cep = Cep("75140070")

print(cep.number) # 75140070
print(cep.state) # GO
print(cep.city) # Anápolis
print(cep.street) # Rua Senador Mardocheu Diniz
print(cep.district) # Dom Pedro II
print(cep.query_service) #CorreiosService
print(cep.status) # query_done

Você também pode converter os dados para dict

from pycep import Cep

cep = Cep("75140070")
print(dict(cep))

{
 'street': 'Rua Senador Mardocheu Diniz',
 'district': 'Dom Pedro II',
 'city': 'Anápolis',
 'state': 'GO',
 'cep': '75140070',
 'provider': 'CorreiosService'
 }

Este projeto utiliza

HttpX – Adapter padrão para requisições HTTP
AioHTTP – Adapter alternativo para requisições HTTP
Poetry – Gerenciamento de dependências e publicação
Pytest – Testes automatizados

Autor

Erick Duarte – Implementação inicial – erickod

Licença

O projeto está disponível através da licença MIT – Consulte o arquivo LICENSE.md para mais detalhes.

Airbnb_Analysis

Problem Statement

This project involves the analysis of Airbnb data using MongoDB Atlas, focusing on data cleaning, geospatial visualization, and dynamic plotting. The primary goals are to establish a MongoDB connection, prepare the data, develop a Streamlit web application with interactive maps, perform price analysis, explore availability patterns, investigate location-based insights, and create a comprehensive dashboard. The key objectives include:

MongoDB Data Retrieval: Connect to MongoDB Atlas, retrieve the Airbnb dataset, and ensure efficient data extraction for analysis.
Data Cleaning and Preparation: Clean and preprocess the dataset, addressing issues such as missing values, duplicates, and data type conversions for accurate analysis.
Interactive Web Application: Develop a Streamlit web application featuring interactive maps that display the distribution of Airbnb listings. Users can explore prices, ratings, and other relevant factors.
Price Analysis and Visualization: Conduct price analysis and visualize variations based on location, property type, and seasons using dynamic plots and charts.
Availability Pattern Analysis: Analyze availability patterns across seasons, visualizing occupancy rates and demand fluctuations through suitable visualizations.
Location-Based Insights: Investigate location-based insights by extracting and visualizing data for specific regions or neighborhoods.
Interactive Visualizations: Create interactive visualizations that allow users to filter and drill down into the data, gaining deeper insights.
Comprehensive Dashboard: Build a comprehensive dashboard using tools like Tableau or Power BI, combining various visualizations to present key insights derived from the analysis.

In summary, this project aims to leverage MongoDB Atlas and Streamlit to analyze Airbnb data, providing valuable insights into pricing, availability, and location-based trends. The ultimate goal is to create an interactive and informative dashboard that facilitates data exploration and decision-making for Airbnb hosts and users.

Aim

The primary aim of this project is to analyze Airbnb data effectively, utilizing MongoDB Atlas for data storage and retrieval. Key objectives include data cleaning, development of interactive geospatial visualizations, and the creation of dynamic plots to uncover insights regarding pricing variations, availability patterns, and location-based trends.
The project’s specific goals are:

Establish a robust connection to MongoDB Atlas and retrieve the Airbnb dataset efficiently.
Perform comprehensive data cleaning and preparation, addressing issues like missing data, duplicates, and data type conversions for accurate analysis.
Develop an engaging Streamlit web application that features interactive maps, enabling users to explore Airbnb listing distribution, including prices, ratings, and other relevant attributes.
Conduct detailed price analysis and visualization, uncovering insights related to location, property types, and seasonal variations. Dynamic plots and charts will be utilized for clear presentation.
Analyze availability patterns across different seasons, visualizing occupancy rates and demand fluctuations using appropriate visualizations.
Investigate location-specific insights by extracting and visualizing data for particular regions or neighborhoods, enhancing geographical understanding.
Create interactive visualizations that empower users to filter and delve deeper into the data, facilitating a more personalized exploration.
Construct a comprehensive and informative dashboard, leveraging tools like Tableau or Power BI. This dashboard will consolidate various visualizations and key findings, offering a holistic view of the Airbnb data analysis.

Requirements

MongoDB Atlas Setup: Establish a connection to MongoDB Atlas, configure the database environment, and ensure seamless data retrieval.
Data Retrieval: Retrieve the Airbnb dataset from MongoDB Atlas, ensuring efficient and optimized data extraction.
Data Cleaning and Preparation: Implement data cleaning procedures to handle missing values, duplicates, and perform necessary data type conversions. Prepare the dataset for accurate analysis.
Streamlit Web Application: Develop a Streamlit web application that includes interactive maps. The application should allow users to explore the distribution of Airbnb listings, including details such as prices, ratings, and other relevant factors.
Price Analysis: Perform in-depth price analysis and visualization. Explore price variations based on location, property type, and seasons. Create dynamic plots and charts to present these insights.
Availability Pattern Analysis: Analyze availability patterns across different seasons. Visualize occupancy rates and fluctuations in demand using appropriate visualizations.
Location-Based Insights: Investigate location-based insights by extracting data for specific regions or neighborhoods. Visualize this data to provide location-specific information.
Interactive Visualizations: Create interactive visualizations that empower users to filter and drill down into the data, enabling deeper exploration.
Comprehensive Dashboard: Develop a comprehensive dashboard using tools such as Tableau or Power BI. The dashboard should combine various visualizations and insights derived from the analysis to present a holistic view of the data.

Workflow

Workflow for Airbnb Data Analysis Project:

Data Retrieval and MongoDB Connection:
- Establish a connection to MongoDB Atlas.
- Retrieve the Airbnb dataset efficiently.
Data Cleaning and Preparation:
- Identify and handle missing values, ensuring data completeness.
- Address duplicates in the dataset.
- Perform necessary data type conversions for accurate analysis.
Streamlit Web Application Development:
- Create a Streamlit web application to provide an interactive interface for users.
- Incorporate interactive maps to visualize the distribution of Airbnb listings.
- Enable users to explore pricing information, ratings, and other relevant factors within the application.
Price Analysis and Visualization:
- Utilize dynamic plots and charts to conduct price analysis.
- Explore pricing variations based on location, property types, and seasonal trends.
- Visualize insights related to price dynamics for enhanced understanding.
Availability Patterns Analysis:
- Investigate availability patterns across different seasons.
- Create visualizations to showcase occupancy rates and demand fluctuations.
- Use suitable visualizations to present availability insights effectively.
Location-Based Insights:
- Extract and visualize data for specific regions or neighborhoods.
- Provide location-specific insights to enhance geographical understanding.
Interactive Visualizations:
- Develop interactive visualizations that allow users to filter and drill down into the data.
- Enable users to personalize their exploration and extract specific insights of interest.
Comprehensive Dashboard Creation:
- Build a comprehensive dashboard using tools like Tableau or Power BI.
- Combine various visualizations, including price analysis, availability patterns, and location-based insights, into a single informative dashboard.
- Present key findings and trends from the analysis in an accessible and consolidated format.

By following this workflow, the project aims to leverage MongoDB Atlas and advanced visualization techniques to gain valuable insights into Airbnb data, benefiting both hosts and travelers in the vacation rental market.

Conclusion

In conclusion, this project successfully harnessed the power of data analysis and visualization techniques to extract valuable insights from Airbnb data. Through the establishment of a MongoDB connection and meticulous data cleaning, the foundation for accurate analysis was laid. The development of a user-friendly Streamlit web application empowered users to explore Airbnb listings with ease, and interactive geospatial visualizations provided a comprehensive view of pricing, ratings, and other crucial factors.

Price analysis and visualization revealed intricate patterns based on location, property type, and seasons, enabling informed decision-making. Analysis of availability patterns shed light on occupancy rates and demand fluctuations, contributing to a better understanding of the market dynamics.

Location-based insights extracted and visualized data for specific regions, offering a localized perspective on Airbnb trends. The creation of interactive visualizations allowed users to tailor their exploration and extract specific details from the dataset.

The project’s pinnacle achievement was the construction of a comprehensive dashboard using Tableau or Power BI, consolidating various visualizations into a unified platform. This dashboard served as a valuable resource for presenting key findings and trends, facilitating data-driven decision-making for hosts and travelers in the vacation rental market.

Ultimately, this project exemplified the power of data analysis and visualization in uncovering meaningful insights within a dynamic and ever-evolving market like Airbnb.

Line Follower Car Robot

video.mp4

A simple line-following robot built using an Arduino Uno, L298N motor driver, and IR sensors. The robot follows a black line on a white surface.

Components

Arduino Uno
L298N motor driver
4 TT motors with wheels
2 IR sensors
Battery pack
Jumper wires

How It Works

The Line-Follower Car Robot uses infrared (IR) sensors to detect and follow a black line on a white surface. Here’s a brief explanation of the working mechanism:

Sensors Detection: The IR sensors are placed at the front of the robot. These sensors detect the black line by measuring the reflected infrared light. When a sensor is over the black line, it detects a lower amount of reflected light and sends a LOW signal to the Arduino. When the sensor is over the white surface, it detects a higher amount of reflected light and sends a HIGH signal.
Signal Processing: The Arduino processes these signals to determine the position of the line relative to the robot. If both sensors detect the white surface, the robot moves forward. If the left sensor detects the black line, the robot turns left. If the right sensor detects the black line, the robot turns right.
Motor Control: Based on the processed signals, the Arduino sends commands to the L298N motor driver to control the motors. This allows the robot to adjust its direction and follow the line accurately.

Creators

Fares Mohamed Elshahat Mahmoud
Amr Mohy Mohamed Yousef
Asmaa Mohamed Hamed Ibrahim
Ashrakat Samaha Elsayed Goda
Asmaa Mohamed Mohamed Elsayed
Gehad Basiouny Elsayed Basiouny

Tour Planning

A demonstration of using hard and soft constraints on the Leap™ quantum-classical hybrid constrained quadratic model (CQM) solver.

This example solves a problem of selecting, for a tour divided into several legs of varying lengths and steepness, a combination of locomotion modes (walking, cycling, bussing, and driving) such that one can gain the greatest benefit of outdoor exercise while not exceeding one’s budgeted cost and time.

The techniques used in this example are applicable to commercial problems such as traffic routing—selecting the optimal among available means of transportation, for commuters or deliveries, given constraints of pricing, speed, convenience, and green-energy preferences—or network routing, where the routing of data packets must consider bandwidth, pricing, reliance, service tiers, and latency across numerous hops.

Hard and Soft Constraints
Installation
Usage
Model Overview
Code
License

Hard and Soft Constraints

Constraints for optimization problems are often categorized as either “hard” or “soft”.

Any hard constraint must be satisfied for a solution of the problem to qualify as feasible. Soft constraints may be violated to achieve an overall good solution.

By setting appropriate weights to soft constraints in comparison to the objective and to other soft constraints, you can express the relative importance of such constraints. Soft constraints on binary variables can be one of two types:

linear: the penalty for violating such a constraint is proportional to the value of the violation (i.e., by how much the constraint is violated).
quadratic: the penalty for violating such a constraint is proportional to the square of the value of the violation.

For example, for a soft constraint on the tour’s maximum cost, with a price of 3 for driving, preferring to drive over free locomotion on a leg of length 2 adds a penalty of 6 or 36, for a linear or quadratic constraint, respectively, that goes up, for a leg length of 3, to 9 and 81, respectively. Such a quadratic constraint severely discourages driving on longer legs.

This example enables you to set hard or soft constraints on the tour’s cost, its duration, and the steepest leg one can walk or cycle. The CQM has hard constraints that ensure a single mode of locomotion is selected for each leg and, optionally, prevent driving on legs with toll booths.

Example Results

Some of the variety of results you can obtain from the application of these constraints are shown below for an example tour¹.

1. All constraints are hard.

For this case, acceptable solutions must satisfy all constraints, and the solver was unable to find a feasible solution.

2. Constraints on cost and time are relaxed to soft constraints.

The solver tries to satisfy such constraints but accepts solutions that violate one or more. Now the solver returns a solution. However, it provides little exercise because cycling is not allowed on legs even slightly steeper than the configured maximum.

3. Constraint on slope is also relaxed.

Now the returned solution is to cycle on all but the steepest slopes, gaining exercise by tolerating a wide margin of violations of the slope constraint.

4. Soft constraint on slope is set to quadratic.

Now the solver discriminates sharply between slopes that are just a bit over the configured maximum and those significantly too steep. The returned solution allows for cycling on legs that violate the slope constraint by a narrow margin.

In general, the use of soft constraints can result in imperfect but good solutions to many optimization problems: for example, in three-dimensional bin packing, which addresses problems in areas such as containers, pallets and aircraft, boxes should be fully supported to ensure stability; however, satisfying such a hard constraint might not be possible due to a variety of box sizes or bin size. Using a soft constraint that enables solutions with 70% support might be acceptable. Another example is job shop scheduling, where jobs should complete on time. If this constraint cannot be met due to conflicting constraints, a soft constraint that penalizes delays by length might return good solutions.

Installation

You can run this example without installation in cloud-based IDEs that support the Development Containers specification (aka “devcontainers”).

For development environments that do not support devcontainers, install requirements:

pip install -r requirements.txt

If you are cloning the repo to your local system, working in a virtual environment is recommended.

Usage

Your development environment should be configured to access Leap’s Solvers. You can see information about supported IDEs and authorizing access to your Leap account here.

To run the demo:

python app.py

Access the user interface with your browser at http://127.0.0.1:8050/.

The demo program opens an interface where you can configure tour problems, submit these problems to a CQM solver, and examine the results.

Hover over an input field to see a description of the input and its range of supported values.

Configuring the Tour

The upper-left section of the user interface lets you configure the tour’s legs: how many, how long, and the maximum elevation gain. Additionally, you can configure your budgets of cost and time for the entire tour: modes of locomotion vary in price and speed. For example, walking is free but slower than driving. Finally, you can chose whether or not to add tollbooths randomly to 20% of the legs.

Leg lengths are set to a uniform random value between your configured minimum and maximum values. Steepness is set uniformly at random between zero and ten.

A leg’s steepness affects exercising: a constraint is set to discourage (soft constraint) or disallow (hard constraint) walking or cycling on those legs that exceed the maximum slope you configured.

When you update a tour’s legs, toll booths may be placed at random on some of the legs (each leg has a 20% probability that it is given a tollbooth). These affect driving in a private car (but not bussing): the generated CQM has a hard constraint to not drive on legs with toll booths. This constraint is optional.

Configuring the Constraints

The upper-middle section of the user interface lets you tune the constraints on cost, time, and steepness.

You can select whether to use hard or soft constraints, and for soft constraints, you can set weights and chose between linear or quadratic penalties.

Submitting the Problem for Solution

The upper-right section of the user interface lets you submit your problem to a Leap hybrid CQM solver. The default solver runtime of 5 seconds is used unless you choose to increase it.

Problem Details and Solutions

The lower section’s following tabs contain information about the problem and any found solutions.

Graph: displays the configured problem and any found solutions in three ways:
- Space: displays relative leg lengths, steepness as a color heatmap, and toll booths as icons above the colored bar representing the tour. Modes of locomotion for the best solution found are displayed as icons below it.
- Time: displays relative leg duration and, for the best found solution, the cost per leg as a color heatmap.
- Feasibility: displays feasible and non-feasible solutions in a three-dimensional plot of exercise, cost, and time.
Problem: displays the legs of the tour (length, slope, and toll booths), formatted for reading and for copying into your code.
Solutions: displays the returned solutions, formatted for reading and as a dimod sampleset for copying into your code.
CQM: displays the constrained quadratic model generated for your configured tour and constraints. A good way to learn about the construction of a CQM, is to begin with a minimal problem (a single mode of locomotion, one leg, no tollbooths), study the simple CQM, and watch it change as you increase the problem’s complexity.
Locomotion: contains information about your configured tour, such as the minimum, maximum, and average values of cost and time, and the values for the available modes of locomotion (speed, cost, exercise) that you can configure.

Model Overview

The problem of selecting a mode of locomotion for every leg of the tour to achieve some objective (maximize exercise) given a number of constraints (e.g., do not overpay) can be modeled as an optimization problem with decisions that could either be true or false: for any leg, should one drive? Should one walk?

This model uses up to four binary variables for each leg of the tour, each one representing whether one particular mode of locomotion is used or not. For example, leg number 5 might have the following binary variables and values in one solution:

Binary Variable	Represents	Value in a Particular Solution
`walk_5`	Walk leg 5	False
`cycle_5`	Cycle leg 5	True
`bus_5`	Bus leg 5	False
`drive_5`	Drive leg 5	False

In the solution above, cycling is the mode of locomotion selected for leg 5.

The CQM is built as follows with a single objective and several constraints:

Objective: Maximize Exercise

To maximize exercise on the tour, the CQM objective is to minimize the negative summation of values of exercise set for each locomotion mode across all the tour’s legs.

The terms above are as follows:

Because a single mode of locomotion is selected for each leg (as explained below), all the products but one are zeroed by the binary variables of that leg. For example, in leg 5 in the solution above, the leg length is multiplied by its slope and the exercise value of cycling because, for this leg, the binary variable representing cycling is the only non-zero variable.
Constraint 1: Cost

To discourage or prevent the tour’s cost from exceeding your preferred budget, the CQM sets a constraint that the total cost over all legs is less or equal to your configured cost. It does this by minimizing the summation of leg lengths multiplied by the cost value of locomotion mode for the leg. This can be a hard or soft constraint.

Again, for each leg the only non-zero product has the binary variable representing the selected locomotion mode.
Constraint 2: Time

To discourage or prevent the tour’s duration from exceeding your configured value, the CQM sets a constraint similar to that on cost but with the leg length divided by the value of speed for each mode of locomotion. This can be a hard or soft constraint.
Constraint 3: Steep Legs

To discourage or prevent the selection of exercising on legs where the slope is steeper than your configured maximum, the CQM sets a constraint that for each leg the binary variables representing walking and cycling multiplied by the slope be less or equal to your configured highest slope. This can be a hard or soft constraint.
Constraint 4: Single Mode of Locomotion Per Leg

To ensure a single mode of locomotion is selected for each leg, the sum of the binary variables representing each leg must equal one (a “one-hot” constraint). This is a hard constraint.
Constraint 5: Toll Booths

This optional constraint prevents driving on legs with toll booths. If you choose to enable the placement of tollbooths on some legs (tollbooths may be placed at random on a leg with 20% probability), the CQM sets a constraint that the binary variable representing driving be zero for any leg with a toll booth. This is a hard constraint.

Code

Most the code related to configuring the CQM is in the tour_planning.py file. The remaining files mostly support the user interface.

Note: Standard practice for submitting problems to Leap solvers is to use a dwave-system sampler; for example, you typically use LeapHybridCQMSampler for CQM problems. The code in this example uses the dwave-cloud-client, which enables finer control over communications with the Solver API (SAPI).

If you are learning to submit problems to Leap solvers, use a dwave-system solver, with its higher level of abstraction and thus greater simplicity, as demonstrated in most the code examples of the example collection and in the documentation.

License

Released under the Apache License 2.0. See LICENSE file.

The tour comprises 20 legs of equal length, 2, with budgeted cost of 150 and duration of 5, and a steepest leg for exercising of 2. Cycling (speed 3, cost 2) and bussing (speed 5, cost 4) are the available modes of locomotion. For soft constraints, weights are set to 5. ↩

Tour Planning

A demonstration of using hard and soft constraints on the Leap™ quantum-classical hybrid constrained quadratic model (CQM) solver.

This example solves a problem of selecting, for a tour divided into several legs of varying lengths and steepness, a combination of locomotion modes (walking, cycling, bussing, and driving) such that one can gain the greatest benefit of outdoor exercise while not exceeding one’s budgeted cost and time.

The techniques used in this example are applicable to commercial problems such as traffic routing—selecting the optimal among available means of transportation, for commuters or deliveries, given constraints of pricing, speed, convenience, and green-energy preferences—or network routing, where the routing of data packets must consider bandwidth, pricing, reliance, service tiers, and latency across numerous hops.

Hard and Soft Constraints
Installation
Usage
Model Overview
Code
License

Hard and Soft Constraints

Constraints for optimization problems are often categorized as either “hard” or “soft”.

Any hard constraint must be satisfied for a solution of the problem to qualify as feasible. Soft constraints may be violated to achieve an overall good solution.

By setting appropriate weights to soft constraints in comparison to the objective and to other soft constraints, you can express the relative importance of such constraints. Soft constraints on binary variables can be one of two types:

linear: the penalty for violating such a constraint is proportional to the value of the violation (i.e., by how much the constraint is violated).
quadratic: the penalty for violating such a constraint is proportional to the square of the value of the violation.

For example, for a soft constraint on the tour’s maximum cost, with a price of 3 for driving, preferring to drive over free locomotion on a leg of length 2 adds a penalty of 6 or 36, for a linear or quadratic constraint, respectively, that goes up, for a leg length of 3, to 9 and 81, respectively. Such a quadratic constraint severely discourages driving on longer legs.

This example enables you to set hard or soft constraints on the tour’s cost, its duration, and the steepest leg one can walk or cycle. The CQM has hard constraints that ensure a single mode of locomotion is selected for each leg and, optionally, prevent driving on legs with toll booths.

Example Results

Some of the variety of results you can obtain from the application of these constraints are shown below for an example tour¹.

1. All constraints are hard.

For this case, acceptable solutions must satisfy all constraints, and the solver was unable to find a feasible solution.

2. Constraints on cost and time are relaxed to soft constraints.

The solver tries to satisfy such constraints but accepts solutions that violate one or more. Now the solver returns a solution. However, it provides little exercise because cycling is not allowed on legs even slightly steeper than the configured maximum.

3. Constraint on slope is also relaxed.

Now the returned solution is to cycle on all but the steepest slopes, gaining exercise by tolerating a wide margin of violations of the slope constraint.

4. Soft constraint on slope is set to quadratic.

Now the solver discriminates sharply between slopes that are just a bit over the configured maximum and those significantly too steep. The returned solution allows for cycling on legs that violate the slope constraint by a narrow margin.

In general, the use of soft constraints can result in imperfect but good solutions to many optimization problems: for example, in three-dimensional bin packing, which addresses problems in areas such as containers, pallets and aircraft, boxes should be fully supported to ensure stability; however, satisfying such a hard constraint might not be possible due to a variety of box sizes or bin size. Using a soft constraint that enables solutions with 70% support might be acceptable. Another example is job shop scheduling, where jobs should complete on time. If this constraint cannot be met due to conflicting constraints, a soft constraint that penalizes delays by length might return good solutions.

Installation

You can run this example without installation in cloud-based IDEs that support the Development Containers specification (aka “devcontainers”).

For development environments that do not support devcontainers, install requirements:

pip install -r requirements.txt

If you are cloning the repo to your local system, working in a virtual environment is recommended.

Usage

Your development environment should be configured to access Leap’s Solvers. You can see information about supported IDEs and authorizing access to your Leap account here.

To run the demo:

python app.py

Access the user interface with your browser at http://127.0.0.1:8050/.

The demo program opens an interface where you can configure tour problems, submit these problems to a CQM solver, and examine the results.

Hover over an input field to see a description of the input and its range of supported values.

Configuring the Tour

The upper-left section of the user interface lets you configure the tour’s legs: how many, how long, and the maximum elevation gain. Additionally, you can configure your budgets of cost and time for the entire tour: modes of locomotion vary in price and speed. For example, walking is free but slower than driving. Finally, you can chose whether or not to add tollbooths randomly to 20% of the legs.

Leg lengths are set to a uniform random value between your configured minimum and maximum values. Steepness is set uniformly at random between zero and ten.

A leg’s steepness affects exercising: a constraint is set to discourage (soft constraint) or disallow (hard constraint) walking or cycling on those legs that exceed the maximum slope you configured.

When you update a tour’s legs, toll booths may be placed at random on some of the legs (each leg has a 20% probability that it is given a tollbooth). These affect driving in a private car (but not bussing): the generated CQM has a hard constraint to not drive on legs with toll booths. This constraint is optional.

Configuring the Constraints

The upper-middle section of the user interface lets you tune the constraints on cost, time, and steepness.

You can select whether to use hard or soft constraints, and for soft constraints, you can set weights and chose between linear or quadratic penalties.

Submitting the Problem for Solution

The upper-right section of the user interface lets you submit your problem to a Leap hybrid CQM solver. The default solver runtime of 5 seconds is used unless you choose to increase it.

Problem Details and Solutions

The lower section’s following tabs contain information about the problem and any found solutions.

Graph: displays the configured problem and any found solutions in three ways:
- Space: displays relative leg lengths, steepness as a color heatmap, and toll booths as icons above the colored bar representing the tour. Modes of locomotion for the best solution found are displayed as icons below it.
- Time: displays relative leg duration and, for the best found solution, the cost per leg as a color heatmap.
- Feasibility: displays feasible and non-feasible solutions in a three-dimensional plot of exercise, cost, and time.
Problem: displays the legs of the tour (length, slope, and toll booths), formatted for reading and for copying into your code.
Solutions: displays the returned solutions, formatted for reading and as a dimod sampleset for copying into your code.
CQM: displays the constrained quadratic model generated for your configured tour and constraints. A good way to learn about the construction of a CQM, is to begin with a minimal problem (a single mode of locomotion, one leg, no tollbooths), study the simple CQM, and watch it change as you increase the problem’s complexity.
Locomotion: contains information about your configured tour, such as the minimum, maximum, and average values of cost and time, and the values for the available modes of locomotion (speed, cost, exercise) that you can configure.

Model Overview

The problem of selecting a mode of locomotion for every leg of the tour to achieve some objective (maximize exercise) given a number of constraints (e.g., do not overpay) can be modeled as an optimization problem with decisions that could either be true or false: for any leg, should one drive? Should one walk?

This model uses up to four binary variables for each leg of the tour, each one representing whether one particular mode of locomotion is used or not. For example, leg number 5 might have the following binary variables and values in one solution:

Binary Variable	Represents	Value in a Particular Solution
`walk_5`	Walk leg 5	False
`cycle_5`	Cycle leg 5	True
`bus_5`	Bus leg 5	False
`drive_5`	Drive leg 5	False

In the solution above, cycling is the mode of locomotion selected for leg 5.

The CQM is built as follows with a single objective and several constraints:

Objective: Maximize Exercise

To maximize exercise on the tour, the CQM objective is to minimize the negative summation of values of exercise set for each locomotion mode across all the tour’s legs.

The terms above are as follows:

Because a single mode of locomotion is selected for each leg (as explained below), all the products but one are zeroed by the binary variables of that leg. For example, in leg 5 in the solution above, the leg length is multiplied by its slope and the exercise value of cycling because, for this leg, the binary variable representing cycling is the only non-zero variable.
Constraint 1: Cost

To discourage or prevent the tour’s cost from exceeding your preferred budget, the CQM sets a constraint that the total cost over all legs is less or equal to your configured cost. It does this by minimizing the summation of leg lengths multiplied by the cost value of locomotion mode for the leg. This can be a hard or soft constraint.

Again, for each leg the only non-zero product has the binary variable representing the selected locomotion mode.
Constraint 2: Time

To discourage or prevent the tour’s duration from exceeding your configured value, the CQM sets a constraint similar to that on cost but with the leg length divided by the value of speed for each mode of locomotion. This can be a hard or soft constraint.
Constraint 3: Steep Legs

To discourage or prevent the selection of exercising on legs where the slope is steeper than your configured maximum, the CQM sets a constraint that for each leg the binary variables representing walking and cycling multiplied by the slope be less or equal to your configured highest slope. This can be a hard or soft constraint.
Constraint 4: Single Mode of Locomotion Per Leg

To ensure a single mode of locomotion is selected for each leg, the sum of the binary variables representing each leg must equal one (a “one-hot” constraint). This is a hard constraint.
Constraint 5: Toll Booths

This optional constraint prevents driving on legs with toll booths. If you choose to enable the placement of tollbooths on some legs (tollbooths may be placed at random on a leg with 20% probability), the CQM sets a constraint that the binary variable representing driving be zero for any leg with a toll booth. This is a hard constraint.

Code

Most the code related to configuring the CQM is in the tour_planning.py file. The remaining files mostly support the user interface.

Note: Standard practice for submitting problems to Leap solvers is to use a dwave-system sampler; for example, you typically use LeapHybridCQMSampler for CQM problems. The code in this example uses the dwave-cloud-client, which enables finer control over communications with the Solver API (SAPI).

If you are learning to submit problems to Leap solvers, use a dwave-system solver, with its higher level of abstraction and thus greater simplicity, as demonstrated in most the code examples of the example collection and in the documentation.

License

Released under the Apache License 2.0. See LICENSE file.

The tour comprises 20 legs of equal length, 2, with budgeted cost of 150 and duration of 5, and a steepest leg for exercising of 2. Cycling (speed 3, cost 2) and bussing (speed 5, cost 4) are the available modes of locomotion. For soft constraints, weights are set to 5. ↩

Data analytics and prediction using Netezza Performance Server

In this code pattern, we will learn about how users and developers interested in leveraging the development and use of analytic algorithms to perform research or other business-related activities using Netezza Performance Server. Netezza a.k.a. Netezza or INZA, enables data mining tasks on large data sets using the computational power and parallelization mechanisms provided by the Netezza appliance. The parallel architecture of the Netezza database environment enables high-performance computation on large data sets, making it the ideal platform for largescale data mining applications.

Netezza has in-database Analytics packages for mining the spectrum of data set sizes. IBM Netezza In-Database Analytics is a data mining application that includes many of the key techniques and popular real-world algorithms used with data sets.

In this code pattern, we will load Jupyter notebook using IBM Cloud Pak for Data (CP4D) platform. The notebook has steps to connect to Netezza and use In-Database analytic functions to analyze the data and also run machine learning algorithms which allows you to predict and forecast data. In order to access analytical functions of Netezza, you should install INZA module into the Netezza server. All of the analytical functions are under INZA schema AND NZA database.

In this code pattern, we will be using energy price dataset and analyze the data using Jupyter Notebook using IBM Cloud Pak for Data (CP4D) platform. We will walk you through step by step on:

Analyzing data using Netezza In-Database analytic functions.
Creating machine learning models using Netezza In-Database machine learning algorithms.

Flow

User loads Jupyter notebook to IBM Cloud Pak for Data.
User connect to Netezza using NZPY connector.
User loads and analyzes data from Netezza Performance Server.
Netezza creates models using in-database analytics functions.
User forecasts and predicts energy price using the model.

Included components

Netezza Performance Server: IBM Netezza® Performance Server for IBM Cloud Pak® for Data is an advanced data warehouse and analytics platform available both on premises and on cloud.
IBM Cloud Pak for Data Platform : IBM Cloud Pak® for Data is a fully-integrated data and AI platform that modernizes how businesses collect, organize and analyze data to infuse AI throughout their organizations.
Jupyter Notebook: An open-source web application that allows you to create and share documents that contain live code, equations, visualizations and narrative text.

Steps

Clone the repo
Create a new project in CP4D
Add connection to Netezza server
Upload data assets
Load notebook to your project
Install NZPY
Configure NPS connection in notebook
Load data to Netezza
Visualize energy price data
Analyze energy price data
Create machine learning model using timeseries algorithm

1. Clone the repo

git clone https://github.com/IBM/prediction-using-netezza-in-database-analytics-functions.git

2. Create a new project in CP4D

Log into IBM Cloud Pak for Data and create a new project, by selecting Projects from hamburger menu and clicking New Project +.

Then, choose Analytics project, and select Create empty project, provide the project Name and click Create.

3. Add connection to Netezza server

From the project page select, Add to project +, choose Connection

In the next screen, choose From Global tab NPS for pure analytics

Fill out the connection details, Test the connection and if it is successful, click Create.

NOTE: for database you can use system for now. We will be creating our own database and using that in our notebook.

NOTE: Save the name of the connection for later use.

4. Upload data assets

Upload energy_price.csv from the cloned repository folder by going to doc/source/data. In the project home page, on the Assets tab, click the data icon, and browse to upload the file. You will have to unzip the data locally first before you upload.

5. Load notebook to your project

From the project page, click Add to project +, and select notebook from the options:

Select From URL tab and fill in the name and provide the Notebook URL as below, and click Create notebook.

https://raw.githubusercontent.com/IBM/prediction-using-netezza-in-database-analytics-functions/main/doc/source/notebooks/PredictionUsingINZAfunctions.ipynb

6. Install NZPY

Run the cell that contains pip install nzpy which is the only pre-requisite for this notebook. nzpy lets us connect to the server and allow us to run DDL and DML SQLs.

7. Configure NPS connection in notebook

Open the notebook in edit mode, and in the cell with title Connecting to the database, provide the name of the connection that you created earlier in step 2.
Run that cell and the cell below and make sure you get the 10 database names. This ensures that we have successfully connected to our remote NPS server.

OR

Add the connection detail directly into the notebook by replacing the values of the following in the connection cell.

# Setup connection and use the credentials from the connection. Replace the following values before you start

# from project_lib import Project
# project = Project.access()
# NPS_credentials = project.get_connection(name="NPS")

## OR

username="<username>"
password="<password>"
host="<hostname or ip>"
database="system"

8. Load data to Netezza

We will be loading the energy_price.csv file to Netezza using external table feature of Netezza. First we create the table and load csv file directly to Netezza like below:

## initialize cursor
cursor=con.cursor()
## drop table if exists
table='energy_price'
cursor.execute(f'drop table {table} if exists')

cursor.execute('''
CREATE TABLE nzpy_test..energy_price (
    temperature    REAL,
    pressure    REAL,
    humidity    REAL,
    wind_speed    REAL,
    precipitation    REAL,
    price    REAL,
    price_hour    TIMESTAMP
)
''')
print('Table energy price successfully created')
## Load the data to Netezza

with con.cursor() as cursor:
    cursor.execute('''
        insert into nzpy_test..energy_price
            select * from external '/project_data/data_asset/energy_price.csv'
                using (
                    delim ',' 
                    remotesource 'odbc'
                    )''')
    print(f"{cursor.rowcount} rows inserted")

9. Visualize energy price data

In this part of the notebook, we will be exploring the data, datatypes and correlation between different columns with price. You can run the cell on this part step by step. The overall graph group by dates is shown below:

updDf.groupby('DATES').sum().plot.line().legend(loc='upper left',bbox_to_anchor=(1.05, 1))

In the above graph, you can see the correlation between temperature, pressure, humidity, wind speed, precipitation with price.

Similarly, you can see the correlation between individual columns (temperature, pressure, humidity, wind speed, precipitaion) with Price as well.

10. Analyze energy price data

In-database analytic functions such as summary1000 and cov lets you analyze your data. It automatically give you statistical analysis of each columns. The summary1000 function gives you statistics like distinct values, average, variance, standard deviation etc. as shown below

summaryDF = pd.read_sql("CALL nza..SUMMARY1000('intable=ENERGY_PRICE, outtable=PRICE_TEMP_ANALYSIS');", con)
summaryAnalysisDF = pd.read_sql('select * from PRICE_TEMP_ANALYSIS', con)
summaryAnalysisDF.head()

Also you can call nza..COV function to get the covariance. Below code show the relation between temperature and price column.

    # cursor.execute("drop table PRICE_TEMP_ANALYSIS if exists")
    pd.read_sql("CALL nza..DROP_TABLE('PRICE_TEMP_ANALYSIS')",con);

    # use the Covariance function, store results in PRICE_TEMP_ANALYSIS
    pd.read_sql("CALL nza..COV('intable=ENERGY_PRICE, incolumn=TEMPERATURE;PRICE,outtable=PRICE_TEMP_ANALYSIS');",con)
    # bring the results table into the notebook - or just query it directly in Netezza
    pd.read_sql('select * from PRICE_TEMP_ANALYSIS', con)

11. Create machine learning model using timeseries algorithm

First we will cleanup the training data set. Since we are using time sereies algorithm, the timestamp column will have to converted to date format to represent each day and use the row id as the unique id.

# clean up the analysis tables
pd.read_sql("CALL nza..DROP_TABLE('PRICE_TEMP_NEW')",con);
# the INZA functions usully need a unique ID for each row of data, we use the internal ROWID for this
cursor=con.cursor()
cursor.execute("create table PRICE_TEMP_NEW as select *,DATE(PRICE_HOUR) AS DAY,ROWID as ID from ENERGY_PRICE")
priceTempNewDf = pd.read_sql('select * from PRICE_TEMP_NEW limit 10', con)

Now lets create the model using time series algorithm, by calling the nza..timeseries function:

# drop model if it was already created. Initially you might want to comment this out
# and run as it throws error if if doesn't find the model
cursor.execute("CALL nza..DROP_MODEL('model=PRICE_TIME');")

# we now call a timeseries algorithm to create a model, the model name is PRICE_TIME
pd.read_sql("CALL nza..TIMESERIES('model=PRICE_TIME, intable=ADMIN.PRICE_TEMP_NEW, by=DAY, time=PRICE_HOUR, target=PRICE' );",con)

Once the query execution is completed, you can check the v_nza_models table to see if the model has been created.

# we can list our models here
pd.read_sql("select * from v_nza_models;",con=con)

The NZA_META_<model_name>_FORECAST table holds forecast values. The table contains one line for each time series and point in time for which a forecast has been made, with the following columns. The following function gives the forecasting results applied to the timeseries dataset.

pd.read_sql("select * from NZA_META_PRICE_TIME_FORECAST;", con=con)

License

This code pattern is licensed under the Apache Software License, Version 2. Separate third party code objects invoked within this code pattern are licensed by their respective providers pursuant to their own separate licenses. Contributions are subject to the Developer Certificate of Origin, Version 1.1 (DCO) and the Apache Software License, Version 2.

Apache Software License (ASL) FAQ

Blog

Traffic Sign Recognition

Data Set Summary & Exploration

1. Provide a basic summary of the data set and identify where in your code the summary was done. In the code, the analysis should be done using python, numpy and/or pandas methods rather than hardcoding results manually.

2. Data set exploratoration and visualization

Include an exploratory visualization of the dataset and identify where the code is in your code file.

Design and Test a Model Architecture

Preprocessed data set

2. Data set overview

3. Model

Describe, and identify where in your code, what your final model architecture looks like including model type, layers, layer sizes, connectivity, etc.) Consider including a diagram and/or table describing the final model.

Training

hyperparameters

Training process

Test a Model on New Images

1. Six new German traffic signs

Predictions result

Vislualization softmax predictions

Traffic Sign Recognition

Data Set Summary & Exploration

1. Provide a basic summary of the data set and identify where in your code the summary was done. In the code, the analysis should be done using python, numpy and/or pandas methods rather than hardcoding results manually.

2. Data set exploratoration and visualization

Include an exploratory visualization of the dataset and identify where the code is in your code file.

Design and Test a Model Architecture

Preprocessed data set

2. Data set overview

3. Model

Describe, and identify where in your code, what your final model architecture looks like including model type, layers, layer sizes, connectivity, etc.) Consider including a diagram and/or table describing the final model.

Training

hyperparameters

Training process

Test a Model on New Images

1. Six new German traffic signs

Predictions result

Vislualization softmax predictions

DiztinGUIsh (“Diz”)

Features

Main features

Other useful features

Details

Doesn’t this already exist?

Features

“Distinguish” but with a ‘z’ because it’s rad. It’s also a GUI application so might as well highlight that fact.”

Other tips

PyCEP

Comece por aqui

Requerimentos

Instalação

PIP

Poetry

Fazendo uma consulta

Acessando os dados da consulta

Este projeto utiliza

Autor

Licença

Airbnb_Analysis

Problem Statement

Aim

Requirements

Workflow

Conclusion

Line Follower Car Robot

Components

How It Works

Creators

Tour Planning

Hard and Soft Constraints

Example Results

Installation

Usage

Configuring the Tour

Configuring the Constraints

Submitting the Problem for Solution

Problem Details and Solutions

Model Overview

Code

License

Footnotes

Tour Planning

Hard and Soft Constraints