Blog

  • German-Traffic-Sign-Classification

    Traffic Sign Recognition


    Build a Traffic Sign Recognition Project

    The goals / steps of this project are the following:

    • Load the data set (see below for links to the project data set)
    • Explore, summarize and visualize the data set
    • Design, train and test a model architecture
    • Use the model to make predictions on new images
    • Analyze the softmax probabilities of the new images
    • Summarize the results with a written report

    Data Set Summary & Exploration

    1. Provide a basic summary of the data set and identify where in your code the summary was done. In the code, the analysis should be done using python, numpy and/or pandas methods rather than hardcoding results manually.

    I use pickle to import the data into ipython notebook. Then I use numpy to calculate the statistics of the data set.

      import numpy as np
    
      # Number of training examples
      n_train = len(X_train)
    
      # Number of testing examples.
      n_test = len(X_test)
    
      # the shape of an traffic sign image
      image_shape = X_train[0].shape
    
      # number of unique classes/labels there are in the dataset.
      n_classes = len(np.unique(y_train))
    
      print("Number of training examples =", n_train)
      print("Number of testing examples =", n_test)
      print("Image data shape =", image_shape)
      print("Number of classes =", n_classes)
    • Image data shape = (32, 32, 3)
    • Number of classes = 43

    2. Data set exploratoration and visualization

    Include an exploratory visualization of the dataset and identify where the code is in your code file.

    The chart below is the label distribution, which can show the label distribution are not even, some class have more data then others. label distribution

    The img below is the chart to show one iamge from each class. Some of them are really dark, which makes difficulties hgiher. allClass

    import matplotlib.pyplot as plt
    # Visualizations will be shown in the notebook.
    %matplotlib inline
    #edit matplot size in notebook
    plt.rcParams["figure.figsize"] = [15,18]
    
    
    fig = plt.figure(figsize=(5,5))
    f, axarr = plt.subplots(9, 5)
    # make it as a sigle dim. array
    plts = np.reshape(axarr, -1)
    
    #display one sample from all class
    for classId in np.unique(y_train):
        thePicIndex = np.where(y_train == classId)[0]
        myplt = plts[classId]
        myplt.imshow(X_train[thePicIndex[25]])
        myplt.set_title("class " + str(classId))
    
    plt.tight_layout()

    Design and Test a Model Architecture

    Preprocessed data set

    ####1. Describe how, and identify where in your code, you preprocessed the image data. What tecniques were chosen and why did you choose these techniques? Consider including images showing the output of each preprocessing technique. Pre-processing refers to techniques such as converting to grayscale, normalization, etc.

    I have few ways to Preprocessed the data set:

    1. Turn raw images into gray Scale
    2. Normalize the image to range(0, 1)
    3. Augmente the images

    The advantage of normalizing image is to make gradient descent converge faster.

    Gray scale image can reduce the image size, which helps to train the model easier. Image normalized in to range (0, 1) can help model to learn data set. Since the size of dataset not large enough, augmentation on image is necessary.

    The Pre-processing params.

    • ANGLE_ROTATE = 25
    • TRANSLATION = 0.2
    • NB_NEW_IMAGES = 10000
    def toGrayscale(rgb):
        result = np.zeros((len(rgb), 32, 32,1))
        result[...,0] = np.dot(rgb[...,:3], [0.299, 0.587, 0.114])  
        return result
    
    # normalize the images
    def normalizeGrascale(grayScaleImages):
        return grayScaleImages/255
    
    def processImages(rgbImages):
        return np.array(normalizeGrascale(toGrayscale(rgbImages)))
    
    def transformOnHot(nbClass, listClass):
        oneHot = np.zeros((len(listClass), nbClass))
        oneHot[np.arange(len(listClass)), listClass] = 1
        return np.array(oneHot)
    
    def augmenteImage(image, angle, translation):
        h, w, c = image.shape
    
        # random rotate
        angle_rotate = np.random.uniform(-angle, angle)
        rotation_mat = cv2.getRotationMatrix2D((w//2, h//2), angle_rotate, 1)
    
        img = cv2.warpAffine(image, rotation_mat, (IMG_SIZE, IMG_SIZE))
    
        # random translation
        x_offset = translation * w * np.random.uniform(-1, 1)
        y_offset = translation * h * np.random.uniform(-1, 1)
        mat = np.array([[1, 0, x_offset], [0, 1, y_offset]])
    
        # return warpped img
        return cv2.warpAffine(img, mat, (w, h))

    Image below is the Image Augumentation result. ImageAugumentation

    Image below is gray scale of above images grayScale

    2. Data set overview

    2. Describe how, and identify where in your code, you set up training, validation and testing data. How much data was in each set? Explain what techniques were used to split the data into these sets. (OPTIONAL: As described in the “Stand Out Suggestions” part of the rubric, if you generated additional data for training, describe why you decided to generate additional data, how you generated the data, identify where in your code, and provide example images of the additional data)

    since the data set provided a validation data set, thus i do not use data set spliting for validation.

    • Number of training examples = 34799
    • Number of validation examples = 4410
    • Number of testing examples = 12630

    3. Model

    Describe, and identify where in your code, what your final model architecture looks like including model type, layers, layer sizes, connectivity, etc.) Consider including a diagram and/or table describing the final model.

    I use a model simliar to AlexNet with smaller number kernel used in conv. layer and less node in hidden layer. It’s because the input size is smaller wiuth this problem.

    finalGraph

          ## ini network
          x, y, keep_prob, logits, optimizer, predictions, accuracy = nn()
    
          # Dro save model
          saver = tf.train.Saver()
    
          # TensorBoard record
          train_writer = tf.summary.FileWriter("logs/train", sess.graph)  
    
          # Variable initialization
          init = tf.global_variables_initializer()
          sess.run(init)
    
          # save the acc history
          history = []
    
    
          # Record time elapsed for performance check
          last_time = time.time()
          train_start_time = time.time()
    
          # Run NB_EPOCH epochs of training
          for epoch in range(NB_EPOCH):
              generator = batchGenerator(x_train_processed, y_train_processed)
              while generator.hasNext():
                  x_, y_ = generator.next_batch(BATCH_SIZE)
                  sess.run(optimizer, feed_dict={x: x_, y: y_, keep_prob: DROPOUT_PROB})
    
              # Calculate Accuracy Training set
              train_acc = calculate_accuracy(32, accuracy, x, y, x_train_processed, y_train_processed, keep_prob, sess)
    
              # Calculate Accuracy Validation set
              valid_acc = calculate_accuracy(32, accuracy, x, y, x_valid_processed, y_valid_processed, keep_prob, sess)
    
              # Record and report train/validation/test accuracies for this epoch
              history.append((train_acc, valid_acc))
    
              # Print log
              if (epoch+1) % 10 == 0 or epoch == 0 or (epoch+1) == NB_EPOCH:
                  print('Epoch %d -- Train acc.: %.4f, valid. acc.: %.4f, used: %.2f sec' %\
                      (epoch+1, train_acc, valid_acc, time.time() - last_time))
                  last_time = time.time()
    
          total_time = time.time() - train_start_time
          print('Training time: %.2f sec (%.2f min)' % (total_time, total_time/60))

    Training

    I create a class batchGenerator to manage batch train, which perform batch mangement, it helps the train fucntion cleaner. It also have shuffle function, which randomize the datasets.

    hyperparameters
    • learning rate 0.005
    • drop out rate 0.5
    • optimizor
    • total epochs
    • batch size 128
    • optimizer: gradient descent algorithm
    • Early stop patience 3 # how many epoch will watch for early stop
    • Early stop min_delta 0.02 # min. threshold delta change in accuracy
    class batchGenerator:
        def __init__(self, x, y, shuffle= True):
            self.dataX = x
            self.dataY = y
            self.totalData = len(self.dataX)
            if shuffle:
                self.shuffle()
    
        def printLog(self):
            if len(self.dataX):
                print(str(totalData-len(self.dataX))+"https://github.com/"+str(totalData) , end = '\r')
            else:
                print(str(totalData-len(self.dataX))+"https://github.com/"+str(totalData))
    
        def shuffle(self):
            newOrder = np.arange(len(self.dataX))
            np.random.shuffle(newOrder)
            self.dataX = self.dataX[newOrder]
            self.dataY = self.dataY[newOrder]
    
        def hasNext(self):
            return len(self.dataX)>0
    
        def next_batch(self,size):
            if(len(self.dataX) < size):
                size = len(self.dataX)
            tempX = self.dataX[0: size]
            self.dataX = self.dataX[size:]
            tempY = self.dataY[0: size]
            self.dataY = self.dataY[size:]
    
            return np.array(tempX), np.array(tempY)

    Training process

    accResult The chart above is the accuracy of training and validation during 40 epoch training.

    The final model accuracy were:

    • training set accuracy of 0.9769
    • validation set accuracy of 0.9395
    • test set accuracy of 0.9375

    The code for calculating the accuracy of the model is located in the the Ipython notebook.


    I have try different model.

    • Conv 3x3x16 strides 1X1
    • Conv 5x5x64 strides 2X2
    • Conv 3x3x128 strides 1X1
    • Fc 4096
    • Fc 1024
    • Fc 43 ps no normalization applied

    failGraph failResult

    The model need longer time to train to 0.8 acc., which can consider a inefficient model design. The major mistake in this model wasn’t apply batch normalization in the training, which make it need longer to train and do not fully use the nonlinearity of relu.


    I tried using keras to train the model also. I use a smaller model, it gave a pretty good result without data augumentation, ~93%. predict test data 0.933096 code kerasGraph accKeras

    Test a Model on New Images

    Describe how certain the model is when predicting on each of the five new images by looking at the softmax probabilities for each prediction and identify where in your code softmax probabilities were outputted. Provide the top 5 softmax probabilities for each image along with the sign type of each probability.

    1. Six new German traffic signs

    Here are six German traffic signs that I found on the web: newImgs

    The fifth image much more difficult, it’s because the image not only include single sign, thus it might make model harder to classify into a class.

    For others, it have different lighting compare to dataset, they are completly new images, it will be a chanllege to the model.

    Predictions result

    Here are the results of the prediction:

    Image Prediction
    Roundabout mandatory Roundabout mandatory
    Ahead only Ahead only
    Yield Yield
    Speed limit (30km/h) Speed limit (30km/h)
    Road work Road work
    General caution Bicycles crossing

    The model was able to correctly guess 5 of the 6 traffic signs, which gives an accuracy of 83.33 %.

    Vislualization softmax predictions

    "newImgResult"

    For sample image -General caution, it seems predict a completly wrong class.

    class softmax
    Bicycles crossing 84.8%
    Bumpy Road 13.9%
    Children crossing 0.3%
    General caution 0.3%
    speed limit(30rm/h) 0.3%

    It seems model do not handle well when image not fully fit with sign. In General caution sample image have a little sign below General caution, it might the reason make model misclassifying it.

    For the other sample images, it seems model predit well, all of them have a dominant softmax value over otehr classes.

    Visit original content creator repository https://github.com/hiit-tabata/German-Traffic-Sign-Classification
  • German-Traffic-Sign-Classification

    Traffic Sign Recognition


    Build a Traffic Sign Recognition Project

    The goals / steps of this project are the following:

    • Load the data set (see below for links to the project data set)
    • Explore, summarize and visualize the data set
    • Design, train and test a model architecture
    • Use the model to make predictions on new images
    • Analyze the softmax probabilities of the new images
    • Summarize the results with a written report

    Data Set Summary & Exploration

    1. Provide a basic summary of the data set and identify where in your code the summary was done. In the code, the analysis should be done using python, numpy and/or pandas methods rather than hardcoding results manually.

    I use pickle to import the data into ipython notebook. Then I use numpy to calculate the statistics of the data set.

      import numpy as np
    
      # Number of training examples
      n_train = len(X_train)
    
      # Number of testing examples.
      n_test = len(X_test)
    
      # the shape of an traffic sign image
      image_shape = X_train[0].shape
    
      # number of unique classes/labels there are in the dataset.
      n_classes = len(np.unique(y_train))
    
      print("Number of training examples =", n_train)
      print("Number of testing examples =", n_test)
      print("Image data shape =", image_shape)
      print("Number of classes =", n_classes)
    • Image data shape = (32, 32, 3)
    • Number of classes = 43

    2. Data set exploratoration and visualization

    Include an exploratory visualization of the dataset and identify where the code is in your code file.

    The chart below is the label distribution, which can show the label distribution are not even, some class have more data then others. label distribution

    The img below is the chart to show one iamge from each class. Some of them are really dark, which makes difficulties hgiher. allClass

    import matplotlib.pyplot as plt
    # Visualizations will be shown in the notebook.
    %matplotlib inline
    #edit matplot size in notebook
    plt.rcParams["figure.figsize"] = [15,18]
    
    
    fig = plt.figure(figsize=(5,5))
    f, axarr = plt.subplots(9, 5)
    # make it as a sigle dim. array
    plts = np.reshape(axarr, -1)
    
    #display one sample from all class
    for classId in np.unique(y_train):
        thePicIndex = np.where(y_train == classId)[0]
        myplt = plts[classId]
        myplt.imshow(X_train[thePicIndex[25]])
        myplt.set_title("class " + str(classId))
    
    plt.tight_layout()

    Design and Test a Model Architecture

    Preprocessed data set

    ####1. Describe how, and identify where in your code, you preprocessed the image data. What tecniques were chosen and why did you choose these techniques? Consider including images showing the output of each preprocessing technique. Pre-processing refers to techniques such as converting to grayscale, normalization, etc.

    I have few ways to Preprocessed the data set:

    1. Turn raw images into gray Scale
    2. Normalize the image to range(0, 1)
    3. Augmente the images

    The advantage of normalizing image is to make gradient descent converge faster.

    Gray scale image can reduce the image size, which helps to train the model easier. Image normalized in to range (0, 1) can help model to learn data set. Since the size of dataset not large enough, augmentation on image is necessary.

    The Pre-processing params.

    • ANGLE_ROTATE = 25
    • TRANSLATION = 0.2
    • NB_NEW_IMAGES = 10000
    def toGrayscale(rgb):
        result = np.zeros((len(rgb), 32, 32,1))
        result[...,0] = np.dot(rgb[...,:3], [0.299, 0.587, 0.114])  
        return result
    
    # normalize the images
    def normalizeGrascale(grayScaleImages):
        return grayScaleImages/255
    
    def processImages(rgbImages):
        return np.array(normalizeGrascale(toGrayscale(rgbImages)))
    
    def transformOnHot(nbClass, listClass):
        oneHot = np.zeros((len(listClass), nbClass))
        oneHot[np.arange(len(listClass)), listClass] = 1
        return np.array(oneHot)
    
    def augmenteImage(image, angle, translation):
        h, w, c = image.shape
    
        # random rotate
        angle_rotate = np.random.uniform(-angle, angle)
        rotation_mat = cv2.getRotationMatrix2D((w//2, h//2), angle_rotate, 1)
    
        img = cv2.warpAffine(image, rotation_mat, (IMG_SIZE, IMG_SIZE))
    
        # random translation
        x_offset = translation * w * np.random.uniform(-1, 1)
        y_offset = translation * h * np.random.uniform(-1, 1)
        mat = np.array([[1, 0, x_offset], [0, 1, y_offset]])
    
        # return warpped img
        return cv2.warpAffine(img, mat, (w, h))

    Image below is the Image Augumentation result. ImageAugumentation

    Image below is gray scale of above images grayScale

    2. Data set overview

    2. Describe how, and identify where in your code, you set up training, validation and testing data. How much data was in each set? Explain what techniques were used to split the data into these sets. (OPTIONAL: As described in the “Stand Out Suggestions” part of the rubric, if you generated additional data for training, describe why you decided to generate additional data, how you generated the data, identify where in your code, and provide example images of the additional data)

    since the data set provided a validation data set, thus i do not use data set spliting for validation.

    • Number of training examples = 34799
    • Number of validation examples = 4410
    • Number of testing examples = 12630

    3. Model

    Describe, and identify where in your code, what your final model architecture looks like including model type, layers, layer sizes, connectivity, etc.) Consider including a diagram and/or table describing the final model.

    I use a model simliar to AlexNet with smaller number kernel used in conv. layer and less node in hidden layer. It’s because the input size is smaller wiuth this problem.

    finalGraph

          ## ini network
          x, y, keep_prob, logits, optimizer, predictions, accuracy = nn()
    
          # Dro save model
          saver = tf.train.Saver()
    
          # TensorBoard record
          train_writer = tf.summary.FileWriter("logs/train", sess.graph)  
    
          # Variable initialization
          init = tf.global_variables_initializer()
          sess.run(init)
    
          # save the acc history
          history = []
    
    
          # Record time elapsed for performance check
          last_time = time.time()
          train_start_time = time.time()
    
          # Run NB_EPOCH epochs of training
          for epoch in range(NB_EPOCH):
              generator = batchGenerator(x_train_processed, y_train_processed)
              while generator.hasNext():
                  x_, y_ = generator.next_batch(BATCH_SIZE)
                  sess.run(optimizer, feed_dict={x: x_, y: y_, keep_prob: DROPOUT_PROB})
    
              # Calculate Accuracy Training set
              train_acc = calculate_accuracy(32, accuracy, x, y, x_train_processed, y_train_processed, keep_prob, sess)
    
              # Calculate Accuracy Validation set
              valid_acc = calculate_accuracy(32, accuracy, x, y, x_valid_processed, y_valid_processed, keep_prob, sess)
    
              # Record and report train/validation/test accuracies for this epoch
              history.append((train_acc, valid_acc))
    
              # Print log
              if (epoch+1) % 10 == 0 or epoch == 0 or (epoch+1) == NB_EPOCH:
                  print('Epoch %d -- Train acc.: %.4f, valid. acc.: %.4f, used: %.2f sec' %\
                      (epoch+1, train_acc, valid_acc, time.time() - last_time))
                  last_time = time.time()
    
          total_time = time.time() - train_start_time
          print('Training time: %.2f sec (%.2f min)' % (total_time, total_time/60))

    Training

    I create a class batchGenerator to manage batch train, which perform batch mangement, it helps the train fucntion cleaner. It also have shuffle function, which randomize the datasets.

    hyperparameters
    • learning rate 0.005
    • drop out rate 0.5
    • optimizor
    • total epochs
    • batch size 128
    • optimizer: gradient descent algorithm
    • Early stop patience 3 # how many epoch will watch for early stop
    • Early stop min_delta 0.02 # min. threshold delta change in accuracy
    class batchGenerator:
        def __init__(self, x, y, shuffle= True):
            self.dataX = x
            self.dataY = y
            self.totalData = len(self.dataX)
            if shuffle:
                self.shuffle()
    
        def printLog(self):
            if len(self.dataX):
                print(str(totalData-len(self.dataX))+"https://github.com/"+str(totalData) , end = '\r')
            else:
                print(str(totalData-len(self.dataX))+"https://github.com/"+str(totalData))
    
        def shuffle(self):
            newOrder = np.arange(len(self.dataX))
            np.random.shuffle(newOrder)
            self.dataX = self.dataX[newOrder]
            self.dataY = self.dataY[newOrder]
    
        def hasNext(self):
            return len(self.dataX)>0
    
        def next_batch(self,size):
            if(len(self.dataX) < size):
                size = len(self.dataX)
            tempX = self.dataX[0: size]
            self.dataX = self.dataX[size:]
            tempY = self.dataY[0: size]
            self.dataY = self.dataY[size:]
    
            return np.array(tempX), np.array(tempY)

    Training process

    accResult The chart above is the accuracy of training and validation during 40 epoch training.

    The final model accuracy were:

    • training set accuracy of 0.9769
    • validation set accuracy of 0.9395
    • test set accuracy of 0.9375

    The code for calculating the accuracy of the model is located in the the Ipython notebook.


    I have try different model.

    • Conv 3x3x16 strides 1X1
    • Conv 5x5x64 strides 2X2
    • Conv 3x3x128 strides 1X1
    • Fc 4096
    • Fc 1024
    • Fc 43 ps no normalization applied

    failGraph failResult

    The model need longer time to train to 0.8 acc., which can consider a inefficient model design. The major mistake in this model wasn’t apply batch normalization in the training, which make it need longer to train and do not fully use the nonlinearity of relu.


    I tried using keras to train the model also. I use a smaller model, it gave a pretty good result without data augumentation, ~93%. predict test data 0.933096 code kerasGraph accKeras

    Test a Model on New Images

    Describe how certain the model is when predicting on each of the five new images by looking at the softmax probabilities for each prediction and identify where in your code softmax probabilities were outputted. Provide the top 5 softmax probabilities for each image along with the sign type of each probability.

    1. Six new German traffic signs

    Here are six German traffic signs that I found on the web: newImgs

    The fifth image much more difficult, it’s because the image not only include single sign, thus it might make model harder to classify into a class.

    For others, it have different lighting compare to dataset, they are completly new images, it will be a chanllege to the model.

    Predictions result

    Here are the results of the prediction:

    Image Prediction
    Roundabout mandatory Roundabout mandatory
    Ahead only Ahead only
    Yield Yield
    Speed limit (30km/h) Speed limit (30km/h)
    Road work Road work
    General caution Bicycles crossing

    The model was able to correctly guess 5 of the 6 traffic signs, which gives an accuracy of 83.33 %.

    Vislualization softmax predictions

    "newImgResult"

    For sample image -General caution, it seems predict a completly wrong class.

    class softmax
    Bicycles crossing 84.8%
    Bumpy Road 13.9%
    Children crossing 0.3%
    General caution 0.3%
    speed limit(30rm/h) 0.3%

    It seems model do not handle well when image not fully fit with sign. In General caution sample image have a little sign below General caution, it might the reason make model misclassifying it.

    For the other sample images, it seems model predit well, all of them have a dominant softmax value over otehr classes.

    Visit original content creator repository https://github.com/hiit-tabata/German-Traffic-Sign-Classification
  • DiztinGUIsh

    DiztinGUIsh (“Diz”)

    Build Status

    A Super NES ROM Disassembler and tracelog capture/analysis tool with a focus on collaborative workflow UX. Exports .asm files ready to be compiled back into the original binary. Written in Winforms/C#.

    Diz tools suite: image

    Official support channel is #diztinguish in the https://sneslab.net/ discord


    Features

    Main features

    Disassembling programs (like SNES games) for some CPU architectures (like the SNES’s 658016) is a pain because you have to know a lot of information about the program at the point where it’s running. Diz is designed to make this less of a nightmare.

    Demo of basic disassembling: ezgif com-gif-maker

    View more docs here: https://github.com/IsoFrieze/DiztinGUIsh/blob/master/Diz.App.Winforms/dist/docs/HELP.md


    Realtime tracelog capturing: We provide a tight integration with a custom BSNES build to capture CPU tracelog data over a socket connection. You don’t have to play the game at 2FPS anymore, or deal with wrangling gigabyte-sized tracelog files. Simply hit ‘capture’ and Diz will talk directly to a running BSNES CPU, capturing data for as long as you like. Turn the ROM visualizer on and watch this process in realtime.

    ezgif com-gif-maker image

    For more details, visit the Tracelog capturing tutorial

    Other useful features

    • Tracelog file import support for Bizhawk and BSNES (record where the CPU is executing and what flags are set)
    • BSNES usage map import / Bizhawk CDL import (record which sections of ROM are code vs data)
    • Annotation of ROM and RAM addresses, labels, and comments. These are exported in the assembly output for humans
    • Merge-friendly XML based file format. Save your project file with a .dizraw extension (~1.5MB), and the uncompressed XML is easy to share, collaborate, and merge with other people easily. Great for group aggregration projects or building a database from various sources of info laying around the internet. Re-export the assembly and generate code with everyone’s collective efforts stored in one place. Say goodbye to search+replace for adding labels and variable names all over the place.
    • ROM visualizer, view which parts of the ROM you’ve marked as code vs data, and see visual progress.
    • C# .NET WinForms app, easy to add features to. Write your own plugins or use our plumbing or GUI as a base for your own tools.

    NOTE: Works fine with stock asar though, there’s a bugfix you may want:

    Details

    Doesn’t this already exist?

    There is at least one 65C816 disassembler out there already. The biggest issue with it (not with that program, but with disassembling 65C816 in general) is that some instructions assemble to different sizes depending on context. This makes it difficult to automate.

    A ROM contains two broad categories of stuff in it: code and data. A perfect disassembler would isolate the code, disassemble it, and leave the data as it is (or maybe neatly format it). Differentiating data from code is already kinda hard, especially if the size of the data isn’t explicitly stated. A perfect program would need context to do its job. Turns out that keeping track of all memory and providing context for these situations is pretty much emulation. Some emulators have code/data loggers (CDLs) that mark every executed byte as an instruction and every read byte as data for the purpose of disassembly. A naive approach to disassembling then, would be to disassemble everything as code, then leave it up to a person to go back and mark the data manually. Disassembling code is the most tedius part, so this isn’t a bad approach.

    In the 65C816 instruction set, several instructions assemble to different lengths depending on whether or not a bit is currently set or reset in the processor flag P register. For example, the sequence C9 00 F0 48 could be CMP.W #$F000 : PHA or CMP.B #$00 : BEQ +72 depending on if the accumulator size flag M is 0 or 1. You could guess, but if you’re wrong, the next however many instructions may be incorrect due to treating operands (#$F0) as opcodes (BEQ). This is known as desynching. So now you need context just to be able to disassemble code too.

    Now for the most part, you can get away with just disassembling instructions as you hit them, following jumps and branches, and only keeping track of the M and X flags to make sure the special instructions are disassmbled properly. But more likely than not there will be some jump instructions that depend on values in RAM. Keeping track of all RAM just to get those effective addresses would be silly–again, it would basically be emulation at that point. You’ll need to manually determine the set of jumps possible, and start new disassmble points from each of those values. Don’t forget to carry over those M and X flags!

    Things get more complicated if you want to determine the effective address of an instruction. Instructions like LDA.L $038CDA,X have the effective address right in the instruction ($038CDA). But most instructions look something like STA.B $03. The full effective address needs to be deduced from the data bank and direct page registers. Better keep track of those too!

    So to take all of this into consideration, DiztinGUIsh tries to make the manual parts of disassembling accurately as speedy as possible, while still automating the easy parts. The goal of this software is to produce an accurate, clean disassembly of SNES games in a timely manner. Any time things look dicey, or if it becomes impossible to disassemble something accurately, the program will pause and wait for more input. Of course, there are options to go ham and just ignore warnings, but proceed at your own risk!

    Features

    Implemented or currently in progress:

    • Manual and Auto Stepping
    • Stepping into a branch or call
    • Goto effective address
    • Goto first or nearby unreached data
    • Marking data types (plain data, graphics, pointers, etc.)
    • Tracking M & X flags, Data Bank & Direct Page registers
    • Producing a customizable output file that assembles with asar

    Planned stuff:

    • SPC700 & SuperFX architechtures
    • Merging multiple project files together
    • Better labelling effective addresses that aren’t ROM
    • Programmable data viewer to locate graphics easily
    • Setting a “base” per instruction for relocateable code
    • Option to put large data blocks into separate .bin files intead of in the .asm
    • Scripting engine & API

    “Distinguish” but with a ‘z’ because it’s rad. It’s also a GUI application so might as well highlight that fact.”

    Other tips

    • On Win11, if you have DPI or screen issues (fonts messed up or bunched up or too small): Right click on Diztinguish.exe, Compatibility, change high dpi settings, Override high DPI scaling behavior and select Scaling performed by System.
    Visit original content creator repository https://github.com/IsoFrieze/DiztinGUIsh
  • pycep

    PyCEP

    Consulta CEPs em vários serviços (Correios, ViaCep, OpenCep) de maneira totalmente assíncrona

    Comece por aqui

    Nesta seção você encontrará instruções de como instalar o pacote e também encontrará exemplos de uso

    Requerimentos

    Esse projeto é compatível com as versões 3.10, 3.11 e 3.12 do python no momento. A compatibilização com versões anteriores está prevista, e qualquer contribuição é bem vinda.

    Instalação

    PIP

    pip install pycep
    
    Poetry

    poetry add pycep
    

    Fazendo uma consulta

    Tenha em mente que a lib vai retornar o serviço que responder mais rápido

    from pycep import Cep
    
    cep = Cep("75140070")

    Acessando os dados da consulta

    Você pode usar os atributos listados abaixo para acessar os dados do Cep:

    from pycep import Cep
    
    cep = Cep("75140070")
    
    print(cep.number) # 75140070
    print(cep.state) # GO
    print(cep.city) # Anápolis
    print(cep.street) # Rua Senador Mardocheu Diniz
    print(cep.district) # Dom Pedro II
    print(cep.query_service) #CorreiosService
    print(cep.status) # query_done

    Você também pode converter os dados para dict

    from pycep import Cep
    
    cep = Cep("75140070")
    print(dict(cep))
    
    {
     'street': 'Rua Senador Mardocheu Diniz',
     'district': 'Dom Pedro II',
     'city': 'Anápolis',
     'state': 'GO',
     'cep': '75140070',
     'provider': 'CorreiosService'
     }

    Este projeto utiliza

    • HttpX – Adapter padrão para requisições HTTP
    • AioHTTP – Adapter alternativo para requisições HTTP
    • Poetry – Gerenciamento de dependências e publicação
    • Pytest – Testes automatizados

    Autor

    • Erick DuarteImplementação inicialerickod

    Licença

    O projeto está disponível através da licença MIT – Consulte o arquivo LICENSE.md para mais detalhes.

    Visit original content creator repository
    https://github.com/erickod/pycep

  • Scraping

    Visit original content creator repository
    https://github.com/silventesa/Scraping

  • Airbnb_Analysis

    Airbnb_Analysis

    Problem Statement

    This project involves the analysis of Airbnb data using MongoDB Atlas, focusing on data cleaning, geospatial visualization, and dynamic plotting. The primary goals are to establish a MongoDB connection, prepare the data, develop a Streamlit web application with interactive maps, perform price analysis, explore availability patterns, investigate location-based insights, and create a comprehensive dashboard. The key objectives include:

    1. MongoDB Data Retrieval: Connect to MongoDB Atlas, retrieve the Airbnb dataset, and ensure efficient data extraction for analysis.

    2. Data Cleaning and Preparation: Clean and preprocess the dataset, addressing issues such as missing values, duplicates, and data type conversions for accurate analysis.

    3. Interactive Web Application: Develop a Streamlit web application featuring interactive maps that display the distribution of Airbnb listings. Users can explore prices, ratings, and other relevant factors.

    4. Price Analysis and Visualization: Conduct price analysis and visualize variations based on location, property type, and seasons using dynamic plots and charts.

    5. Availability Pattern Analysis: Analyze availability patterns across seasons, visualizing occupancy rates and demand fluctuations through suitable visualizations.

    6. Location-Based Insights: Investigate location-based insights by extracting and visualizing data for specific regions or neighborhoods.

    7. Interactive Visualizations: Create interactive visualizations that allow users to filter and drill down into the data, gaining deeper insights.

    8. Comprehensive Dashboard: Build a comprehensive dashboard using tools like Tableau or Power BI, combining various visualizations to present key insights derived from the analysis.

    In summary, this project aims to leverage MongoDB Atlas and Streamlit to analyze Airbnb data, providing valuable insights into pricing, availability, and location-based trends. The ultimate goal is to create an interactive and informative dashboard that facilitates data exploration and decision-making for Airbnb hosts and users.

    Aim

    The primary aim of this project is to analyze Airbnb data effectively, utilizing MongoDB Atlas for data storage and retrieval. Key objectives include data cleaning, development of interactive geospatial visualizations, and the creation of dynamic plots to uncover insights regarding pricing variations, availability patterns, and location-based trends.
    The project’s specific goals are:

    • Establish a robust connection to MongoDB Atlas and retrieve the Airbnb dataset efficiently.

    • Perform comprehensive data cleaning and preparation, addressing issues like missing data, duplicates, and data type conversions for accurate analysis.

    • Develop an engaging Streamlit web application that features interactive maps, enabling users to explore Airbnb listing distribution, including prices, ratings, and other relevant attributes.

    • Conduct detailed price analysis and visualization, uncovering insights related to location, property types, and seasonal variations. Dynamic plots and charts will be utilized for clear presentation.

    • Analyze availability patterns across different seasons, visualizing occupancy rates and demand fluctuations using appropriate visualizations.

    • Investigate location-specific insights by extracting and visualizing data for particular regions or neighborhoods, enhancing geographical understanding.

    • Create interactive visualizations that empower users to filter and delve deeper into the data, facilitating a more personalized exploration.

    • Construct a comprehensive and informative dashboard, leveraging tools like Tableau or Power BI. This dashboard will consolidate various visualizations and key findings, offering a holistic view of the Airbnb data analysis.

    Requirements

    1. MongoDB Atlas Setup: Establish a connection to MongoDB Atlas, configure the database environment, and ensure seamless data retrieval.

    2. Data Retrieval: Retrieve the Airbnb dataset from MongoDB Atlas, ensuring efficient and optimized data extraction.

    3. Data Cleaning and Preparation: Implement data cleaning procedures to handle missing values, duplicates, and perform necessary data type conversions. Prepare the dataset for accurate analysis.

    4. Streamlit Web Application: Develop a Streamlit web application that includes interactive maps. The application should allow users to explore the distribution of Airbnb listings, including details such as prices, ratings, and other relevant factors.

    5. Price Analysis: Perform in-depth price analysis and visualization. Explore price variations based on location, property type, and seasons. Create dynamic plots and charts to present these insights.

    6. Availability Pattern Analysis: Analyze availability patterns across different seasons. Visualize occupancy rates and fluctuations in demand using appropriate visualizations.

    7. Location-Based Insights: Investigate location-based insights by extracting data for specific regions or neighborhoods. Visualize this data to provide location-specific information.

    8. Interactive Visualizations: Create interactive visualizations that empower users to filter and drill down into the data, enabling deeper exploration.

    9. Comprehensive Dashboard: Develop a comprehensive dashboard using tools such as Tableau or Power BI. The dashboard should combine various visualizations and insights derived from the analysis to present a holistic view of the data.

    Workflow

    Workflow for Airbnb Data Analysis Project:

    1. Data Retrieval and MongoDB Connection:

      • Establish a connection to MongoDB Atlas.
      • Retrieve the Airbnb dataset efficiently.
    2. Data Cleaning and Preparation:

      • Identify and handle missing values, ensuring data completeness.
      • Address duplicates in the dataset.
      • Perform necessary data type conversions for accurate analysis.
    3. Streamlit Web Application Development:

      • Create a Streamlit web application to provide an interactive interface for users.
      • Incorporate interactive maps to visualize the distribution of Airbnb listings.
      • Enable users to explore pricing information, ratings, and other relevant factors within the application.
    4. Price Analysis and Visualization:

      • Utilize dynamic plots and charts to conduct price analysis.
      • Explore pricing variations based on location, property types, and seasonal trends.
      • Visualize insights related to price dynamics for enhanced understanding.
    5. Availability Patterns Analysis:

      • Investigate availability patterns across different seasons.
      • Create visualizations to showcase occupancy rates and demand fluctuations.
      • Use suitable visualizations to present availability insights effectively.
    6. Location-Based Insights:

      • Extract and visualize data for specific regions or neighborhoods.
      • Provide location-specific insights to enhance geographical understanding.
    7. Interactive Visualizations:

      • Develop interactive visualizations that allow users to filter and drill down into the data.
      • Enable users to personalize their exploration and extract specific insights of interest.
    8. Comprehensive Dashboard Creation:

      • Build a comprehensive dashboard using tools like Tableau or Power BI.
      • Combine various visualizations, including price analysis, availability patterns, and location-based insights, into a single informative dashboard.
      • Present key findings and trends from the analysis in an accessible and consolidated format.

    By following this workflow, the project aims to leverage MongoDB Atlas and advanced visualization techniques to gain valuable insights into Airbnb data, benefiting both hosts and travelers in the vacation rental market.

    Conclusion

    In conclusion, this project successfully harnessed the power of data analysis and visualization techniques to extract valuable insights from Airbnb data. Through the establishment of a MongoDB connection and meticulous data cleaning, the foundation for accurate analysis was laid. The development of a user-friendly Streamlit web application empowered users to explore Airbnb listings with ease, and interactive geospatial visualizations provided a comprehensive view of pricing, ratings, and other crucial factors.

    Price analysis and visualization revealed intricate patterns based on location, property type, and seasons, enabling informed decision-making. Analysis of availability patterns shed light on occupancy rates and demand fluctuations, contributing to a better understanding of the market dynamics.

    Location-based insights extracted and visualized data for specific regions, offering a localized perspective on Airbnb trends. The creation of interactive visualizations allowed users to tailor their exploration and extract specific details from the dataset.

    The project’s pinnacle achievement was the construction of a comprehensive dashboard using Tableau or Power BI, consolidating various visualizations into a unified platform. This dashboard served as a valuable resource for presenting key findings and trends, facilitating data-driven decision-making for hosts and travelers in the vacation rental market.

    Ultimately, this project exemplified the power of data analysis and visualization in uncovering meaningful insights within a dynamic and ever-evolving market like Airbnb.

    Visit original content creator repository
    https://github.com/Go7bi/Airbnb_Analysis

  • Line-Follower-Car-Robot

    Line Follower Car Robot


    video.mp4


    A simple line-following robot built using an Arduino Uno, L298N motor driver, and IR sensors. The robot follows a black line on a white surface.

    Components

    • Arduino Uno
    • L298N motor driver
    • 4 TT motors with wheels
    • 2 IR sensors
    • Battery pack
    • Jumper wires

    How It Works

    The Line-Follower Car Robot uses infrared (IR) sensors to detect and follow a black line on a white surface. Here’s a brief explanation of the working mechanism:

    1. Sensors Detection: The IR sensors are placed at the front of the robot. These sensors detect the black line by measuring the reflected infrared light. When a sensor is over the black line, it detects a lower amount of reflected light and sends a LOW signal to the Arduino. When the sensor is over the white surface, it detects a higher amount of reflected light and sends a HIGH signal.

    2. Signal Processing: The Arduino processes these signals to determine the position of the line relative to the robot. If both sensors detect the white surface, the robot moves forward. If the left sensor detects the black line, the robot turns left. If the right sensor detects the black line, the robot turns right.

    3. Motor Control: Based on the processed signals, the Arduino sends commands to the L298N motor driver to control the motors. This allows the robot to adjust its direction and follow the line accurately.

    Creators

    • Fares Mohamed Elshahat Mahmoud
    • Amr Mohy Mohamed Yousef
    • Asmaa Mohamed Hamed Ibrahim
    • Ashrakat Samaha Elsayed Goda
    • Asmaa Mohamed Mohamed Elsayed
    • Gehad Basiouny Elsayed Basiouny

    Visit original content creator repository
    https://github.com/FaresM0hamed/Line-Follower-Car-Robot

  • tour-planning

    Open in GitHub Codespaces

    Tour Planning

    A demonstration of using hard and soft constraints on the Leap™ quantum-classical hybrid constrained quadratic model (CQM) solver.

    This example solves a problem of selecting, for a tour divided into several legs of varying lengths and steepness, a combination of locomotion modes (walking, cycling, bussing, and driving) such that one can gain the greatest benefit of outdoor exercise while not exceeding one’s budgeted cost and time.

    Example Solution

    The techniques used in this example are applicable to commercial problems such as traffic routing—selecting the optimal among available means of transportation, for commuters or deliveries, given constraints of pricing, speed, convenience, and green-energy preferences—or network routing, where the routing of data packets must consider bandwidth, pricing, reliance, service tiers, and latency across numerous hops.

    Hard and Soft Constraints

    Constraints for optimization problems are often categorized as either “hard” or “soft”.

    Any hard constraint must be satisfied for a solution of the problem to qualify as feasible. Soft constraints may be violated to achieve an overall good solution.

    By setting appropriate weights to soft constraints in comparison to the objective and to other soft constraints, you can express the relative importance of such constraints. Soft constraints on binary variables can be one of two types:

    • linear: the penalty for violating such a constraint is proportional to the value of the violation (i.e., by how much the constraint is violated).
    • quadratic: the penalty for violating such a constraint is proportional to the square of the value of the violation.

    For example, for a soft constraint on the tour’s maximum cost, with a price of 3 for driving, preferring to drive over free locomotion on a leg of length 2 adds a penalty of 6 or 36, for a linear or quadratic constraint, respectively, that goes up, for a leg length of 3, to 9 and 81, respectively. Such a quadratic constraint severely discourages driving on longer legs.

    This example enables you to set hard or soft constraints on the tour’s cost, its duration, and the steepest leg one can walk or cycle. The CQM has hard constraints that ensure a single mode of locomotion is selected for each leg and, optionally, prevent driving on legs with toll booths.

    Example Results

    Some of the variety of results you can obtain from the application of these constraints are shown below for an example tour1.

    1. All constraints are hard. For this case, acceptable solutions must satisfy all constraints, and the solver was unable to find a feasible solution.

    Example All Hard Constraints

    2. Constraints on cost and time are relaxed to soft constraints. The solver tries to satisfy such constraints but accepts solutions that violate one or more. Now the solver returns a solution. However, it provides little exercise because cycling is not allowed on legs even slightly steeper than the configured maximum.

    Example Slope Hard Constraint

    3. Constraint on slope is also relaxed. Now the returned solution is to cycle on all but the steepest slopes, gaining exercise by tolerating a wide margin of violations of the slope constraint.

    Example All Soft Linear Constraints

    4. Soft constraint on slope is set to quadratic. Now the solver discriminates sharply between slopes that are just a bit over the configured maximum and those significantly too steep. The returned solution allows for cycling on legs that violate the slope constraint by a narrow margin.

    Example Slope Quadratic Constraint

    In general, the use of soft constraints can result in imperfect but good solutions to many optimization problems: for example, in three-dimensional bin packing, which addresses problems in areas such as containers, pallets and aircraft, boxes should be fully supported to ensure stability; however, satisfying such a hard constraint might not be possible due to a variety of box sizes or bin size. Using a soft constraint that enables solutions with 70% support might be acceptable. Another example is job shop scheduling, where jobs should complete on time. If this constraint cannot be met due to conflicting constraints, a soft constraint that penalizes delays by length might return good solutions.

    Installation

    You can run this example without installation in cloud-based IDEs that support the Development Containers specification (aka “devcontainers”).

    For development environments that do not support devcontainers, install requirements:

    pip install -r requirements.txt
    

    If you are cloning the repo to your local system, working in a virtual environment is recommended.

    Usage

    Your development environment should be configured to access Leap’s Solvers. You can see information about supported IDEs and authorizing access to your Leap account here.

    To run the demo:

    python app.py

    Access the user interface with your browser at http://127.0.0.1:8050/.

    The demo program opens an interface where you can configure tour problems, submit these problems to a CQM solver, and examine the results.

    Hover over an input field to see a description of the input and its range of supported values.

    Configuring the Tour

    The upper-left section of the user interface lets you configure the tour’s legs: how many, how long, and the maximum elevation gain. Additionally, you can configure your budgets of cost and time for the entire tour: modes of locomotion vary in price and speed. For example, walking is free but slower than driving. Finally, you can chose whether or not to add tollbooths randomly to 20% of the legs.

    Leg lengths are set to a uniform random value between your configured minimum and maximum values. Steepness is set uniformly at random between zero and ten.

    A leg’s steepness affects exercising: a constraint is set to discourage (soft constraint) or disallow (hard constraint) walking or cycling on those legs that exceed the maximum slope you configured.

    When you update a tour’s legs, toll booths may be placed at random on some of the legs (each leg has a 20% probability that it is given a tollbooth). These affect driving in a private car (but not bussing): the generated CQM has a hard constraint to not drive on legs with toll booths. This constraint is optional.

    Configuring the Constraints

    The upper-middle section of the user interface lets you tune the constraints on cost, time, and steepness.

    You can select whether to use hard or soft constraints, and for soft constraints, you can set weights and chose between linear or quadratic penalties.

    Submitting the Problem for Solution

    The upper-right section of the user interface lets you submit your problem to a Leap hybrid CQM solver. The default solver runtime of 5 seconds is used unless you choose to increase it.

    Problem Details and Solutions

    The lower section’s following tabs contain information about the problem and any found solutions.

    • Graph: displays the configured problem and any found solutions in three ways:

      • Space: displays relative leg lengths, steepness as a color heatmap, and toll booths as icons above the colored bar representing the tour. Modes of locomotion for the best solution found are displayed as icons below it.
      • Time: displays relative leg duration and, for the best found solution, the cost per leg as a color heatmap.
      • Feasibility: displays feasible and non-feasible solutions in a three-dimensional plot of exercise, cost, and time.
    • Problem: displays the legs of the tour (length, slope, and toll booths), formatted for reading and for copying into your code.

    • Solutions: displays the returned solutions, formatted for reading and as a dimod sampleset for copying into your code.

    • CQM: displays the constrained quadratic model generated for your configured tour and constraints. A good way to learn about the construction of a CQM, is to begin with a minimal problem (a single mode of locomotion, one leg, no tollbooths), study the simple CQM, and watch it change as you increase the problem’s complexity.

    • Locomotion: contains information about your configured tour, such as the minimum, maximum, and average values of cost and time, and the values for the available modes of locomotion (speed, cost, exercise) that you can configure.

    Model Overview

    The problem of selecting a mode of locomotion for every leg of the tour to achieve some objective (maximize exercise) given a number of constraints (e.g., do not overpay) can be modeled as an optimization problem with decisions that could either be true or false: for any leg, should one drive? Should one walk?

    This model uses up to four binary variables for each leg of the tour, each one representing whether one particular mode of locomotion is used or not. For example, leg number 5 might have the following binary variables and values in one solution:

    Binary Variable Represents Value in a Particular Solution
    walk_5 Walk leg 5 False
    cycle_5 Cycle leg 5 True
    bus_5 Bus leg 5 False
    drive_5 Drive leg 5 False

    In the solution above, cycling is the mode of locomotion selected for leg 5.

    The CQM is built as follows with a single objective and several constraints:

    • Objective: Maximize Exercise

      To maximize exercise on the tour, the CQM objective is to minimize the negative summation of values of exercise set for each locomotion mode across all the tour’s legs.

      eq_exercise

      The terms above are as follows:

      eq_exercise_terms

      Because a single mode of locomotion is selected for each leg (as explained below), all the products but one are zeroed by the binary variables of that leg. For example, in leg 5 in the solution above, the leg length is multiplied by its slope and the exercise value of cycling because, for this leg, the binary variable representing cycling is the only non-zero variable.

    • Constraint 1: Cost

      To discourage or prevent the tour’s cost from exceeding your preferred budget, the CQM sets a constraint that the total cost over all legs is less or equal to your configured cost. It does this by minimizing the summation of leg lengths multiplied by the cost value of locomotion mode for the leg. This can be a hard or soft constraint.

      eq_cost

      eq_cost_terms

      Again, for each leg the only non-zero product has the binary variable representing the selected locomotion mode.

    • Constraint 2: Time

      To discourage or prevent the tour’s duration from exceeding your configured value, the CQM sets a constraint similar to that on cost but with the leg length divided by the value of speed for each mode of locomotion. This can be a hard or soft constraint.

      eq_time

      eq_time_terms

    • Constraint 3: Steep Legs

      To discourage or prevent the selection of exercising on legs where the slope is steeper than your configured maximum, the CQM sets a constraint that for each leg the binary variables representing walking and cycling multiplied by the slope be less or equal to your configured highest slope. This can be a hard or soft constraint.

    • Constraint 4: Single Mode of Locomotion Per Leg

      To ensure a single mode of locomotion is selected for each leg, the sum of the binary variables representing each leg must equal one (a “one-hot” constraint). This is a hard constraint.

      eq_one_hot

    • Constraint 5: Toll Booths

      This optional constraint prevents driving on legs with toll booths. If you choose to enable the placement of tollbooths on some legs (tollbooths may be placed at random on a leg with 20% probability), the CQM sets a constraint that the binary variable representing driving be zero for any leg with a toll booth. This is a hard constraint.

      eq_toll

    Code

    Most the code related to configuring the CQM is in the tour_planning.py file. The remaining files mostly support the user interface.


    Note: Standard practice for submitting problems to Leap solvers is to use a dwave-system sampler; for example, you typically use LeapHybridCQMSampler for CQM problems. The code in this example uses the dwave-cloud-client, which enables finer control over communications with the Solver API (SAPI).

    If you are learning to submit problems to Leap solvers, use a dwave-system solver, with its higher level of abstraction and thus greater simplicity, as demonstrated in most the code examples of the example collection and in the documentation.


    License

    Released under the Apache License 2.0. See LICENSE file.

    Footnotes

    1. The tour comprises 20 legs of equal length, 2, with budgeted cost of 150 and duration of 5, and a steepest leg for exercising of 2. Cycling (speed 3, cost 2) and bussing (speed 5, cost 4) are the available modes of locomotion. For soft constraints, weights are set to 5.

    Visit original content creator repository https://github.com/dwave-examples/tour-planning
  • tour-planning

    Open in GitHub Codespaces

    Tour Planning

    A demonstration of using hard and soft constraints on the Leap™ quantum-classical hybrid constrained quadratic model (CQM) solver.

    This example solves a problem of selecting, for a tour divided into several legs of varying lengths and steepness, a combination of locomotion modes (walking, cycling, bussing, and driving) such that one can gain the greatest benefit of outdoor exercise while not exceeding one’s budgeted cost and time.

    Example Solution

    The techniques used in this example are applicable to commercial problems such as traffic routing—selecting the optimal among available means of transportation, for commuters or deliveries, given constraints of pricing, speed, convenience, and green-energy preferences—or network routing, where the routing of data packets must consider bandwidth, pricing, reliance, service tiers, and latency across numerous hops.

    Hard and Soft Constraints

    Constraints for optimization problems are often categorized as either “hard” or “soft”.

    Any hard constraint must be satisfied for a solution of the problem to qualify as feasible. Soft constraints may be violated to achieve an overall good solution.

    By setting appropriate weights to soft constraints in comparison to the objective and to other soft constraints, you can express the relative importance of such constraints. Soft constraints on binary variables can be one of two types:

    • linear: the penalty for violating such a constraint is proportional to the value of the violation (i.e., by how much the constraint is violated).
    • quadratic: the penalty for violating such a constraint is proportional to the square of the value of the violation.

    For example, for a soft constraint on the tour’s maximum cost, with a price of 3 for driving, preferring to drive over free locomotion on a leg of length 2 adds a penalty of 6 or 36, for a linear or quadratic constraint, respectively, that goes up, for a leg length of 3, to 9 and 81, respectively. Such a quadratic constraint severely discourages driving on longer legs.

    This example enables you to set hard or soft constraints on the tour’s cost, its duration, and the steepest leg one can walk or cycle. The CQM has hard constraints that ensure a single mode of locomotion is selected for each leg and, optionally, prevent driving on legs with toll booths.

    Example Results

    Some of the variety of results you can obtain from the application of these constraints are shown below for an example tour1.

    1. All constraints are hard. For this case, acceptable solutions must satisfy all constraints, and the solver was unable to find a feasible solution.

    Example All Hard Constraints

    2. Constraints on cost and time are relaxed to soft constraints. The solver tries to satisfy such constraints but accepts solutions that violate one or more. Now the solver returns a solution. However, it provides little exercise because cycling is not allowed on legs even slightly steeper than the configured maximum.

    Example Slope Hard Constraint

    3. Constraint on slope is also relaxed. Now the returned solution is to cycle on all but the steepest slopes, gaining exercise by tolerating a wide margin of violations of the slope constraint.

    Example All Soft Linear Constraints

    4. Soft constraint on slope is set to quadratic. Now the solver discriminates sharply between slopes that are just a bit over the configured maximum and those significantly too steep. The returned solution allows for cycling on legs that violate the slope constraint by a narrow margin.

    Example Slope Quadratic Constraint

    In general, the use of soft constraints can result in imperfect but good solutions to many optimization problems: for example, in three-dimensional bin packing, which addresses problems in areas such as containers, pallets and aircraft, boxes should be fully supported to ensure stability; however, satisfying such a hard constraint might not be possible due to a variety of box sizes or bin size. Using a soft constraint that enables solutions with 70% support might be acceptable. Another example is job shop scheduling, where jobs should complete on time. If this constraint cannot be met due to conflicting constraints, a soft constraint that penalizes delays by length might return good solutions.

    Installation

    You can run this example without installation in cloud-based IDEs that support the Development Containers specification (aka “devcontainers”).

    For development environments that do not support devcontainers, install requirements:

    pip install -r requirements.txt
    

    If you are cloning the repo to your local system, working in a virtual environment is recommended.

    Usage

    Your development environment should be configured to access Leap’s Solvers. You can see information about supported IDEs and authorizing access to your Leap account here.

    To run the demo:

    python app.py

    Access the user interface with your browser at http://127.0.0.1:8050/.

    The demo program opens an interface where you can configure tour problems, submit these problems to a CQM solver, and examine the results.

    Hover over an input field to see a description of the input and its range of supported values.

    Configuring the Tour

    The upper-left section of the user interface lets you configure the tour’s legs: how many, how long, and the maximum elevation gain. Additionally, you can configure your budgets of cost and time for the entire tour: modes of locomotion vary in price and speed. For example, walking is free but slower than driving. Finally, you can chose whether or not to add tollbooths randomly to 20% of the legs.

    Leg lengths are set to a uniform random value between your configured minimum and maximum values. Steepness is set uniformly at random between zero and ten.

    A leg’s steepness affects exercising: a constraint is set to discourage (soft constraint) or disallow (hard constraint) walking or cycling on those legs that exceed the maximum slope you configured.

    When you update a tour’s legs, toll booths may be placed at random on some of the legs (each leg has a 20% probability that it is given a tollbooth). These affect driving in a private car (but not bussing): the generated CQM has a hard constraint to not drive on legs with toll booths. This constraint is optional.

    Configuring the Constraints

    The upper-middle section of the user interface lets you tune the constraints on cost, time, and steepness.

    You can select whether to use hard or soft constraints, and for soft constraints, you can set weights and chose between linear or quadratic penalties.

    Submitting the Problem for Solution

    The upper-right section of the user interface lets you submit your problem to a Leap hybrid CQM solver. The default solver runtime of 5 seconds is used unless you choose to increase it.

    Problem Details and Solutions

    The lower section’s following tabs contain information about the problem and any found solutions.

    • Graph: displays the configured problem and any found solutions in three ways:

      • Space: displays relative leg lengths, steepness as a color heatmap, and toll booths as icons above the colored bar representing the tour. Modes of locomotion for the best solution found are displayed as icons below it.
      • Time: displays relative leg duration and, for the best found solution, the cost per leg as a color heatmap.
      • Feasibility: displays feasible and non-feasible solutions in a three-dimensional plot of exercise, cost, and time.
    • Problem: displays the legs of the tour (length, slope, and toll booths), formatted for reading and for copying into your code.

    • Solutions: displays the returned solutions, formatted for reading and as a dimod sampleset for copying into your code.

    • CQM: displays the constrained quadratic model generated for your configured tour and constraints. A good way to learn about the construction of a CQM, is to begin with a minimal problem (a single mode of locomotion, one leg, no tollbooths), study the simple CQM, and watch it change as you increase the problem’s complexity.

    • Locomotion: contains information about your configured tour, such as the minimum, maximum, and average values of cost and time, and the values for the available modes of locomotion (speed, cost, exercise) that you can configure.

    Model Overview

    The problem of selecting a mode of locomotion for every leg of the tour to achieve some objective (maximize exercise) given a number of constraints (e.g., do not overpay) can be modeled as an optimization problem with decisions that could either be true or false: for any leg, should one drive? Should one walk?

    This model uses up to four binary variables for each leg of the tour, each one representing whether one particular mode of locomotion is used or not. For example, leg number 5 might have the following binary variables and values in one solution:

    Binary Variable Represents Value in a Particular Solution
    walk_5 Walk leg 5 False
    cycle_5 Cycle leg 5 True
    bus_5 Bus leg 5 False
    drive_5 Drive leg 5 False

    In the solution above, cycling is the mode of locomotion selected for leg 5.

    The CQM is built as follows with a single objective and several constraints:

    • Objective: Maximize Exercise

      To maximize exercise on the tour, the CQM objective is to minimize the negative summation of values of exercise set for each locomotion mode across all the tour’s legs.

      eq_exercise

      The terms above are as follows:

      eq_exercise_terms

      Because a single mode of locomotion is selected for each leg (as explained below), all the products but one are zeroed by the binary variables of that leg. For example, in leg 5 in the solution above, the leg length is multiplied by its slope and the exercise value of cycling because, for this leg, the binary variable representing cycling is the only non-zero variable.

    • Constraint 1: Cost

      To discourage or prevent the tour’s cost from exceeding your preferred budget, the CQM sets a constraint that the total cost over all legs is less or equal to your configured cost. It does this by minimizing the summation of leg lengths multiplied by the cost value of locomotion mode for the leg. This can be a hard or soft constraint.

      eq_cost

      eq_cost_terms

      Again, for each leg the only non-zero product has the binary variable representing the selected locomotion mode.

    • Constraint 2: Time

      To discourage or prevent the tour’s duration from exceeding your configured value, the CQM sets a constraint similar to that on cost but with the leg length divided by the value of speed for each mode of locomotion. This can be a hard or soft constraint.

      eq_time

      eq_time_terms

    • Constraint 3: Steep Legs

      To discourage or prevent the selection of exercising on legs where the slope is steeper than your configured maximum, the CQM sets a constraint that for each leg the binary variables representing walking and cycling multiplied by the slope be less or equal to your configured highest slope. This can be a hard or soft constraint.

    • Constraint 4: Single Mode of Locomotion Per Leg

      To ensure a single mode of locomotion is selected for each leg, the sum of the binary variables representing each leg must equal one (a “one-hot” constraint). This is a hard constraint.

      eq_one_hot

    • Constraint 5: Toll Booths

      This optional constraint prevents driving on legs with toll booths. If you choose to enable the placement of tollbooths on some legs (tollbooths may be placed at random on a leg with 20% probability), the CQM sets a constraint that the binary variable representing driving be zero for any leg with a toll booth. This is a hard constraint.

      eq_toll

    Code

    Most the code related to configuring the CQM is in the tour_planning.py file. The remaining files mostly support the user interface.


    Note: Standard practice for submitting problems to Leap solvers is to use a dwave-system sampler; for example, you typically use LeapHybridCQMSampler for CQM problems. The code in this example uses the dwave-cloud-client, which enables finer control over communications with the Solver API (SAPI).

    If you are learning to submit problems to Leap solvers, use a dwave-system solver, with its higher level of abstraction and thus greater simplicity, as demonstrated in most the code examples of the example collection and in the documentation.


    License

    Released under the Apache License 2.0. See LICENSE file.

    Footnotes

    1. The tour comprises 20 legs of equal length, 2, with budgeted cost of 150 and duration of 5, and a steepest leg for exercising of 2. Cycling (speed 3, cost 2) and bussing (speed 5, cost 4) are the available modes of locomotion. For soft constraints, weights are set to 5.

    Visit original content creator repository https://github.com/dwave-examples/tour-planning
  • prediction-using-netezza-in-database-analytics-functions

    Data analytics and prediction using Netezza Performance Server

    In this code pattern, we will learn about how users and developers interested in leveraging the development and use of analytic algorithms to perform research or other business-related activities using Netezza Performance Server. Netezza a.k.a. Netezza or INZA, enables data mining tasks on large data sets using the computational power and parallelization mechanisms provided by the Netezza appliance. The parallel architecture of the Netezza database environment enables high-performance computation on large data sets, making it the ideal platform for largescale data mining applications.

    Netezza has in-database Analytics packages for mining the spectrum of data set sizes. IBM Netezza In-Database Analytics is a data mining application that includes many of the key techniques and popular real-world algorithms used with data sets.

    In this code pattern, we will load Jupyter notebook using IBM Cloud Pak for Data (CP4D) platform. The notebook has steps to connect to Netezza and use In-Database analytic functions to analyze the data and also run machine learning algorithms which allows you to predict and forecast data. In order to access analytical functions of Netezza, you should install INZA module into the Netezza server. All of the analytical functions are under INZA schema AND NZA database.

    In this code pattern, we will be using energy price dataset and analyze the data using Jupyter Notebook using IBM Cloud Pak for Data (CP4D) platform. We will walk you through step by step on:

    1. Analyzing data using Netezza In-Database analytic functions.
    2. Creating machine learning models using Netezza In-Database machine learning algorithms.

    Flow

    Architecture

    1. User loads Jupyter notebook to IBM Cloud Pak for Data.
    2. User connect to Netezza using NZPY connector.
    3. User loads and analyzes data from Netezza Performance Server.
    4. Netezza creates models using in-database analytics functions.
    5. User forecasts and predicts energy price using the model.

    Included components

    • Netezza Performance Server: IBM Netezza® Performance Server for IBM Cloud Pak® for Data is an advanced data warehouse and analytics platform available both on premises and on cloud.
    • IBM Cloud Pak for Data Platform : IBM Cloud Pak® for Data is a fully-integrated data and AI platform that modernizes how businesses collect, organize and analyze data to infuse AI throughout their organizations.
    • Jupyter Notebook: An open-source web application that allows you to create and share documents that contain live code, equations, visualizations and narrative text.

    Steps

    1. Clone the repo
    2. Create a new project in CP4D
    3. Add connection to Netezza server
    4. Upload data assets
    5. Load notebook to your project
    6. Install NZPY
    7. Configure NPS connection in notebook
    8. Load data to Netezza
    9. Visualize energy price data
    10. Analyze energy price data
    11. Create machine learning model using timeseries algorithm

    1. Clone the repo

    git clone https://github.com/IBM/prediction-using-netezza-in-database-analytics-functions.git
    

    2. Create a new project in CP4D

    • Log into IBM Cloud Pak for Data and create a new project, by selecting Projects from hamburger menu and clicking New Project +.

    Create new project

    Then, choose Analytics project, and select Create empty project, provide the project Name and click Create.

    Analytics Project

    Project details

    Project created

    3. Add connection to Netezza server

    • From the project page select, Add to project +, choose Connection

    Add connection

    • In the next screen, choose From Global tab NPS for pure analytics

    NPS selection

    • Fill out the connection details, Test the connection and if it is successful, click Create.

    NOTE: for database you can use system for now. We will be creating our own database and using that in our notebook.

    connection details

    connection created

    NOTE: Save the name of the connection for later use.

    4. Upload data assets

    Upload energy_price.csv from the cloned repository folder by going to doc/source/data. In the project home page, on the Assets tab, click the data icon, and browse to upload the file. You will have to unzip the data locally first before you upload.

    Upload data assets

    5. Load notebook to your project

    • From the project page, click Add to project +, and select notebook from the options:

    add notebook

    • Select From URL tab and fill in the name and provide the Notebook URL as below, and click Create notebook.
    https://raw.githubusercontent.com/IBM/prediction-using-netezza-in-database-analytics-functions/main/doc/source/notebooks/PredictionUsingINZAfunctions.ipynb

    6. Install NZPY

    Run the cell that contains pip install nzpy which is the only pre-requisite for this notebook. nzpy lets us connect to the server and allow us to run DDL and DML SQLs.

    add notebook

    7. Configure NPS connection in notebook

    • Open the notebook in edit mode, and in the cell with title Connecting to the database, provide the name of the connection that you created earlier in step 2.

    • Run that cell and the cell below and make sure you get the 10 database names. This ensures that we have successfully connected to our remote NPS server.

    OR

    Add the connection detail directly into the notebook by replacing the values of the following in the connection cell.

    # Setup connection and use the credentials from the connection. Replace the following values before you start
    
    # from project_lib import Project
    # project = Project.access()
    # NPS_credentials = project.get_connection(name="NPS")
    
    ## OR
    
    username="<username>"
    password="<password>"
    host="<hostname or ip>"
    database="system"

    add notebook

    8. Load data to Netezza

    We will be loading the energy_price.csv file to Netezza using external table feature of Netezza. First we create the table and load csv file directly to Netezza like below:

    ## initialize cursor
    cursor=con.cursor()
    ## drop table if exists
    table='energy_price'
    cursor.execute(f'drop table {table} if exists')
    
    cursor.execute('''
    CREATE TABLE nzpy_test..energy_price (
        temperature    REAL,
        pressure    REAL,
        humidity    REAL,
        wind_speed    REAL,
        precipitation    REAL,
        price    REAL,
        price_hour    TIMESTAMP
    )
    ''')
    print('Table energy price successfully created')
    ## Load the data to Netezza
    
    with con.cursor() as cursor:
        cursor.execute('''
            insert into nzpy_test..energy_price
                select * from external '/project_data/data_asset/energy_price.csv'
                    using (
                        delim ',' 
                        remotesource 'odbc'
                        )''')
        print(f"{cursor.rowcount} rows inserted")

    9. Visualize energy price data

    In this part of the notebook, we will be exploring the data, datatypes and correlation between different columns with price. You can run the cell on this part step by step. The overall graph group by dates is shown below:

    updDf.groupby('DATES').sum().plot.line().legend(loc='upper left',bbox_to_anchor=(1.05, 1))

    Visualize energy data

    In the above graph, you can see the correlation between temperature, pressure, humidity, wind speed, precipitation with price.

    Similarly, you can see the correlation between individual columns (temperature, pressure, humidity, wind speed, precipitaion) with Price as well.

    Visualize energy data

    10. Analyze energy price data

    In-database analytic functions such as summary1000 and cov lets you analyze your data. It automatically give you statistical analysis of each columns. The summary1000 function gives you statistics like distinct values, average, variance, standard deviation etc. as shown below

    summaryDF = pd.read_sql("CALL nza..SUMMARY1000('intable=ENERGY_PRICE, outtable=PRICE_TEMP_ANALYSIS');", con)
    summaryAnalysisDF = pd.read_sql('select * from PRICE_TEMP_ANALYSIS', con)
    summaryAnalysisDF.head()

    Analyze energy data

    Also you can call nza..COV function to get the covariance. Below code show the relation between temperature and price column.

        # cursor.execute("drop table PRICE_TEMP_ANALYSIS if exists")
        pd.read_sql("CALL nza..DROP_TABLE('PRICE_TEMP_ANALYSIS')",con);
    
        # use the Covariance function, store results in PRICE_TEMP_ANALYSIS
        pd.read_sql("CALL nza..COV('intable=ENERGY_PRICE, incolumn=TEMPERATURE;PRICE,outtable=PRICE_TEMP_ANALYSIS');",con)
        # bring the results table into the notebook - or just query it directly in Netezza
        pd.read_sql('select * from PRICE_TEMP_ANALYSIS', con)

    11. Create machine learning model using timeseries algorithm

    • First we will cleanup the training data set. Since we are using time sereies algorithm, the timestamp column will have to converted to date format to represent each day and use the row id as the unique id.
    # clean up the analysis tables
    pd.read_sql("CALL nza..DROP_TABLE('PRICE_TEMP_NEW')",con);
    # the INZA functions usully need a unique ID for each row of data, we use the internal ROWID for this
    cursor=con.cursor()
    cursor.execute("create table PRICE_TEMP_NEW as select *,DATE(PRICE_HOUR) AS DAY,ROWID as ID from ENERGY_PRICE")
    priceTempNewDf = pd.read_sql('select * from PRICE_TEMP_NEW limit 10', con)

    Clean up

    Now lets create the model using time series algorithm, by calling the nza..timeseries function:

    # drop model if it was already created. Initially you might want to comment this out
    # and run as it throws error if if doesn't find the model
    cursor.execute("CALL nza..DROP_MODEL('model=PRICE_TIME');")
    
    # we now call a timeseries algorithm to create a model, the model name is PRICE_TIME
    pd.read_sql("CALL nza..TIMESERIES('model=PRICE_TIME, intable=ADMIN.PRICE_TEMP_NEW, by=DAY, time=PRICE_HOUR, target=PRICE' );",con)

    Once the query execution is completed, you can check the v_nza_models table to see if the model has been created.

    # we can list our models here
    pd.read_sql("select * from v_nza_models;",con=con)

    Timeseries model

    The NZA_META_<model_name>_FORECAST table holds forecast values. The table contains one line for each time series and point in time for which a forecast has been made, with the following columns. The following function gives the forecasting results applied to the timeseries dataset.

    pd.read_sql("select * from NZA_META_PRICE_TIME_FORECAST;", con=con)

    forecast results

    License

    This code pattern is licensed under the Apache Software License, Version 2. Separate third party code objects invoked within this code pattern are licensed by their respective providers pursuant to their own separate licenses. Contributions are subject to the Developer Certificate of Origin, Version 1.1 (DCO) and the Apache Software License, Version 2.

    Apache Software License (ASL) FAQ

    Visit original content creator repository https://github.com/IBM/prediction-using-netezza-in-database-analytics-functions