Category Archives: Database

Setting Up and Monitoring ElphaPex DG Home1 on PowerPool Using Home Assistant

Introduction

Home Assistant is an excellent tool for monitoring the ElphaPex DG Home1 miner. This post covers my observations on optimizing its setup for maximum efficiency with minimal downtime.

PowerPool Configuration

Setting up PowerPool is straightforward. You need to enter the PowerPool stratum address three times. This works because PowerPool addresses have multiple DNS entries. If one connection fails, the miner automatically connects to the next available address.

Configuration Example

When the first active entry fails, the miner switches to the next, as shown in the ElphaPex logs:

Feb 17 08:49:34 DG-Home1 user.info health: min = 44250 max = 49812
Feb 17 08:49:49 DG-Home1 user.err cpuminer[1682]: [ttyS5][Stratum connection timed out]
Feb 17 08:49:49 DG-Home1 user.notice cpuminer[1682]: [ttyS5][current pool: stratum+tcp://scrypt.stratum.powerpool.io:3333, user: elsonico.dg1, pass: x]
Feb 17 08:49:49 DG-Home1 user.err cpuminer[1682]: [ttyS5][Stratum connection interrupted]
Feb 17 08:49:49 DG-Home1 user.info cpuminer[1682]: [ttyS5][Starting Stratum on stratum+tcp://scrypt.stratum.powerpool.io:3333]

When a connection fails, the previous pool becomes “Unreachable,” the second “Alive,” and the third remains “Standby.”

This setup works with any mining pool that has multiple resolvable IP addresses. You can verify DNS resolution using:

🕙 11:07:57 ❯ nslookup scrypt.stratum.powerpool.io
Server: 192.168.0.210
Address: 192.168.0.210#53
Non-authoritative answer:
scrypt.stratum.powerpool.io canonical name = scrypt.powerpool.cloud.
Name: scrypt.powerpool.cloud
Address: 161.97.162.191
Name: scrypt.powerpool.cloud
Address: 149.102.159.179
Name: scrypt.powerpool.cloud
Address: 75.119.136.155
Name: scrypt.powerpool.cloud
Address: 158.220.80.69

Monitoring

Hash Rate Monitoring

A low hash rate over an extended period indicates a malfunction. Initially, I monitored the hash rate via PowerPool’s API but noticed occasional inaccuracies. To address this, I now monitor the miner’s own reported hash rate alongside PowerPool’s API data.

If the hash rate stays at 0.0 for 15 minutes, Home Assistant automatically reboots the miner using the following automation:

- id: 28077b7f639644688b946f35ccf0127b
  alias: Reboot or Power Cycle Miner
  description: Reboot the miner on low hash rate, fallback to power cycle if needed
  triggers:
  - entity_id: sensor.miner_hashrate_gh
    to: '0.0'
    for:   
      minutes: 15
    trigger: state
  - minutes: /15 
    trigger: time_pattern
  conditions:
  - condition: numeric_state
    entity_id: sensor.miner_hashrate_gh
    below: 0.1
  - condition: numeric_state
    entity_id: sensor.electricity_cost_in_cents_per_kwh
    below: 62
  actions: 
  - target:
      entity_id: switch.shellyplug_s_3ce90ee04b97
    action: switch.turn_off
    data: {}
  - delay: 
      seconds: 10
  - target:
      entity_id: switch.shellyplug_s_3ce90ee04b97
    action: switch.turn_on
    data: {}
  mode: single

Note that I only do this when the electricity price is below 62c/kWh. I will go into details later why that has been set the way it has.

Temperature Monitoring

I monitor various circuit temperatures. If any exceed 55°C, an event triggers notifications via email and WhatsApp, giving me time to intervene before an automatic shutdown.

Example of WhatsApp message when any of them go above 56C:

Above is done with following snippet on my automations.yaml:

- id: miner_whats_app_temperature_alert
  alias: WhatsApp alert when miner temperatures are high
  trigger:
    platform: numeric_state
    entity_id:
    - sensor.miner_outlet_temperature_1
    - sensor.miner_outlet_temperature_2
    - sensor.miner_inlet_temperature_1
    - sensor.miner_inlet_temperature_2
    above: 55
  condition:
  - condition: template
    value_template: "{% set temps = namespace(high=false) %} {% for sensor in [\n
      \    'sensor.miner_outlet_temperature_1', \n     'sensor.miner_outlet_temperature_2',
      \n     'sensor.miner_inlet_temperature_1', \n     'sensor.miner_inlet_temperature_2'\n
      \  ] %}\n  {% if states(sensor) not in ['unknown', 'unavailable'] %}\n    {%
      if states(sensor) | float > 55 %}\n      {% set temps.high = true %}\n    {%
      endif %}\n  {% endif %}\n{% endfor %} {{ temps.high }}\n"
  mode: single
  action:
  - service: persistent_notification.create
    data:
      title: 'Debug: Miner Temperature Alert Triggered'
      message: 'Current temperatures: Outlet 1: {{ states(''sensor.miner_outlet_temperature_1'')
        }}°C, Outlet 2: {{ states(''sensor.miner_outlet_temperature_2'') }}°C, Inlet
        1: {{ states(''sensor.miner_inlet_temperature_1'') }}°C, Inlet 2: {{ states(''sensor.miner_inlet_temperature_2'')
        }}°C'
  - service: notify.email_notify
    data:
      title: Miner Alert - High Temperature
      message: 'High temperature detected on your miner: Outlet 1: {{ states(''sensor.miner_outlet_temperature_1'')
        }}°C, Outlet 2: {{ states(''sensor.miner_outlet_temperature_2'') }}°C, Inlet
        1: {{ states(''sensor.miner_inlet_temperature_1'') }}°C, Inlet 2: {{ states(''sensor.miner_inlet_temperature_2'')
        }}°C. Please check your miner''s cooling system.'
  - service: notify.whatsapp
    data:
      title: Miner Alert - High Temperature
      message: 'High temperature detected on your miner: Outlet 1: {{ states(''sensor.miner_outlet_temperature_1'')
        }}°C, Outlet 2: {{ states(''sensor.miner_outlet_temperature_2'') }}°C, Inlet
        1: {{ states(''sensor.miner_inlet_temperature_1'') }}°C, Inlet 2: {{ states(''sensor.miner_inlet_temperature_2'')
        }}°C. Please check your miner''s cooling system.'
      target: 14164327184@c.us                                                                                                                                          

Electricity Price Monitoring

Mining is only profitable below a certain electricity cost. Based on AI-assisted calculations, I set my limit to 62c/kWh. If the price rises above this, Home Assistant turns off the miner:

- id: 634d0fe360ec41c3ac1ea0f3473ad603                                
  alias: Turn Off Miner When Electricity Price Is High                
  description: Turns off the miner when electricity price goes above 62c/kWh
  triggers:                                                           
  - entity_id:                                                        
    - sensor.electricity_cost_in_cents_per_kwh                        
    above: 62                                                         
    trigger: numeric_state                                            
  actions:                                                            
  - target:                                                           
      entity_id: switch.shellyplug_s_3ce90ee04b97                     
    action: switch.turn_off                                           
    data: {}                                                          
- id: 9c4564819e404cff95fcc447532bd19b                                
  alias: Turn On Miner When Electricity Price Is Low                  
  description: Turns on the miner when electricity price goes below 62c/kWh                                                                                             
  triggers:                                                           
  - entity_id:                                                        
    - sensor.electricity_cost_in_cents_per_kwh                        
    below: 62                                                         
    trigger: numeric_state                                            
  actions:                                                            
  - target:                                                           
      entity_id: switch.shellyplug_s_3ce90ee04b97                     
    action: switch.turn_on                                            
    data: {}                

Room Temperature Monitoring

Since I also use a heat pump, I monitor indoor temperatures. If the bedroom temperature drops below 20°C, the heat pump turns on:

- id: bedroom_temperature_control                                                                                                                                       
  alias: Bedroom Temperature Control
  description: Control bedroom heating based on temperature between 9 AM and 7 PM
  trigger:
  - platform: numeric_state
    entity_id: sensor.temperature_bedroom
    below: 20 
  - platform: numeric_state
    entity_id: sensor.temperature_bedroom
    above: 20 
  - platform: time_pattern
    minutes: /15
  condition:
  - condition: time
    after: 09:00:00
    before: '19:00:00'
  action:
  - choose:
    - conditions:
      - condition: numeric_state
        entity_id: sensor.temperature_bedroom
        below: 20
      sequence:
      - service: climate.set_hvac_mode
        target:
          entity_id: climate.ac_12488762
        data:
          hvac_mode: heat
      - service: climate.set_temperature
        target:
          entity_id: climate.ac_12488762
        data:
          temperature: 20
    - conditions:
      - condition: numeric_state
        entity_id: sensor.temperature_bedroom
        above: 21
      sequence:
      - service: climate.turn_off
        target:
          entity_id: climate.ac_12488762
  mode: single

Rewards and Taxation

Initially, I set my mining rewards to be paid in BTC. However, I now distribute them as 1/3 BTC, 1/3 DOGE, and 1/3 LTC, balancing risk and market fluctuations.

For tax reporting, I direct rewards to Binance to simplify tracking. Home Assistant helps by exporting electricity costs directly from its database, making expense calculations easier.

I want to see plenty of things with one look from my mobile Home Assistant Companion App. Expected Rewards, current expenses, temperatures, unpaid rewards in BTC, DOGE, LTC, USD and EUR. I want to see current electricity price with everything including transfer tariff and taxes. Some of the things I want see in graphs, some in numbers. The one thing I want to add is uptime, but other than that I’m cool with my current setup.

As you can see, the current expected profit for whole day is only 0.60€. This does not take into consideration that I also heating the whole house with the miner. As you can also see, the current electricity SPOT price is way higher than average. Same time DOGE is valued significantly lower than the average which lowers the expected rewards.

Reward payments and taxation

When I initially setup the miner and payments I decided to have them on de-centralized wallet. But then the conversion fees there are way higher than on Binance for example. I decided to to get the rewards paid directly to Binance for that reason.

Maybe later when I’m doing millions with this I might consider de-centralized wallet, but at the moment when it seems the miner is not going to be paying itself within next two years if ever I don’t feel pressure to get the little I earn from this to anywhere outside Binance.

I feel also that it is easiest to do tax return when everything I earn is on Binance. Home assistant certainly helps in regards calculating the expenses since I can export monthly electricity costs directly from Home Assistant database tables. I’ve already created few views there to make that part of the mining easier.

Final Thoughts

This setup provides an efficient and automated mining operation while reducing downtime. Let me know if you have suggestions for improvement or spot any flaws in my approach!

Why mine anything

I’ve been running a DOGE miner for about a week and a half now. I never really thought DOGE was going to be “the next big thing.” I see a bit more potential in Litecoin, but I’m mostly focused on Bitcoin. However, my miner currently doesn’t mine BTC. The BTC miners earns about one-tenth of the profit compared to mining LTC or DOGE. So I rather mine where the money is and put everything to BTC. To be clear: I mine altcoins and take the profit in bitcoin.

To be honest, I’m skeptical about the future of altcoins—especially meme coins. So why am I mining, given that I don’t see much sense in home BTC mining, and I’m not really sold on other cryptocurrencies either? The main reason is that I want to understand the technology. I’ve decided to only invest in BTC, but I still want to explore how other coins work and whether mining Bitcoin at home can ever be truly profitable.

Lately, I’ve become more interested in decentralized AI. The idea is that many of us—myself, you, and others—have GPUs at home. Instead of using them for crypto mining, we could put them to work on decentralized AI projects. Yes, it still consumes electricity, but at least it could serve a more meaningful purpose. We could even apply the same incentive structure that exists in blockchain to reward participants.

Rather than sending our money to large operations in China or the U.S., we could invest in our own local communities by redirecting home mining capacity to AI development. Let’s break free from the current systems!

I know I’m not the only one thinking this. Let’s collaborate and make it happen!

Decentralized AI – dAI.

Migrating from Gallery Menalto 1.x to Piwigo: An Open Source Solution

Introduction

Migrating a large collection of photos and albums from an outdated photo gallery software can be a daunting task. With the aim of addressing this challenge, I have developed a robust framework to facilitate the migration from Gallery Menalto to Piwigo. This open-source project, licensed under GPL 3, not only aids in migrating to Piwigo but also provides the flexibility to migrate to any photo gallery software with an open API.

Background

Around 2004, I deployed a self-hosted photo gallery to organize, store, and maintain digital photos. At the time, this was a new area for me, and I had little experience managing digital images. The motivation was clear: my first daughter had just been born, and I wanted to ensure that these precious memories were easily accessible not only for myself but also for other family members, including grandparents.

For various reasons, I chose Gallery Menalto as my gallery software and ran it on OpenBSD 4.0 with a Geeklog 1.3 website. This setup was hosted in my closet in an apartment in Finland. Ten years later, several more daughters had been born, and I had moved to Canada. Despite the passage of time, both GeekLog and Gallery Menalto were still running on the same old OpenBSD system, which I had virtualized on top of VMware Server 2.0. The whole system had moved from the closet to the basement.

By then, I had amassed over 10,000 photos, and I was increasingly stressed about how to migrate them from the legacy system. Although I had attempted several times to upgrade from OpenBSD to Linux, the process was far from simple.

Another ten years passed, and the system was back in Finland, still running on a virtualized OpenBSD on top of VMPlayer. I couln’t even get the VM to run on libvirt because the OpenBSD 4.0 can’t handle that. Upgrading had become even more challenging; what was a legacy system ten years ago had become practically prehistoric.

I searched extensively for a solution, but nothing seemed to work. The Gallery version was 1.4, which couldn’t be upgraded to 2.x and then to 3.x, making the transition to something like Piwigo impossible. OpenBSD hadn’t received any updates either, but the environment was jail-rooted, and Gallery 1.x, with its known internals, didn’t pose significant security risks – it was wide open anyway.

Since no ready-made solution was available, I decided to create my own. I started by migrating all the articles from Geeklog to WordPress, a task that took about two nights and was relatively straightforward – if you can call something where you need to have gateway MySQL to run on docker container because of the legacy source system and bunch of python scripts straight forward, it was straightforward. Next, I delved into the internals of Gallery 1.4, experimenting through trial and error. Within a few nights, I had a plan.

Instead of attempting to migrate everything at once, I decided to collect the metadata from Gallery 1.x in one phase and perform the actual migration in a second, separate phase. This approach meant that the first phase was not dependent on the second, allowing me to choose any target gallery software if I decided against Piwigo later on.

Once this idea took shape, I managed to get everything working relatively quickly. I started the migration yesterday, and as of now, I have successfully completed it. I had to make a few changes to the code along the way, but the final version is now available in my GitHub repository.

Project Overview

The project comprises two main Python scripts:

  1. Collect Gallery Metadata (collect_gallery_meta_data.py): This script gathers meta information for all albums and photos from Gallery Menalto.
  2. Execute Migration (execute_migration.py): This script performs the actual migration of albums and photos, along with their metadata, to Piwigo.

Collecting Gallery Metadata

The collect_gallery_meta_data.py script is designed to traverse through the albums and photos in a Gallery Menalto installation, collecting crucial metadata and storing it in a MySQL database (or any SQLAlchemy-supported database). This metadata includes:

  • Album name
  • Parent album
  • Album caption
  • Album title
  • Album description
  • Photo filename
  • Photo caption
  • Photo title

How It Works

  1. Command-line Argument: The script takes an album located in the Gallery 1.x root as a command-line argument.
  2. Traversal: It recursively goes through all sub-albums and photos, collecting metadata.
  3. Storage: The collected data is stored in database tables (albums and photos)

Executing the Migration

The execute_migration.py script utilizes the metadata collected to perform the migration to Piwigo. It:

  • Creates the root album and sub-albums in Piwigo.
  • Uploads photos along with their metadata (capture date, upload date, caption, title, description) to Piwigo.

Key Features

  1. Download Photos: Photos are downloaded from Gallery Menalto only once. The field downloaded is set to 1 when the download is complete.
  2. Upload Photos: Once a photo is successfully uploaded to Piwigo, the field uploaded is set to 1.
  3. Track Migration: The albums table has a column migrated, which is set to 1 once an album is migrated. This ensures the process can be safely resumed if interrupted.

Advantages and Applications

This framework is highly beneficial for users who wish to transition from Gallery Menalto to modern photo gallery software. Its primary advantages include:

  1. Flexibility: While designed for Piwigo, the framework can be adapted for any photo gallery software with an open API.
  2. Efficiency: Metadata and photos are efficiently handled and migrated, ensuring data integrity and completeness.
  3. Open Source: As an open-source project under the GPL 3 license, it encourages collaboration and customization.

Getting Started

Clone the Repository:

git clone https://github.com/elsonico/gallery-piwigo-migration.git
cd gallery-piwigo-migration

Set Up the Database:

Ensure you have a MySQL database (or any SQLAlchemy-supported database) ready. Here’s the DDL for the two tables:

albums

CREATE TABLE `albums` (
  `id` int NOT NULL AUTO_INCREMENT,
  `name` varchar(255) NOT NULL,
  `parent_id` int DEFAULT NULL,
  `metadata` text,
  `meta` text,
  `title` varchar(255) DEFAULT NULL,
  `caption` text,
  `description` text,
  `migrated` tinyint(1) DEFAULT '0',
  `created` tinyint(1) DEFAULT '0',
  PRIMARY KEY (`id`),
  KEY `parent_id` (`parent_id`),
  CONSTRAINT `albums_ibfk_1` FOREIGN KEY (`parent_id`) REFERENCES `albums` (`id`)
)

photos

CREATE TABLE `photos` (
  `id` int NOT NULL AUTO_INCREMENT,
  `album_id` int DEFAULT NULL,
  `filename` varchar(255) NOT NULL,
  `caption` text,
  `metadata` text,
  `meta` text,
  `type` varchar(10) DEFAULT NULL,
  `description` text,
  `url` varchar(255) DEFAULT NULL,
  `downloaded` tinyint(1) DEFAULT '0',
  `uploaded` tinyint(1) DEFAULT '0',
  `capturedate` datetime DEFAULT NULL,
  `uploaddate` datetime DEFAULT NULL,
  PRIMARY KEY (`id`),
  KEY `album_id` (`album_id`),
  CONSTRAINT `photos_ibfk_1` FOREIGN KEY (`album_id`) REFERENCES `albums` (`id`)
)

Collect Metadata

Before starting the metadata collection you need to have few environment parameters set, since the code expects to have them:

  • GALLERY_BASE_URL
    • This is your gallery URL including your album location: https://[gallery_host]/[albums]
    • This is the location you can find your albums.dat
  • DATABASE_URL
    • This is your MySQL URL: mysql+pymysql://[user]:[password@[host]/[database]

Once all above including the database tables are created your are good to go:

python collect_gallery_meta_data.py [root_album]

Execute Migration

As for collecting the metadata for doing the migration you need several environment variables set:

  • GALLERY_BASE_URL
    • Same as above depending on your configuration: https://[gallery_host]/[albums]
  • PIWIGO_API_URL
    • This is: https://[piwigo_host]/piwigo/ws.php
  • PIWIGO_USERNAME / PIWIGO_PASSWORD
    • Piwigo credentials with privileges to create albums and add photos
  • MIG_DB_HOST / MIG_DB_NAME
    • The database host and database your store the albums and database tables
  • MIG_DB_USER / MIG_DB_PASSWORD
    • Credentials for the migration database
  • PW_DB_HOST / PW_DB_USER / PW_DB_PASSWORD / PW_DB_NAME
    • The Piwigo database host, database and credentials

Once you are certain above environmental variables are correctly set you are good to go and migrate the Gallery albums from root album level album by album to Piwigo.

The migration start with below command:

python execute_migration.py [root_album]

The program expects to find album information collected for [root_album] so you must ensure you have data collected with collect_gallery_meta_data.py before getting to this phase. Note also, that you can edit the meta information through database tables if you want to change anything there.

Conclusion

Migrating from Gallery Menalto to Piwigo, or any other modern photo gallery software, is now more manageable with this open-source framework. By leveraging the power of Python and SQLAlchemy, this project ensures a seamless transition, preserving all your precious photo memories. I warmly welcome contributions and collaboration to further enhance and extend this tool.

Migrating Your Home Assistant Database from SQLite to PostgreSQL

Migrating your Home Assistant database from SQLite to PostgreSQL can significantly enhance performance, especially as your data grows. This guide will walk you through creating a database dump, converting data types, setting up your PostgreSQL database, and configuring Home Assistant to use the new database.

Creating the Database Dump from SQLite

To start, you need to create a dump of your existing SQLite database:

sqlite3 home-assistant_v2.db .dump > ha_dump.sql

This command creates a plain text file containing the SQL commands needed to reconstruct the database.

Converting Data Types

During the migration from SQLite to PostgreSQL, certain data types need to be converted:

  • DATETIME to TIMESTAMP: PostgreSQL uses TIMESTAMP instead of DATETIME.
sed -i 's/DATETIME/TIMESTAMP/g' ha_dump.sql
  • BLOB to BYTEA: Convert BLOB fields to BYTEA for binary data.
sed -i 's/BLOB/BYTEA/g' ha_dump.sql

Preparing PostgreSQL Database

Creating Database and User

Start by setting up your PostgreSQL database and user:

CREATE DATABASE homeassistant;
CREATE USER ha WITH ENCRYPTED PASSWORD 'yourpassword';
GRANT ALL PRIVILEGES ON DATABASE homeassistant TO ha;

Create the objects and load the data

Use the command line utility psql to create the database objects and load the data into the newly created PostgreSQL database:

psql -h [your_db_host]-U ha -d homeassitant -f ha_dump.sql -W > load.log 2>&1

Instructions:

  • Replace [your_db_host] with the actual hostname or IP address where your PostgreSQL database is hosted.
  • User and Database: Ensure that ha is the correct username and homeassistant is the correct database name you created for Home Assistant.

After running the command, the SQL dump file (ha_dump.sql) will be executed against your PostgreSQL database. The output and any errors encountered during the process will be redirected to load.log. This log file is essential for tracking the progress and identifying any issues that need resolution.

Check for Errors:

  • Review the load.log file for possible errors. This file contains all output from the psql command, including any SQL errors or warnings that were generated during the import process.
  • Iterate as Necessary: If errors are found, you may need to fix issues in ha_dump.sql and rerun the command. This step might need to be repeated several times. Modifications could involve correcting data types, adding missing sequences for auto-incrementing fields, or adjusting SQL syntax to be compatible with PostgreSQL.

This process can be time-consuming, but it is crucial for ensuring that your database is correctly set up with all the necessary data and schema configurations. As noted, your mileage may vary depending on the specifics of your data and the initial state of the ha_dump.sql file.

Adjusting the Schema

For tables needing auto-increment functionality (which is common in primary key columns), set up sequences:

CREATE SEQUENCE states_state_id_seq;
ALTER TABLE states ALTER COLUMN state_id SET DEFAULT nextval('states_state_id_seq');
SELECT setval('states_state_id_seq', (SELECT MAX(state_id) FROM states) + 1);

Repeat this pattern for other necessary tables and columns, such as events(event_id), state_attributes(attributes_id), and so on.

Configuring Home Assistant

Install SQLAlchemy and other dependencies

Home Assistant uses SQLAlchemy as the SQL toolkit and Object-Relational Mapping(ORM)system for Python. Install it with pip:

pip3 install SQLAlchemy<br>pip3 install psycopg2-binary

Modify the configuration

Modify the configuration.yaml file to point to the new PostgreSQL database:

recorder:
  db_url: postgresql://ha:yourpassword@localhost/homeassistant

This setup directs Home Assistant to use the newly configured PostgreSQL database.

Benefits of Migrating to PostgreSQL

Moving from SQLite to PostgreSQL offers several benefits:

  • Scalability: PostgreSQL handles larger databases and more concurrent connections.
  • Performance: Improved query performance and optimization options.
  • Reliability: Robust transaction support and recovery features.
  • Flexibility: Richer set of data types and full-text searching capabilities.

Conclusion

Migrating your Home Assistant database to PostgreSQL not only enhances performance but also provides a more robust and scalable backend, suitable for growing smart home environments. This migration ensures that your Home Assistant setup can handle increased data loads efficiently and reliably.

Famous last words

One of my primary concerns during this migration was the potential loss of historical data, particularly how it might affect critical metrics like energy usage. The statistics table, which was the last to have the auto-incremental column added, is pivotal as it houses the energy usage stats.

As the image below shows, there appears to be a gap of approximately four hours in the data on energy usage stats. However, it seems that Home Assistant has effectively compensated for this missing data. The system appears to have aggregated the missing energy usage from those four hours into the data represented in the 5 PM bar on the chart.

This outcome is quite reassuring and confirms that the system’s integrity remains intact despite the migration hiccups. I’m relieved to see that after all the adjustments and troubleshooting, everything is functioning as expected.

This experience underscores the importance of careful planning and execution in database migrations, especially when dealing with essential home automation systems like Home Assistant. The transition may require significant effort and attention to detail, but the end result can be gratifying, ensuring continuity and robustness in data handling.

Building a DIY AI Chatbot: Control Your Conversations

Introduction

A self-built AI chatbot is crafted entirely by an individual or team from scratch, without relying on pre-existing templates or platforms. This approach gives developers complete autonomy over the coding, features, and functionalities of the chatbot.

Creating a self-built AI chatbot demands a blend of programming expertise, a deep understanding of artificial intelligence, and inventive thinking. Developers can use a variety of programming languages, including Python, Java, or JavaScript, based on their preferences and the chatbot’s intended application.

One of the standout advantages of a self-built AI chatbot is its high level of customization. Developers can fine-tune the chatbot’s responses and functionalities to meet specific needs and objectives. Moreover, they can continually refine and enhance the chatbot on their own timetable, independent of external updates or support.

Getting started

Building a chatbot from scratch might seem daunting, but it’s quite feasible with the right tools. I used the OpenAI API and the Python openai library (version 1.23.2 as of this writing). While GPT-4 typically suggests using openai==0.28, the transition to versions above 1.0 signifies substantial changes and necessitates thoughtful consideration. However, this doesn’t mean that ChatGPT cannot assist in coding—it can, though it requires precise instructions.

Technical setup

For my project, the technical foundation included:

  • Python 3.9.x or higher: I chose Flask as the application server.
  • Access to the OpenAI API: Essential for integrating the AI logic into the chatbot

This setup is sufficient to establish a testing environment for the AI logic, connecting the Python code to the OpenAI API.

Advanced configuration

After thorough testing, I moved on to production. I continued using Flask for its simplicity, but also added Gunicorn as a frontend server. The application runs either as a standalone version or embedded within a WordPress blog.

I explored different operational models, including storing interactions in a database and the Bring Your Own Data (BYOD) model, although the latter’s impact on performance is still unclear. Initially, I deployed the gpt-3.5-turbo-instruct model for its speed and contextual retention. However, for superior output quality, I ultimately chose GPT-4 despite its slower response time.

The AI Bot Herself

The embedded ChatBot is utilizing gpt-3.5-turbo-instruct whereas the one on below links is utilizing gpt-4 model. The later needs a bit time to think, but she will get there… You can compare the results.

Conclusions

A self-built AI chatbot can serve myriad purposes—customer support, entertainment, educational assistance, or personal aid, and can be integrated across websites, messaging platforms, or mobile apps.

For me, the project was primarily an exploration of AI technologies and the OpenAI API. It was also an invaluable learning experience in Python, application servers, and container technologies.

Building a self-built AI chatbot is undoubtedly a complex, resource-intensive endeavor that necessitates ongoing updates and maintenance. Yet, the potential for continuous learning and improvement through natural language processing and machine learning algorithms makes it increasingly efficient and precise over time.

From a Friday morning start to a productive Monday evening, my journey with this project underscores the potential and versatility of AI technologies, making a self-built AI chatbot a potent, customizable tool for any tech-driven initiative.

References

Streamlining Your Database Migration: A Guide to Leveraging OpenAI API for Seamless Assessments

Database migration is a complex process that demands careful assessment to ensure data integrity, application performance, and overall system reliability. The OpenAI API, with its advanced natural language processing capabilities, offers a way to simplify this process by automating assessments and summarizing key points. This guide will walk you through using the AWS Schema Conversion Tool (AWS SCT) for initial assessments, integrating the OpenAI API with Python to generate assessment summaries, and understanding the requirements for connecting with Azure OpenAI API, as well as its differences from ChatGPT OpenAI.

Kickstarting Your Migration: Utilizing AWS SCT for Comprehensive Database Assessment

The Amazon Web Services Schema Conversion Tool (AWS SCT) simplifies database migration from one platform to another. It assesses your existing database schema and generates a detailed report on potential migration issues. Supporting a wide range of source and target databases, AWS SCT is versatile for many migration scenarios.

AWS SCT examines your database schema, identifies non-convertible elements, and produces a comprehensive report. This report, containing potential action items, is crucial for planning your migration, offering an overview of the complexity, potential challenges, and the effort required.

The report, in PDF format, provides a detailed view of your database schema, potential issues, and recommendations. While invaluable for database administrators and engineers, the report’s extensive and complex nature makes OpenAI API a perfect tool for simplification and summarization.

Transforming PDFs into Comprehensive Assessment Summaries

With the AWS SCT report in hand, the next step is to utilize OpenAI API’s sophisticated natural language processing capabilities. By reading and understanding the PDF report, OpenAI can extract key points and summarize the information in a more accessible format.

Using the Python package pymupdf, we scan the PDF and convert its contents to text. This text is then fed to OpenAI API to highlight important sections and summarize the findings, including potential issues and recommended actions.

The Python method process_directory reads each PDF, converts it to text, and then passes this text to another method, generate_summary, which calls the OpenAI API to generate a concise assessment summary.

Method: process_directory()

def process_directory(directory):
    """Processes each PDF file in the given directory to generate a summary."""
    hostname, port_number, database_name = directory.split('_')
    for file in os.listdir(directory):
        if file.endswith('.pdf'):
            file_path = os.path.join(directory, file)
            pdf_text = extract_text_from_pdf(file_path)
            summary = generate_summary(pdf_text)
            print(f"Summary for {file} ({hostname}, {port_number}, {database_name}):\n{summary}\n")

Method: generate_summary()

def generate_summary(text):
    """Generates a summary for the given text using OpenAI's API."""
    response = openai.chat.completions.create(
        model="gpt-4",
        messages=[
            {"role": "system", "content": "You are database \
               reliability engineer providing migration \
               assessment summary."},
            {"role": "user", "content": "Summarise the output \
              of assessment text: \n" + text}
        ],  
        temperature=0.4,
        max_tokens=150,
        top_p=1.0,
        frequency_penalty=0.0,
        presence_penalty=0.0
    )   
    summary = response.choices[0].message.content.strip()
    return summary

Understanding OpenAI API Parameters

Understanding the role and impact of various OpenAI API parameters is crucial for tailoring your query results. Here’s a brief overview:

temperature (0.4): This parameter controls the level of creativity or randomness in the responses generated by the model. A lower temperature, such as 0.4, results in more predictable and conservative outputs. Conversely, a higher temperature encourages diversity and creativity in the answers.

max_tokens (150): Specifies the maximum length of the generated response measured in tokens (words and characters). Setting this to 150 means the response will not exceed 150 tokens, ensuring concise and to-the-point answers.

top_p (1.0): Also known as “nucleus sampling,” this parameter filters the model’s token generation process. A value of 1.0 means no filtering is applied, allowing any token to be considered. Lowering this value helps in focusing the response generation on more likely token sequences, potentially enhancing relevance and coherence.

frequency_penalty (0.0): Adjusts the likelihood of the model repeating the same line of text. A value of 0.0 implies no penalty on repetition, enabling the model to freely reuse tokens. Increasing this value discourages repetition, fostering more varied and dynamic outputs.

Above python methods generated modest summary from my sandbox environment. Modest at this point – we can take this much further though. I’ve taken small part of whole summary describing migration effort from MS SQL Server 2019 database to RDS for PostgreSQL.

Migration Plan Summary

Source Database

  • AdventureWorks2019.MSSQL
  • Microsoft SQL Server 2019 (RTM-CU22-GDR) – 15.0.4326.1 (X64)
  • Standard Edition (64-bit) on Windows Server 2019 Datacenter
  • Case sensitivity: OFF

Target Platform:

  • AWS RDS for PostgreSQL

Assessment Findings:

  • Storage Objects: 100% can be converted automatically or with minimal changes.
  • Code Objects: 77% can be converted automatically or with minimal changes.
  • Estimated 99.9% of code can be converted to AWS RDS for PostgreSQL automatically.
  • 515 conversion actions recommended ranging from simple to complex tasks
OPENAI

Above AI-generated summary can be a significant time saver for database administrators and engineers. Instead of going through pages of detailed reports, they can quickly glance through the summary and understand the key points. It can also be used as a reference guide during the migration process, helping to avoid potential issues and ensuring a smooth transition.

Building a fully automated OpenAI-Powered Python Module for PDF Analysis and Summary Generation

To generate the assessment summary using the OpenAI API, I developed the Python methods described above. These methods are components of a larger assessment framework that I’m currently developing. In this article, we focus exclusively on the integration with the OpenAI API. It’s worth noting that the PDF files used as input are generated through a fully automated process. However, the details of that process are beyond the scope of this blog post.

Python, with its versatility and powerful capabilities, is ideal for integrating with the OpenAI API. It offers libraries for API interactions and processing PDF files, enabling the automation of the entire workflow—from reading PDF files to generating summaries.

For the initial step, libraries such as PyPDF2, PDFMiner, or pymupdf—which I prefer—can be utilized to read the contents of PDF files. After extracting the text, this information can be processed by the OpenAI API. The API is designed to analyze the text, pinpoint the essential information, and compile a concise summary.

Subsequently, this summary can be saved either as a text file or within a database for easy access in the future. Moreover, the module can be configured to insert summaries into a database table, integrating them into a larger assessment data repository. This data can then be leveraged for generating reports, such as Power BI dashboards or other forms of reporting, allowing key stakeholders to stay informed about the migration process’s progress.

Setting Up Azure OpenAI API: Essentials and Differences from ChatGPT

The Azure OpenAI API is a cloud-based service enabling developers to integrate OpenAI’s capabilities into their applications. To utilize the Azure OpenAI API, one must have an Azure account and subscribe to the OpenAI service, in addition to generating an API key for authentication during API requests.

There are notable differences between utilizing ChatGPT and the Azure OpenAI API.

For ChatGPT, your Python module only requires the openai.api_key to be set, along with specifying the model, such as “gpt-4” in my example code. However, integrating with the Azure OpenAI API necessitates additional configuration:

    openai.api_base = os.getenv('AZURE_OPENAI_ENDPOINT')
    openai.api_key = os.getenv('AZURE_OPENAI_API_KEY')
    openai.api_version = os.getenv('AZURE_OPENAI_VERSION')
    openai.api_type = "azure"
    deployment_id = os.getenv('AZURE_OPENAI_DEPLOYMENTID')

It’s important to note that when using Azure OpenAI, Python OpenAI API parameter model corresponds to your specific deployment name instead of “gpt-4” as it was for ChatGPT model in my examples earlier.

The Azure OpenAI API and ChatGPT OpenAI both offer advanced natural language processing capabilities, albeit tailored to different use cases. The Azure OpenAI API is specifically designed for embedding AI functionalities into applications, whereas ChatGPT OpenAI excels in conversational AI, facilitating human-like text interactions within applications.

Choosing between the two for summarizing database migration assessments hinges on your project’s unique needs. Azure OpenAI API is the preferable option for projects requiring deep AI integration. On the other hand, if your application benefits from conversational AI features, ChatGPT OpenAI is the way to go.

In summary, utilizing the OpenAI API can drastically streamline the database migration assessment process. The AWS Schema Conversion Tool yields a thorough report on your database schema and potential issues, which can efficiently be condensed using the OpenAI API. By developing a Python module, this summarization process becomes automated, thus conserving both time and resources. Regardless of whether Azure OpenAI API or ChatGPT OpenAI is chosen, each offers potent AI capabilities to facilitate your database migration endeavors.

Running benchmark: Comparing Ubuntu 20.04 and RHEL 8.5 performance

The claim

I found a claim on Quora that Ubuntu is slow compared to RHEL. I never thought about it. Is it really? It seemed like a sentimental statement with nothing to prove it. I questioned the claim and found out, that many people tried to support the claim still without providing any kind proof.

Instead of continuing to asks any evidence, I decided to dig the evidence my self.

Is there any difference in the performance between the two? I don’t really know, but if you think about default server install with nothing extra, I doubt there could be any significant performance difference.

Ubuntu 20.04 has currently kernel 5.4 where as RHEL 8.5 has 5.13. Libraries and software are pretty much the same. File system by default is XFS on both. I always enable LVM although that could affect performance – certainly not by improving it, but there are other advantages. I don’t think LVM reduces performance much either and as said I always set it up anyway.

In this case I have two virtual machines both having 4GiB RAM and two 3.3GHz CPU cores running on qemu/kvm. The host OS is, yes you guessed right, Ubuntu 20.04 , because on desktop it has certain software I need. And, it works well as virtual host too. That does not affect the results anyhow, since the guest OS has no idea of host OS.

I haven’t tuned either of the guests OS at all except for one thing. I set tuned profile to virtual-guest for both, which makes sense. It is the recommended profile when I run tuned-adm recommend on both of the guests machines.

Put the HammerDB down

Last I ran HammerDB I had to settle with text based version, but this time it had a nice working GUI. But even before quick HammerDB installation, I downloaded Db2 11.5.7 Community Edition. Installed it on both Ubuntu and RHEL. I created SAMPLE database with db2sampl and took timing for that: no difference really. I knew it. Ok that doesn’t prove anything.

But, the real test does. HammerDB.

Ubuntu 20.04

Let’s start with Ubuntu. I read a tutorial on how to run time based benchmark with HammerDB. I want to do this fast. One virtual user only. Looks good.

Ubuntu 20.04

The process goes:

  • Choose Engine and configure it (Db2)
  • Build schema
  • Configure and load driver
  • Configure virtual user
  • Create virtual user(s)
  • Run virtual users
  • Monitor and wait
Here are the results for Ubuntu.

The results for Ubuntu 20.04

System achieved 5178 NPM from 22865 Db2 TPM

RHEL 8.5

Then same thing for RHEL. The machine crashes twice. Reminds me of kernel parameters. We have only 4GiB, so might be I need to tune them. But no, it run all good the third time. In my previous job though, servers with low memory running Db2 on RHEL crashed always without tuning the kernel parameters. There’s a simple formula based on RAM to calculate correct values for Db2 here. That said, I did not change anything from defaults for Ubuntu nor RHEL. It wouldn’t be fair comparison, if I started to tune the kernel parameters for one and not to the other.

Running on RHEL 8.5
There’s finally some I/O wait
The winner is RHEL 8.5 by two New Orders Per Minute (NOPM)

The results for RHEL 8.5

System achieved 5180 NOPM from 22815 Db2 Db2 TPM

First conclusion

There is really no difference between Ubuntu and RHEL what comes to achieved performance results. The two new orders per minute makes 0.04% difference which I’m pretty sure no one can notice just by “using the server a bit”.

Comparison between database engines

Since I already started playing with HammerDB, why not try some more tests. I have earlier installed Db2 on the host machine itself as well as MS SQL Server. I also have virtual machine running Oracle Linux 8 on it with the same 4GiB RAM and two CPU core setup. MySQL and PostgreSQL I have running on the host itself.

The hosts OS, as said, is running Ubuntu Desktop 20.04. It has 4 x 3.3GHz cores and 32GiB RAM and fast NVMe 500GiB M.2 PCIe SSD. This is small form factor machine suitable for industrial use as a headless server running for example Linux. Or you can use it as desktop computer as well. My idea for it was to use it as a platform for several virtual guests, but I wanted to see how it works as a Linux desktop computer as well.

Let’s do few quick tests on the host itself for various database engines. More of a test of HammerDB itself than real comparison between the engines.

Db2

TEST RESULT Ubuntu 20.04 Desktop: System Achieved 6651 NOPM from 2928 TPM.

I’m a bit surprised it didn’t achieve more. Need to test more. It takes time for bufferpools to warm up with automatic memory tuning and with 32GiB memory I’m pretty sure we could get much better results.

MS SQL Server

But let’s check with MS SQL Server I have running on the same machine. Certainly Db2 beat MS SQL Server, right?

MS SQL Server gets higher TPM numbers compared to virtual machines
Obviously the benchmark is somewhat different between the engines,

The winner is… oh no, MS SQL Server

8394 New Orders Per Minute with 19243 SQL Server TPM

Oracle

I have one Oracle 21c Server running on VM running Oracle Linux 8. Oh but Oracle – I’m so lost with it. HammerDB asks too much questions and it seems I need to create another pluggable database. I will do that – later.

PostgeSQL and MySQL

Out of curiosity I ran the test for PostgreSQL and MYSQL:

PostgreSQL: TEST RESULT: 11607 NOPM from 26880 PostgeSQL TPM

MySQL: TEST RESULT: 1739 NOPM from 5252 MySQL TPM

I have no idea why the difference between above two is that significant. Might be for various reasons. I wouldn’t pay much attention on the difference since running the test on host OS and not on virtual machines with proper setup doesn’t make much sense – unlike the more serious comparison I did for Ubuntu Server and RHEL.

Final Conclusion

Without official test for Oracle we cannot make any other conclusion than Oracle is the slowest from these three DB Engines: Db2, MS SQL Server and Oracle. I’m kidding of course; I’m no Oracle expert and just too slow myself to set a proper test for Oracle. That might change once I have enough time to dig deeper on Oracle. For MySQL and PostgreSQL the test was also too quick; more of a test do they work similarly in comparison to Db2 and MS SQL what comes to HammerDB.

What comes to the original claim about Ubuntu being overall slow and which surprisingly many is willing to believe, I think I have busted the claim.

We can speculate how about real server environments and please do, but before you actually have any benchmarks to show otherwise, I take it proven that Ubuntu and RHEL are equally slow or fast.

Also, what comes to MS SQL Server performance compared to Db2, obviously this was not the last word. Let’s try with 10 virtual users beating Db2 for a bit longer.

Db2 with 10 Virtual Users

Final results for Db2 running on this tiny Asus Mini PC PN41 were:

TEST RESULT: System achieved 15650 NOPM from 68735 Db2 TPM.

So we have a winner: Db2 11.5.7?

In a sense Db2 won that it did get the highest number of new orders per minute yes. But in comparing with other database engines I didn’t really organise any meaningful tests between them this time.

Want to test yourself?

Prove me wrong. Run your own tests and provide me your data and conclusions. I have serious doubt Ubuntu Server and RHEL differs much what comes to performance. There certainly is plenty of other things which makes the difference when choosing the distribution. Things like support, cost, platform you are running on and so on. Red Hat certainly has it’s advantages on enterprise level support whereas Ubuntu started strong on desktop, but it is easy to deploy for example on Azure and fully supported.

Configure SQuirreL SQL Client for DB2 for Linux, UNIX and Windows

Configure SQuirreL SQL Client for DB2 for Linux UNIX, and Windows

At least for me the default settings for SQuirreL DB2 driver didn’t work. That’s why I documented here how to modify the default “IBM DB2 App Driver” so it works.

Configure the DB2 driver

  1. First I copied db2jcc4.jar from the DB2 instance. In my case the driver was under

    /opt/ibm/db2/V11.1/java/ but you find it as DB2 instance user under $HOME/sqllib/java as well, since this is symlink to installation directory.

  2. I stored DB2 db2jcc4.jar  under $HOME/Java, but you can choose a different location.
  3. Click Drivers tab and then double click “IBM DB2 App Driver” to modify it.
  4. Highlight “Extra Class Path” tab and click “Add” to add the db2jcc4.jar you have earlier stored to you computer.
  5. Set “Example URL” to jdbc:db2://<HOST>:<PORT>/<DATABASE_NAME>
  6. Set “Class Name” to com.ibm.db2.jcc.DB2Driver

Setup connection to DB2 database

  1. To set up your first DB2 database connection on SQuirreL SQL Client, select the “Aliases” tab and click  icon, to add new database connection.
  2. From “Driver pull down list select “IBM DB2 App Driver”.
  3. Choose “Name” for you connection ie. the database name you are connecting to.
  4. URL is the form you set up when creating the driver: “jdbc:db2://<HOST>:<PORT>/<DATABASE_NAME>”. Set “host” and “dbname” accordingly.

No you can test your connection buy hitting test and then connect:

Ready to go

Once connection is done you can connect to DB2 database:

And start querying:

Questions?

If you have any questions or feedback, please connect with me.

Create Linux VM running CentOS 7.3 minimal with pyodbc and Netezza Client

This document describes how to create Linux Virtual Machine (VM) to be run on macOS or Windows Host. When followed the steps in this document, you will have CentOS 7.3 VM capable of running Netezza Linux Client, unixODBC, Python 3.6 with pyodbc and pandas among others. This setup is useful for developing Python code which needs Netezza connection.

Especially macOS users will benefit from this kind of setup, since there is no Netezza client for macOS.

This document concentrates on deploying the VM on VirtualBox, but the CentOS setup portion is identical also when using other hypervisors ie. VMWare Player, VMWare Workstation or VMWare Fusion.

Note: The LinuxVM created in this documented has all capabilities on Python 3.6. You execute python code calling python3.6 instead of just python, which points to python 2.75.

Install CentOS 7.3 on VirtualBox

  1. Download newest version of VirtualBox and install it: https://www.virtualbox.org/wiki/Downloads
  2. Once installed go to VirtualBox menu and choose “Preferences” and click “Network”.
  3. Click “Host-only Networks” and choose icon add: 
  4. Now you have new “Host-only Network” which is needed for incoming connections. You can check the details by double clicking vboxnet0:
  5. Next create VM with two network adapters:
    1. Choose “New” and select “Name”, “Type” and “Version”:
    2. Click continue. You can keep the memory on 1024MB which is the default.
    3. Click continue and choose “Create virtual hard disk now” and click “Create”.
    4.  “Hard disk type” can be VDI, if you do not plan to run VM on other hypervisors, but if you plan to run it on VMWare hypervisor, choose VMDK. Click “Continue”.
    5. For flexibility choose “Dynamically allocated” and for best performance choose “Fixed size”.
    6. For most purposes 8.0GB is enough, but your needs may vary. Choose “Create”.
    7. Now VM is created, but we need to change some of the network settings:
      1. While new VM is highlighted, choose “Settings” and select “Network” tab.
      2. “Adapter 1” default settings are ok for most cases, but we need to add “Adapter 2” so click “Adapter 2”. We need 2nd Network card for incoming connections, so we select “Enable Network Adapter” and set “Attached to: Host-only Adapter”:
      3. Click “OK”.
  6. We need to have CentOS minimal installation image which we can download from CentOS site: https://www.centos.org/download/
  7. Choose your download site and store the image to desired location. We need it only during intallation.
  8. Once downloaded, go back to VM Settings on virtual box:
    1. Select “Storage” tab and “Controller: IDE” and click the CDROM icon and then another CDROM icon on right side from “Optical Drive” selection and “Choose Virtual Optical Disk File”.
    2. Select the CentOS minimal installation disk image you downloaded on previous step:

      1. Click “OK”
  9. Now we can start the CentOS minimal installation. Choose “Install….” when VM has booted.
  10. Next we get graphical installation screen. We can keep language settings as default and click “Continue”:
  11. Note when you click the VM, it will grab the mouse. To release the mouse, click left CMD (on MacOS).
  12. Click “Network & Host name”. You can specify your hostname as preferred.
  13. Both Network cards are off by default. Set them both to “ON”.
  14. For both Network cards, click “Configure” and on “General” tab choose “Automatically connect to this network when it is available”:
  15. Click “Done” to get out from Network settings, and click “Installation Destination” to confirm storage device selected by default is correct (no need to change anything). Then click “Done” to get back to main screen and you can start the installation by selecting “Begin Installation”.
  16. During installation set root password and select to create user. In my examples for setting up Netezza client, I have chosen to create “Netezza User” with username “nz”. I will also make this user an administrator:
  17. Once installation is done you can click “Finnish configuration” and then “Reboot”.
  18. VM boots now first time. You can either ssh to the system (from Terminal on MacOS, or using Putty on Windows).
  19. If this is only VM using Host-only network on VirtualBox, it’s likely the IP is 192.168.56.101. You can check the IP for device enp0S8 with command: ip addr show when logged in through VirtualBox console.
  20. After installation first thing to do is to update all packages with yum update command. Either as root give command “yum -y update” or as administrative user as “sudo yum -y update”.

Configure file sharing between Host and Guest OS

You might want to be able to share, for instance your PycharmProjects folder to run Python code you developed directly on LinuxVM. That is a bit of the whole point for the LinuxVM in this case.

To achieve that, you need to enable file sharing. There is few additional steps needed I’l go through below:

  1. You need few additional packages first. Run following commands as root:
    1. yum -y update
    2. yum -y install gcc kernel-devel make bzip2
    3. reboot
  2. Once LinuxVM has rebooted and you have LinuxVM Window active select from menu “Devices” –> “Insert Guest Additions CD Image…” . Then log in to LinuxVM as root via VirtualBox console or ssh again and run following commands:
    1. mkdir /cdrom
    2. mount /dev/cdrom /cdrom
    3. /cdrom/VBoxLinuxAdditions.run
  3. Now select the folder you want to share from your Host OS to LinuxVM. Go to VM settings and choose “Shared Folders” tab and click icon and then choose the folder you want to share:
  4. Note: Make sure you set the mount permanent. No need for automount option, since we do it a bit differently below.
  5. Above we are sharing PycharmProjects folder. We want to have PycharmProjects folder mounted on LinuxVM on nz users home directory. As nz user we first create directory PycharmProjects with command: mkdir $HOME/PycharmProjects
  6. Then as root, we add following entry to /etc/fstab:PycharmProjects /home/nz/PycharmProjects vboxsf uid=nz,gid=nz           0 0
  7. After reboot you should now have your PycharProjects folder mounted with read and write access under nz users home directory.

    Note: The purpose for above share is, that when you develop your Python code with Pycharm on MacOS and if your code needs connection to Netezza, you can not run it on MacOS, since there is no Netezza drivers. Instead, when following this guide, you will be able to run you Pycharm edited code seamlessly on the LinuxVM through ssh connection, and once confirmed to work, you can commit your changes.

Install Python 3.6 with pyodbc, pandas and sqlalchemy

Log in to LinuxVM as root and run following commands:

yum -y update
yum -y install git
yum -y install yum-utils
yum -y groupinstall development
yum -y install https://centos7.iuscommunity.org/ius-release.rpm
yum -y install python36u
yum -y install python36u-pip
yum -y install python36u-devel
pip3.6 install pandas
yum -y install unixODBC-devel
pip3.6 install pyodbc
yum -y install gcc-c++
yum -y install python-devel
yum -y install telnet
yum -y install compat-libstdc++-33.i686
yum -y install zlib-1.2.7-17.el7.i686
yum -y install ncurses-libs-5.9-13.20130511.el7.i686
yum -y install libcom_err-1.42.9-9.el7.i686
yum -y install wget
yum -y install net-tools
pip3.6 install sqlalchemy
pip3.6 install psycopg2

Testing pyodbc

Edit the connection string accordingly:

[nz@nzlinux ~]$ python3.6
Python 3.6.2 (default, Jul 18 2017, 22:59:34) 
[GCC 4.8.5 20150623 (Red Hat 4.8.5-11)] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import pyodbc
>>> pyodbc.connect(server="nz", database="TEST", dsn="NZSQL", user="admin", PWD="password", autocommit=False)
<pyodbc.Connection object at 0x7f66ef8566b0>

Install Netezza Linux client

First you need to download NPS Linux client from IBM Fix Central

Then, as root run following commands (accept all defaults):

mkdir NPS
cd NPS
tar xvfz ../nz-linuxclient-v7.2.1.4-P2.tar.gz
cd linux
./unpack
cd ../linux64
./unpack

Now, log in as nz user and add following lines to $HOME/.bashrc (modify credentials and server details accordingly: NZ_USER, NZ_PASSWORD and NZ_HOST):

NZ_HOST=netezza.domain.com
NZ_DATABASE=SYSTEM
NZ_USER=admin
NZ_PASSWORD=password
export NZ_HOST NZ_DATABASE NZ_USER NZ_PASSWORD
export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/usr/local/nz/lib64
export PATH=$PATH:/usr/local/nz/bin
export ODBCINI=$HOME/.odbc.ini
export NZ_ODBC_INI_PATH=$HOME

To make above changes effective without logging out and in, you can instead run command: . ./.bashrc

Now you should be able to use nzsql:

[nz@nzlinux ~]$ nzsql
Welcome to nzsql, the IBM Netezza SQL interactive terminal.
Type:  \h for help with SQL commands
       \? for help on internal slash commands
       \g or terminate with semicolon to execute query
       \q to quit
SYSTEM.ADMIN(ADMIN)=>

Setup ODBC

Copy following two files, odbc.ini and odbcinst.ini to /etc as root:

odbc.ini
odbcinst.ini

As nz user create following symlinks:

ln -s /etc/odbcinst.ini .
ln -s /etc/odbc.ini .
ln -s /etc/odbc.ini .odbc.ini
ln -s /etc/odbcinst.ini .odbcinst.ini

 Questions

If you have any questions, please connect with me.

Update

The .odbc.ini and .odbcinst.ini issues seems to be fixed with newer Python versions, so creating symlinks to users home directory nor creating system files under /etc are not anymore required. Just using .odbc.ini and .odbcinst.ini in user’s home directory works now as it is supposed to work.

How to set up SQuirreL SQL Client for Netezza

This article describes how to set up SQuirreL SQL Client for Netezza. There is not much free SQL query tools available for MacOS and Linux, but SQuirreL is an exception. It uses JDBC for connecting to Netezza, so first thing you need is Netezza JDBC driver. Netezza JDBC driver you can find  for example from latest Netezza Linux client, for example (it is inside npsclient.7.2.1.X-PX.tar.gz  as lib/nzjdbc3.jar).

Download and install SQuirreL SQL Client

  1. Download SQuirreL jar installer package from: http://www.squirrelsql.org/
  2. You need to have Java JDK installed to run and install SQuirreL. As of writing this HOWTO latest JDK can be downloaded from: http://www.oracle.com/technetwork/java/javase/downloads/jdk8-downloads-2133151.html
  3. Once the jar installer is downloaded, run it with: java -jar squirrel-sql-3.7.1-MACOSX-install.jar
  4. Accept the defaults and choose which plugins you need. In my example I have chosen plugins “Multi source”, “Data import”, “MySQL”, “Netezza”, “Oracle”, “PostgreSQL”, “Session Scripts”. “Smart Tools”, “SQL Parametrisation”, “SQL Replace” and “SQL Validator” .
  5. Click Next and Done and you have SQuirreL SQL Client installed on you system.

Configure SQuirreL SQL Client for Netezza

Before beginning configuring the driver, you need to have Netezza JDBC driver nzjdbc3.jar stored on you computer. Netezza JDBC driver can be found for example from Netezza Linux Client under the 32bit Linux tar archive.

Configure the Netezza driver

Once SQuirreL SQL Client is started, first thing to do, is to add Netezza JDBC driver to available drivers on SQuirrelL SQL CLient. In Netezza’s case, there is no Netezza driver on list of drivers, so we will add one.

  1. I stored Netezza JDBC driver under $HOME/Java, but you can choose a different location.
  2. Click Drivers tab and and then click  sign to add new driver.
  3. Highlight “Extra Class Path” tab and click “Add” to add the Netezza JDBC driver you have earlier stored to you computer.
  4. Set “Name” to “Netezza”, “Example URL” to “jdbc:netezza://<host>:5480/<dbname>”, “Websitete URL” to “http://ibm.com” and most importantly set “Class Name” to “org.netezza.Driver”. Once done, click “OK”.

Setup connection to Netezza database

Since SQuirreL SQL Client uses JDBC for connecting to database, it can in theory have only one connection per database. So you need to create one connection per database you are working on because of the JDBC limitation. There is commercial tools like RazorSQL which you can have only one database connection and still connect and query all databases with that same connection, but with SQuirreL SQL Client this is not the case.

  1. To set up your first Netezza database connection on SQuirreL SQL Client, select the “Aliases” tab and click  icon, to add new database connection.
  2. From “Driver pull down list select “Netezza”.
  3. Choose “Name” for you connection ie. the database name you are connecting to.
  4. URL is the form you set up when creating the driver: “jdbc:netezza://<host>:5480/<dbname>”. Set “host” and “dbname” accordingly.
  5. Now you can test the connection. Click “Test” and then “Connect”. If connection is unsuccesful, check the trace to find out the reason.
  6. Once alias is created, you can connect to database. Either double click the alaias or right click it and select connect.

Questions?

If you have any questions or feedback, please connect with me.