The Magic of AI can only be managed with robust Data Governance

The AI craze is in hyperdrive, and it shows no clear indication of dying down anytime soon. The question is, do we jump on the bandwagon or do we hold back for sanity to prevail?

You know the answer. We are not just jumping, but we are flying with whatever we have to see where we reach. The horizon has never been so bleak, and it doesn't matter which side is up and which is down, we are flying alright!

Dealing with AI with the Data on Hand

At Etherion, I have been actively looking into open-source data to clean, analyze, and develop Insights to share with the community. I have been eagerly looking to work with new tools in the market, AI-enabled and disabled, both kinds, and it has been a lot of fun. I learned that working with AI-integrated products is fun, but as a co-founder and part of the leadership team, I am mindful of the objective and goal that is set out, cause it is so easy to just get lost in the flow.

Speaking of which, my technical flow looks something like this -

Acquire data - Kaggle, data.org, and web scraping
Cleaning and Exploratory Analysis
Data Normalization and Modeling
Data Warehouse & Pipeline
Data Visualization
Documenting the Insights and Dashboarding

It is a simple, proven step-by-step process that keeps me interested in the project as well as allows me some leeway to involve people as required. It also allows me to try out new tools at each step. Maybe that is where the complication crept into the process.

My Cursor slipped onto Ollama, and the Rest is Magic.

While my process is modular and simple, I have been feeling quite adventurous lately, so I figured, why not just run the entire analytics through AI? It is the flavor of the times, and the tools are great and simple to jump in without any prior homework. The steps are simple -

Download Ollama & any model you want to try. Llama 3.2 is available for free
Set up the project in VSCode. I am using UV to set up the environment.
Load Libraries like Langchain and the subcomponents

A few tutorials and documentations later, I was able to feed the CSV files with rental data I picked up from Kaggle into Llama 3.2 and chat with it to explore the data.

I was able to get Min, Max, and Average rent before it started to throw errors with type. After a few normalization steps, I was able to get it working again, and it was quick to respond to my prompts as well. The surprising bit was how it struggled with the parts where the data was not in the correct or acceptable format. It raised a question -

Can a lack of Data Governance derail the performance of a well-trained model?

In my experience working with multiple clients on the data integration, migration and modernization projects the crux of the problem is rarely technical rather more process and business alignment related that causes major projects to fail.

DAMA-DMBOK2 and the need for professional data expertise

Yes, the pre-trained models are only as good as the data they are trained on, and can only work with good data for accurate insights. What is DAMA-DMBOK?

DAMA-DMBOK stands for the DAMA Guide to the Data Management Body of Knowledge. It is the de facto global standard for best practices in data management, developed by DAMA International (Data Management Association).

They do have a series of certifications that are helpful for Data Professionals to master the data management framework and work on large-scale data transformation projects in big organizations looking to take control of their data.

How does DAMA-DMBOK fit into the conversation with AI?

It's simple, isn't it? Any LLM trained on bad data will give out bad results, and any LLM working with bad data is going to hallucinate to varying degrees. Data sits at the crux of this advent of AI, and it requires professionals to manage it. It provides a structural clarity and a much-needed alignment within the organization when it comes to generation, access, use, and deprecation of data.

The Framework to enable AI

Data Governance provides a framework for organizations to assess, design, map, and quality assure the data to train LLM to enable the organization to be driven by AI.

DAMA has its version of a data governance framework that manages the flow of information as well as the structure of the organization to maintain data integrity. Here is the DAMA Wheel for reference -

In my project above, I know I haven't lost track of it, I saw that there is a need for a data quality framework to make sure that my interaction with the data continues to be efficient and accurate.

There are multiple data quality tools and frameworks to follow, but the quickest and simplest one yet is to integrate dbt in the pipeline. Oh, create a data pipeline so that Llama3.2 works with quality data.

I created a data pipeline to download the updated data from Kaggle and created DBT tests to make sure the data is in the required format and quality. The clean data then was exported to another csv and fed into Langchain which uses Chroma to create a vector database for quick processing with Llama3.2I do not want to go into technical bit, I will create another post with step by step details about integrating any dataset with local LLM and creating a AI Chatbot to interact with the dataset.

Where do we stand with Data Governance at Etherion Consulting LLP?

Every time I work with a new dataset, and more so now when I work with local LLMs for a much smoother workflow, I go back to the same question: How can I make sure the data is accurate and of good quality?

The answer invariably takes me back to implementing a robust Data Governance Framework with integrated Data Security, Privacy, and Compliance measures.

At Etherion, we are working on implementing a governance-first approach whenever we work with clients who are looking to integrate AI into their workflow. There is no quick fix, and unless we assess the organization's data for maturity and people's awareness regarding compliance, security, and privacy of data.

Next Steps

Etherion is working on creating a playbook on Implementing Data Governance within organizations that are looking to enable AI. A top-down perspective of introducing AI in the organization that feels more seamless than a forced approach implemented due to current market trends.

We want to empower our clients with the expertise that we have garnered by studying the frameworks proposed by DAMA and efficiently drive this AI revolution.

Catch you next time with another post. Subscribe to get notified.

Thanks for reading Bytes from Etherion! Subscribe for free to receive new posts and support my work.

The Magic of AI can only be managed with robust Data Governance

Dealing with AI with the Data on Hand

My Cursor slipped onto Ollama, and the Rest is Magic.

DAMA-DMBOK2 and the need for professional data expertise

The Framework to enable AI

Where do we stand with Data Governance at Etherion Consulting LLP?

Next Steps

Data Visualization Decoded: My Journey with Tableau and Power BI at Etherion Consulting

Web3 Analytics: How To Extract Actionable Insight from a Decentralized Network

A Definitive Guide to Getting Started with Insurance Analytics

Join From Ether members. Register for free, and upgrade when you love it.