GenAI and Data Analytics

Introduction: Large language models (LLMs) are the heavyweights of generative AI (GenAI). They can handle a variety of tasks, and with the advent of multimodal AI, they are capable of generating images and interpreting images. Important aspects include:

Speed: GenAI has has made most administrative tasks and coding much faster
Education: GenAI can serve as a tutor for any topic, including data analysis and coding. The Khan Academy has Khanmigo that individualizes tutoring for K-12 students. There is no reason this could not be expanded to colleges, graduate, and professional schools.
Code generation and debugging
Explainable insights from text, images, audio and video
Generate synthetic data
Create dashboards and a variety of reports
Create data from images and videos
Content creation
Brainstorming
Graphic designs
Web scraping
Perform supervised and unsupervised learning
Anomaly detection
Perform descriptive, diagnostic, predictive and prescriptive analytics

Examples of AI analytical companies: Polymer, Tableau. Altair RapidMiner, Datalab, DataRobot, SproutSocial. Microsoft BI, Salesforce AI, Qlik, H2O.ai, Clarifai, and Dataiku

Future of AI Analytics:

Advanced simulations: AI can test thousands of simulations and is central to digital twins
Real-time problem detection: Leveraging the Internet of Things (IoT), edge computing, and live streaming, problems are discovered before humans realize a problem exists. We are moving towards this with ICU medicine.
Embedded Analytics: Seamlessly embedded models continuously monitor services and products involuntarily or autonomically
Prescriptive Analytics: This approach is advanced and rarely taken. With AI-recommended solutions, multiple options can be analyzed for improved outcomes.

AI Analytics with Tabular Data

Essentially, every modern (frontier) LLM is capable of analyzing data. Just upload a CSV or Excel file and give it some standard orders, such as "summarize this dataset," and most will do a good job. The most common programming language to analyze data with LLMs is Python. Most LLMs display the code for each step and they often give you the option to copy the code. Why? Because most are not capable of generating data visualization plots such as a box plot or scatter plot. The user has to copy the code and paste it into a programming notebook like Jupyter Notebook or Google Colab.

One of the major advances in 2025 was the appearance of agentic systems that consist of multiple agents that have memory, work semi-autonomously, and can access external tools. Examples of these systems include Manus AI and Genspark AI. My personal experience is that the agentic systems may perform better if you have multiple steps in a data analysis. Because LLMs do not have memory, they may lose their way with complex data analytics. To add to the confusion, in late 2025 we saw Gemini 3 and KIMI 2 appear and both have agentic qualities. We await testing of these two new models on complex datasets.

GenAI and Data Analytics

Insights