GenAI and Data Analytics

Introduction: Large language models (LLMs) are the heavyweights of generative AI (GenAI). They can handle a variety of tasks, and with the advent of multimodal AI, they are capable of generating images and interpreting images. Important aspects include:

  • Speed: GenAI has has made most administrative tasks and coding much faster

  • Education: GenAI can serve as a tutor for any topic, including data analysis and coding. The Khan Academy has Khanmigo that individualizes tutoring for K-12 students. There is no reason this could not be expanded to colleges, graduate, and professional schools.

  • Code generation and debugging

  • Explainable insights from text, images, audio and video

  • Generate synthetic data

  • Create dashboards and a variety of reports

  • Create data from images and videos

  • Content creation

  • Brainstorming

  • Graphic designs

  • Web scraping

  • Perform supervised and unsupervised learning

  • Anomaly detection

  • Perform descriptive, diagnostic, predictive and prescriptive analytics

Examples of AI analytical companies: Polymer, Tableau. Altair RapidMiner, Datalab, DataRobot, SproutSocial. Microsoft BI, Salesforce AI, Qlik, H2O.ai, Clarifai, and Dataiku

Future of AI Analytics:

  • Advanced simulations: AI can test thousands of simulations and is central to digital twins

  • Real-time problem detection: Leveraging the Internet of Things (IoT), edge computing, and live streaming, problems are discovered before humans realize a problem exists. We are moving towards this with ICU medicine.

  • Embedded Analytics: Seamlessly embedded models continuously monitor services and products involuntarily or autonomically

  • Prescriptive Analytics: This approach is advanced and rarely taken. With AI-recommended solutions, multiple options can be analyzed for improved outcomes.

AI Analytics with Tabular Data

Essentially, every modern (frontier) LLM is capable of analyzing data. Just upload a CSV or Excel file and give it some standard orders, such as "summarize this dataset," and most will do a good job. The most common programming language to analyze data with LLMs is Python. Most LLMs display the code for each step and they often give you the option to copy the code. Why? Because most are not capable of generating data visualization plots such as a box plot or scatter plot. The user has to copy the code and paste it into a programming notebook like Jupyter Notebook or Google Colab.

One of the major advances in 2025 was the appearance of agentic systems that consist of multiple agents that have memory, work semi-autonomously, and can access external tools. Examples of these systems include Manus AI and Genspark AI. My personal experience is that the agentic systems may perform better if you have multiple steps in a data analysis. Because LLMs do not have memory, they may lose their way with complex data analytics. To add to the confusion, in late 2025 we saw Gemini 3 and KIMI 2 appear and both have agentic qualities. We await testing of these two new models on complex datasets.