Pre-trained large language models (LLM) such as GPT-4 and Gemini have attracted much attention from organizations, who are eager to use LLM to build applications such as chatbots and co-pilots.Massachusetts Institute of TechnologyTechnology Reviewup to dateThe report, titled “CleaderThe AI Readiness of 2021,” a survey conducted on behalf of ETL vendor Fivetran, found that scaling AI or GenAI was the “top priority” for 82% executives surveyed.
Source: The image is generated by AI, and the image is authorized by Midjourney
The survey found that 83% organizations have identified technologies to use for AI or GenAI.dataHowever, there are questions about how prepared organizations are to actually connect to GenAI and provide data to GenAI applications when needed, in the appropriate format, cleaned, and well-prepared. All while ensuring that privacy or security is not compromised.
The report states that, on average, organizations require “more than a dozen different technologies to gather all the intelligence about their data, and an equal number of technologies to integrate, transform, and replicate the data,” which presents significant challenges. Acquiring better data integration and ETL/data pipeline tools is clearly an important task, as data integration and ETL tools developed in the past for centralized data warehouse projects may not be suitable for new GenAI use cases.
Additionally, the survey found that while 64% of respondents said data integration and ETL/pipeline tools were one of their top two GenAI investment priorities, 35% prioritized data lakes and 31% prioritized data transformation tools. Data catalogs and LLM investments accounted for only 7%, while vector databases and compute layers were in the middle. Organizations face many challenges in building a data foundation, including data integration and building data pipelines, data governance and security, and data quality.
The survey also found that organizations face challenges with data governance, compliance, and reporting. A large number of respondents cited preparing data for use in AI.maximumChallenges were data governance and security (cited by 44% respondents), and data integration or pipelines (cited by 45% respondents). However, a deeper dive into the survey data revealed a stark divide. In particular, the survey showed that positive concerns about security and governance were concentrated among government and financial services organizations, while concerns were not shared by the same proportion of technology executives in manufacturing, retail, and other industries.
"Organizations may not be able to control who is using the data in a business application and sending it to a generative AI model. These are important issues," the survey report quoted IDC's Bond as saying in the report. Building a strong data foundation is a prerequisite for the success of GenAI. If organizations do not build a solid data foundation first, their data scientists will waste time on basic data integration and cleansing work.