Basic Principles of ConceptMiner

Here, we will explain the basic principles of ConceptMiner.

ConceptMier completely integrates qualitative information analysis and quantitative data analysis using mathematical methods, enabling flexible and precise analysis and the discovery of useful insights and ideas.

Traditional qualitative analysis methods that should be compared first include the KJ method and the Grounded Theory Approach (GTA). These methods focus on organising fragmented and scattered (chaotic) information expressed in natural language. Although there are various differences between the methods, if we focus on the fundamental principles, they all follow the same steps:

  1. Begin by classifying the fragments of information based on their similarities,
  2. extract the common characteristics within each classified group,
  3. and explain the relationships between groups and characteristics.

People try to conduct qualitative analysis as systematically as possible, but when it is done by human beings, there will always be a degree of ambiguity. Furthermore, the more we try to apply strict rules, the more rigid and inefficient our thinking becomes. In many cases, it may have been more efficient to use your own methods rather than faithfully following the procedures established by specific methods.

When considering quantitative analysis analogies, it is reasonable to assume that 1. corresponds to cluster analysis, 2. corresponds to profile analysis (such as multiple comparison tests), and 3. corresponds to graphical modelling (such as structural equation modelling or Bayesian networks). However, it goes without saying that, in the past, there was no way to directly link qualitative information to these quantitative methods. This is because qualitative information (i.e., natural language) was not a subject of calculation. However, the situation has changed significantly in recent years. This is because the widespread adoption of large language models (LLMs) has made it easier to obtain embeddings.

In LLM, words are represented as high-dimensional vectors called ‘embeddings’ (e.g., 1536-dimensional or 3072-dimensional). This allows texts with similar meanings to be represented as vectors that are close to each other, even if they use seemingly different words. By using embeddings (high-dimensional vectors), it is possible to convert the subtle nuances of information contained in qualitative data (natural language) into numerical values that can be precisely calculated.

This can also be seen as a visualisation of the enormous amount of information processing that humans perform subconsciously. The human brain is said to contain between 100 billion and 150 billion neurons, with each neuron having approximately 10,000 synapses. This means that the brain, as a physical entity, is capable of processing information in at least 10,000 dimensions. Unfortunately, however, our consciousness is unable to understand or explain this process. Human consciousness can only perceive up to three dimensions of space, and even with effort, it can only think in terms of a dimension number (comparison standard) of around seven plus alpha, known as the magic number. When the number of entities to be compared reaches hundreds or thousands, it becomes impossible to compare them strictly in the same way within consciousness.

In other words, neither the traditional KJ method nor GTA can explain the actual process, and all we can do is explain the results produced by lower-level information processing by retroactively reconciling them. The reason for the ambiguity of qualitative analysis is that the information processing taking place in the subconscious was concealed.

The ability to calculate qualitative information using vectors is significantly changing the traditional positioning and status of qualitative and quantitative analysis. For example, when placing product or service entities in a semantic space using embeddings obtained from their descriptions, this approach is by no means inferior to traditional quantitative methods such as PCA or correspondence analysis. Conventional quantitative methods are based on measurable numerical values or survey responses, and they merely provide accurate calculation results from the perspective of those attributes. Quantitative methods often overlook important perspectives or are difficult to measure.

ConceptMiner uses its own data mining technology to convert and visualize ultra-multidimensional embeddings (vectors) at a level that humans can understand. This allows us to summarize and express the information processing that humans perform subconsciously and that of artificial neural networks at a level that humans can understand. This allows us to finally truly understand what is being said. Furthermore, by using multidimensional vector spaces to interact with AI, we can guide the AI in its inference of new concepts, rather than being at the mercy of the AI.

Fragments of text information (entities) collected in qualitative research are converted into vectors and organized using data mining technology, allowing each entity to be placed in a semantic space and clustered. The dimensions of the vectors are reduced to a level that humans can interpret, and the meaning of each dimension is interpreted, making it possible to determine which clusters have high values for which dimensions. Furthermore, by assigning quantitative data such as numerical values or categorical values to each entity, it is possible to analyze the characteristics of each cluster from the perspective of quantitative data.