Data Aggregation Techniques and Challenges in the Internet of Things: A Comprehensive Review

— The Internet of Things (IoT) has revolutionized the way people interact with their environment, generating massive amounts of data from interconnected devices. With the exponential growth of IoT devices, efficient data aggregation techniques are essential for extracting meaningful insights and reducing network traffic. This review paper aims to provide a comprehensive overview of data aggregation techniques in IoT, focusing on their methodologies, advantages, and challenges. The paper begins by discussing the fundamentals of IoT data aggregation. It then categorizes the data aggregation techniques into two main approaches: centralized and distributed. For each approach, various algorithms and protocols are explored, including clustering-based aggregation, tree-based aggregation, and centralized-based aggregation. Furthermore, the paper investigates the trade-offs involved in data aggregation, such as energy consumption, latency, and data accuracy. It examines the impact of different factors, such as data heterogeneity, and security considerations, on the choice of aggregation technique. Furthermore, in this paper researcher discussed the existing techniques, while highlights the emerging trends and future directions in IoT data aggregation. In this review paper we concluded by summarizing the key findings and highlighting the challenges that need to be addressed in the field of IoT data aggregation.


INTRODUCTION
The contemporary technology environment now includes a new computational component called the Internet of Things (IoT).The internet of things includes integrated sensors and items.They can communicate with one another without needing to contact with humans.The "things" in the "internet of things" are actual physical items like sensors, data gatherers, and monitors of various kinds of data relating to machine and human social behaviour [1].The Internet of Things (IoT) encourages networking, data sharing, information collection, and integration of the different items that are present in our surroundings [2].IoT has becoming more and more prevalent in our daily lives [3].For example in smart home [4], smart devices are connected to external services, enabling relationship between Smart gadgets and outside services.Electronic medical systems [5] may employ wearable technology to track vital signs of the patient including blood pressure and the heart rate.Intelligent transportation frequently uses IoT.IoT refers to the networked connectivity of several heterogeneous systems [6].The applications layer, the cloud computing layer, the sensing layer, and the networking layer are the four tiers that make up the architecture of IoT-based systems [7]. Figure 1 illustrates the IoT architecture. Sensing Layer: Many IoT and sensor devices continuously scan their surroundings to collect data zone in this layer and deliver it to the sink [8,9].The monitoring region is covered by millions of IoT devices that have been placed to create a selforganized, multi-hop topology [10].It should be mentioned that some IoT devices are more likely to malfunction since they are located in a particular region [11].Moreover, some gadgets' energy runs out more quickly than others [12].Thus, it is crucial to use energy-efficient approaches for IoT data aggregation. Network Layer: Offering efficient topologies for data transfer between source devices and destination devices is the responsibility of the networking layer [13].While IoT topologies are designed to provide source devices high data transmission rates, these systems are constrained in terms of energy usage, throughput, and malicious attacks because of different topologies [14]. Cloud computing layer: The ability to manage massive amounts of data more rapidly and precisely has been made feasible by advancements in cloud computing technology [15].Data is received, processed, and decided upon by the cloud computing layer, which then sends the results to other levels [16].While edge and fog are chosen by other strategies to optimise costs and performance, cloud computing is the preferred option for IoT-based systems to store and analyse data instead of completely replacing the cloud, the key objective of adopting edge or fog architecture must be modified and included into the data collection process to handle scattered IoT devices, manage system heterogeneity, and separate critical data from generic data [17].In fact, the demand for large-scale data aggregation in IoT applications drives techniques away from cloud computing and towards fog or edge computing. Applications layer: This layer includes a variety of applications, including wireless sensor networks [18].The application layer (also known as Layer 4) is made up of n data elements, which together make up the IoT data.Clustering is feasible because each data unit can store up to 64 bits of data.Ways to decipher the data-carrying IoT communications' content formats.Consequently, it is now possible to increase the efficiency and speed of data aggregation performed across the IoT tiered architecture [19].
Apps for the Internet of Things are also utilised to keep track on emergency situations and environmental conditions.Applications for the Internet of Things are used to track environmental and disaster conditions.Figure 2 shows the application of IoT.In healthcare introduces new tools restored with the newest technology in the environment that helps in creating improved healthcare, ranging from personal fitness sensors to surgical robots [20]. Smart City: The smart metropolis is an emerging and promising application of the Internet of Things (IoT) that has captured considerable attention.One of the notable use cases of IoT in smart cities is intelligent surveillance, which enhances security measures.Additionally, IoT enables improve automated transportation systems, energy management systems, efficient water sharing, enhanced metropolitan security, and environmental monitoring.IoT has the potential to address significant urban challenges like pollution, traffic congestion, and energy scarcity.For instance, IoTenabled devices like Smart Belly garbage bins equipped with cellular communication capabilities can notify municipal services when they need to be emptied.Through online applications and citywide installed devices, residents can easily locate available parking spaces.Furthermore, these monitors can detect issues such as faulty installations, general system malfunctions, and meter fraud in the electrical system [21]. Connected Car: Vehicular digital technology has traditionally focused on optimizing the interior functions of cars.However, there is now a growing emphasis on enhancing the in-car experience.
Connected cars refer to vehicles that utilize internal devices and internet connectivity to enhance their operations, maintenance, and passenger convenience.Several established automakers and new entrants are actively developing connected vehicle solutions.Prominent companies such as Tesla, BMW, Apple, and Google are at the forefront of driving the next automotive revolution.The extensive network of connected vehicle technology encompasses various sensors, antennas, integrated systems, and other technologies that facilitate communication, enabling smooth navigation through our complex world.This technology plays a critical role in making prompt, accurate, and reliable decisions.As the world moves towards a future in which steering wheel control is relinquished, and autonomous or self-driving cars become more prevalent on the roads, the importance of these requirements will only intensify [22]. Smart Home: The Smart homes are now the new benchmark for victory in the domestic market, and it is predicted that they will soon be as common as mobile phones.When thinking about IoT systems, the most major and effective application that comes to mind is Smart Home, which rates as the top IOT application across all platforms.The sum of the funds given to startups in the smart home sector is expected to be $2.5 billion, and it is steadily increasing.Wouldn't it be great if you could turn the lights off even after you've left the house or put on the air conditioning before you got there.You may even temporarily admit visitors if you're not at home simply opening the doors.Don't be surprised that companies are creating IoT-related products to simplify and ease your life [23]. Smart farming: One IoT use case that is frequently disregarded is smart gardening.However, because farmers typically deal with a large number of distant agricultural operations and animals, the Internet of Things can watch all of this and change how farmers conduct their business.But this concept hasn't yet attracted much notice.Although it is still an IoT usage, it should not be dismissed particularly for nations involved in the production and trade of agricultural products, smart farming has emerged as a promising field with a wide range of potential applications [24]. Smart Retail: Retailers have been using Internet of Things (IoT) solutions and integrating IoT-enabled systems into different applications, which has improved the efficacy and efficiency of their shop operations., including boosting sales, lowering fraud, allowing inventory management, and improving the purchasing experience for customers.Physical stores can more effectively contend with online rivals thanks to IoT.They can draw customers to the shop and recover lost market share, Retailers' purchasing procedures are made easier as a result, allowing them to buy more things for less money.[25]. Smart Supply Chain: For a few years now, supply networks have been evolving to become more intelligent.Providing answers to issues like monitoring products while they are traveling or in transportation or assisting vendors in exchanging inventory data are some of the well-liked services.An IoT-enabled device allows factory equipment with integrated sensors transmits information about various factors like pressure, temperature, and machine usage.In order to improve performance, the IoT system can also handle processes and modify equipment settings [26].

C. Data Aggregation
Data aggregation is the process of gathering information from various Internet of Things devices and portraying it in a condensed manner.IoT heavily utilizes data aggregation methods to reduce traffic and energy usage.A much more straightforward method of data aggregation is for every source nodes to gather data from various sources and transmit it, without any kind of pre-processing, to a single destination execute the various data aggregation operations directly on the combined data using a single aggregator server [27].The goal of data aggregation methods on the IoT is to achieve high QoS, which includes taking into account the importance of the data and having low data transfer delays, high dependability, and minimal energy usage [28].The following are generally some benefits that data aggregation techniques provide:  It contributes to enhancing the usefulness and accuracy of the data presented  accomplished through a vast network [29].
 Additionally, it lessens traffic volume and conserves node energy.[29]. The data collected from nodes contains specific duplication, so this procedure is essential to reduce the extra information [29]. Traffic volume: Through multi-hop transfer, a large number of IoT devices transmit their data-to-data repositories.When adjacent nodes decline with high traffic loads, this type of communication behaviour causes an unbalance in the network's traffic burden.
Monitoring traffic volumes can aid in the creation of more effective routing formulas and the advancement of node distribution [30].

D. Data Aggregation Mechanism on IoT
In IoT networks, data aggregation algorithms are used to collect and summarise data from many sources in order to increase network efficiency [31].These strategies include protocols for clustering, compression, and encryption.These methods may be applied to lessen network traffic, save energy, lengthen network life, and enhance security.Client-Server-based and mobile agent-based data aggregation strategies for IoT are detailed in separate categories.IoT devices provide data sent in a multi-hop fashion to the sink using client-server-based data aggregation techniques, where certain intermediary devices can carry out aggregation procedures [32].Different components of client-server-based methods are studied: cluster-based, tree-based, In-network, Chain-Based, Grid-based, and centralized ones.Individual software packets consecutively visit IoT devices to gather their sensed data in mobile agent-based data aggregation processes.These packets then collect and send the data to the sink location.[33].

E. Client-Server-based Data Aggregation Mechanisms
A central server is used in client-server-based data aggregation methods to gather and combine data from several clients [34].An alternative to the client-server paradigm that has been suggested as a more practical option is mobile agent-based data aggregation [35].An IoT device has enough memory in client-server systems to retain the data it senses and packets it receives from other devices.Before sending the last packet to the next destination, it performs the aggregation function on the data that has gathered in its memory [36].Client-server-based data aggregation mechanisms can help improve the efficiency of data collection and transmission in IoT networks.By aggregating data at the client side before sending it to the server, the amount of traffic injected into the network can be reduced, which can help to alleviate network congestion and reduce energy consumption.This can ultimately improve the overall lifetime of IoT devices by conserving their battery life.Additionally, aggregating data on the server side can also help to reduce the amount of data that needs to be stored and processed, which can further improve network efficiency [37].

F. Cluster-based Data Aggregation Mechanisms
A Wireless Sensor Network uses cluster-based data aggregation algorithms to aggregate data from many sources (WSN).These processes entail segmenting the network into clusters, with each cluster having a cluster head node in charge of collecting and aggregating information from the cluster's other nodes.The information is subsequently transmitted to a base station or central node, where it may be used for additional applications and analysis.Routing protocols are frequently used by cluster-based data aggregation methods to enable effective communication between the nodes and the cluster head.The fuzzy similarity matrix-based clustering, the tree-based, the beta-dominating set centred-cluster-based data aggregation mechanism (DSC2DAM), are some typical examples of cluster-based data aggregation techniques [38] [39].And the cooperative information aggregation (CIA) mechanisms [40] By sending fewer messages to the central node, these techniques help a WSN be more fault-tolerant and scalable while also consuming less energy.

G. Tree-based Data Aggregation Mechanisms
Tree-based data aggregation mechanisms are a type of data aggregation technique used in the IoT.According to this technique [41], a tree structure is built in the network, with each node representing a sensor node.The nodes then interact with one another and transmit the data they have gathered to the tree's base.Then, after being aggregated, this information may be used for a variety of tasks, including analysis and decision-making.While it can assist decrease data redundancy and the amount of data that has to be carried, tree-based data aggregation is beneficial for networks with a lot of nodes.Also, since data is only sent to the tree's root, it can aid in lowering network power usage.

H. Centralized Data Aggregation Mechanisms
In centralized-based aggregation mechanisms, all IoT devices transfer their detected data through the shortest route to the most powerful device (also known as the header device).The header device processes all received data through the aggregation function before sending a single packet to the sink [44] [45].Improved data integrity, decreased data redundancy, lower costs, insightful data, and the capacity to lower the volume of data created in Wireless Sensor Networks are benefits of centralized data aggregation techniques [46].When there are fewer nodes in the network, centralized data aggregation techniques perform better [38].

PROCEEDINGS OF THE RICE. HYDERABAD, 2023
The architecture of the data aggregation is shown in the MAs instantaneously acquire data from SNs, saving energy and bandwidth [47].In systems that aggregate data using a single agent, a mobile agent is sent across the network by the data delivery sink get information from all devices, then returns it.It lengthens the life of the IoT system, reduces device energy consumption, and enhances network data flow.The use of mobile agents for data aggregation in the context of the Internet of Things (IoT) is not without its limitations though [48].
 When the mobile agent visits several devices one at a time, long delays are necessary. Since the number of mobile agents' packets grows over the course of the data collecting procedure, IoT devices near to the washbasin will require more energy.

II. RELATED WORK
A means of offering security and effectiveness in the data aggregation has been proposed by the writers in [49].This approach aims to create a precise, safe data compilation a way that integrates the IoT network's transmission and processing constraints while taking security considerations into account.This method guarantees security, but its restriction is the heavy traffic burden.
According to scalability, security, reliability, and flexibility, the authors of [50] have suggested an internet of things storage system that is scalable and secure.This system will enable users to fulfil the requirements for data mining and analytics with massively aggregated data.A revised secret sharing scheme is the foundation of design in order to accomplish the security of data without the need for complicated key management.
The author [51] have suggested a tree-based technique for data aggregation that is both delay-aware and energyefficient.By offering energy-efficient routing routes in various monitoring regions, the suggested technique reduces data transfer delay.In an effort to minimize the end-to-end latency, it also judiciously chooses an immediate forwarding method for transmission of delay-sensitive data.In order to decrease overall energy usage, a wait-forwarding one is also used for data transfer with a delay tolerance.In order to balance the energy consumption of IoT devices, the suggested approach comprises creating routing channels in regions with high residual energy.It is crucial to remember that the generated routes are often changed to preserve this equilibrium.The recommended method has been shown through performance research to minimise network energy consumption and transfer delays for data that is both delaysensitive and delay-tolerant.The IoT ecosystem's lifetime is enhanced as a result.
To increase network lifespan and reduce data transfer latency, Li et al. [52] have suggested a tree-based data consolidation method on diverse and dynamic IoT.To extend the life of nearby devices, it adaptively modifies the data transfer latencies of those devices.According to experimental findings, the suggested technique performs well when its degree of variability and volatility is greater.
Lu et al. [53] suggested a simple privacy-preserving method for boosting fog computing security in IoT.Data from multiple devices will be safely integrated using this technique.It makes use of a centralised process that combines composite data while rejecting any erroneous data input into the system, using the Chinese Remainder Theorem, one-way hash chains, and homomorphic Parlier encryption.The results of the security study show that this simple solution is reliable and efficient in a number of private circumstances.
The distance, node degree, and leftover energy were used by the researcher of [54] to determine the CH.The ideal amount of CHs to encompass the full sensing region, however, was not guaranteed.
This paper provides a comprehensive overview of the security challenges associated with healthcare data aggregation and transmission in the Internet of Things (IoT).The paper begins by discussing the importance of healthcare data security and the challenges that are posed by the IoT.The authors then present a taxonomy of security threats to healthcare data in the IoT, and discuss the security mechanisms that can be used to mitigate these threats.The paper concludes by discussing open research challenges in healthcare data security in the IoT [55].
The goal of this study is to classify the three main types of data aggregation processes used in the Internet of Things (IoT): centralized, cluster-based, and tree-based.The study offers recommendations for further research by doing an extensive comparison of the essential operations within each category.To assess the various methods, the evaluation compares them based on tolerance, latency, heterogeneity, network durability, scalability, security, and traffic volume.The results show areas that require more study to fix the found issues and improve the effectiveness of data aggregation in IoT [56].
The paper proposes a new data aggregation and routing algorithm called the Energy-Efficient Data Aggregation and Routing (EDAAR) algorithm.The EDAAR algorithm works by first aggregating data from neighboring sensor nodes.
The EDAAR algorithm was able to reduce the amount of data that needed to be transmitted by up to 75% [57].
The author proposes an effective flow aggregation method based on the SDN architecture for delay-insensitive traffic management.The situation of numerous tiny delayinsensitive traffic patterns is the main topic of the research.In order to combine and reduce traffic flows according to the flow magnitude and to be flexible to changes in network circumstances, the writers developed a novel data structure called a flow tree.This method lowers the price of storing in switches' memory as well as the expense of communication between the supervisor and OpenFlow switches [58].
In this paper the author compared various used mechanism (from 2016 to 2020) that illustrate efficient data aggregation mechanism on IoT to enhance security, privacy and minimize energy and computational resource.After that he suggested for future research Heterogeneity of IOT device, precisely in dynamic monitoring areas [59].
The goal of data aggregation strategies, which is to efficiently gather and merge data packets with the aim of lowering power consumption, reducing traffic congestion, extending the lifespan of the network while improving data accuracy, is the primary focus of the research paper "Comparison of Data Aggregation Techniques in Internet of Things (IoT)".The author looks at the number of transfers necessary for data capture, resulting in less network traffic, delay, and power consumption.This strategy also lengthens network lifespan and improves data precision [60].
On supporting IoT data aggregation through programmable data planes.The goal of this paper was to reduce the number of repeating packet headers by assembling packet data from several sources.The author finds that IoT improves network efficiency by 78%, according to research, and it also gives users control over the average delay caused by data aggregation techniques [61].
Table 1 An overview of data aggregation mechanism and their performance  Clustering-based, tree-based, and centralized-based aggregation algorithms have been examined, each with its own advantages and challenges.This paper also highlighted the challenges and open research problems in the area of data aggregation such as energy efficiency, scalability, and security.Finally, it concluded with some insights and future directions for research in this field.

Fig 2 :
Fig 2: Application of IoT B. Application of Internet of Things (IoT)  Connected Health: IoT has many uses in the healthcare industry, including advanced & smart sensors, equipment integration, and remote tracking tools.It has the capacity to get better.How doctors maintain their patients' wellbeing and provide treatment.Patient spending may increase thanks to IoT in healthcare.Spending time talking to their physicians can increase patient engagement and happiness.In healthcare introduces new tools restored with the newest technology in the environment that helps in creating improved healthcare, ranging from personal fitness sensors to surgical robots [20]. Smart City: The smart metropolis is an emerging and promising application of the Internet of Things (IoT) that has captured considerable attention.One of the notable use cases of IoT in smart cities is intelligent surveillance, which enhances security measures.Additionally, IoT enables improve automated transportation systems, energy management systems, efficient water sharing,

Fig 6 :
Fig 6: Tree-based Data Aggregation A Tree-based Data Aggregation Mechanism [42], an Aggregation Tree Based Data Aggregation Algorithm [38], and Tree-based Data Aggregation Algorithms in Wireless Sensor Networks [43] are a few examples of tree-based data aggregation mechanisms.Figure 6 illustrates the design of IoT's tree-based data aggregation techniques.

Fig 7 :
Fig 7: Centralized-based Data Aggregation few years, a lot of work has been done on the techniques of data aggregation, most of which have been seen to reduce energy consumption.
There are still many areas on which very little work has been done like delay, heterogeneity and scalability, can be clearly seen in the above table.It is visible in this table that how much the areas have been explored.This review paper has touched upon privacy concerns in data aggregation.As IoT applications expand into sensitive domains such as healthcare and smart cities, preserving data confidentiality becomes paramount.Future research should focus on developing robust privacypreserving aggregation techniques that ensure data security while still enabling meaningful analysis.The potential for integrating edge computing and fog computing concepts with data aggregation in the Internet of Things is enormous.By leveraging the computational capabilities of edge devices and utilizing localized aggregation, the burden on centralized cloud infrastructure can be reduced, leading to lower latency and improved scalability.Despite the challenges, data aggregation is an important technique for IoT.It can improve energy efficiency, network lifetime, scalability, and security.Researchers are working on new data aggregation techniques that can address the challenges and make data aggregation more efficient and secure.IV.CONCLUSIONThis review paper has offered a thorough rundown of data aggregation methods in the context of the Internet of Things (IoT).The analysis of various methodologies and algorithms has shed light on the importance of efficient data aggregation in managing the vast volume of data generated by interconnected IoT devices.The categorization of data aggregation techniques into centralized and distributed approaches has allowed for a deeper understanding of the different strategies employed in aggregating IoT data.
III. DISCUSSION& INSIGHTSThe discussion surrounding IoT data aggregation techniques opens up several avenues for further research.One key aspect that warrants attention is the scalability and adaptability of aggregation techniques in dynamic IoT environments.As the number of IoT devices continues to grow, it becomes crucial to develop aggregation methods that can handle large-scale data while maintaining efficiency and responsiveness.Another area for future exploration is the optimization of energy consumption in data aggregation.IoT devices often operate on limited battery power, and energy-efficient aggregation techniques can significantly prolong their operational lifetimes.Investigating innovative approaches that reduce energy consumption without compromising data accuracy is essential for sustainable IoT deployments.