Disclaimer: This dissertation has been written by a student and is not an example of our professional work, which you can see examples of here.

Any opinions, findings, conclusions, or recommendations expressed in this dissertation are those of the authors and do not necessarily reflect the views of UKDiss.com.

Determining Potability of Ground Water Samples

Info: 22842 words (91 pages) Dissertation
Published: 13th Dec 2019

Reference this

Tagged: Environmental Science

CHAPTER 1

INTRODUCTION

The internet has drastically changed the way we work, spend our leisure time and communicate with one another. The number of users of Internet has drastically increased over the last two and a half decades. It is estimated that the number of internet users worldwide is 3.2 billion [1]. During the initial birth of the World Wide Web (WWW) the users would access static web pages and documents. These documents would typically be in the form of Hypertext Markup Language (HTML). But the internet these days supports all kinds of dynamic media such as audio, video, mashups, etc.

Technological advancements in the field of computing and growth of Internet has resulted in generation of tremendous amounts of data and media. Video in particular has become an important component of the Internet user content. Video is being used as the most common way of sharing a user’s thoughts and expressions. Streaming of media such as video has experienced rapid growth over the past few years and it is expected to increase and become more significant as the internet and broadband technologies continue to improve. Today, there is a transition of streaming media to Over-The-Top (OTT) streaming. OTT streaming refers to the streaming of audio, video and other media via Internet without the involvement of satellite based direct-broadcast television systems or multiple cable operator. In OTT streaming the content is delivered through the internet from the media generation infrastructure to the client without involvement of any satellite based communication systems.

Live streaming technology is very widely used these days for broadcast of events like sports, concerts, campaigns, etc. OTT based live streaming allows users to view the events in real time and also to virtually participate in them. In OTT based streaming users can use a wide

variety of devices such as mobile phones, tablets, laptops, personal computers, etc to access the live.

To determine potability of ground water samples and its prediction has vast scope in today’s market. It can be used by the government to keep track of various viable ground water sources in the state [1]. It can be used by the pollution controls board to keep track of the levels of pollution in bore wells that contains water that is not fit for consumption. Hence the scope of usage of this project sums to be limitless. Ground water is ever changing and its quality can  never be accurately predicted. Most environment phenomenon is cyclic in nature. Various conventional methods for prediction and forecasting of water quality are gray system theory, neural network, regression etc. The Linear Regression and Naïve Bayes Classification are used predict and forecast the water quality potability [2].

A geographic information system (GIS) is a system designed to capture, store, manipulate, analyze, manage, and present all types of spatial or geographical data. The acronym GIS is sometimes used for geographical information science or geospatial information studies to refer to the academic discipline or career of working with geographic information systems and is a large domain within the broader academic discipline of Geoinformatics. What goes beyond a GIS is a spatial data infrastructure, a concept that has no such restrictive boundaries. In a general sense, the term describes any information system that integrate, store, edit, analyze, share and display geographic information. GIS applications are tools that allow users to create interactive queries (user-created searches), analyze spatial information, edit data in maps, and present the results of all these operations. Geographic information science is the science underlying geographic concepts, applications, and systems.

1.1           State of art development 

In this section, prediction and forecasting of ground water quality, challenges in prediction and forecasting of ground water in wells have been discussed.

1.1.1 Need for prediction and forecasting

 

Ground water is ever changing and its quality can never be predicted accurately. Most environment phenomenon is cyclic in nature. These cycles usually have a long time period and are hard to recognize. To determine the potability of ground water is very difficult task. It is significant issue to predict water quality which increases economic efficiency as a result. Prediction of water quality has been complicated issue due to complexity and diversity. The data affects the quality of water and accuracy to forecast. Meantime, the models that are constructed on predicted accuracy in monitoring the data is very hard.

Balakrishnan, Saleem and Mallikarjun [3] studied the Spatial variations  in  ground  water quality in the corporation area of Gulbarga City located in the northern part of Karnataka State, India, have been studied using geographic information system (GIS) technique

Najafabadi [4] made use WATMAPGIS (creating of the groundWATer MAPs using Geographical Information System) software, which was prepared by in order to create groundwater observation maps on environment of ArcInfo GIS software. Groundwater monitoring results can be mapped and made questionable rapidly, thus, fluctuations of groundwater on a monthly basis can be rapidly displayed.

Fangjie Cong , Yan Fang Dieo [5] developed mix structure with Client/Server (C/S) and Browser/Server (B/S) is adopted and the system is separated into eight modules: groundwater exploitation management, groundwater permission management, groundwater record management, groundwater level monitor management, groundwater quality monitor management, groundwater payment management, official document management and information publishing management. Two of the developed modules, ground water level monitoring and quality monitoring, are useful to represent the values of the parameters and the ground water level in spatial form.

Hong Mei, Zhang [6] calculated the relative risk map of groundwater pollution that was estimated through a procedure that identifies, cell by cell, the values of three factors, that is inherent vulnerability, load risk of pollution source, and contamination hazard. Early warning model of groundwater pollution combined with Component Object Model Geographic Information System(COMGIS) technology organically, the regional groundwater pollution early- warning information system was developed, and applied it into Qiqiha’r, China.

Zhenghua Yang, Yan [7] combines the Ordinary Kriging Interpolation method with the techniques from geographic information systems to analyze the dynamic changes of the groundwater depth and the groundwater level in Minqin Oasis in Januarys between 1998 and 2010. The results show that: (1) the groundwater depth is greater in the south while it is less in the north. It has declined year by year and the annual mean decline rate was 0.55m/a. (2) The groundwater level is higher in the southwest whereas lower in the northeast and it has been continuously decreasing year by year..

Mei, Song, Lijie [8] worked on the theory of GIS for the Evaluation and Management of groundwater and a network of hydro-geological information systems has been developed. Model base management subsystem is designed, in which parameter calculation model, mining prediction model, and groundwater resources management model are included. Groundwater resource management model is build and applied in practice. Groundwater resources management model offers support for optimizing management and rational use of water  resource.

Sonam Pal, Rakesh Sharma [9] proposed an intelligent system to support making decisions for establishment of new resources on any geographical area. Using GIS the whole information of any geographical area can be obtained. The GIS and Artificial Intelligence (AI) techniques are used here with spatial database.

K. A. Mogaji, H. S. Lim & K. Abdullah [10] developed a groundwater vulnerability prediction modeling, based on geographic information system-based ordered weighted average (OWA)-DRASTIC approach. This was investigated in southern part of Perak, Malaysia. The results obtained confirmed that the methodology hold significant potential to support the complexity of decision making in evaluating groundwater pollution potential mapping in the area.

Efren L. Linan, Victor B. Ella, Leonardo M. Florence [11] conducted to assess the vulnerability of groundwater resource to contamination with the application of engineering technologies   such   as   the   DRASTIC   model   in   combination   with   Quantum Geographic Information System (QGIS) in Boracay Island, Aklan, Philippines. The study exhibited the combined use of the DRASTIC model and Quantum GIS as an effective method for groundwater contamination vulnerability assessment.

Muzafar N. Teli1, Nisar A. Kuchhay [12] studied water samples that were collected from 92 locations. The water samples were analyzed for physico-chemical parameters like pH, Total hardness, Ca, Mg, Fe, F, SO4, NO3, K, Cl and Na using standard techniques in the laboratory by the Department of Public Health Engineering, Srinagar (PHE) and Central Groundwater Board Jammu (CGWB) and compared with the WHO standards. The results obtained in the study and the spatial database established in GIS will be helpful for monitoring and managing ground water pollution in the study area.

Valmik and Meshram [13] proposed Bayesian model for prediction of water level, since Bayesian prediction model has ability to learn new classes. The accuracy grows with  the increase of learning data. The issues of proposed approach is that if the predictor category is absent in the training data, the Bayesian algorithm accepts that a new record with that category has zero probability. According to this paper, Bayesian model for water level prediction provides good accuracy. The parameters considered are station level pressure, mean sea level pressure, temperature, and rainfall. Some parameters are ignored which are less relevant  features in the dataset for model computation.

Kannan, Prabhakaran and Rramachandran [14] computed Pearson coefficient for five years data and then compared with predicted data using regression approach. Here, the prediction of water level and rainfall is by using multiple linear regression method. The predicted values lied in the range of 80% as compared to the computed values. According to the results, it does  not show high accuracy but show an approximate value from the computed model.

Changjun and Qinghua [15] employed a grey clustering model to measure water quality. The evaluation results were based on assessment of quality of water of twenty divisions in Suzhou River and was equated with that of conventional framework, the performance of proposed  grey  clustering  model  is  practically  feasible.  Its  application  is  quite  simple.  The

proposed algorithm which can defeats the drawbacks of single factors can contemplate the  quality of water at present.

James, Bavy and Tharam [16] employed Improved Naive Baye’s Classifier (INBC) technique which digs into the genetic algorithms (GAs) for selection of a subset of input features in classification problems. According to the performance, two schemes is established, scheme I considers all basic input variables for water quality prediction and scheme II considers the optimum subset of input variables to be chosen. According to the results predicted INBC achieved 90% accuracy rate on the potable and not-potable classification problems.

Vanessa L Lougheed etal, [17] conducted a study to examine the water quality based on the relationship among the different salts like chlorine, fluorine, chlorophyll, nitrate and pH. For the determination of water quality, linear regression was used and for the deriving the relationship, Pearson correlation co-efficient was used. The results obtained showed that the relationship obtained among the parameters were accurate but the results of the linear regression were found to be approximate and not highly accurate.

Wang, Hui-hua sheng [18] proposed General regression neural network (GRNN) algorithm for annual quality prediction in Zhengzhou. The results of the proposed model have better benefits in fitting and prediction. Accuracy predicted by GRNN model is better than BP. The stepwise regression algorithm is deficient to BP and GRNN models in accuracy of simulation and prediction results. The drawback of this method is that it cannot be efficiently applied for multivariate analysis.

Wen-jei Wu,Yan Xu [19] discussed Pearson’s correlation coefficient, which can reflect  the correlative information between two variables, is adopted to analyze the correlation between the functions of parameters. The result is that the related variables take on positive correlations with the main functions of the parameters, though with certain differences in the degree of the correlation.

Sinin, Nannan etal., [20] proposed back propagation (BP) and radial basis function (RBF) for evaluation of water quality in Fuyang River in handan city. The results of evaluation of water quality grade are III. RBF is more super ordinate to Back Propagation neural network training process. Back Propagation NN falls into local minimum. RBF with Guassian function is used for evaluating water quality in Fuyang River in handan city. The traditional methods usually need to determine the connection weights artificially. So this seriously impacts the reliability of evaluation models.

Dingsheng and Yufeng [21] employed the annual average water quality prediction model based on multi linear Regression with the combination of stepwise discriminant framework and applied Bayesian statistical framework to improve accuracy of predicted model, but the overall performance can still be further improved. The proposed method provides satisfactory experimental results based on predicted accuracy.

Scott, Amy and Ming [22] developed a technique making use of continuous Bayesian Networks for prediction of water quality potability. This network trains on subset of data as well as subset of the attributes in data. This model generated for uninterrupted domains and these are applied to discover significant parameters in dataset and discover relationships among the parameters. They apply linear Gaussian distributions between those ensembles, allowing  effective network level inference. By this approach, they are capable to constitute nonlinear relationships.

Ramakrishnaiah, Sadashivaiah [23] developed a model that is aimed at assessing  the water quality index (WQI) for the groundwater of Tumkur taluk. This has been determined by collecting groundwater samples and subjecting the samples to a comprehensive physicochemical analysis. For calculating the WQI, the following 12 parameters have been considered: pH, total hardness, calcium, magnesium, bicarbonate, chloride, nitrate, sulphate, total dissolved solids, iron, manganese and fluorides. The WQI for these samples ranges from 89.21 to 660.56. The results of analyses have been used to suggest models for predicting water quality. The analysis

reveals that the groundwater of the area needs some degree of treatment before consumption, and it also needs to be protected from the perils of contamination.

Ruiz-Luna, Berlanga-Robles [24] developed a model which analysed that along with  salts, the changes in maroons and agriculture also have effect on the water quality in the area of study using linear regression. Also a thematic map for each sub-image was produced using supervised classification with the Extraction and Classification of homogeneous objects algorithm (ECHO). Aerial photography and field verification of testing points were used to validate the classification and to assess its accuracy using the Kappa co-efficient.

Qiang Zhang,Chong-yu Xu,Jiang [25] concluded that annual maximum stream flow and annual maximum water level and their variations exert most serious influences on  human society. In this paper, temporal trends and frequency changes at three major stations of Yangtze River, i.e. Yichang, Hankou and Datong representing upper, middle and lower reaches, respectively, detected with the help of parametric t-test, Mann–Kendall (MK) analysis, linear regression and wavelet transform methods. Linear regression was developed to predict the water level and maximum stream flow. The results proved that water stream flow affects the water levels in the River under study, thus making it difficult for prediction of water level.

Wengao, Nakai [26] developed two models for the prediction of water quality. They are artificial neural networks and Linear Regression. This study proves that Linear Regression method is more accurate and produces accurate results with a root mean square values of prediction were 0.144 and 0.949, 0.232 and 0.868 and 0.234 and 0.815 for certain parameters in consideration. The superiority of the Linear Regression approach is due to high prediction accuracy and the ability to compute the quality of water considering multiple parameters.

Shoba, Shobha [27] researched on data mining techniques. Data mining methods may be classified by the function they perform or by their class of applications. Using this approach, four major categories of processing algorithms and rule approaches emerge: 1) Classification, 2) Association, 3) Sequence and 4) Cluster. This paper explores various data mining techniques like Artificial  Neural   Network,  Back   propagation,  MLP,   GRNN,   Decision  Tree   etc.  used  in

prediction of water quality. This survey focuses on how data mining techniques are used for water quality Prediction.

The above methods indicate various machine learning techniques used for forecasting of ground water quality potability and ground water levels. These techniques did not consider the interdependencies among the parameters. So for the proposed system, a multivariate model is developed using the techniques of Linear Regression improved with Pearson co-efficient for forecasting water quality parameters and the water levels for the upcoming year and for classification of water sample, Naïve Baye’s classifier method is used.

1.2           Motivation

Groundwater plays a vital part in the economic system, especially the farmers of the Chikballapur district. Chikballapur is also called land of Silk and Milk. Here agriculture is based on irrigation facility by tanks constructed earlier. Due to drought situations, farmers depend on bore wells for irrigation needs. The water in this area is contaminated with a lot of impurities making it unfit for domestic use. Hence, motivation behind this project was partly assistance to the people of the district and partly interest to correlate our knowledge of Computer Science to a social cause that would help in the betterment of the society.

1.3           Problem Statement

 

Media streaming architectures typically have high amounts of end-to-end delay. This end-to-end delay in the media streaming architectures can be extremely undesirable to the viewers when it comes to live streaming of events such as sports events. The end-to-end delay should be minimal in OTT streaming compared to direct-broadcast satellite based streaming.

The end-to-end delay in media streaming architectures is typically due to the three hops in architectures. The first hop is the path generally between the encoder at the customer premises and the entry point of the media streaming architecture. The middle hop the path the media stream travels through the different components of the architecture. The last hop is the path between the delivery servers to the client device. To reduce the latency that arises from these three hops we focus on accelerating the media streams in these three hops. The first-mile acceleration, middle-mile acceleration and last-mile acceleration correspond to the first hop, middle hop and last hop respectively. The middle-mile latency can be reduced by utilizing techniques such as content prefetching, caching, compression. Middle-mile latency can be drastically reduced by enhancing the protocols used in the middle-mile. The last-mile latency is becoming significantly becoming less concerning as many users these days are adopting high speed broadband connections.

The first-mile latency that exists in the path between the origin infrastructure and the entry point of the media streaming architecture contributes a significant amount to the end-to-end latency. The first-mile latency is due to the limiting bottlenecks of the underlying protocols used such as Transmission Control Protocol (TCP).

1.4           Objectives

 

The primary objective of this project is to improve the first-mile acceleration in media streaming. Some of the other objectives are:

  • To reduce end-to-end delay incurred in existing media streaming architecture.
  • To increase first-mile and middle-mile acceleration of media streams.
  • To transport media streams in shorter segments.
  • Support HTTP based OTT (Over-the-top) streaming.
  • Protocol optimization in the path between encoder and ingest servers.
  • Benchmarking performance using QUIC and without QUIC

1.5           Scope

 

The system aims to determine the potability of ground water samples which has vast scope in today’s market. The potability classification results can also be used to take necessary actions for improvement of water quality. Further, this model can be used as a basic model and extensive optimization approaches can be applied to it for better accuracy in near future. This system can be used by the Government to keep track of the various viable groundwater sources  in the state. It can be used by the pollution controls boards to keep track of levels of pollution in bore wells that contain water that is not fit for consumption. Hence the usage of this project seems to be limitless.

1.6           Methodology

The existing TCP based transport in the first-mile is replaced by a UDP based transport protocol implement by Google called Quick UDP Internet Connections (QUIC). This protocol reduces the latency compared to TCP. The media streams captured using the hardware devices (Camera, Microphone, etc) are encoded using an encoder. The encoder compresses the streams, encodes into formats and packages them into containers. The encoder simulator used here is FFmpeg. The streams are then passed into a Gstreamer pipeline. A QUIC server running on the encoder system transports the streams from the pipeline to the ingest server via a plugin. The plugin connects the Gstreamer pipeline to the QUIC server.

A QUIC client running on the ingest server which is the entry point to the media streaming architecture accepts the streams from server. The streams are transported from the encoder to the ingest server using QUIC protocol in the form of QUIC packets. The streams at the ingest server are transported to the mid-tier architecture for delivery.

1.7           Organization of the Report

 

This section gives a broad picture of the various chapters in the report.

  • Chapter 2 is the overview of the project or thesis domain which describes the details of the domain of the project carried out.
  • Chapter 3 is Software Requirement Specification which describes the user characteristics, assumptions and dependencies, constraints and functional requirements of the project.
  • Chapter 4 is High Level Design which states the design phase in Software Development Life Cycle.  This chapter discusses about the design considerations like general constraints, development methods and architectural strategies. This chapter explains the project System Architecture and Data Flow Diagrams.
  • Chapter 5 is Detailed Design which explicates the project modules. The functionality of each module is explained in this section. And the structural diagram of each system is also explained.
  • Chapter 6 is Implementation which describes the technology used in the system. This section also explains programming language, development environment, code conventions followed.
  • Chapter 7 is Software Testing which elaborates the test environment and briefly explains the test cases which were executed during unit testing, integration testing and system testing.
  • Chapter 8 is “Experimental Results” which mentions the results found by the experimental analysis on the available data. It tells about the inference made from the results.
  • Chapter 9 is Conclusion conveying the summary, limitations and future enhancements of the project.

1.8   Summary

 

This chapter deals with the introduction to the machine learning techniques and the research              gap.              It              also              discuss              about              the              Motivation,              Problem              Statement,              Objective.

CHAPTER 2

 

FUNDAMENTAL CONCEPTS

 

Over the last few years, many attempts were considered to make the Internet faster. The Internet Service Providers (ISP) try to decrease the web page load times by using efficient and fast networking devices and by increasing the bandwidth. The web browser developers try to make the browsers faster and efficient to make Internet surfing faster.

2.1 Problems with Transmission Control Protocol (TCP)

 

Transmission Control Protocol (TCP) has been the backbone of the Internet from many years. Today, TCP along with User Datagram Protocol (UDP) are the most used protocols in the Transport Layer. TCP is a connection oriented protocol, whereas UDP is a connectionless oriented protocol. Both TCP and UDP have their own disadvantages and advantages. TCP provides a reliable connection using the three-way handshake. UDP provides an unreliable connection by using a “send-and-forget” method of sending a packet. Though TCP provides a reliable connection using a three-way handshake to establish a connection, this increases the latency to establish a connection. This can be a problem if the connection is short-lived and not persistent.

Another problem in TCP is that only a single HTTP/1.1 Request/Response can be carried in a TCP segment. Also, in a single HTTP/1.1 session, it is possible to send a large number of small segments. But, this can result in a large overhead.

Moreover, since clients always initiate HTTP/1.1 transfers, the performance of HTTP/1.1 decreases significantly when embedded files are loaded, because the server has to wait for a request from the client, even if the server knows that a specific resource is needed by the client.

Head-of-line (HOL) blocking is another problem that is found in TCP. HOL occurs when consecutive packets arrive at the destination after a packet that was supposed to arrive before the consecutive packets is lost. The host at the receiving side has to the wait till the retransmission of the lost packet before it can start processing the consecutive packets that arrived after the packet was lost. In applications of media streaming, a small number of packets lost does not have a significant influence on the user experience. But, the receiver has to wait for the lost packet before it can start playing the video. This problem with TCP can be overcome by opening multiple connections between the same endpoints. This works well for small number of connections. But, when many connections are opened simultaneously, then the connections tend to sway between very small and very large congestion windows. This leads to less throughput and consequently leads to bad user experience.

2.2 Overview of Google QUIC (Quick UDP Internet Connection)

QUIC (Quick UDP Internet Connection) is an experimental protocol developed by Google. QUIC aims to combine the advantages of TCP and UDP to a single unified protocol. The protocol supports a set multiplexed connections over UDP, and was designed to provide security protection equivalent to TLS/SSL, along with reduced connection and transport latency. An experimental implementation is being put in place in Chrome by a team of engineers at Google.

QUIC uses UDP and many of its mechanisms are inspired by TCP. The QUIC protocol uses acknowledgements like TCP to convey to the receiver that segments have arrived at the destination. The loss recovery and congestion control mechanism of QUIC is a reimplementation of TCP Cubic with additional enhancements [2]. The TCP cubic mechanism is enhanced for high bandwidth and high latency networks [3]. The QUIC protocol makes use of a timer for retransmission. Every segment that is not acknowledge within the period of the timer, will be considered as lost by the protocol.

tcp_udp_quic_http2_compared

Figure 2.1 QUIC in the protocol stack

 

QUIC uses fast retransmit mechanism to avoid the retransmission timeouts. This is triggered when the sender receives three or more duplicate segments of acknowledgements (ACKs). A duplicate ACK is an acknowledgment for a segment that has been acknowledged before, and indicates a packet loss. When a retransmission timeout occurs, the congestion window will be set to the maximum segment size. The fast retransmit mechanism sets the congestion window to a value dependent on the value before the loss. The connection will stay in the congestion avoidance phase. As it is done after a retransmission timeout occurs, it does not start with a new slow start.

2.2 Improvements in QUIC protocol

 

The congestion control mechanism in QUIC is inspired by TCP and uses many mechanisms that TCP uses. Moreover, QUIC also has new mechanisms of congestion and flow control that are exclusive to this protocol.

2.2.1 Quicker establishment of connections

Connections between the sender and receiver in QUIC are established in 0-RTT or at most 1-RTT. When a QUIC client connects to a server that it has never connected to before, i.e., it connects to the server for the first time then the QUIC client sends an empty inchoate packet called as Client-HELO (CHLO). When the QUIC server receives this packet, the server responds with a rejection packet (REJ) along with the configuration of the server and the Secure Sockets Layer (SSL) certificates. The QUIC client then uses this information to send another CHLO packet which is accepted by the server. After this, all the data that is sent is authenticated and encrypted. When the QUIC server is known by the QUIC client, the inchoate packet CHLO is not necessary. This results in a 0-RTT handshake connection establishment. When the server is unknown the inchoate CHLO packet has to be sent to the server resulting in a 1-RTT handshake connection establishment.

2.2.2 Multiplexing

QUIC uses a randomly selected 64bit connection identifier by the client. Multiple streams are used within each connection to transport the segments. This feature of QUIC allows the clients to establish mobility of connection across many different IP addresses and UDP ports. Many ports can be used for an application but the application has to listen to each of these ports. Also, QUIC provides migration of connection across end points. This means that the QUIC connection can remain established even when the IP address of one or both of the end points changes.

Since QUIC protocol supports multiple different streams within a single connection, it addresses the issue of head-of-line (HOL) blocking by sending independent data via respective different streams. Every stream within a connection is identified by a stream identifier (stream ID). Collisions are avoided between server and client regarding the stream ID by using an even stream ID when the client initiates the stream and by using and odd stream ID when the server initiates the stream. Each participant will increase the stream ID monotonically for every new

stream that is created.

2.2.3 Increasing sequence numbers

In TCP same sequence numbers are used for the original segments and the retransmitted segments. This causes a problem in TCP as the host cannot differentiate between the original segments and the retransmitted segments. To solve this problem, QUIC uses sequence numbers that are always increasing for every segment. Therefore, even the retransmitted segments have unique sequence numbers.

2.2.4 Verbose signaling

QUIC has a better signaling than that of TCP. As QUIC uses unique sequence numbers for both the original segments and the retransmitted segments a receiver can easily differentiate between an original packet and its retransmitted equivalent. Therefore, the sender can calculate the RTT with greater accuracy as the difference between a delayed acknowledgement and acknowledge of retransmitted packet can be easily recognized.

TCP uses selective acknowledgements (SACKs) to  inform the sender which packets are received so that the sender can retransmit the lost segments. The QUIC protocol instead of SACKs uses negative acknowledgements (NACKs) with a bigger range of up to 255 instead of 3 Instead of implicating non-acknowledge packets by not send ACKs the QUIC protocol uses NACKs to directly report the lost packets. Compared to the SACKs that TCP uses, it is more advantageous when QUIC uses NACKs as it has a larger range.

2.2.5 Forward error correction

QUIC uses forward error correction (FEC) and sends additional data along with every packet so that when a packet is lost, it can use this additional data to reconstruct the lost packet. This technique uses a combination of XOR operations to reconstruct the lost packets. This technique is simple and effective. But, this technique will not work when multiple packets are lost within a group. Packets cannot be reconstructed in such case.

2.2.6 Packet Pacing

Packet pacing is a technique used in QUIC that sends packets at a rate less than the full rate to make the transmission less bursty. This technique is very useful and effective in conditions with low bandwidth.

2.2.7 Encryption and authentication of packets

When TCP is used, the headers and the payload are not encrypted as long as no additional protocol is used. Additional protocols have to be used along with TCP to authenticate and encrypt the data. QUIC uses SSL to always encrypt and authenticate the data. The handshake in QUIC is inspired by the handshake of TLS.

2.3           Summary

This chapter introduces the basic concepts used in the project. Google QUIC can be used for ingestion of streams from the encoder to the entry point of the media streaming architectures. As the path from the encoder to the entry point experience high packet loss and RTT, it is ideal to use a UDP based protocol such as QUIC.

CHAPTER 3

 

   SOFTWARE REQUIREMENTS SPECIFICATION

 

Software Requirements Specification (SRS) [45] is detail description of system behavior that is constructed. It integrates the non-functional and functional requirements for software to be constructed. The functional requirements describe what exactly the software must do and the non-functional requirement includes the constraint on the design or implementation of the system. A function is described as a set of inputs, the behavior, and outputs. A non-functional requirement is a requirement that specifies criteria that can be used to judge the operation of a system, rather than specific behaviors.

3.1           Overall Description

 

This section describes the general factors which affects system and the requirements. The software developed should provide acceleration of streams in the first-mile. This section also deals with user characteristics, constraints on using the system and dependencies of the system on other applications.

3.1.1     Product Perspective

 

The System should be versatile and easy to use. It should be scalable easily and the response time should be quick. Maintenance should be easy. The System is composed of several modules performing different tasks. The System developed is easy to deploy and use.

The intended audiences for this system are businesses that aim to deliver video content at the highest quality, scale rapidly to a large number of users and provide high quality viewing experience. This system is easy to deploy and provides an efficient and reliable delivery of media streams. This system is secure as data that is sent from the encoder is always encrypted. This system can be used to broadcast live events such as music concerts, Olympics, sports events, speeches, etc.

3.1.2  Product Functions

The primary function of the system is to support live streaming video delivery while optimizing the quality, scalability and security. The system should be able to integrate with an encoder. The media encoder takes a real-time audio-video signal from an audio-video hardware device. In case of HLS, the encoder will encode the media using an encoding format like H.264 for video and AAC format for audio and encapsulate it in a MPEG-2 Transport Stream. A media segmenter will be used to segment the MPEG-2 Transport Stream to a series of individual segments of equal length. An index file is created by the encoder that contains references to the segmented files along. The QUIC client running on the encoder machine takes segments from the encoder and POSTs to the QUIC server running on the ingest server machine. The segments and index file are retrieved by the QUIC server and forwarded to the ingest server. The ingest server then transfers the media segments to the streaming mid-tier.

3.1.3     User Characteristics

 

The project is to develop a software model which can be effectively used by water development authorities as well as some features may be useful to the common man. It is useful for governments and municipal authorities, who can take help of FCGWS results, devise desalination solutions and take necessary rectification steps if needed. For the common man, this can be used to inform him about the conditions of water in a particular location and help him avoid using it for drinking purposes if it is contaminated. The users of the system would prefer a quite accurate model for their use and also expect the system to be fast and user friendly.

3.1.4     Constraints and Dependencies

 

The encoder at the media generation infrastructure should accept the audio-video signals generated and send it to the entry point of the media streaming architecture. Any loss of data from the encoder can lead to insufficient data being transferred into the streaming architecture. This will result in bad viewing experience for users. Hence, there should be no loss of data from the encoder.

The project is dependent on the following factors:

  • The encoder should always accept the audio-video signals generated and transfer the data to the QUIC client.
  • The QUIC server should always be running to accept data from the QUIC client.
  • The QUIC client should send the media stream segments by encrypting it.

3.2           Specific requirements

This section covers all SRS [45] to level of sufficient details to make use of these details by the designers to design a system to satisfy requirements. The product perspective and user characteristics does not state the actual requirements needed that the system to be satisfied. The specific requirements are actual data with which the customer and software provider can agree. The final system is expected to satisfy all the requirements mentioned here.

3.2.1  Functional Requirements

 

The functional requirements of the system are: –

  • The system should encode audio-video signals generated from the media generation infrastructure and forward it to the QUIC client.
  • The system should support HTTP based streaming such as HTTP Live Streaming (HLS).
  • The QUIC client should encrypt the media segments and transport the segments to the QUIC server using QUIC protocol.
  • The client should be able to access the media stream from using a media player.

3.2.2     Performance Requirements

 

Traditional methods of media streaming use TCP as the transport protocol for transporting the streams from the encoder to the streaming architecture. UDP based streaming can improve the performance by reducing the latency.

Performance requirements include the following:

  • The system aims to use QUIC, a UDP based protocol for transporting stream segments from encoder to the streaming architecture.
  • The system must be reliable and secure and no loss of data should occur.

3.2.3     Supportability

 

The system is web based. Thus it should support all the available browsers. It must be deployed on the recent versions of the Apache Tomcat server.

3.2.4  Software Requirements

 

The different software requirements required by the application are as follows: OS :Ubuntu 14.04 LTS or higher

Platform : Linux

Language : C and C++

Visual Interface : HTML, CSS.

IDE/tool : Eclipse Neon.2 Release (4.6.2)

3.2.5     Hardware Requirements

 

The following points describe the minimum hardware requirements for the running of the application.

Processors : x86 or x64 based processor.

RAM : 4GB.

Storage : 100GB.

3.2.6     Design Constraints

 

The system is flexibly designed. The amount of data available is not optimum and more

data can make the application more accurate. The design constraints include a user friendly UI. It

should be able to handle large amounts of data in quick time. To view the Google maps that indicates the location of the wells, the potability status and the water level in those wells, internet is required.

3.2.7     Interfaces

 

This section describes the interfaces in detail. The section below describes about the user interfaces implemented in the system.

User Interfaces of the system:

 

User interfaces include options for selecting operations like browse the related data, to see the forecasting respectively. The following are some of user interface options provided in the system:

  • The input is selection of well number.
  • The user can select the well number from the list of wells.
  • The user can analyze the prediction of next year based on previous year.
  • The forecasted values of various salts for the selected well are displayed.
  • The forecasted values of ground water level for the selected well are displayed.
  • Provision for analysis of Rainfall, Ground water level and water quality prediction in the form of graph.
  • An option to view the location of the wells along with the forecasted water level of the wells and the potability status of the wells on Google maps.
  • An option to view the GIS maps that indicate the spatial representation of the salts’ values.
  • Finally, the GUI provides the admin of the system, options to add new taluk, add new well, add the water quality data and water level data for the upcoming years, delete wells and taluk.

Software Interfaces of the system:

 

 

 

Since the system is implemented in Java, it can run on any system which supports JVM. The following are softwares used in this system:

  • Persistence library for accessing database.
  • Google APIs are used for plotting the graph and for maps.
  • ARCGIS software for drawing the GIS maps.

3.2.8     Non-Functional Requirements

 

These are the requirements that specify criteria that can be used to judge the operation of the system rather than specific behavior and are not directly concerned with the specific functions delivered by the system [45].

  • Reliability: The system shall always perform correct operation as specified by the client user and shall return the respective operation result code.
  • Availability: The application shall be up and running during entire session of user operation.
  • Security: Only admin of the system should be able to log in.
  • Portability: The System must be capable of porting to any platform and operating system. The programming language selected is such that it caters this requirement (Java is platform independent).

3.3 Summary

 

The specific requirements and constraints of this project have been detailed in this chapter. These include the hardware requirements, software requirements and functional requirements. Also, this chapter cites the various assumptions being made by the developer of the water quality potability and rainfall forecasting. All these have to be managed while using the system.

CHAPTER 4

 

HIGH LEVEL DESIGN

 

Design is significant phase in development of software. It is basically a creative procedure which includes the description of the system organization, establishes that it satisfies the functional and non-functional system requirements [47]. Larger systems divided down into smaller sub-systems contain services that are related to each other. The output in design phase describes the architecture of software to be used for the development of the common endpoint service. This section depicts the issues that are required to be covered or resolved before attempting to devise a complete design solution.

The detailed design includes an explanation for all the modules. It throws light on the purpose, functionality, input and output.

The software specification requirements have been studied to design the system using Linear Regression improved with Pearson Co-efficient [16] and Naïve Baye’s Classification[18] , in which previous data is trained and used to predict and forecast by the system.

4.1   Design Considerations

 

There are several design consideration issues that need to be fixed before designing a solution for the system to be implemented. The following sections describe constraints that have heavy impact on the software, a method or approach used for the development and the architectural strategies. It also describes the overview of the system design.

4.1.1  General Constraints

General constraints which need to be considered to use the system are listed below:

  • The user should be entitled to use the system.
  • The user should be aware and have detailed information about data like training dataset and test dataset.
  • The amount of data present to train and test the model is not of optimum amount and hence very good precision for prediction and classification may not be achieved.

4.1.2  Development methods

The design method employed is highlighted in this section:

  • The data flow model has been the design method employed for development of the system.
  • A data flow model is modeling system based on data transformation that takes place as the data is being processed. The notations used represent functional processing, data stores.
  • Data flow models gives the better understanding of how data is associated with the particular process by tracking and providing the documentation.

4.2   Architectural Strategies

The overall organization of the system and its high level structure is provided by this section and also provides the key insight into the mechanism and strategies used in system architecture.

4.2.1  Programming Language

 

Java programming language is used to design the system. Java supports object oriented programming, wide range of data types and application programming interface for handling the data. The user interface is developed using HTML, CSS and JavaScript.

Java [32][46][47] is the optimal programming language for the  project because of   the following reasons:

  • Java is platform independent which is one of its design goals. This results in great level cross-platform independence and portability.
  • Java has simple methods for reading images from files of different formats. This is an important feature as reading input images is the basic requirement of project.

4.2.2  User Interface Paradigm

 

The GUI of the system is a multi-window system with simplified transfer of control from one window to another. The output is shown in a window which has the predicted value as well as a graph to showcase the results is available. User selects the well number in the available set of wells and water quality, and ground water level forecasting is obtained.

4.2.3  Error Detection and Recovery

 

Error detection and recovery is an important aspect of the implemented project. Exceptions may occur, if the admin enters wrong data type or if any of the fields are left  blank in inserting the new data for upcoming years. Internal error detection is done using the try-catch blocks.

4.2.4  Data Storage Management

Data storage management is necessary for efficient nature of the program. It ensures that all the data generated as results must also be fed back into the training set and stored appropriately. This is applicable to both the prediction and classification as the model continuously learns after every processing. The training data is passed to the system and the results are stored back in database and hence retrieved for plotting the graph.

4.3   System Architecture

 

This process is focused on basic structure of model in the system. It deals to identify major modules in system and communications amongst these modules. The approach used  for the development of the common endpoint service is object oriented  wherein  the  system is classified into different objects, which represent real world entities. The architecture is depicted in the figure below.

Figure 4.1, System Architecture of FCGWS

 

 

The system architecture is as shown in Figure 4.1. Initially, the collected data is inserted into the database. The data from database is retrieved and the trained using the algorithm of Linear Regression improved with Pearson Co-efficient and the values of salts and water level is forecasted for the next year and the data is inserted back into database. The forecasted salts’ values are retrieved from the database and classified as potable or not using Naïve Baye’s classifier method. Finally, the values of the salts of water in different wells are represented spatially on GIS maps using ARCGIS software.

4.4   Data flow diagram

 

A Data Flow Diagram (DFD) is graphical representation of the “flow” of data through an information system [45]. Data Flow models describe how data flows through a sequence  of processing steps. DFD is composed of four elements, which are process, data flow, external entity and data store. With data flow diagram, the users can easily to visualize the operations   within   the   system,   what   can   be   accomplished   using   the   system      and

implementation of the system. DFDs provide the end users an abstract idea regarding the data that is given as input to the system, the effect that the input will ultimately have upon the whole system.

4.4.1  Level 0 Data flow diagram

 

The level 0 DFD describes general operation of the system. There are two modules. They are water quality potability and Ground Water level prediction. The Data is collected from CGWB and the relevant data are passed as input to the corresponding system and predicts the respective outputs. The level 0 Data flow diagram is as shown in the Figure 4.2.

Figure 4.2 Level 0 Data flow Diagram of the FCGWS System

 

  1. Level 1 Data flow diagram

 

The Level 1 DFD describes more in detail than the Level 0 DFD. This is as shown in the figure below.

Figure 4.3 Level 1 Data flow Diagram for the FCGWS System

 

The Level 1 DFD in Figure 4.3 clearly shows the splitting up of system into its constituent modules which perform the basic tasks of the software i.e. forecasting. The prediction module performs the prediction of the groundwater level and water quality potability. The system takes as input the water quality data and ground water level from the database and will give as output, the respective water quality potability and ground water level forecasting to the government.

4.4.3  Level 2 Data flow diagram

 

The process in level 1 is expanded here. The Level 2 DFD for water quality potability forecasting is shown in Figure 4.4 and the Level 2 DFD for water level forecasting is shown in Figure 4.5.

Figure 4.4 Level 2 Data flow Diagram for Water quality prediction

 

Figure 4.4 shows level 2 DFD for water quality potability forecasting. The water quality data from the CGWB [1] is taken as input and fed into the system. The system takes the data from the database, trains the system with previous years’ data using the process of Linear Regression with Pearson co-efficient. Then the values of the salts are forecasted for  the next year. These values are input to the Naïve Baye’s classifier method. The data is classified whether good or bad for drinking purpose. Finally, the values of the salts are spatially represented in GIS maps.

Figure 4.5 Level 2 Data flow Diagram for Ground water level

 

Figure 4.5 shows level 2 DFD for water level forecasting. The water level data from the CGWB [1] is taken as input and fed into the system. The system takes the data from the database, trains the system with previous years’ data using the process of Linear Regression with Pearson co-efficient. Finally, the values of the salts are spatially represented in GIS maps.

4.4.4  Level 3 Data flow diagram

 

The Level 3 depicts how the system is further split up into sub-processes of the system, each sub-processes provide functionality of whole system. The Level 3 DFD for water quality potability forecasting is shown in Figure 4.6 and the Level 3 DFD for water level forecasting is shown in Figure 4.7.

Figure 4.6 Level 3 Data flow Diagram for Water quality potability

 

Figure 4.6 shows level 3 DFD for water quality potability forecasting. The water quality data from the CGWB [1] is taken as input and stored in the database. The system  takes the data from the database, and it is trained with previous years’ data using the process of Linear Regression with Pearson co-efficient. Then the values of the salts are forecasted for the next year. These values are inserted back into the database. The values are taken from the database and are input to the Naïve Baye’s classifier method. The data is classified whether good or bad for drinking purpose. The values to be spatially represented are converted into ARCGIS software readable format and then the values of the salts are spatially represented in GIS maps.

Figure 4.7 Level 3 Data flow Diagram for Ground water level

 

Figure 4.7 shows level 3 DFD for water level forecasting. The level data from the CGWB [1] is taken as input and stored in the database. The system takes the data from the database, and it is trained with previous years’ data using the process of Linear Regression with Pearson co-efficient. Then the values of the water levels are forecasted for the next year. These values are inserted back into the database. The values to be spatially represented are converted into ARCGIS software readable format and then the values of the water levels are spatially represented in GIS maps.

4.5   Summary

 

The above data models depict how data is processed by the system. This constitutes the analysis level. The notations applied above represent functional processing, data stores and data movement amongst the functions. The purpose of chapter is to describe major high- level processes and interrelation in the system. All the above mentioned levels of DFDs illustrate this.

CHAPTER 5

 

DETAILED DESIGN FOR FCGWS SYSTEM

 

In the Detailed Design phase [45], the internal logic of every module specified in  High Level Design (HLD) is determined. Specifically, in this phase the design of each module, the low  level components and subcomponents are described. After determining  HLD [46] graphical representation of the software system being developed is drawn. Each module‘s input and output type, along with the possible data structures and algorithms used are documented during the detailed design phase. The following sections provide such information of the modules

5.1           Structured Chart

 

The structure chart [45] shows the control flow among the modules in the system. It explains all the identified modules and the interaction between the modules. It also explains the identified sub-modules. The structure chart explains the input for each modules and  output  generated  by each module.

In FCGWS, there are three sub modules. They are forecasting, classification and spatial representation. The forecasting module is further divided into two. They are water quality sub module and water level sub module. The description of the sub modules, the flow of data and the results of each sub module are shown in figure 5.1.

Figure 5.1 Structure Chart of the FCGWS system

 

  1. Functional Description of Modules

 

The internal working of each of the modules is explained in this section. It also describes the software component and sub-component of the FCGWS system.

  1. Water quality forecasting module

 

This module describes the forecasting of the water quality parameters for the upcoming year on selecting a particular well. There are twelve parameters, whose values  have to be forecasted.

  • Purpose: The purpose of this module is to forecast the water quality parameters.
  • Functionality: The parameters’ values are retrieved from the database. Pearson co- efficient technique is used to derive the correlation between the parameters. The formula for Pearson co-efficient is explained in Chapter 2. For example,  in  this project for the parameter pH, fluorine is the parameter with highest correlation with Pearson Co-efficient value of 0.84.

After obtaining the two parameters, linear regression technique discussed in Chapter 2, is applied on those two parameters. For example, let the two parameters be fluorine and pH. Then, linear regression method is applied considering fluorine as the input parameter (X) and pH as the output parameter (Y), thus forecasting the pH value for the upcoming year.

  • Input: The water quality parameters from the database.
  • Output:  Forecasted values of the parameters for the upcoming year.
  • Flowchart: The flowchart shown in the figure 5.2 explains the procedure followed in the forecasting the values of water quality parameters. Firstly, parameters are  retrieved from the database and are input to the Pearson Co-efficient method and after two parameters are chosen. Linear Regression is applied on the two parameters selected, by choosing one parameter as input (X) and the other as output (Y), thus forecasting the value of Y for the upcoming year.

Figure 5.2 the water quality module flowchart

 

 

  1. Water level forecasting module

 

This module describes the forecasting of the water level after three seasons for the upcoming year on selecting a particular well.

  • Purpose: The purpose of this module is to forecast the water level for three seasons for the upcoming year.
  • Functionality: In the module, the parameters’ values are retrieved from the database. Pearson co-efficient technique is used to derive the correlation between the parameters. The correlation between rainfall and water level is obtained. This is found to be 0.75.

After obtaining the two parameters, linear regression is applied on those two variables. Here, the two variables are water level and the current year rainfall. Then, linear regression method is applied considering rainfall as the input parameter (X) and

water level as the output parameter (Y), thus forecasting the water level values after three seasons for the upcoming year.

  • Input: The water level values from the database.
  • Output:  Forecasted water level values after three seasons for the upcoming year.
  • Flowchart: The flowchart shown in the figure 5.3 explains the procedure followed in the forecasting of water level values. Firstly, water level and rainfall values are retrieved from the database and are input to the Pearson Co-efficient method for obtaining the correlation. Linear Regression is applied on the two parameters selected, by choosing one parameter as rainfall (X) and the other as water level values (Y), thus forecasting the value of water level value after three seasons for the upcoming year.

Figure 5.3 shows the water level module flowchart

 

  1. Water quality potability forecasting module

 

After the water quality parameters are forecasted, the potability status is determined for the upcoming year on selecting a particular well. The Naïve Baye’s classifier method is used.

  • Purpose: The purpose of this module is to forecast the potability status of the water sample, that is whether the water is potable or not.
  • Functionality: In this module, for the classification of water quality as potable or not, Naïve Baye’s classifier method as explained in chapter 2 is used. The posterior values for potable and non-potable are calculated. These values are compared. If the  posterior value for potable is more than that for non-potable, then the water sample is classified as potable, else it is classified as not potable.
  • Input: The forecasted water quality parameter values from the database.
  • Output:  Forecasted potability status for the upcoming year.
  • Flowchart: The flowchart shown in the figure 5.4 explains the procedure followed in the potability forecasting module. Firstly, the forecasted water quality parameters are retrieved from the database. These values are the input to the Naïve Baye’s classifier method. This method classifies the water as potable or not. If the water sample is not potable, the parameters that do not comply with the WHO standards are indicated.

Figure 5.4 the water quality classification model flowchart

 

  1. GIS module

 

The forecasted values are spatially represented on GIS maps using the ARCGIS software.

  • Purpose: The purpose of this module is to plot the GIS maps for different parameters.
  • Functionality: In this module, the GIS [8] maps that represent the data spatially is plotted. Firstly, the longitude and latitude values of the required wells are converted into respective UTM co-ordinates, called easting and northing. The value of one of  the parameters for these wells are taken. These values are written into an excel file. This file is the input to ARCGIS software. Then, the folder is selected where all the intermediate files are to be stored. Then the base map of the place containing all these

wells is plotted. Once the wells are plotted, the IDW [10] technique is chosen along with required contour interval, thus producing the required GIS maps.

  • Input: A Microsoft Excel file that consists of the longitude and latitude values of the wells and the values for one parameter (example EC or pH).
  • Output: GIS map consisting of the wells and the values of one particular parameter of those wells (example EC or pH).
  • Flowchart: The flowchart shown in the figure 5.5 indicates the process in drawing the GIS maps. Firstly, the northing and easting values of the wells are obtained. A base map of Chikballapur is plotted and the wells are plotted on that. Then the IDW technique is used plot and spatially represent the values of the parameters.

Figure 5.5 the GIS module flowchart

 

5.3 Summary

 

The internal working of the project with the necessary data flow has been described in this chapter. A clear view on control flow within FCGWS was conveyed by the structure chart with the functionality of its modules being explained.

CHAPTER 6

IMPLEMENTATION OF FCGWS SYSTEM

 

The implementation phase is significant phases in the project development as it affords final solution that solves the issues. In this phase the low level designs are transformed into the language specific programs such that the requirements given in the SRS [45] are satisfied. The phase requires actual implementation of ideas that were described in analysis and design phase.

The technique and the methods that are used for implementing software must support reusability, ease of maintenance and should be well documented.

6.1        Programming Language Selection

 

The programming language chosen to implement the project is Java [32][46][47]. Java is one of the most useful languages in the current age and time with extensive coverage in scientific computing domains like Artificial Intelligence (AI), machine learning, astronomy, computer vision etc. The reasons for choosing Java are listed below:

  • Java has a simple and easy to program syntax. Though it is an object oriented language it can be used for functional development of product as well.
  • Web applications can be easily developed in Java.
  • Java is platform independent which is one of its design goals. This results in great level cross-platform independence and portability.
  • Java helps in handling huge amounts of inputs efficiently.
  • Retrieving data from the database and writing the data into the database can be done easily and effectively.

6.2        Platform Selection

The system is designed to work on windows operating systems. The reasons to use windows system are mainly due to relative simplicity of the operating systems well abundant documentation available for windows system. Since there is no necessity to manipulate system data structure there is any necessity of open source operating system. Also since Java is used, which is platform independent, it is easy to adopt the project to work on other operating system like Linux.

6.3        Code Conventions

 

This section discusses the coding standards followed throughout the project. It includes the software applications that are necessary to complete the project. Proper coding standards should be followed because large project should be coded in a consistent style. This makes it easier to understand any part of the code without much difficulty. Code conventions are important because it improves readability in software, allowing the programmers to understand code clearly.

6.3.1       Naming Conventions

 

Naming conventions helps programs in understandable manner which makes easier to read. The names given to packages, scripts, graphs and classes should be clear and precise so that their contents can easily be understood. The conventions followed for this project are as follows:

Classes: Class names must be nouns. The upper camel casing method is followed, in which the first letter of every word is in Capitals including the first word. Example: Class AddWell, PredictPh.

Methods: Methods should be verb. For methods also, the upper camel casing is followed.

Example: GetEc( ).

Variables: Variable names must be short and meaningful. Example: wellno, which indicates the well number for which the prediction is to be made.

6.3.2      File Organization

 

The files used to implement the project were organized and kept in certain order based on their types.

  • All the layout files, .xml files are stored in the resources folder of the application.
  • The Java [32][47] files are present in the source (src) folder of the application. In the src folder, there are three sub packages, they are controller, entity and services. In the controller package, all the files related to the servlets are present. In the entity package, the code to retrieve data from the database is present. In the services, the files that perform operation like prediction are present.
  • All images and media used for the project are stored in the images folder of the application.
  • All the java server pages (jsp) and html files are present in web folder.

6.3.3         Declarations

 

Standard declarations conventions are followed. Standard names are given which make it easy to understand the role of each entity declared. Multiple declarations per line are not allowed because of commenting and to reduce ambiguity.

6.3.4     Comments

 

Comments are necessary part of any coding conventions as it improves the understandability of the code developed. Comment lines begin with the character ‘//’, and anything after a ‘//’character is ignored by the interpreter. The // character itself only tells the interpreter to ignore the remainder of the same line.

In the project files, commented areas are printed in grey by default, so they should be  easy to identify. There are two useful keyboard shortcuts for adding and removing chunks of

comments. Select the code to be commented or uncommented, and then press Ctrl-K + Ctrl-C for commenting the selected block which will place one //’ symbol at the beginning of each line and Ctrl-K + Ctrl -U to perform the uncommenting. Comments for blocks of code are started by a ‘/*’ and are delimited by a ‘*/’.

Comments are useful for explaining what function a certain piece of code performs especially if the code relies on implicit or subtle assumptions or otherwise perform subtle actions.

To make the documentation easy Java provides special type comments called Java Docs. In the project Java Docs are declared with each and every class and function. Java Doc comment looks like as follows,

/*

  • Any comment
  • @ is an annotation. It can be @author, @param etc.

*/

6.4        Difficulties Encountered and Strategies Used to Tackle

 

This section discusses the difficulties encountered in the development of the project. The main difficulties encountered were in the collection of data and the machine learning algorithms.

6.4.1  Getting data in required format

 

Data is needed to train [18][20] models for prediction as well as classification. The data collected was in Microsoft Excel sheets with many of the values missing and extra entries that were random and not useful. Also, data about every well was present in a single file which led to quite a bit of confusion to read data individually for each well.

To overcome these issues many actions were taken. First, the data for each well was put in separate files for clearer access manually and hence the data for certain wells which were consistent were to be found out. Also the unnecessary columns were removed from the data files and only the important parameters needed were selected. Once the required data was obtained, the data was inserted into the database. The data is retrieved using the Java persistence utility called JPA (Java Persistence API). It helps in preventing the SQL injectionattacks[47].

For the spatial representation of the data in GIS maps, the software called ARCGIS is used. The data required for the drawing of the maps should be in a excel file and that file to be in

.xls format (should not be in .xlsx) and this excel file is input to the ARCGIS software to obtain the GIS maps.

6.4.2       Display Logic

 

The design of user interface for the windows application was a challenging task because  it had to be kept very simple and yet the user should be able to access all the results on the click of few buttons. Bootstrap, a front end framework that makes UI design fast and simple, was used simply because it is extensively used in the industry and also user friendly design can be obtained. It was difficult in developing the design using this framework, as there were only a few applications developed using this framework before.

6.5 Summary

 

This chapter deals with the programming language, development environment and code conventions followed during implementation of the project. It also explains the difficulties encountered in the course of implementation of the project and strategies used to tackle them.

CHAPTER 7

SOFTWARE TESTING FOR FCGWS SYSTEM

 

The aim of Software Testing [48] is to detect defects or errors testing the components of programs individually. During testing, the components are combined to form a complete system. At this particular stage, testing is concerned to demonstrate that the function meets the required functional goals, and does not behave in abnormal ways. The test cases are chosen to assure the  system  behavior  can  be  tested  for  all  combinations. Accordingly,  the expected behavior of  the  system  under  different  combinations  is  given. Therefore  test cases are selected which have inputs and the outputs on expected lines, inputs that are   not  valid  and  for  which  suitable  messages  must  be  given  and  inputs  that  do  not   occur  very frequently which can be regarded as special cases.

In this chapter, several test cases are designed for testing the behavior  of  all  modules. When all modules are implemented completely, they are integrated and deployed  on Tomcat server. Test cases are executed under same environment. Test cases mainly contain tests for functionality of all modules. Once the application passes all the test cases it is deployed on the production environment for actual real time use.

7.1           Test Environment

 

The proposed software mainly deals with the prediction and forecasting of corresponding data. In this system all the inputs to system are given through the  User Interface (UI), built in HTML, CSS. The output is also viewed through GUI. For testing software, various test strategies are to be used such as  unit  testing, integration testing,  system testing and interface testing.

There are many software components required to test components of System. Within the NetBeans IDE [46] itself, the system can be debugged by fixing breakpoints in code, then running application in the debugger. The execution of code line by line examines application state in order to discover any problems. In this section, the various parameters which are directly responsible for the performance of the system will be explained.

7.2           Unit Testing

 

Unit test concerns in verification effort  on  the  smallest  unit  of  software  design, the software components or modules. By testing in this method the bugs occurred will be  very clear. The following show the test cases on which this testing is performed.

7.2.1      Unit Testing of Water quality potability forcasting Module

 

The following show the test cases for water quality potability forecasting [16] on which this testing is performed. The testing for forecasting of parameters’ values is as shown in the table below.

Table 7.1 Forecasting of Water quality potability

 

Sl No. of  Test Case 1
Name of Test Case Forecasting of Water Quality Potability
Feature being Tested Forecasted values of potability
Description Ground water Sample of selected well is to be forecasted
Sample Input Well number
Expected Output Salts contents in Mg/L, pH  and EC in S/m
Actual Output Salts contents in Mg/L, pH  and EC in S/m
Remarks Successful forecasting

 

 

Table 7.1 shows the values of the salts of water sample for the upcoming year. This test was successful. Here, the user has to input the well number for which the forecasted values have to be displayed. This module successfully displays the forecasted values with the corresponding units.

The accuracy of individual parameters’ values are tested by comparing the predicted values of the parameters with the actual values for the year 2015.

The accuracy test for the parameter pH, is shown in table below.

Table 7.2 Accuracy test of Water quality parameters (pH)

 

Sl No. of  Test Case 2

 

Name of Test Case Prediction of salts (pH)
Feature being Tested Predicted value of each salts (pH)
Description Ground water Sample of selected well is to be predicted
Sample Input Well number
Expected Output pH value
Actual Output pH value
Remarks Successful prediction with 95% accuracy (for the year 2015).

 

 

Table 7.2 shows the accuracy of the parameter pH. The input to this system is the well number. Once the well number is entered, the comparison between the predicted pH value and the actual pH value for the year 2015 is displayed. An accuracy of about  95%  is achieved.

The accuracy test for the parameter fluorine is shown in table below

Table 7.3 Accuracy test of Water quality parameters (fluorine)

 

Sl No. of  Test Case 3
Name of Test Case Prediction of salts (fluorine)
Feature being Tested Predicted value of each salts (fluorine)
Description Ground water Sample of selected well is to be predicted
Sample Input Well number
Expected Output fluorine value in mg/L
Actual Output fluorine value in mg/L
Remarks Successful prediction with 97% accuracy (for the year 2015).

 

 

Table 7.3 shows the accuracy of the parameter fluorine. The input to this system is the well number. Once the well number is entered, the comparison between the predicted fluorine value and the actual fluorine value, in mg/L for the year 2015 is displayed. An accuracy of about 97% is achieved.

The accuracy test for the parameter Electrical conductivity (EC) is shown in table

below.

Table 7.4, Accuracy test of Water quality parameters (EC)

 

Sl No. of  Test Case 4
Name of Test Case Prediction of salts (EC)
Feature being Tested Predicted value of each salts (EC)
Description Ground water Sample of selected well is to be predicted
Sample Input Well number
Expected Output EC value in µ
Actual Output EC value in µ
Remarks Successful prediction with 80% accuracy (for the year 2015).

 

 

Table 7.4 shows the accuracy of the parameter EC. The input to this system is the well number. Once the well number is entered, the comparison between the predicted EC value and the actual EC value for the year 2015 in µ is displayed. An accuracy of about 80% is achieved.

The accuracy test for the parameter sodium is shown in table below.

Table 7.5, Accuracy test of Water quality parameters (sodium)

 

Sl No. of  Test Case 5
Name of Test Case Prediction of salts (Sodium)
Feature being Tested Predicted value of each salts (Sodium)
Description Ground water Sample of selected well is to be predicted
Sample Input Well number
Expected Output Sodium in mg/L
Actual Output Sodium in mg/L
Remarks Successful prediction with 87% accuracy (for the year 2015).

 

 

Table 7.5 shows the accuracy of the parameter sodium. The input to this system is the well number. Once the well number is entered, the comparison between the predicted sodium value and the actual sodium value for the year 2015 in mg/L is displayed. An accuracy of about 87% is achieved.

The accuracy test for the parameter chlorine is shown in table below.

Table 7.6, Accuracy test of Water quality parameters (chlorine)

 

Sl No. of  Test Case 6
Name of Test Case Prediction of salts (chlorine)
Feature being Tested Predicted value of each salts (chlorine)
Description Ground water Sample of selected well is to be predicted
Sample Input Well number
Expected Output Chlorine in mg/L
Actual Output Chlorine in mg/L
Remarks Successful prediction with 75% accuracy (for the year 2015).

 

 

Table 7.6 shows the accuracy of the parameter chlorine. The input to this system is  the well number. Once the well number is entered, the comparison between the predicted chlorine value and the actual chlorine value for the year 2015 in mg/L is displayed. An accuracy of about 75% is achieved.

The accuracy test for the parameter sulphate is shown in table below

Table 7.7, Accuracy test of Water quality parameters (sulphate)

 

Sl No. of  Test Case 7
Name of Test Case Prediction of salts (sulphate)
Feature being Tested Predicted value of each salts (sulphate)
Description Ground water Sample of selected well is to be predicted
Sample Input Well number
Expected Output sulphate in mg/L
Actual Output sulphate in mg/L
Remarks Successful prediction with 80% accuracy (for the year 2015).

 

 

Table 7.7 shows the accuracy of the parameter sulphate. The input to this system is  the well number. Once the well number is entered, the comparison between the predicted sulphate value and the actual sulphate value for the year 2015 in mg/L is displayed. An accuracy of about 80% is achieved.

The test for classification [18] of water quality as potable or not, is in table below.

Table 7.8, Classification test of Water quality potability

 

Sl No. of  Test Case 8
Name of Test Case Classification of water for potability based on salinity
Feature being Tested Classification of water for potability
Description Ground water Sample of selected well is to be predicted
Sample Input Well number
Expected Output Classified as potable or not based on WHO standards
Actual Output Classified as potable or not based on WHO standards
Remarks Successful classification

 

 

Table 7.8 shows the classification test for the water sample. The input to this system  is the well number. Once the well number is entered, the forecasted values of the parameters are compared with the WHO standards and if any of the values is not in the range, the sample is classified as not potable or else it is classified as potable.

7.2.2    Unit Testing of Ground Water Level Module

 

The ground water level of the wells is forecasted [20] for the upcoming year for  which the testing is performed as follows. The forecasted value testing is shown in table 7.9

Table 7.9 Forecasting of the ground water level

 

Sl No. of  Test Case 9
Name of Test Case Forecasting of water level
Feature being Tested Forecasted values of water level
Description Ground water Sample of selected well is to be forecasted
Sample Input Well number
Expected Output Water level values in meter from surface(for three seasons)
Actual Output Water level values in meter from surface (for three seasons)
Remarks Successful forecasting

 

 

Table 7.9 shows the water level values of the upcoming year. This test was successful. Here, the user has to input the well number for which the forecasted values have to be

displayed. This module successfully displays the forecasted values of water level in meters from the surface, after three seasons that is, pre-monsoon, south-west monsoon and north-east monsoon.

The accuracy of the water level after the three seasons has also been tested, by comparing the predicted values with the actual values for the year 2014.

The accuracy of the water level after pre-monsoon season is shown in table below.

Table 7.10, Accuracy of the ground water level for pre-monsoon

 

Sl No. of  Test Case 10
Name of Test Case Prediction of water level (for pre-monsoon season)
Feature being Tested Prediction  of values  of  water  level   (for  pre-monsoon season)
Description Ground water Sample of selected well is to be forecasted
Sample Input Well number
Expected Output Water level values in meters from surface (for pre-monsoon season)
Actual Output Water level values in meters from surface (for pre-monsoon season)
Remarks Successful prediction with 86% for the year 2015( pre- monsoon season)

 

 

Table 7.10 shows the accuracy of water level after pre-monsoon season. The input to this system is the well number. Once the well number is entered, the comparison between the predicted water level value and the actual water level value for the year 2015 in meters form the surface is displayed. An accuracy of about 86% is achieved.

The accuracy of the water level after south-west monsoon season is shown in table

below.

Table 7.11, Accuracy of the ground water level for sw-monsoon

 

Sl No. of  Test Case 11
Name of Test Case Prediction of water level (for south-west monsoon season)
Feature being Tested Prediction  of values   of   water   level  (for  south-west monsoon season)
Description Ground water Sample of selected well is to be forecasted

 

Sample Input Well number
Expected Output Water level values in meters from surface (for south-west monsoon season)
Actual Output Water level values in meters from surface (for south-west monsoon season)
Remarks Successful prediction with 84% for the year 2015( south- west monsoon season)

 

 

Table 7.11 shows the accuracy of water level after south-west monsoon season. The input to this system is the well number. Once the well number is entered, the comparison between the predicted water level value and the actual water level value for the year 2015 in meters form the surface is displayed. An accuracy of about 84% is achieved.

The accuracy of the water level after north-east monsoon season is shown in table

below.

Table 7.12, Accuracy of the ground water level for ne-monsoon

 

Sl No. of  Test Case 12
Name of Test Case Prediction of water level (for north-east monsoon season)
Feature being Tested Prediction of values of water level (for north-east monsoon season)
Description Ground water Sample of selected well is to be forecasted
Sample Input Well number
Expected Output Water level values in meters from surface (for north-east monsoon season)
Actual Output Water level values in meters from surface (for north-east monsoon season)
Remarks Successful prediction with 88% for the year 2015( north- east monsoon season)

 

 

Table 7.12 shows the accuracy of water level after north-east monsoon season. The input to this system is the well number. Once the well number is entered, the comparison between the predicted water level value and the actual water level value for the year 2015 in meters form the surface is displayed. An accuracy of about 88% is achieved.

7.2.3    Unit Testing of Update Module

 

The admin of the system can input the actual values for any particular year. Here, the validation of all the fields and correct insertion into the database is tested.

Insertion into the water level table is tested as shown in table below.

Table 7.13 Insertion into water level table

 

Sl No. of  Test Case 13
Name of Test Case Water level table update
Feature being Tested Updating the water level data in the table
Description Admin inputs the values which should be inserted to database
Sample Input Values for season, well number , depth fields etc.,
Expected Output Insertion into database (if the fields are not null) on submitting
Actual Output Insertion into database occurred
Remarks Test successful

 

 

Table 7.13 shows the testing of the correct insertion into database. The admin of the system, inputs the actual values of the fields like depth, season, well number etc., and submits this data and the data is correctly inserted into the water level table in the database.

Insertion into the water quality table is tested as shown in table below.

Table 7.14, Insertion into water quality table

 

Sl No. of  Test Case 14
Name of Test Case Water quality table update
Feature being Tested Updating the water quality data in the table
Description Admin inputs the values which should be inserted to database
Sample Input Values for well number, values of the salts etc.,
Expected Output Insertion into database (if the fields are not null) on submitting
Actual Output Insertion into database occurred
Remarks Test successful

 

Table 7.14 shows the testing of the correct insertion into database. The admin of the system, inputs the actual values of the fields like well number, taluk name, values of the salts etc., and submits this data and the data is correctly inserted into the water level table in the database.

Insertion into the taluk table is tested as shown in table below.

Table 7.15, Insertion into taluk name table

 

Sl No. of  Test Case 15
Name of Test Case New Taluk insertion
Feature being Tested Inserting the new taluk into the table
Description Admin inputs the values which should be inserted to database
Sample Input Taluk name
Expected Output Insertion into database (if the fields are not null) on submitting
Actual Output Insertion into database occurred
Remarks Test successful

 

 

Table 7.15 shows the testing of the correct insertion into database. The admin of the system inputs the taluk name and submits this data and the data is correctly inserted into the taluk name table in the database.

Insertion of new well is tested as shown in table below.

Table 7.16, Insertion of new well

 

Sl No. of  Test Case 16
Name of Test Case New well insertion
Feature being Tested Inserting the new well information
Description Admin inputs the values which should be inserted to database
Sample Input Values for season, well number , depth , salts fields etc.,
Expected Output Insertion into database (if the fields are not null) on submitting
Actual Output Insertion into database occurred
Remarks Test successful

 

Table 7.16 shows the testing of the correct insertion of new well into database. The admin of the system inputs the taluk name, the well number, the values of the salts and the water level values and submits this data and the water quality data is correctly inserted into the water quality table and water level data into water level table in the database.

7.3   Integration Testing

 

Integration testing is a systematic technique for constructing the program structure while at the same time  conducting  tests  to  uncover  errors  associated  with  interfacing.  The objective is to take unit tested components and build a program structure.

7.3.1  Integration of water quality forecasting and inserting it into database

 

The values of the parameters are forecasted and the forecasted values are inserted into the database. This testing is as shown in table below.

Table 7.17 Forecasting water quality and Insertion into database

 

Sl No. of  Test Case 17
Name of Test Case Forecasted water quality values update
Feature being Tested Updating the forecasted values of the salts
Description The forecasted values of the salts for the upcoming year to be inserted into database
Sample Input Forecasted values of the salts
Expected Output Insertion into database, the forecasted values of salts.
Actual Output Insertion into database occurred
Remarks Test successful

 

 

Table 7.17 shows the integration testing of water quality module and insertion of the forecasted values of the salts into the database. The forecasted values of salts for the upcoming year have been successfully inserted into the database.

7.3.2  Integration of water level forecasting and inserting it into database

 

The values of the water levels are forecasted and the forecasted values are inserted  into the database. This testing is as shown in table below.

Table 7.18, Forecasting water level and Insertion into database

 

Sl No. of  Test Case 18
Name of Test Case Forecasted water level values update
Feature being Tested Updating the forecasted values of the water levels
Description The forecasted values of the water levels for the upcoming year to be inserted into database
Sample Input Forecasted values of the water levels
Expected Output Insertion into database, the forecasted values of water  levels
Actual Output Insertion into database occurred
Remarks Test successful

 

 

Table 7.18 shows the integration testing of water level module and insertion of the forecasted values of the water levels into the database. The forecasted values of water levels for the upcoming year have been successfully inserted into the database.

7.4         System Testing

 

System testing [48] is the testing in which all modules,  that  are  tested  by  Integration testing are combined  to  form  single  system. System  is  tested  such  that  all  the units are linked properly to satisfy user specific requirement. This  test  helps  in  removing the overall bugs and  improves  quality  and  assurance  of  the  system.  The  Proper functionality of the system is concluded in system testing.

The whole system is evaluated in this system testing, with the forecasting of ground water level and Water quality being tested. The system testing is as shown in table below.

Table 7.19, System testing

 

Sl No. of  Test Case 19
Name of Test Case System test
Feature being Tested System
Description Testing if the whole system is working
Sample Input Complete training and test data
Expected Output Forecasted values of quality, classification and water level
Actual Output Forecasted values obtained

 

Remarks Test successful

 

 

Table 7.19 shows the system testing. Here all the modules are combined and tested. The system should forecast the values of the water quality parameters and compare these values with the WHO standards and display these values along with the status of potability that is if the water sample is potable or not.

It should also display the forecasted values of water levels. On testing, the system successfully met all the requirements.

  1. Summary

 

This chapter includes the general testing process, which starts with unit testing of modules followed by integration testing wherein the modules are merged together. System testing where the entire system is tested for its functionality and correctness is executed, was performed. Finally, the functional testing of user interface is performed and the tests proved successful   in  most  test   cases.  Changes   to   failed  tests  were  performed  and    rectified.

CHAPTER 8

EXPERIMENTAL RESULTS AND ANALYSIS FOR FCGWS SYSTEM

Analysis of a process, experiments are commonly used to evaluate the inputs of  which process will have a significant impact on the output of the process, and how much the target level of those inputs should  be  to  achieve  a  desired  result. The output obtained  from the system is compared with the truth to verify the correctness of the system [49]. There are several metrics for comparison. Analyzing the experimental output is verifying whether the evaluation metrics are satisfied. This chapter discusses the performance characteristics of the system.

8.1   Evaluation Metrics

 

Evaluation metrics are the criteria for testing different algorithms. The behavior of the algorithms or techniques can be determined using these metrics. Some techniques satisfy some of the metrics.

In this project, the outputs that are obtained from the different inputs given to the system are compared with the truth to check whether the metrics are satisfied. The required metrics of evaluation as per that a good technique should be evaluated against are:

  • The Root Mean Square (RMS) value: The RMS value indicates the magnitude of difference between obtained value and expected value [50]. This operation is performed on the test data set. This metric was used to check the accuracy of the forecasted values.
  • Accuracy Score: The Accuracy Score is defined as the ratio of the number of correctly predicted values to the total number of predicted values [51]. This metric  was used for the problem of Classification.

8.2   Experimental Dataset

 

To ensure the correctness of a project or a system, it has to be tested under various conditions. For the forecasting of ground water level, the experimental dataset consists of two columns. First column is the date, in which the data was taken for three seasons such as January, May and August months of respective years. The Second column consists of the depth of water level data. The water quality forecasting consists of 12 salts such as pH, EC, Total dissolved solids, Total Hardness, Cl, SO4, NO3, Ca, Mg, Na, K and F in each sample of water to predict and classify the water is potable or not potable. For every module, the user has to input the well number for which the results are to be displayed.. The predictor, forecaster and classifier will give results for the same.

8.2.1 Ground water level training data set

 

Out of 32 bore wells in Chikballapur district, the consistent data for water level is available only for 2 wells currently. The experimental dataset of ground water level consists of two columns. First column is the date, in which the data was taken for three seasons such as January, May and August month of respective years. The Second column consists of the depth of water level data.  The partial dataset is tabulated and shown below in figure 8.1.

Figure 8.1, water level module training data set

 

8.2.1 Ground water quality training data set

Out of 32 bore wells for Chikballapur district, the water quality data is available for only 2 bore wells of some taluks. The experimental dataset of water quality consists of 12 salts. The amount of salts such as pH, EC, Total dissolved solids, Total Hardness, Cl, SO4, NO3, Ca, Mg, Na, K and F are present in the groundwater has been used to predict the potability of water. The partial dataset is tabulated and shown below in figure 8.2.

Figure 8.2 Water quality data of Well  No SW57G2C2 of Gudibanda taluk

 

  1. Performance analysis of results

 

This section explains the experimental results of this project.  The accuracy of 87% was achievd in forecasting of water quality, 92% in classification and 85% for water level forecasting.

8.3.1  Water level forecasting

 

The output obtained from a system is evaluated against the different metrics to measure the performance of the system. Figure 8.1 gives the forecasting of the ground water level data for the upcoming year. The Ground water level consists of three season’s data

which are January, may and august months for respective years. The results are predicted and forecasted accordingly for the seasons.

Figure 8.3 Experimental results of water level

 

The figure 8.3 shows the screenshot of experimental results of water level for the upcoming year for the well SW5C2G2. The results for the water level after pre-monsoon season is 1.83 m, after south-west monsoon is 2.07 m and after north-east monsoon is 1.34 m from the surface.

The figure 8.4 shows the screenshot of bar graph that shows the variation of the water level after the three seasons for the last five years. The bar marked in blue indicates the variation of water level values after pre-monsoon season. The one in red indicates the water level values after north-east monsoon season and the one in yellow indicates the values after south-west monsoon.

Figure 8.4 The statistics for water level

 

 

 

  1. Water quality forecasting

 

The output obtained from a system is evaluated against the different metrics to measure the performance of the system. Figure 8.5 gives the forecasting of the ground water quality data for the upcoming year. The Ground water quality consists of twelve parameters and also indicates the status of potability. The results are predicted and forecasted for upcoming year.

Figure 8.5 Experimental results of water quality

 

 

The figure 8.5 shows screenshot of the experimental results of water quality for the upcoming year for the well SW5C2G2. The values for the twelve parameters are as shown in figure 8.5. The parameters indicated by red represent the values that are not in the range according to the WHO standards and responsible for indicating the potability.

The figures 8.6 through 8.8 show the variations of the water quality parameters such  as EC, pH and others for the last five years.

Figure 8.6 The statistics for some water quality parameters

 

Figure 8.6 indicates the statistics indicating the changes of water quality parameters like pH, EC and nitrate over the last 5 years.

Figure 8.7 The statistics for some water quality parameters

 

Figure 8.7 indicates the statistics indicating the changes of water quality parameters like magnesium, sodium and potassium over the last 5 years.

Figure 8.8 The statistics for some water quality parameters

 

Figure 8.8 indicates the statistics indicating the changes of water quality parameters like calcium, TDS and hardness over the last 5 years.

The multivariate analysis model (this model) is developed using Linear regression improved with Pearson Co-efficient for forecasting the water quality parameters’ values. These forecasted values are then inputted to the Naïve Baye’s classifier method that classifies the water sample as potable or not. The accuracy for forecasting the values is 87% and the accuracy for classification is 92% compared to the univariant model [41] which had an accuracy of 85%.

8.4   Summary

 

The results obtained consisted of all the results expected at the beginning of the project. The forecasting of ground-water level gave a minimal root mean square error and had an accuracy of 85% and the potability classification of ground water gave us good accuracy  of nearly 92%. The ground-water quality data for each well was considered for 20 years and the salt contents were forecasted for the upcoming year. Similarly the water level data was for 10 years and the levels were forecasted for the upcoming year.

CHAPTER 9

 

CONCLUSION AND FUTURE ENHANCEMENT

 

Groundwater is a critical component. Correct and accurate estimation of this resource temporally is the one of the true and important focus of this project. This aids, not only in sustainable use but also in the formulation of new legislation, regulation and rules governing the groundwater resources in Karnataka. Groundwater resource estimation is one of the primary objectives of the Central Ground Water Board, Bangalore.

Amongst the 32 wells of Chikballapur, two wells are taken for implementation, as the remaining ones do not have satisfactory data for training and testing. The results obtained show that the accuracy of forecasting the water quality parameters of 87%. The forecasting of ground water level has an accuracy of 85%. The classification results show good accuracy of around 92%. This forecasting of groundwater levels and classification of groundwater for potability is made accessible to the concerned authorities as well as the common man through web application.

With the results and analysis following conclusions are made:

  • The amount of salts such as pH, EC, Total dissolved solids, Total Hardness, Cl, SO4, NO3, Ca, Mg, Na, K and F are present in the groundwater has been used to predict the potability of water.
  • The system was able to provide an accuracy of only 92% for classification.
  • Some salts such as Fe and Si do not contribute at all to the potability classification and hence they were ignored.
  • Each salt has a different effect to the classification analysis and results.

9.1.   Limitations of the Project

 

It is not possible to achieve complete accuracy especially in the field of prediction. While it takes many professional hands working tirelessly for a long time to deliver nearly perfect software with optimum trade-offs between performance, accuracy and speed. However, with every success come certain limitations. The limitations are as follows:

  • It requires the availability to huge amount of past data in order to predict the future. If the data is not sufficiently available, the software might not be able to perform up to the mark.
  • It cannot give 100% accuracy, which is a limitation that comes naturally, like in the field of fuzzy and probability theories.

9.2.   Future Enhancements

 

Enhancements are a necessary key to continuous improvement and increased efficiency. Stated below are a few enhancements or future work that can be incorporated/ achieved in this project:

  • Achieving better accuracy with larger data set.
  • To integrate other water quality parameters to the existing system is a good way to improve prediction accuracy, which is necessary to collaborate with water evaluation experts who can help classify water parameters into different information groups and develop models for each group.

References

 

[1] Julia Murphy and Max Roser (2017) – ‘Internet’. Published online at OurWorldInData.org. Retrieved from: https://ourworldindata.org/internet/ [Online Resource]

[2] QUIC at 10,000 feet. https://docs.google.com/document/d/ 1gY9-YNDNAB1eip-RTPbqphgySwSNSDHLq9D5Bty4FSU/ edit

[3] CUBIC for Fast Long-Distance Networks. https:// tools.ietf.org/html/draft-rhee-tcpm-cubic-02.

[4] Najafabadi, Alireza Taravat. “Preparing of groundwater maps by using Geographical Information  System.”  IEEE  17th  International   Conference   on   Geoinformatics,   Fairfax, Virginia, 2009, pp. 1-4.

[5] Cong, Fangjie, and Yanfang Diao. “Development of Dalian Groundwater Resources Management Geographical Information System”, IEEE International conference on Web Information Systems and Mining (WISM), Sanya, China, Vol. 1, 2010,  pp. 388-390.

[6] Weihong, Zhang, Zhao Yongsheng, Dong Jun, and Hong Mei. “An early warning system for groundwater pollution based on GIS.” IEEE International Symposium on Water Resource and Environmental Protection (ISWREP), Xi’an, Shaanxi Province, China, vol. 4, 2011, pp. 2773-2776.

[7] Yang, Zhenghua, Yanping Liu, and Haowen Yan. “GIS-based analysis and visualization  of the groundwater changes in Minqin Oasis“. IEEE International Symposium on Geomatics for Integrated Water Resources Management (GIWRM), Lanzhou, China, 2012, pp. 1-4. .

[8] Song, Mei, and Wang Lijie. “Study on groundwater optimization model based on GIS.”, IEEE 5th  International Joint Conference on INC, IMS and IDC, Seoul, South Korea, 2011,  pp. 977-980.

[9] Sonam Pal, Rakesh Sharma.” An Intelligent Decision Support System for Establishment  of New Organization on Any Geographical Area Using GIS”, International Journal on Advanced Research in Computer Science and Software Engineering, Volume 3, Issue 8, August 2013, pp 45-49.

[10] Mogaji, K. A., H. S. Lim, and K. Abdullah. “Modeling groundwater vulnerability prediction  using  geographic  information  system  (GIS)-based  ordered  weighted    average

(OWA) method and DRASTIC model theory hybrid approach”, Arabian Journal of Geosciences, no. 12, June 2014, pp 5409-5429.

[11] Efren L. Linan, Victor B. Ella, Leonardo M. Florece,” DRASTIC Model and GIS-Based Assessment of Groundwater Vulnerability to Contaminationin Boracay Island, Philippines”, Proceeding of the Intl Conf. on Advances In Applied Science and Environmental Engineering,2014.

[12] Muzafar N. Teli, Nisar A. Kuchhay,” Spatial Interpolation Technique For Groundwater Quality Assessment Of District Anantnag J&K”, International Journal of Engineering Research and Development, Volume 10, Issue 3, March 2014, pp 55-66.

[13] Yiteng Huang, Jacob Benesty, Jingdong Chen Blind,” Using the Pearson correlation coefficient to develop an optimally weighted cross relation based blind SIMO identification algorithm”,  IEEE  International  Conference  on  Acoustics,  Speech  and  Signal  Processing

,Taipei, Taiwan, 2013, pp 3153-3156.

[14] M.Kannan, S.Prabhakaran and P.Ramachandran, “Rainfall Forecasting Using Data Mining Technique”, International Journal of Engineering and Technology, Vol.2, no.6, July 2010, pp 397-401.

[15] Changjun Zhu, Qinghua Liu, “Evaluation of water Quality using Grey Clustering”, Second International workshop on Knowledge Discovery and Data Mining, July 2009, pp 10- 13.

[16] Andrew Kusiak, Xiupeng Wei, Anoop Prakash Verma and Evan Roz, “Modeling and Prediction of Quality Using Radar Reflectivity Data: A Data-Mining Approach”, IEEE Transactions on Geoscience and Remote Sensing, Vol.51, no.4, April 2013, pp 796-807.

[17] Lougheed, Vanessa L., Barb Crosbie, and Patricia Chow-Fraser, “Predictions on the effect of common carp (Cyprinus carpio) exclusion on water quality, zooplankton, and submergent macrophytes in a Great Lakes wetland” Canadian Journal of Fisheries and Aquatic Sciences, Volume 55, Issue 5,  January 2008, pp 1189-1197.

[18] Giardino, Claudia, et al. “Assessment of water quality in Lake Garda (Italy) using Hyperion.” Remote Sensing of Environment,Volume 109, Issue 2, December 2007, pp 183-

195.

[19] Wu, Wen-Jie, and Yan Xu. “Correlation analysis of visual verbs’ subcategorization based on Pearson’s correlation coefficient.” IEEE International Conference on Machine Learning and Cybernetics (ICMLC), Qingdao, China, Volume 4, June 2010, pp 2042-2046.

[20] Simin Li, Nannan Zhao, Zhennan Shi, Fengbing Tang, “Application of Artificial Neural Network on Water Quality Evaluation of Fuyang River in Handan City”, IEEE Transactions on neural network, Volume 7, no. 6, May 2010, pp 246-256.

[21] Wang, Bo, et al. “Finding Correlated item pairs through efficient pruning with a given threshold.” The Ninth International Conference on Web-Age Information Management (WAIM), Zhangjiajie Hunan, China, July 2008, pp 413-420.

[22] Scott Armstrong J., Fred Collopy, “Error Measures For Generalizing About Forecasting Methods: Empirical Comparisons”, International Journal of Forecasting, Volume 26, Issue  1,  January 2008, pp 69-80.

[23] Ramakrishnaiah, C. R., C. Sadashivaiah, and G. Ranganna. “Assessment of water quality index for the groundwater in Tumkur Taluk, Karnataka State, India.” Journal of Chemistry, October 2009, pp 523-530.

[24] Ruiz-Luna, A., and C. A. Berlanga-Robles. “Modifications in coverage patterns and land use around the Huizache-Caimanero lagoon system, Sinaloa, Mexico: a multi-temporal analysis using LANDSAT images” Journal of Estuarine, Coastal and Shelf Science, July 1999, Volume 49, Issue 1,  pp 37-44.

[25] Zhang, Qiang, et al. “Observed trends of annual maximum water level and streamflow during past 130 years in the Yangtze River basin, China.” International Journal of Hydrology, Volume 324, Issue 4, March 2006, pp 255-265.

[26] Lou, Wengao, and Shuryo Nakai. “Application of artificial neural networks for  predicting the thermal inactivation of bacteria: a combined effect of temperature, pH and water activity.” Journal on Food Research International, Volume 34, Issue 1, July 2001, pp 573-579.

[27] Shoba G, Shobha G, “Rainfall Prediction Using Data Mining Techniques”, International Journal of Engg and Computer Science, Volume 3, Issue 5, May 2014, pp 6206-6211.

[28] David Meyer, “Support Vector Machines”, Technische University, Wien, Austria.

[29] Ian Sommerville, “Software Engineering”, Pearson Publications, 9th Edition, 2011, ISBN: 978-013703515.

[30] Vipin Kumar, “Introduction to Data Mining”, Pearson Publications, 2011, ISBN: 978- 81317-1472-0.

[31] Herbert Schildt, “Java: The Complete Reference”, McGraw Hill Publications, 8th  edition, ISBN: 9780072123548.

[32] Jim Keogh, “Java EE: The Complete reference”, McGraw Hill Publications, 5th edition, ISBN: 007222472X.

[33] Patrick Naughton, “The Java Handbook”, McGraw Hill Publications, ISBN:0- 07882199-1.

[34] Tim Lindholm, Bill Joy, “The Java Virtual Machine Specification”, Addison-Wesley Publications, ISBN: 0-201-69581-2.

[35] Zhu Xiaochun, Zhou Bo, “A test Automation solution on GUI Functional test”, IEEE international conference on industrial informatics, July 2008, pages 1413-1418.

[36] Benesty J, Jingdong Chen, Yiteng Huang, On the importance of Pearson correlation coefficient in noise reduction”, IEEE transactions, May 2008, pages 757-765.

[37] Simon Kendal, “Object oriented programming with Java”, Pearson Publications, 1st edition, ISBN: 978-87-7681-501-1.

[38] Oracle whitepaper, “Advanced Java Diagnostics and Monitoring without performance overhead”, September 2013.

[39] Agarwal G., Li J.,Su Q., “Evaluating a demand driven technique for call graph construction”, 2002, pages 29-45.

[40] Forman, I.R., Forman, N., “Java Reflection in Action”, Manning Publications (2004).

[41] Shoba G, Shobha G, “Water Quality Prediction Using Data Mining techniques: A Survey”, International Journal Of Engineering And Computer Science, Volume 3 Issue 6, June 2014, pp 6299-6306.

[42] G. Shobha, Jayavardhana Gubbi, Krishna S Raghavan, ”A novel fuzzy rule based system for assessment of ground water potability: A case study in South India”, IOSR Journal of Computer Engineering (IOSR-JCE), Volume 15, Issue 2, December 2013, pp 35-41.

[43] Hal Daumé III, “A Course in Machine Learning”, Version 0.8, August 2012.

[44] Tom M. Mitchell, “The Discipline of Machine Learning”, School of Computer Science, Carnegie Mellon University, Pittsburgh, July 2006, pp 312-317.

[45] Mu Huaxin, Jiang Shuai, “Design Patterns in Software Development”, Proceeding of the IEEE 2nd International Conference on Software Engineering and Service Sciences, Beijing, July 2011, pp 15-17.

[46] Rod Johnson, JuergenHoeller, AlefArendsen, “Overview of Spring Framework”, in Introduction to the Spring Framework, December 2005, pp 5-24.

[47] Jeff Linwood and Dave Minter, “An  Introduction  to  Hibernate  in  Beginning Hibernate, An introduction to persistence using hibernate” , Springer, 2nd  edition, June  2010, pp 1-10.

[48] Ron Patton,” Software Testing”, Pearson Education India, 2nd  Edition, 2006.

[49] Susan Hyde, Thad Dunning, “The Analysis of Experimental Data: Comparing Techniques”, Proceeding of the Annual Meeting of American Political Science Association, Boston, December 2008, pp 233-242.

[50] Stanford University, “Introduction to Machine learning” by Andrew Ng.

[51] Marina Sokolova, Nathalie Japkowicz, Stan Szpakowicz, “Beyond Accuracy, F-Score and ROC: A Family of Discriminant Measures for Performance Evaluation”, Advances in Artificial Intelligence, Vol 43, no.4, June 2006, pp 1015-1021.

Appendices

Appendix A – Screen Shots of FCGWS

Figure A.1 shows the Home page of the web application. The options are given for the user of the system for viewing the forecasting of water level and water quality potability can be are provided. The admin of the system can login and insert the values of water quality and water level for the upcoming years.

Figure A.1 Home Page of the Web Application

 

Figure A.2 displays home page of water quality potability forecasting module to browse the respective quality data. This is for all for cases under consideration as illustrated in the figure below. This also shows the Google maps that consists of well number indicating the potability status.

Figure A.2  The water quality home page with Google maps

 

Figure A.3 displays home page of water level forecasting module to browse the respective water level data. This is for all for cases under consideration as illustrated in the figure below. This also shows the Google maps that consists of well number indicating the water level for after all the three seasons.

Figure A.3 The water level home page with Google maps

 

Figure A.4 illustrates the result page of forecasting of water quality module for upcoming year. On selecting the well number and submitting in Figure A.2, forecasted result for the next year is displayed. This also indicates the status of potability and indicates the values that are responsible for non potability indicated in red.

Figure A.4 The forecasted values of water quality.

 

Figure A.5 illustrates the result page of forecasting of water level module for upcoming year. On selecting the well number and submitting in Figure A.2, forecasted result for the next year is displayed.

Figure A.5 The forecasted values of water level.

 

Figure A.6 shows the home page of the admin of the system. Admin alone has the rights to update the water quality and water level data and also insert new taluk and well.

Figure A.6 Admin home page

 

Figure A.7 shows the GIS maps for different parameters for three different wells. Thus showing the spatial representation and drawing a comparison between the values.

Figure A.7 GIS maps.

 

Figure A.8 shows the testing of the water quality parameters for the year 2015, thus proving an accuracy of 87% for forecasting.

Figure A.8 Testing for water quality

 

Figure A.9 shows the testing of the water level for the year 2015, thus proving an accuracy of 85% for forecasting.

Figure A.9 Testing for water level

Cite This Work

To export a reference to this article please select a referencing stye below:

Reference Copied to Clipboard.
Reference Copied to Clipboard.
Reference Copied to Clipboard.
Reference Copied to Clipboard.
Reference Copied to Clipboard.
Reference Copied to Clipboard.
Reference Copied to Clipboard.

Related Services

View all

Related Content

All Tags

Content relating to: "Environmental Science"

Environmental science is an interdisciplinary field focused on the study of the physical, chemical, and biological conditions of the environment and environmental effects on organisms, and solutions to environmental issues.

Related Articles

DMCA / Removal Request

If you are the original writer of this dissertation and no longer wish to have your work published on the UKDiss.com website then please: