Solar Irradiance (Global, Direct and Diffuse) Quality Control Methodologies Review: Application to Time Series Measured At LES/LNEG, Lisboa, Portugal

Solar irradiance spatial and temporal quantification is essential to the development, implementation, and operation of solar systems, being used throughout a solar project lifecycle. It is crucial to have good quality data measured in meteorological and radiometric ground stations in order to enable the calibration and validation of irradiance models and data series. The Solar Energy laboratory at LNEG operates a meteorological station gathering relevant parameters to characterize the solar irradiation profile for the city of Lisbon in Portugal. This work presents and compares the application of different methodologies used for quality control of solar irradiance measurements. Three methods - the CIE (1994) / Muneer and Fairooz (2002), the QCRad and the IEC - were tested against two synthetic data sets: a clear-sky year and a typical meteorological year randomly and uniformly infused with errors. IEC showed to have limitation regarding the extreme value criteria for beam normal irradiance and CIE for the diffuse horizontal irradiance. The QCRad presented the best performance, with total sensitivity above 80% and maximum specificity. This method was applied to the measured data of LES-LNEG between 2014 and 2018. Most of the detected errors were detected during the coherence test stage, having a higher prevalence between 2015 and mid-2016, highlighting the need to modify the diffuse horizontal irradiance measuring system.


Introduction
Solar irradiance spatial and temporal quantification is essential to the development, implementation and operation of solar systems, being used throughout a solar project lifecycle (Sengupta et al., 2017): selection of installation location; solar systems' performance estimation and viability evaluation during the planning, financing and plant dimensioning stages; performance evaluation during commissioning and throughout the life of the solar system; production forecasting and operation planning.
Different types of meteorological data series can be used for the above purposes: data series from ground station measurements; data series obtained from satellite information; data series generated from stochastic models based on statistical information.However, even for the last two cases it is crucial to have data measured at meteorological and radiometric ground stations to enable the calibration and validation of models and data series.
The measurement of solar irradiance entails several uncertainty and error sources that need to be taken into consideration when building the data sets.Thus, it is necessary to have quality control procedures in place when treating and assembling raw irradiance date into data sets, otherwise the irradiance data will not be suitable for use in the aforementioned processes.Several methodologies have been proposed (see section 3) but to the authors' best knowledge there is a lack of systematic inter-methodology comparisons.
The Solar Energy Laboratory (LES) from the National Laboratory for Energy and Geology, I.P. (LNEG) measures and gathers meteorological data at the Lumiar campus in Lisbon, Portugal, within the scope of its research and solar collector testing activities.This information is used to characterize the solar irradiation profile for the city of Lisbon.Currently the data quality control was performed guarantying the calibration of the equipment on annual or biennial frequency and by graphical representation of the radiation components in order to identify inconsistencies In order to implement an automated quality control method, a comparison between three well known quality control methods was performed and is presented in this work.The comparison was performed for synthetic irradiance data for a typical meteorological year and a clear sky year for the city of Lisbon.Based on the comparison results, a method was selected and applied to the irradiance data acquired in LES between 2015 and November 2018.

Equipment characteristics -associated uncertainties and problems
When performing irradiation measurements errors can arise from various sources: intrinsic characteristics of the equipment used; problems in the operation of that equipment; processing of the results.Some of these errors are systematic while others are random in their nature.Some of these errors will persist even after careful debugging of the installation and its operational proceedings.The creation of irradiance data sets for solar applications requires the detection and correction of the data entries affected by them.

Equipment characteristics and associated uncertainty
Equipment measurement uncertainty usually arise from sensors and their construction (Muneer and Fairooz, 2002) or the tracking system.Choosing an adequate sensor is fundamental to avoid this type of errors.This uncertainty is dependent on equipment characteristics which are summarized, for pyrheliometers, in Table 7.2 and for pyranometers, in Table 7.4, of the WMO Guide, nº 8 (WMO, 2018).These characteristics are response time, zero offset, resolution, stability, temperature response, non-linearity, spectral sensitivity and tilt response.
For each of these characteristics it is well defined the admissible uncertainty values for high and good quality in the case of pyrheliometers and for high, good or moderate quality in the case of pyranometers.If these are verified it is possible to consider an achievable uncertainty with 95% confidence level for hourly values as given in Table 1.
Tab. 1: Achievable uncertainty with 95% confidence level for hourly values.

Moderate quality 20 %
The usual classification of pyrheliometers and pyranometers, according to the ISO 9060:1990 standard ( ISO, 1990), is secondary standard, first class and second class.It is assumed that these correspond respectively to high, good and moderate quality equipment.

Operation errors
The referred classification can only be considered, according the WMO Guide, No.

Shading caused by building structures
The presence of structures like building and trees in the way of radiation.
Positioning the equipment in a place sufficiently away from any structure.
It is a systematic error that can be detected in its prevalence in the data.It can be mitigated through data imputation techniques.

Incorrect sensor levelling
The uncertainty of sensor levelling adds additional uncertainty to measures and calculations.
Performing an accurate levelling of the sensor.
It is a systematic error.When identified, it can be corrected in the data.

Electrical fields in the vicinity of cables
Electrical fields may distort the signal flowing in weakly shielded cable and cause misreadings at the registry.
Protecting cables from strong electric fields.Buying cables with proper electromagnetic radiation shielding.

Loading of Cables
Mechanical loading on cables connecting the irradiance sensor to the datalogger may result in signal damage due to piezoelectric effects.There is internal generation of electrical charge in response to the applied mechanical stress, which can cause noise in the data, usually observed as unusually high values of irradiance measurement.
Protecting the cables from mechanical loading risk.Not having cables in zones where objects are deposited or passageways where loads flow.
It may be systematic in nature or occasional.The occasional mechanical loading can be identified by the "spikes" observed in the registered data.

Surface obstruction
The presence of dust, snow, dew, water-droplets, bird droppings, etc. on the sensor coverture obstructs the irradiation arriving to the sensor.Weather conditions and ambient conditions may result in some obstruction to the path of the radiation, causing misreads of the current irradiation.
Cleaning regularly the sensor coverture, especially in times where there is an increased risk of surface obstruction.

Complete or partial shade-ring misalignment
The misalignment of the shade-ring may allow some beam irradiation to hit the sensor.
Accurate alignment of the shade ring.

Station or equipment shut-down
Either for maintenance, for repair, lack of operator or due to unfortunate circumstances, the equipment or the station may have to shutdown, pause or function in temporary malfunction.
The observation of the measurements' unidentified errors or omissions is needed to identify and correct the cause.
Data imputation may minimize the problems and omissions in the data.

Quality control procedures
LNEG-LES automatic quality control process starts with the filtration and treatment of measurements that may present anomalies related with incorrect operation of measuring equipment, data acquisition and data storage (e.g.: blank entries, not a number entries, etc.).This is followed by a frequency analysis of the data sampling period, where measurement gaps or periods between sequential data points are identified and corrected, without modification of the measurement values.Finally, the coherence of the measured value is checked.
Several quality control procedures are available in the literature, e.g.: NREL (1993), CIE (1994), Long and Dutton (2002), Muneer and Fairooz (2002), Yournes et al. (2005), Shi and Long (2008), Journée andBertrand (2011), IEC (2017).In this work three well known procedures were selected for comparison in order to evaluate the best option to implement within LNEG-LES automatic procedure: • Each methodology considers different thresholds for the validity of the analysis: • CIE notes that automatic testing should not be performed when the solar elevation is less than 4º (zenith angles larger than 96º) and when the global irradiance Gh is less than 20 W/m 2 -Yournes et al. (2005) increases to less than 7º; • QC Rad considers null the terms in its equations that calculate the horizontal component of the irradiance for zenith angles larger than 90º and restricts the validity of the tests to air temperatures between 170K and 350K; • IEC does not mention limitations to the application of the method.
In general, measurement coherence control is performed in three stages applied in the following order: 1 -evaluation of the measurements' physical feasibility; 2 -evaluation of rare occurrence values; 3 -evaluation of the irradiance measurements' coherence.However, different methodologies may include additional tests.
QC Rad and IEC follow the standard order of steps 1, then 2, then 3. CIE follows a different order: 1, then 3, then 2.
It should be noted that quality control methodologies do not attempt to correct values, but only to signal and/or discard those that are unreliable.If a measurement does not pass a test in a given stage, it is considered to not having passed the quality control and does not continues to be evaluated by following tests/stages.

Physical feasibility
The first stage identifies values in the data that are not physically possible, therefore filtering large measurement errors.Each of the 3 irradiance variables are tested to check of they are within the limits listed in Table 3.

Extremely rare values
The second stage identifies extremely rare values in the data, further restricting the interval of accepted values.It is physically possible for the irradiance to be outside this interval for very short periods of time or extremely rare situations.However, it is advisable to filter these values when building representative data sets or for calculations.

CIE --
This step was added to the CIE method by Muneer and Fayred (2002) and compares the diffuse and global horizontal irradiances to the values of the extreme conditions calculated with Page's models for very clear and heavily overcast skies.For the clear-sky, the same values of the respective series were used.

Coherence evaluation
The three irradiance values under consideration (Gh, Gd,h and Gb,n) are physically related to each other, thus, the third step of the procedure checks for deviations to the physical relationships between the irradiance measurements (Table 5).In the table r = Gh / (Gd,h + Gb,h).

Methodology
The direct comparison of the application of the three methods to measured data does not allow for a proper comparison of the methods performance since it is not possible to ascertain if each method is identifying true errors or not, i.e., it is not possible to compare the number of false positives (data entries that are correct but are identified as errors), false negatives (data entries that have an error but are identified as correct) and correct error identifications for each method.
Thus, to make a performance comparison of the aforementioned methods, it was decided to apply the methods to synthetic data series (where all entries were correct) modified by the insertion of deliberate errors.Afterwards, the method with better results was applied to a measured data set consisting of irradiance values measured at the LNEG-LES meteorological station between 2015 and 2018.

Generation of reference data
Two synthetic data sets were generated using the software Meteonorm v7.1 for the LNEG-LES location in Lisbon (38,774º N, 9,178º W, 184 m above mean sea level).Each data set presents hourly irradiance data (global and diffuse in the horizontal plane and beam normal).The first corresponds to a typical meteorological year (TMY) while the second was generated using a clear sky model (CS).
For each data set, synthetic errors were added to 25% of the entries, chosen randomly and uniformly from the whole.These entries were further divided randomly into 4 groups with identical size.The first group is intended for errors affecting the three irradiance variables simultaneously (e.g., sources of errors related to the measurement instant and affecting all irradiance variables at the same time), the second group is intended for errors affecting each irradiance variable at different times (e.g., sources of errors related to the measurement equipment), and the third and fourth groups are intended for errors affecting simultaneously two irradiance variables.Finally, each of these groups was divided into five sets each corresponding to one of the following factors, 1.5, 2, 5, 10 or 100, multiplied by the original value.The synthetic errors added to a portion of the data in this analysis are therefore relative errors, where the multiplication factor represents the power gain on the final measurement due to additional noise power (e.g., can be equal to 1 plus the inverse of the signal to noise ratio, the ratio of the signal power to the background noise power), thus encompassing multiple different possible sources of errors.Each data set ended up having a total of 40% entries with at least one error in one of the measured variables.Figure 1 shows an example of 2 annual plots for each data set of the location of the errors affecting the Global Horizontal Irradiance, where the uniform temporal dispersion of those errors is clearly visible.

Radiometric data measurement
The meteorological station operated by LES is equipped with instruments and a data acquisition system to measure all solar radiation componentsdirect normal irradiance (DNI) and global and diffuse irradiance in the horizontal planeand ambient dry bulb temperature.The measurements are performed with data acquisition rates of at least one measurement per minute.The existing equipment are listed in Table 6 and are subject to periodic calibration, annual-for the pyranometers and biennial for the pyrheliometer.

Comparative analysis
Three methods were selected for the comparison: IEC, QC Rad and CIE with the extra step proposed by Muneer and Fairooz (2002).The application of the three methods to the synthetic data series was analyzed in terms of the number and ratio of true positives and false positives, i.e., correct entries signaled as error.
Moreover, the sensitivity and specificity of each method was computed (Shreffler and Huecker 2020).The method sensitivity corresponds to the ratio between the number of true positives and the number of errors in the data set.The method specificity corresponds to the ratio between the number of true negatives and the number of correct entries in the data set.The positive likelihood ratio -ratio between the method sensitivity and the specificity complementwas also computed (Shreffler and Huecker 2020).

Comparison of the different methods
Figure 2 shows that all three methods have a high sensitivity, i.e., they detect a significant share of the synthetic errors (true positives)it should be noticed that night values have been excluded from the total number.The QC Rad has the most constant performance of the three across all irradiance values, with sensitivities between 75% and 92% for both data sets.The CIE method has a sensitivity around that of the QC Rad for the CS data set.However, for the TMY data set, it presents a higher sensitivity for the diffuse horizontal irradiances, with almost all errors detected, and consequently presents a very high number of total true positives.On the other hand, the IEC method presents a better sensitivity for the beam normal irradiance in the CS data set resulting in an excellent total sensitivity, however the same does not happen for the TMY data set.It has a slightly lower sensitivity for the diffuse irradiance errors but slightly higher for the beam irradiance.However, some methods detected a large amount of false errors (i.e., false positives).The IEC presents the highest sensitivity for the beam normal irradiance, but it also has the lowest specificity for that variable, with 40% of false positives detection for the TMY data set and 82% for the CS data set (see Figure 3).This is an indication that one of the stages is signaling an excessive amount of errors, generating false positives.Figure 4 shows that the IEC method is signaling an excessive amount of errors due to violation of the extreme values criteria.When analyzed together, figure 2 and 4 clearly shows there is a problem with the higher limit criteria set for the bean normal irradiance for the IEC method.
Similarly, the very large sensitivity on testing the diffuse horizontal irradiance of the CIE for the TMY data set is matched by a very low specificity.This is due mostly to the last stage of the tests, the extremely rare values test.
Perhaps the use of Page model is not adequate to this location.Naturally, when testing the CS data set, the problem is substantially reduced.
The QCRad, on the other hand, presents an impeccable specificity: with zero false positives, i.e., specificity of 1.
The specificity less than one for the specific variables is due to the third quality control stage flagging an error as a result of a true error in one of the other variables.To compare the value of performing each test, the positive likelihood ratio was calculated (see Figure 5).As expected from the observations above, performing the QC Rad quality control on the synthetic data sets yielded the largest value by far, only detecting true errors.The issues with CIE and IEC are clearly reflected in Figure 5 lowest likelihood for CIE when looking at diffuse irradiance and lowest likelihood for IEC when looking at beam normal irradiance.In face of this results, the QCRad method which presents very good sensitivity values with perfect specificity (resulting in the highest positive likelihood ratio) was chosen for quality testing the LES data set with real measures.

Application to measured data
The test detected around 10% of possible errors in the measured data.Figure 6 shows that the number of errors detected is larger for 2015 and decreases afterwards, being substantially lower for 2017 and 2018.

Conclusions
Three irradiance data quality testing methods -the CIE(1994)/Muneer and Fairooz (2002), the QCRad and the IEC -were tested against two synthetic data sets: clear-sky and typical mean year randomly and uniformly infused with errors.
All tests had a sensitivity value from 68% to 98%.The QCRad presented the best performance, with sensitivities between 75% and 92% and maximum specificity.The other methods have a performance close to that, but present problems in some of the testing stages.The CIE method presents a problem when detecting errors for the diffuse horizontal irradiances due to an excess of error detection (false positives) originated by its extreme values criteria.
The IEC method presents a problem when detecting errors for beam normal irradiance due to an excess of error detection generated by the application of its extreme values criteria.
Thus, it is advisable to review the extreme criteria for the diffuse horizontal irradiance in the case of the CIE method and for the beam normal irradiance in the case of the IEC method.
The QCRad quality control method was applied to the measured data of LES-LNEG between 2015 and 2018.It proved efficient in detecting irradiance measurement errors matching deficiencies previously identified during the regular operation of the meteorological station.For example, it detected errors due to the shading ring that are in accordance with the change of the apparatus to one that is more precise: the shading sphere, proving to be able to signal the need to apply a correction factor to the data.
This work is part of the activities of the Infrastructure Project "INIESC -Infraestrutura Nacional de Investigação em Energia Solar de Concentração" (ALT20-03-0145-FEDER-022113), supported by national funds through FCT/MCTES (PIDDAC) and co-funded by the European Regional Development Fund through the Alentejo and Lisbon Regional Operational Programmes.

Nomenclature
CIE: The first 2 levels of tests proposed by the International Commission on Illumination, CIE (1994), as described by Yournes et al. (2005), plus an extra step proposed by Muneer and Fairooz (2002); • QC Rad: (V2.0) by Long and Dutton (2002), used by the Baseline Surface Radiation Network (BSRN), the National Surface Radiation Budget Network for Atmospheric Research (SUFRAD), and others (Shi and Long 2008); • IEC: the methodology for the creation of annual solar radiation data series presented in IEC TS 62862-1-2:2017 Solar thermal electric plants -Part 1-2: General -Creation of annual solar radiation data set for solar thermal electric (STE) plant simulation.

Fig. 1 :
Fig. 1: Annual plots of the errors' location in reference data sets.Left is the CS data and right is the TMY data (green linecivil dawn; blue linecivil twilight).

Fig. 2 :
Fig. 2: Plot of the sensitivity of the tests on the two data sets: left is the CS, and right is the TMY.Results for the total of the entries with at least one error in one of the variables, and for each variable.

Fig. 3 :
Fig. 3: Plot of the specificity of the tests on the two data sets: left is the CS, and right is the TMY.Results for the total of the entries with all true values detected in the variables, and for each variable.

Fig. 4 :
Fig. 4: Annual plot for the quality control of the direct normal irradiance for the IEC method: left is the CS, and right is the TMY.The numbers correspond to the error detection stage, where 0 is no detection.The green line represents the civil dawn and the blue line the civil twilight.

Fig. 5 :
Fig. 5: Plot of the positive likelihood of the tests on the two data sets: left is the CS, and right is the TMY.Results for the total of the entries with at least one error in one of the variables, and for each variable.

Fig. 6 :
Fig. 6: Plot of the percentage of errors detected in the data for different years.

Figure 7
Figure 7 reveals the time location of the flagged measures and the test stage that flagged it.There are some stage 1 errors signaled for very small angles, but most errors are due to the coherence test.The yearly difference is likely due to a modification to the shading of the pyranometer used to measure diffuse horizontal irradiance, and signal the need to apply a shade-ring correction factor the 2015 and 2016 data.

Fig. 7 :
Fig. 7: Annual plot of the quality control results for the LES-LNEG global horizontal irradiance data.From the left upper corner to the right lower corner: 2015, 2016, 2017, 2018.The numbers correspond to the error detection stage, where 0 is no detection.The green line represents the civil dawn and the blue line the civil twilight.