# The IPO Problem

## Description:

Literature has documented for a long time the existence of abnormal first-day trading returns in initial public offerings (IPOs). The difference between the offer price and the closing price on the first trading day generally involves important price gains that cannot be easily justified. It’s understood that this initial return is mostly money left on the table since the shares could have been sold at a higher price and, therefore, raised more funds for the same stake in the company. Given the size of the IPO market, a lot of academic work has been devoted to study the nature of this phenomenon [RW02].

Click here to get this description in tex format and here to get the figure in eps format.

The problem we try to tackle is forecasting the initial return for a set of IPOs using a set of independent variables identified by means literature review. Most of them are related to the structure of the IPO. Forecasting the initial return of IPOs is very difficult since the specific set relevant variables is yet to be identified. Additionally, the researcher faces the supplementary challenge of dealing with the rather random nature of the target variable that is common in empirical finance.

## Instances and best known solutions for those instances:

Most of the research deals with the identification of explanatory variables using linear regression models whose R2 is generally in the range 0.15-0.20. Little effort has been made to forecast the initial return.

The provided data covers 1,007 companies taken public between 1996 and 1999 in the US. This includes AMEX, NASDAQ and NYSE IPOs and excludes ADRs, close-end funds, financial institutions and unit offerings. The sample consists of the following fields:

Infr_aj: Dependant variable. It measures the percentage difference between the offer price and the first trading day close. The figure was adjusted for the market return.

Lsize: Natural log of the proceeds raised in the IPO (dollar).

Retained: Number of shares sold divided by the pre-offering number of shares.

Price: Final offering price.

LowP: Lower end of the price range offered to potential investors during the roadshow.

HighP: Higher end of the price range offered to potential investors during the roadshow.

RanW: Difference between the lower and higher ends of the price range as a percentage of the lower end.

RanHan: This variable, suggested by Hanley [Han93], represents the absolute value of the difference between the final offer price and the mid point of the price range as a percentage of the last figure.

Employees: Number of employees at the time of the flotation.

SIC: Primary four digit Standard Industrial Classification code.

Techdummy: The variable equals one if the primary SIC code fits under the definition of technology company.

Prestige: Binary variable whose value equals one if the financial main financial advisor was prestigious and zero otherwise. The financial advisors were classified according to the methodology suggested by Balvers et al. [BMM88]. Financial institutions are labeled as prestigious if they were consistently considered top 25  in the annual lead-manager rankings published by Institutional Investor Magazine for the years of study.

In [QLI05] we use a subset of the variables: Infr_aj, Lsize, Retained, Price, Techdummy, Prestige, and Range (defined as (HighP-LowP)/LowP). With these variable we compare the standard regression methods with an evolutionary rule-based system. The error used to assess the performance of the models is the Normalized Mean Square Error. The sample used in this analysis consisted of a subset of 840 patterns picked up randomly, leaving the rest as a validation set. The regressions to be used are OLS (Ordinary least square) and LTS (Least trimming square). LTS discards noisy data and the trimming constant represents how Which points out of the initial 840-pattern training data set are used to fit the regression. The Rules-Based system offers predictions for 90% of the validation test. Given that the prediction error for a regresion can be estimated, we use this information to discard some predictions among those that are likely to be the worst. It is fair to compare the RB system with the regresion predicting the best 90% of the validation set. The results reported in the table below show that the RBS approach offers better results compared to the Linear Models.

Model Trimming Constant NMSE Test NMSE 90% Test
OLS - 0,92302 0,77238
LTS 424 0,98268 0,87663
LTS 600 1,03793 0,96238
LTS 800 0,94482 0,88741
RBS - - 0,43267

In another work (to be published), we use the same subset of the variables: Infr_aj, Lsize, Retained, Price, Techdummy, Prestige, and Range (defined as (HighP-LowP)/LowP). We use these variables in order to compare the performance of a simple Linear Regression model with the combination of the same regression and a novel Constructive Induction method based in Genetic Programming (GP) [Koz92]. This CI method uses GP to project data into a new dataspace. In this space, data shows a more linear behavior and the prediction rate of simple Linear Regression models is improved. Also, in some problems the generated dataspace is smaller than the original one. In this cases, the dimensionality of the problem (the number of variables) is reduced.

The sample used was composed of a train set containing 800 instances and a test set containing 200. The quality of the regression is measured by means of the Normalized Mean Square Error (NMSE). In one of the experiments, our method obtained a regression NMSE of 0.73396. In [QLI05], a 0.92302 NMSE was reported as the best result achieved in their experiments using classic Linear Regression. Even if 20 outliers are removed from the dataset, the classic Linear Regression only obtains a 0.77238 NMSE, which is still a greater error than the obtained by our method. Thus, our method achieves interesting improvements in the performance of a simple Linear Regression while reducing the number of variables of the problem.

## Related Papers:

[R.J. Balvers, B. McDonald and R.E. Miller, “Underpricing of New Issues and the Choice of Auditor as a signal of Investment Banker Reputation”, Accounting Review, Vol. 63, 4, pp. 605-622 (1988)

[Han93] K.W. Hanley, “The Underpricing of Initial Public Offerings and the Partial Adjustment Phenomenon”, Journal of Financial Economics, Vol. 34, 2, pp. 231-250 (1993)

[RW02] J.R. Ritter and Welch, Ivo, “A Review of IPO Activity, Pricing and Allocations”, Journal of Finance, Vol. 57, 4, pp. 1795-1828 (2002)

[QLI05] D. Quintana, C. Luque, P. Isasi, “Evolutionary Rule-Based System for IPO Underpricing Prediction”, Proceedings of Genetic and Evolutionary Computation Conference (GECCO 2005), Vol. 1, pp. 983- 989 (2005)

[Koz92] J. R. Koza. "Genetic Programming: On the Programming of Computers by Means of Natural Selection". MIT Press, Cambridge, MA, USA, 1992.

Last Updated: 21 /10/04                                                                                 For any question or suggestion, click here to contact with us.