| In many shops some of the most common | | | | data was available, searches could be executed |
| queries used in large scale RDBMS systems such | | | | on the data. |
| as Oracle are for pattern searches within ranges | | | | I also found the packaged Schema Browser was |
| of criteria, typically targeted searches for data by | | | | very handy. Admittedly, the Schema Browser |
| users to answer and meet certain business needs. | | | | takes a while to process all the fields in the index |
| Writing standardized reports or simple relational | | | | so if you have a lot of data this can take a while. |
| queries can answer the questions, but such | | | | However the benefit is that it can provide |
| mechanisms can be inflexible and costly to | | | | answers to some of the more common questions |
| maintain. One more efficient way to address | | | | that could be asked such as: the number of |
| these challenges is through the power of Solr. | | | | documents per value which can help for groups of |
| Getting the Data | | | | items such as types of orders; how many |
| After installing Solr Lucid Imagination onto a | | | | documents actually have parent accounts; how |
| standalone server outside of the production | | | | orders are provided by various sending |
| complex, the next step involved actually | | | | systems;how many orders are for a given state |
| configuring Solr so that we could get the data we | | | | or postal code; etc. The data can also yield |
| needed. A few decisions were made at this point. | | | | additional insights from more advanced searches |
| The first decision happened to be about the data | | | | such as faceted searches, such as what postal |
| itself. I decided to target many of the existing | | | | codes are responding to which advertising or |
| information structures within the application which | | | | product promotions; which areas have the most |
| had been simplified to meet other business | | | | activity for certain types of orders; or, how |
| reporting needs. Additionally by using these | | | | many domains are covered per type of account. |
| structures it would make configuration easier later | | | | And the list goes on. |
| on. The second decision involved whether to store | | | | Operationally speaking, the Solr instances were |
| the data values in the index itself. While ideally the | | | | managed in one of two ways: periodic updates |
| data would have been accessed from the | | | | from the main production instances or continual |
| production database instance, I decided instead to | | | | updates with application code not only adding data |
| store the data within the index for easier retrieval | | | | to the Oracle database but inserting them into the |
| and to reduce the queries against the production | | | | Solr index as well. Hence the operations against |
| database instance itself. The final decision involved | | | | the existing production instances could be |
| how much of the data could be safely retrieved | | | | managed to minimize impacts and eliminate any |
| via the DataImportHandler and stored within Solr. | | | | unnecessary processing. |
| This actually turned out to be pretty simple. The | | | | Conclusion |
| Oracle constructs only held a week work's of | | | | With these new capabilities, answers to key |
| data, per an agreement with the business users. I | | | | questions can be found in seconds. Data can be |
| would start with that amount and from there | | | | mined quickly, efficiently and flexibly without a lot |
| determine how much further could be held within | | | | of specialized training for business users. |
| the Solr instance. | | | | Additionally, the indexes could be managed in such |
| Searching with Solr | | | | a way such that additional data could be added |
| The data once imported was not very large, only | | | | for to increase the scope of analysis, or subsets |
| 50GB worth of data overall. This again could be | | | | of data could be indexed and searched for |
| managed by adjusting the field types, whether | | | | specific business reasons such as service outages |
| data had to be stored or not, and the amount of | | | | or legal reasons. |
| historical information to be imported. Now that the | | | | |