EVENT - BigDataDay for San Diego, SoCal- SAT, 31 Jan

Data / analytics enthusiasts, colleagues, entrepreneurs,

We’re Planning a “BigDataDay” for San Diego, SoCal for all day on Saturday, Jan 31st (8:30 to 4PM – which is free to attendees). (we = several companies, related meetup groups, SD IEEE, security groups, etc)

+++  At this point we’re issuing a ‘call for speakers’ for the basic agenda / topic list below.


The event objective is to synchronize and harmonize “Big Data / Analytics” effort in SD – put networking and collaboration into action. Provide education, technical aspects, new products and industry capabilities in an open forum = more DATA mojo for all!

The intent is to have a couple of key note speakers (AM and lunch) (AM is IBM’s World Wide Director for Big Data, Public Sector, lunch speaker is in work) (along with a panel recap at the end), with three tracks during the day to appeal to a wider audience (added topic details provided below):

(1)  Technical = the mechanics of data tools, processes and methods, etc

(2)  Data science = Predictive analytics, statistical modeling, building for the cloud, etc

(3)  Applications = where to get started with big data projects, and what methods / tools are in play now.

(Target audience will be business leaders, start-ups – focus on success stories, notional use cases, and similar topics)

We’ll put out more information, a registration link, updated agenda, etc later. Just keep the 31st open for now.


REQUEST – For now, if you have an interest / specialty in these areas and want to speak, let either Jesse or Mike know asap. Preferably show an initial interest (speaker placeholder) by the end of this year, as we will finalize the agenda by noon on 16 Jan.


Thanks in advance for your help to make San Diego stand out and be noticed in ‘DOING’  big data / predictive analytics capabilities, versus continuing to admire the problem (complexity, privacy etc)  (BTW – sponsors are welcome too….  As the event is free, need to fund refreshments, lunch, etc)

Best regards,







Suggested topics / capability areas:


1) Big Data 101 – Hadoop, NoSQL, Analytic Appliances, & Streaming – market survey, what to use when, and why (use cases will tie in here). Ideally we want to have an unbiased persons such as a computer science professor or industry analyst present with various inputs from industry in order to ensure everyone gets a fair representation (maybe done as a panel)

2) Cloud & Big Data Stores – Discuss the different use cases for Hadoop as a data store versus just an analytic platform. Talk about how to do DBaaS and engineer systems for mobile, enterprise features vs Open Source features, and what databases are best for developers. One particular area to focus on would be the management capabilities and functionalities of Apache and several commercial distributions. One key area often overlooked is the cost and complexity of managing a Hadoop Cluster that is 10+ nodes.

3) NoSQL Technologies – Overview and discussion of what NoSQL databases to use in different use cases/patterns. We know that there are a number of different databases currently on the market (doc stores, key value stores, CouchDB vs MongoDB, etc). Our sense is to first have an overview of what is on the market, what they do well, and where gaps are in different types of DBs.

4) Others?:  Data architectures, Aggregation Options, Pig: The Prequel to SQL, etc…


—Data Science

1) Streaming Analytics – Developers Session. Best practices for developing analytics, analytic portability, design principles, etc. Companies frequently do sessions like this as a lab where folks can bring laptops and take freely available data sources to build sample training applications.

2) Analytic Data Warehouses – worth covering an overview of these (Teradata, Netezza, and Exa are the big ones). These have been part of the market for a long time but a number of folks recently talk about their “role” (or lack of) in a big data world (ie, Hadoop) which misses the very important technical discussion on the role these technologies can play working in tandem with Hadoop deployments.

3) Building for the Cloud – when talking about deploying many software products or even the analytic appliances there is a knowledge gap in how to engineer the infrastructure to support the workload on infrastructure such as Amazon Web Services. This session would focus on the implications of deploying in the could.

4)  Others?   predictive analytics, parallel algorithms, statistical modeling, algorithms for data mining, Starting Small with Big (Data) Dreams, Data Science: Methods & Tools, etc



1) Data Asset Discovery – leveraging technologies, such as Hadoop, to explore and analyze corporate (or command) data resources in order to understand and identify data assets available in the enterprise, use cases, and liability of data on a network. This is also very useful as a precursor to enterprise document management and creating defensible destruction, retention, and security policies.   (related note – This is a tangential topic to the data security aspects of Privacy by Design (PbD) – as it is one application for companies to explore and understand where unprotected or unknown data lies in their network.)

2) Cyber Security Analytics – using data in motion analytics to ingest and store voluminous amounts of data in order to provide real time I&W and allow for quicker action in the event of a cyber attack or data breach. This type of use case leverages a champion-challenger analytics model in order to run a predictive alerting model while forwarding analyzed data to storage in order to “challenge” the model applied to streaming data. There is also a corollary track in data science as there are not many widely available in industry and is only being done by “Type A” customers in the Defense & Intel Communities. (This is also a tangential topic to PbD)

3) Sensor Analytics – integrating, fusing, and modeling external sensors (to include social media) in order to inform decision making and augment the facts known about entities (customers, opposing forces, employee engagement, etc) in order to create and continually enhance the virtual profile of the entities (customers) in order to launch targeted marketing campaigns, inform military decision making, etc (we may likely pick 2 areas  of focus -customer engagement & employee engagement) (this use case is the most relevant for PbD – especially about customers or data collection for Intel purposes.)

4) Others?  Privacy by Design (PbD)  / data security basics and overview, data start-ups / incubators, novel products, data for IoT and wearable devices,.. etc.